This commit improves the URL health checking mechanism to reduce false negatives. - Treat all 2XX status codes as successful, addressing issues with codes like `204`. - Exclude URLs within Markdown inline code blocks. - Send the Host header for improved handling of webpages behind proxies. - Improve formatting and context for output messages. - Fix the defaulting options for redirects and cookie handling. - Add URL exclusion support for non-responsive URLs. - Update the user agent pool to modern browsers and platforms. - Improve CI/CD workflow to respond to modifications in the `test/checks/external-urls` directory, offering immediate feedback on potential impacts to the external URL test. - Add support for randomizing TLS fingerprint to mimic various clients better, improving the effectiveness of checks. However, this is not fully supported by Node.js's HTTP client; see nodejs/undici#1983 for more details. - Use `AbortSignal` instead of `AbortController` as more modern and simpler way to handle timeouts.
status-checker
A CLI and SDK for checking the availability of external URLs.
🧐 Why?
- 🏃 Fast: Batch checks the statuses of URLs in parallel.
- 🤖 Easy-to-Use: Zero-touch startup with pre-configured settings for reliable results, yet customizable.
- 🤞 Reliable: Mimics real web browser behavior by following redirects and maintaining cookie storage.
🍭 Additional features
- 😇 Rate Limiting: Queues requests by domain to be polite.
- 🔁 Retries: Implements retry pattern with exponential back-off.
- ⌚ Timeouts: Configurable timeout for each request.
- 🎭️ Impersonation: Impersonate different browsers for each request.
- 🌐 User-Agent Rotation: Change user agents.
- 🔑 TLS Handshakes: Perform TLS and HTTP handshakes that are identical to that of a real browser.
- 🫙 Cookie jar: Preserve cookies during redirects to mimic real browser.
CLI
Coming soon 🚧
Programmatic usage
The SDK supports both Node.js and browser environments.
getUrlStatusesInParallel
// Simple example
const statuses = await getUrlStatusesInParallel([ 'https://privacy.sexy', /* ... */ ]);
if(statuses.all((r) => r.code === 200)) {
console.log('All URLs are alive!');
} else {
console.log('Dead URLs:', statuses.filter((r) => r.code !== 200).map((r) => r.url));
}
// Fastest configuration
const statuses = await getUrlStatusesInParallel([ 'https://privacy.sexy', /* ... */ ], {
domainOptions: {
sameDomainParallelize: false,
}
});
Batch request options
domainOptions:sameDomainParallelize, (boolean), default:false- Determines if requests to the same domain will be parallelized.
- Setting to
falsemakes all requests parallel. - Setting to
truequeues requests for each unique domain while parallelizing across different domains. - Requests to different domains are always parallelized regardless of this option.
- 💡 This helps to avoid
429 Too Many Requestsand be nice to websites
sameDomainDelayInMs(number), default:3000(3 seconds)- Sets the delay between requests to the same domain.
requestOptions(object): See request options.followOptions(object): See follow options.
getUrlStatus
Check the availability of a single URL.
// Simple example
const status = await getUrlStatus('https://privacy.sexy');
console.log(`Status code: ${status.code}`);
Request options
retryExponentialBaseInMs(number), default:5000(5 seconds)- Base time for the exponential back-off calculation for retries.
- The longer the base time, the greater the intervals between retries.
additionalHeaders(object), default:false- Additional HTTP headers to send along with the default headers. Overrides default headers if specified.
requestTimeoutInMs(number), default:60000(60 seconds)- Time limit to abort the request if no response is received within the specified time frame.
fetchFollow
Follows 3XX redirects while preserving cookies.
Same fetch API except third parameter that specifies follow options, redirect: 'follow' | 'manual' | 'error' is discarded in favor of the third parameter.
const status = await fetchFollow('https://privacy.sexy', 1000 /* timeout in milliseconds */);
console.log(`Status code: ${status.code}`);
Follow options
followRedirects(boolean), default:true- Determines whether or not to follow redirects with
3XXresponse codes.
- Determines whether or not to follow redirects with
maximumRedirectFollowDepth(boolean), default:20- Specifies the maximum number of sequential redirects that the function will follow.
- 💡 Helps to solve maximum redirect reached errors.
enableCookies(boolean), default:true- Enables cookie storage to facilitate seamless navigation through login or other authentication challenges.
- 💡 Helps to over-come sign-in challenges with callbacks.