Scrape

Extract clean Markdown content, metadata, and links from any URL.

Try it
POSThttps://api.crawly.bikal.co/v1/scrape

The Scrape endpoint extracts the main content from a single URL and returns it as clean Markdown. It automatically handles JavaScript-rendered pages using a headless browser, follows redirects, and strips navigation, ads, and boilerplate.

Each request costs 1 credit ($0.001). Failed requests are automatically refunded. The endpoint supports any publicly accessible webpage, including SPAs and dynamically loaded content.


Request Body

urlstringREQUIRED

The URL to scrape. Must be a valid HTTP or HTTPS URL including the protocol.

timeoutnumberDefault: 30

Maximum time in seconds to wait for the page to load. Range: 5 to 60.


Examples

curl -X POST https://api.crawly.bikal.co/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'

Response

Success 200

json
{
"success": true,
"url": "https://example.com",
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"description": "Example Domain for documentation",
"language": "en",
"og_image": "https://example.com/og.png",
"status_code": 200
}
}

Response Fields

successboolean

Whether the scrape completed successfully.

urlstring

The final URL after following any redirects.

markdownstring

The extracted page content as clean Markdown.

metadataobject

Page metadata including title, description, language, og_image, and status_code.


Errors

StatusTypeDescription
400VALIDATION_ERRORMissing or invalid url parameter
401AUTH_ERRORMissing or invalid API key
402CREDITS_EXHAUSTEDInsufficient credits
422EMPTY_CONTENTPage loaded but no content could be extracted
429RATE_LIMITEDRate limit exceeded (120 requests per minute)
502CONNECTION_ERRORCould not connect to the target URL
504PAGE_TIMEOUTPage took too long to load

All error responses follow this format:

json
{
"success": false,
"error": "Could not extract content from this page",
"error_type": "EMPTY_CONTENT"
}

Pricing

1credit per request ($0.001)

Credits are deducted upfront when the request is made. If the request fails, the credit is automatically refunded to your balance.


Notes

  • Every scrape fetches a live, up-to-date version of the page.
  • JavaScript-rendered pages (SPAs) are fully supported via headless browser rendering.
  • The API automatically follows redirects and resolves the final URL.
  • Content is extracted as clean Markdown with ads, navigation, and boilerplate removed.
  • Maximum timeout is 60 seconds. Requests exceeding this return a 504 error.
  • All URLs must include the protocol (https://).