Scrape

Extract clean Markdown content, metadata, and links from any URL.

POSThttps://api.crawly.bikal.co/v1/scrape

The Scrape endpoint extracts the main content from a single URL and returns it as clean Markdown. It automatically handles JavaScript-rendered pages using a headless browser, follows redirects, and strips navigation, ads, and boilerplate.

Each request costs 1 credit ($0.001). Failed requests are automatically refunded. The endpoint supports any publicly accessible webpage, including SPAs and dynamically loaded content.

Request Body

urlstringREQUIRED

The URL to scrape. Must be a valid HTTP or HTTPS URL including the protocol.

timeoutnumberDefault: 30

Maximum time in seconds to wait for the page to load. Range: 5 to 60.

Examples

curl -X POST https://api.crawly.bikal.co/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Response

Success `200`

json

{
  "success": true,
  "url": "https://example.com",
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "metadata": {
    "title": "Example Domain",
    "description": "Example Domain for documentation",
    "language": "en",
    "og_image": "https://example.com/og.png",
    "status_code": 200
  }
}

Response Fields

successboolean

Whether the scrape completed successfully.

urlstring

The final URL after following any redirects.

markdownstring

The extracted page content as clean Markdown.

metadataobject

Page metadata including title, description, language, og_image, and status_code.

Errors

Status	Type	Description
`400`	`VALIDATION_ERROR`	Missing or invalid url parameter
`401`	`AUTH_ERROR`	Missing or invalid API key
`402`	`CREDITS_EXHAUSTED`	Insufficient credits
`422`	`EMPTY_CONTENT`	Page loaded but no content could be extracted
`429`	`RATE_LIMITED`	Rate limit exceeded (120 requests per minute)
`502`	`CONNECTION_ERROR`	Could not connect to the target URL
`504`	`PAGE_TIMEOUT`	Page took too long to load

All error responses follow this format:

json

{
  "success": false,
  "error": "Could not extract content from this page",
  "error_type": "EMPTY_CONTENT"
}

Pricing

1credit per request ($0.001)

Credits are deducted upfront when the request is made. If the request fails, the credit is automatically refunded to your balance.

Notes

Every scrape fetches a live, up-to-date version of the page.
JavaScript-rendered pages (SPAs) are fully supported via headless browser rendering.
The API automatically follows redirects and resolves the final URL.
Content is extracted as clean Markdown with ads, navigation, and boilerplate removed.
Maximum timeout is 60 seconds. Requests exceeding this return a 504 error.
All URLs must include the protocol (https://).

Next:Transcript API →