How to Give Claude and Cursor Web Scraping Abilities (MCP)

TL;DR: Install the Crawly MCP server in 30 seconds and your AI tools can scrape any website and extract YouTube transcripts directly. Works with Claude Desktop, Cursor, Windsurf, and any MCP-compatible client. No code needed after setup.

Why AI Tools Need Web Access

Claude, Cursor, and other AI coding assistants are trained on static data. They cannot visit a URL, read a webpage, or check the current content of a site. This is a significant limitation when you are working on tasks that need live data: competitive research, documentation from external sources, analyzing a client's website, or pulling content from YouTube videos.

Until recently, the workaround was manual. Copy the page content, paste it into the chat, and ask the AI to analyze it. This breaks down when you need to process multiple URLs, work with dynamic JavaScript-rendered pages, or extract transcripts from videos. You need a way for the AI to access the web directly.

What is MCP?

The Model Context Protocol (MCP) is an open standard created by Anthropic that lets AI tools connect to external services. It works like a plugin system. When you add an MCP server to your AI tool, the AI gains new capabilities it can call during conversations.

MCP servers expose "tools" that the AI can use. The Crawly MCP server exposes two tools:

crawly_docs - Fetches the live API schema so the AI knows what endpoints and parameters are available
crawly_request - Makes a request to any Crawly endpoint with the right parameters

When you ask your AI to scrape a website, it automatically calls these tools behind the scenes, sends the URL to Crawly, and returns the Markdown content directly in your chat.

Setup: Claude Desktop

Open your Claude Desktop configuration file. On macOS, it is at:

text

~/Library/Application Support/Claude/claude_desktop_config.json

On Windows:

text

%APPDATA%\Claude\claude_desktop_config.json

Add the Crawly server to your config:

json

{
  "mcpServers": {
    "crawly": {
      "command": "npx",
      "args": ["-y", "crawly-mcp", "--api-key", "cr_your_key_here"]
    }
  }
}

Restart Claude Desktop. You should see Crawly appear in the MCP tools list. If you already have other MCP servers configured, add the crawly entry alongside them inside the existing mcpServers object.

Setup: Cursor

Create or edit .cursor/mcp.json in your project root:

json

{
  "mcpServers": {
    "crawly": {
      "command": "npx",
      "args": ["-y", "crawly-mcp", "--api-key", "cr_your_key_here"]
    }
  }
}

Cursor will detect the file automatically. You can verify it is connected by checking Settings > MCP Servers in the Cursor interface. The Crawly server should show a green status.

Setup: Windsurf and Other Clients

Windsurf, Cline, and other MCP-compatible tools use the same configuration format. Add the Crawly entry to your MCP config file with the command and args shown above. The exact file location varies by tool, but the JSON structure is identical.

Using Environment Variables (Recommended)

Instead of putting your API key directly in the config file, you can use an environment variable. This is safer because the key does not appear in files you might commit to version control.

bash

export CRAWLY_API_KEY=cr_your_key_here

Then omit the --api-key flag from your MCP config. The server will read the key from the environment automatically:

json

{
  "mcpServers": {
    "crawly": {
      "command": "npx",
      "args": ["-y", "crawly-mcp"]
    }
  }
}

What You Can Do After Setup

Once connected, you interact with Crawly using natural language. The AI figures out which endpoint to call and what parameters to use. Here are real examples:

Scrape a Webpage

Prompt: "Scrape https://stripe.com/docs/api and give me a summary of their API structure."

The AI calls crawly_request with the scrape endpoint, gets back clean Markdown, and summarizes the content in your chat.

Extract a YouTube Transcript

Prompt: "Get the transcript of https://youtube.com/watch?v=example and list the main topics."

The AI extracts the full transcript using the transcript endpoint and analyzes the content. This is useful for creating study notes, blog posts, or summaries from video content.

Competitive Research

Prompt: "Scrape these three competitor landing pages and compare their pricing: site1.com, site2.com, site3.com."

The AI makes three separate scraping requests and synthesizes the results into a comparison. This would take you 15 minutes manually. With MCP, it takes seconds.

Documentation Research

Prompt: "Scrape the Next.js docs page on middleware and explain how to implement rate limiting."

Instead of reading documentation yourself and then asking the AI about it, the AI reads the documentation directly and gives you an answer based on the current content.

How It Works Under the Hood

Discovery: The AI calls crawly_docs to fetch the live API schema from api.crawly.bikal.co/v1/schema. This tells it what endpoints exist and what parameters they accept.
Request: The AI calls crawly_request with the right endpoint and parameters. For scraping, it sends the URL to /v1/scrape. For transcripts, it sends the URL to /v1/transcript.
Response: Crawly returns clean Markdown (for scraping) or transcript text (for transcripts). The AI receives this directly and can analyze, summarize, or transform it.
Auto-discovery: When new endpoints are added to Crawly, they appear in the schema automatically. You never need to update the MCP server or reconfigure anything.

Alternatives Compared

Tool	Web Scraping	YouTube Transcripts	Setup Time	Cost
Crawly MCP	✅ Any URL	✅ All formats	30 seconds	$0.001/request
Firecrawl MCP	✅ Any URL	❌	1-2 minutes	$0.003+/request
Browser-use	✅ Via browser	❌	5-10 minutes	Free (self-hosted)
Manual copy-paste	✅ Manual	❌	N/A	Free

Crawly MCP is the only option that combines web scraping and YouTube transcript extraction in a single server. Firecrawl has an MCP server but does not support transcripts and costs 3x more per request. Browser-use automates a real browser, which is powerful but slow and requires local setup with dependencies.

Troubleshooting

MCP server not appearing in Claude/Cursor

Make sure you have Node.js 18+ installed. Run node --version to check. The MCP server uses npx to run, which requires Node.js. Also verify your JSON config file has no syntax errors. A missing comma or bracket will silently fail.

"command not found: npx"

Install Node.js from nodejs.org. On macOS, you can also use brew install node. After installing, restart your AI tool.

API key not working

Make sure your key starts with cr_. You can verify it works by making a direct curl request. If the key is expired or invalid, create a new one at the API Keys page.

Scrape returning empty content

Some sites block scraping entirely. If a page returns empty content, the request is automatically refunded. Try a different URL to verify your setup is working.

Security Best Practices

Use environment variables for your API key instead of hardcoding it in config files
Add MCP config files to .gitignore so keys are not committed to version control
Rotate keys periodically by creating a new key and deleting the old one
Monitor usage in the Crawly dashboard to catch unexpected activity

Pricing

The MCP server itself is free and open source. You pay only for the API requests it makes: 1 credit ($0.001) per scrape or transcript extraction. You get 100 free credits on signup, no credit card required. Credits never expire.

For reference, 100 credits lets you scrape 100 webpages or extract 100 YouTube transcripts. That is enough for weeks of normal AI-assisted research.

Frequently Asked Questions

Does this work with ChatGPT?

Not yet. ChatGPT does not support the MCP protocol. It works with Claude Desktop, Cursor, Windsurf, Cline, and any tool that implements the MCP standard. OpenAI may add MCP support in the future.

Is MCP free?

MCP is a free, open protocol. The Crawly MCP server is free and open source (published on npm as crawly-mcp). You only pay for the API requests: $0.001 per request.

How many requests can I make?

The rate limit is 60 requests per minute. For most AI-assisted workflows, this is more than enough. Each conversation typically involves 1-5 scraping requests.

Can the AI scrape pages that require login?

No. Crawly scrapes publicly accessible pages only. It cannot access content behind authentication walls, paywalls, or login forms. The AI will receive whatever content is available to an unauthenticated visitor.

Do I need to update the MCP server when Crawly adds new features?

No. The MCP server discovers endpoints dynamically by fetching the API schema. When new endpoints are added, the AI sees them automatically on the next crawly_docs call. You never need to update or reinstall the MCP package.

Ready to give your AI web access? Get your API key and follow the setup above. It takes 30 seconds. Or read the full MCP documentation for advanced configuration.

How to Give Claude and Cursor Web Scraping Abilities (MCP)

On this page

Why AI Tools Need Web Access

What is MCP?

Setup: Claude Desktop

Setup: Cursor

Setup: Windsurf and Other Clients

Using Environment Variables (Recommended)

What You Can Do After Setup

Scrape a Webpage

Extract a YouTube Transcript

Competitive Research

Documentation Research

How It Works Under the Hood

Alternatives Compared

Troubleshooting

MCP server not appearing in Claude/Cursor

"command not found: npx"

API key not working

Scrape returning empty content

Security Best Practices

Pricing

Frequently Asked Questions

Does this work with ChatGPT?

Is MCP free?

How many requests can I make?

Can the AI scrape pages that require login?

Do I need to update the MCP server when Crawly adds new features?

Try Crawly.