scrapinator

One API for AI search
responses.

Web-UI scraping. LLM API.

We drive ChatGPT, Perplexity, Gemini, Google AI Overview, Claude, Grok, and DeepSeek in real Chrome browsers. Captures match what users see, citations included. One Bearer token, one JSON shape.

POST /v1/capture/aioverview200 · 6.3s
{
  "success": true,
  "result": {
    "provider": "aioverview",
    "text": "Top EVs for 2026 include …",
    "sources": [11 items],
    "latencyMs": 6302
  }
}
  • ChatGPT
  • Perplexity
  • GeminiGemini
  • AI Overview
  • Claude
  • Grok
  • DeepSeek
  • ChatGPT
  • Perplexity
  • GeminiGemini
  • AI Overview
  • Claude
  • Grok
  • DeepSeek

What you get.

01

Uniform JSON shape

ChatGPT, Perplexity, Gemini, AI Overview, Claude, Grok, and DeepSeek all return the same response: text, sources[], html, latencyMs. Swap a provider, your parser stays the same.

02

Real responses, not LLM API output

We scrape the actual chat UIs — your captures include the same web-search citations end users see, not bare model completions.

03

Bearer auth

One header. Same key works across every endpoint. No OAuth dance, no per-provider auth schemes to wire up.

04

Built for batches

Send N prompts in one request, get N independent results back. Parallel fan-out — wall-clock ≈ slowest single capture, not the sum.

One request shape, every provider.

Pick a provider — the endpoint, request body, and response update.

POST /v1/capture/chatgpt
$ curl -X POST https://api.scrapinator.dev/v1/capture/chatgpt \
  -H "Authorization: Bearer sk_…" \
  -H "content-type: application/json" \
  -d '{"prompt":"Best podcasts for software engineers in 2026?","country":"US"}'
↳ Response200 OK · application/json
{
  "success": true,
  "result": {
    "provider": "chatgpt",
    "text": "Three podcasts I'd recommend for software engineers in 2026 …",
    "sources": [
      { "position": 1, "title": "The Changelog", "url": "https://changelog.com/…" },
      { "position": 2, "title": "Software Engineering Daily", "url": "https://softwareengineeringdaily.com/…" }
    ],
    "latencyMs": 47213
  }
}

Comparison

Why not just build it yourself?

FeatureScrapinatorDIY scrapingGeneric SERP APIs
Captures real ChatGPT / Perplexity / Gemini / Claude / Grok / DeepSeek answers
Source attributions includedpartial
One API across all four AI providerspartial
No CAPTCHA solving on caller side
Maintained against provider UI changespartial
Parallel batch fan-outpartial
Predictable per-call pricing

FAQ

Questions, answered.

We scrape the actual chat UIs — the same ones humans use — so you see what real users see: cited sources, web-search results, AI Overview blocks. The official APIs return base-model output without retrieval or citations.

No. Every call captures a fresh response in real time. AI-search outputs change daily; caching would defeat the point of monitoring them.

Yes — AI Overview goes through our parsed SERP path. p50 ≈ 6 s, with full text + sources + bullet structure. When Google chooses not to render AI Overview for a query, we return a clean "not present" error so you can fall back gracefully.

Every ISO-3166 alpha-2 code is accepted. Country routes the underlying proxy IP geographically and biases provider-side responses to that locale. Perplexity is currently pinned to US due to their anonymous-EU gating — documented in the API reference.

Per-key concurrency limits are enforced server-side. Batch endpoints run prompts in parallel and surface per-prompt failures inside results[] without failing the batch. Selectively retry failed prompts at your own cadence.

Wired into our pipeline. CapSolver handles Cloudflare Turnstile (ChatGPT, Perplexity) and Google reCAPTCHA v2 transparently. You will never see a captcha in your response.

Yes — the html field returns the raw rendered HTML for each capture, and capturedAt is an ISO-8601 server-side timestamp. Use both together as evidence-grade records.

Sign up, copy your API key, and curl one of the examples. Five minutes including reading the docs.

Capture your first
AI response in 5 min.

One Bearer token. One JSON shape across every provider.