Scrape Content
Web Scraping
Scrape Content
Create a new scraping session to extract content from any website with optional AI-powered data extraction
POST
Scrape Content
Overview
The/scrape endpoint initiates a new web scraping job with the specified URL and options. It returns the scraped content directly and supports both renderless and renderful scraping modes. You can also use AI-powered extraction to get structured data from the page.
Parameters
Request Body
The URL to scrape
Whether the request should be asynchronous. When
true, returns a job ID immediately that can be used to check status.Whether to render JavaScript on the page. Enable for dynamic content loaded via JavaScript.
HTTP method for the request. Options:
get, post, put, delete, patch, head, optionsResponse format. Options:
raw, json, markdown.When format is markdown, ScrapEngine auto-detects the response type:- HTML → converted via Turndown
- PDF / XLSX / DOCX → extracted via the built-in document parser (see
documentOptionsbelow)
Proxy country for geo-targeted requests (e.g.,
us, uk, de)Whether to include response headers in the response
HTTP headers for the request
Request body content (for POST/PUT requests)
Document Extraction Options
Applied only whenformat is markdown AND the target response is a PDF, XLSX, or DOCX file. Ignored for HTML responses.
PDF / XLSX / DOCX extraction options.
Password for encrypted PDFs.
1-based page numbers to include (PDF only). Omit for the whole document.
How PDF page boundaries appear in the output markdown. Options:
none (no separator), hr (horizontal rule), comment (HTML comment).Strip recurring page headers and footers from PDF output.
LLM Extraction Options
AI-powered extraction options. When provided, the scraped content will be processed by an LLM to extract structured data.
JSON Schema defining the structure to extract. Use this for precise, typed extraction.
Natural language prompt describing what to extract. Use this for flexible, conversational extraction.
Custom system prompt to guide the LLM behavior
LLM model to use for extraction. Options:
gpt-4o, gpt-4o-mini, claude-3-5-sonnetWhether to include extraction metadata (tokens used, cost) in response
Example Requests
Basic Scraping
Scrape a PDF as Markdown
Point/scrape at a PDF (or XLSX / DOCX) URL with format: "markdown" to get extracted text back as Markdown.
With LLM Extraction (Schema)
Extract structured data using a JSON schema:With LLM Extraction (Prompt)
Extract data using a natural language prompt:Response
Success Response (200)
Without extraction - Returns the scraped HTML content directly:Response Headers
| Header | Description |
|---|---|
x-remaining-credits | Number of API credits remaining |
x-trace-id | Unique identifier for the request |
Error Responses
| Status | Description |
|---|---|
400 | Bad Request - Invalid parameters or URL |
401 | Unauthorized - Invalid or missing API key |
403 | Forbidden - Access denied to target resource |
404 | Not Found - Target URL not found |
408 | Request Timeout |
500 | Internal Server Error |
550 | Faulted After Retries - Job failed after multiple attempts |
Use Cases
- E-commerce scraping: Extract product information, prices, and availability
- Content aggregation: Collect articles, blog posts, and news content
- Lead generation: Extract contact information and company details
- Competitor analysis: Monitor competitor websites and pricing
- SEO analysis: Extract meta tags, headings, and content structure
- AI-powered extraction: Use LLM to extract structured data without writing parsers