Overview
The /scrape endpoint initiates a new web scraping job with the specified URL and options. It returns the scraped content directly and supports both renderless and renderful scraping modes. You can also use AI-powered extraction to get structured data from the page.
Parameters
Request Body
Whether the request should be asynchronous. When true, returns a job ID immediately that can be used to check status.
Whether to render JavaScript on the page. Enable for dynamic content loaded via JavaScript.
HTTP method for the request. Options: get, post, put, delete, patch, head, options
Response format. Options: raw, json, markdown
Proxy country for geo-targeted requests (e.g., us, uk, de)
Whether to include response headers in the response
HTTP headers for the request{
"User-Agent": "Mozilla/5.0...",
"Accept-Language": "en-US,en;q=0.9"
}
Request body content (for POST/PUT requests)
AI-powered extraction options. When provided, the scraped content will be processed by an LLM to extract structured data.
JSON Schema defining the structure to extract. Use this for precise, typed extraction.{
"type": "object",
"properties": {
"title": { "type": "string", "description": "The page title" },
"price": { "type": "number", "description": "Product price" },
"features": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["title", "price"]
}
Natural language prompt describing what to extract. Use this for flexible, conversational extraction.
Custom system prompt to guide the LLM behavior
LLM model to use for extraction. Options: gpt-4o, gpt-4o-mini, claude-3-5-sonnet
Whether to include extraction metadata (tokens used, cost) in response
Example Requests
Basic Scraping
curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.example.com/product",
"render": true
}'
Extract structured data using a JSON schema:
curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.example.com/product",
"render": true,
"extract": {
"schema": {
"type": "object",
"properties": {
"productName": { "type": "string", "description": "The product name" },
"price": { "type": "number", "description": "Price in USD" },
"rating": { "type": "number", "description": "Average rating out of 5" },
"features": {
"type": "array",
"items": { "type": "string" },
"description": "List of product features"
}
},
"required": ["productName", "price"]
},
"model": "gpt-4o-mini"
}
}'
Extract data using a natural language prompt:
curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.example.com/article",
"extract": {
"prompt": "Extract the article title, author name, publication date, and a brief summary of the main points",
"model": "claude-3-5-sonnet"
}
}'
Response
Success Response (200)
Without extraction - Returns the scraped HTML content directly:
<html>
<head>
<title>Example Product</title>
</head>
<body>
<!-- Scraped content here -->
</body>
</html>
With extraction - Returns structured JSON:
{
"data": {
"productName": "Premium Wireless Headphones",
"price": 299.99,
"rating": 4.5,
"features": [
"Active Noise Cancellation",
"40-hour battery life",
"Bluetooth 5.0"
]
},
"metadata": {
"tokensUsed": 1250,
"model": "gpt-4o-mini"
}
}
| Header | Description |
|---|
x-remaining-credits | Number of API credits remaining |
x-trace-id | Unique identifier for the request |
Error Responses
| Status | Description |
|---|
400 | Bad Request - Invalid parameters or URL |
401 | Unauthorized - Invalid or missing API key |
403 | Forbidden - Access denied to target resource |
404 | Not Found - Target URL not found |
408 | Request Timeout |
429 | Too Many Requests - Rate limit exceeded |
500 | Internal Server Error |
550 | Faulted After Retries - Job failed after multiple attempts |
Error Response Format:
{
"error": "Error description message",
"traceId": "abc123-def456",
"timestamp": "2024-01-15T10:30:00.000Z"
}
Use Cases
- E-commerce scraping: Extract product information, prices, and availability
- Content aggregation: Collect articles, blog posts, and news content
- Lead generation: Extract contact information and company details
- Competitor analysis: Monitor competitor websites and pricing
- SEO analysis: Extract meta tags, headings, and content structure
- AI-powered extraction: Use LLM to extract structured data without writing parsers