Skip to main content
POST
/
scrape
Scrape Content
curl --request POST \
  --url https://api.example.com/scrape \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "async": true,
  "render": true,
  "method": "<string>",
  "format": "<string>",
  "country": "<string>",
  "includeHeaders": true,
  "headers": {},
  "body": {},
  "extract": {},
  "extract.schema": {},
  "extract.prompt": "<string>",
  "extract.systemPrompt": "<string>",
  "extract.model": "<string>",
  "extract.includeMetadata": true
}
'

Overview

The /scrape endpoint initiates a new web scraping job with the specified URL and options. It returns the scraped content directly and supports both renderless and renderful scraping modes. You can also use AI-powered extraction to get structured data from the page.

Parameters

Request Body

url
string
required
The URL to scrape
async
boolean
default:"false"
Whether the request should be asynchronous. When true, returns a job ID immediately that can be used to check status.
render
boolean
default:"false"
Whether to render JavaScript on the page. Enable for dynamic content loaded via JavaScript.
method
string
default:"get"
HTTP method for the request. Options: get, post, put, delete, patch, head, options
format
string
default:"raw"
Response format. Options: raw, json, markdown
country
string
default:"us"
Proxy country for geo-targeted requests (e.g., us, uk, de)
includeHeaders
boolean
default:"false"
Whether to include response headers in the response
headers
object
HTTP headers for the request
{
  "User-Agent": "Mozilla/5.0...",
  "Accept-Language": "en-US,en;q=0.9"
}
body
object
Request body content (for POST/PUT requests)

LLM Extraction Options

extract
object
AI-powered extraction options. When provided, the scraped content will be processed by an LLM to extract structured data.
extract.schema
object
JSON Schema defining the structure to extract. Use this for precise, typed extraction.
{
  "type": "object",
  "properties": {
    "title": { "type": "string", "description": "The page title" },
    "price": { "type": "number", "description": "Product price" },
    "features": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["title", "price"]
}
extract.prompt
string
Natural language prompt describing what to extract. Use this for flexible, conversational extraction.
extract.systemPrompt
string
Custom system prompt to guide the LLM behavior
extract.model
string
default:"gpt-4o-mini"
LLM model to use for extraction. Options: gpt-4o, gpt-4o-mini, claude-3-5-sonnet
extract.includeMetadata
boolean
default:"true"
Whether to include extraction metadata (tokens used, cost) in response

Example Requests

Basic Scraping

curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com/product",
    "render": true
  }'

With LLM Extraction (Schema)

Extract structured data using a JSON schema:
curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com/product",
    "render": true,
    "extract": {
      "schema": {
        "type": "object",
        "properties": {
          "productName": { "type": "string", "description": "The product name" },
          "price": { "type": "number", "description": "Price in USD" },
          "rating": { "type": "number", "description": "Average rating out of 5" },
          "features": {
            "type": "array",
            "items": { "type": "string" },
            "description": "List of product features"
          }
        },
        "required": ["productName", "price"]
      },
      "model": "gpt-4o-mini"
    }
  }'

With LLM Extraction (Prompt)

Extract data using a natural language prompt:
curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com/article",
    "extract": {
      "prompt": "Extract the article title, author name, publication date, and a brief summary of the main points",
      "model": "claude-3-5-sonnet"
    }
  }'

Response

Success Response (200)

Without extraction - Returns the scraped HTML content directly:
<html>
  <head>
    <title>Example Product</title>
  </head>
  <body>
    <!-- Scraped content here -->
  </body>
</html>
With extraction - Returns structured JSON:
{
  "data": {
    "productName": "Premium Wireless Headphones",
    "price": 299.99,
    "rating": 4.5,
    "features": [
      "Active Noise Cancellation",
      "40-hour battery life",
      "Bluetooth 5.0"
    ]
  },
  "metadata": {
    "tokensUsed": 1250,
    "model": "gpt-4o-mini"
  }
}

Response Headers

HeaderDescription
x-remaining-creditsNumber of API credits remaining
x-trace-idUnique identifier for the request

Error Responses

StatusDescription
400Bad Request - Invalid parameters or URL
401Unauthorized - Invalid or missing API key
403Forbidden - Access denied to target resource
404Not Found - Target URL not found
408Request Timeout
429Too Many Requests - Rate limit exceeded
500Internal Server Error
550Faulted After Retries - Job failed after multiple attempts
Error Response Format:
{
  "error": "Error description message",
  "traceId": "abc123-def456",
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Use Cases

  • E-commerce scraping: Extract product information, prices, and availability
  • Content aggregation: Collect articles, blog posts, and news content
  • Lead generation: Extract contact information and company details
  • Competitor analysis: Monitor competitor websites and pricing
  • SEO analysis: Extract meta tags, headings, and content structure
  • AI-powered extraction: Use LLM to extract structured data without writing parsers