Scrape Content

Overview

The /scrape endpoint initiates a new web scraping job with the specified URL and options. It returns the scraped content directly and supports both renderless and renderful scraping modes. You can also use AI-powered extraction to get structured data from the page.

Parameters

Request Body

url

string

required

The URL to scrape

async

boolean

default:"false"

Whether the request should be asynchronous. When true, returns a job ID immediately that can be used to check status.

render

boolean

default:"false"

Whether to render JavaScript on the page. Enable for dynamic content loaded via JavaScript.

method

string

default:"get"

HTTP method for the request. Options: get, post, put, delete, patch, head, options

format

string

default:"raw"

Response format. Options: raw, json, markdown

country

string

default:"us"

Proxy country for geo-targeted requests (e.g., us, uk, de)

includeHeaders

boolean

default:"false"

Whether to include response headers in the response

headers

object

HTTP headers for the request

{
  "User-Agent": "Mozilla/5.0...",
  "Accept-Language": "en-US,en;q=0.9"
}

body

object

Request body content (for POST/PUT requests)

LLM Extraction Options

extract

object

AI-powered extraction options. When provided, the scraped content will be processed by an LLM to extract structured data.

extract.schema

object

JSON Schema defining the structure to extract. Use this for precise, typed extraction.

{
  "type": "object",
  "properties": {
    "title": { "type": "string", "description": "The page title" },
    "price": { "type": "number", "description": "Product price" },
    "features": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["title", "price"]
}

extract.prompt

string

Natural language prompt describing what to extract. Use this for flexible, conversational extraction.

extract.systemPrompt

string

Custom system prompt to guide the LLM behavior

extract.model

string

default:"gpt-4o-mini"

LLM model to use for extraction. Options: gpt-4o, gpt-4o-mini, claude-3-5-sonnet

extract.includeMetadata

boolean

default:"true"

Whether to include extraction metadata (tokens used, cost) in response

Example Requests

Basic Scraping

curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com/product",
    "render": true
  }'

With LLM Extraction (Schema)

Extract structured data using a JSON schema:

curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com/product",
    "render": true,
    "extract": {
      "schema": {
        "type": "object",
        "properties": {
          "productName": { "type": "string", "description": "The product name" },
          "price": { "type": "number", "description": "Price in USD" },
          "rating": { "type": "number", "description": "Average rating out of 5" },
          "features": {
            "type": "array",
            "items": { "type": "string" },
            "description": "List of product features"
          }
        },
        "required": ["productName", "price"]
      },
      "model": "gpt-4o-mini"
    }
  }'

With LLM Extraction (Prompt)

Extract data using a natural language prompt:

curl -X POST "https://api.scrapengine.io/api/v1/scrape" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com/article",
    "extract": {
      "prompt": "Extract the article title, author name, publication date, and a brief summary of the main points",
      "model": "claude-3-5-sonnet"
    }
  }'

Response

Success Response (200)

Without extraction - Returns the scraped HTML content directly:

<html>
  <head>
    <title>Example Product</title>
  </head>
  <body>
    <!-- Scraped content here -->
  </body>
</html>

With extraction - Returns structured JSON:

{
  "data": {
    "productName": "Premium Wireless Headphones",
    "price": 299.99,
    "rating": 4.5,
    "features": [
      "Active Noise Cancellation",
      "40-hour battery life",
      "Bluetooth 5.0"
    ]
  },
  "metadata": {
    "tokensUsed": 1250,
    "model": "gpt-4o-mini"
  }
}

Response Headers

Header	Description
`x-remaining-credits`	Number of API credits remaining
`x-trace-id`	Unique identifier for the request

Error Responses

Status	Description
`400`	Bad Request - Invalid parameters or URL
`401`	Unauthorized - Invalid or missing API key
`403`	Forbidden - Access denied to target resource
`404`	Not Found - Target URL not found
`408`	Request Timeout
`429`	Too Many Requests - Rate limit exceeded
`500`	Internal Server Error
`550`	Faulted After Retries - Job failed after multiple attempts

Error Response Format:

{
  "error": "Error description message",
  "traceId": "abc123-def456",
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Use Cases

E-commerce scraping: Extract product information, prices, and availability
Content aggregation: Collect articles, blog posts, and news content
Lead generation: Extract contact information and company details
Competitor analysis: Monitor competitor websites and pricing
SEO analysis: Extract meta tags, headings, and content structure
AI-powered extraction: Use LLM to extract structured data without writing parsers

API documentation

Web Scraping

Search Engines

E-Commerce

Overview

Parameters

Request Body

LLM Extraction Options

Example Requests

Basic Scraping

With LLM Extraction (Schema)

With LLM Extraction (Prompt)

Response

Success Response (200)

Response Headers

Error Responses

Use Cases

API documentation

Web Scraping

Search Engines

E-Commerce

​Overview

​Parameters

​Request Body

​LLM Extraction Options

​Example Requests

​Basic Scraping

​With LLM Extraction (Schema)

​With LLM Extraction (Prompt)

​Response

​Success Response (200)

​Response Headers

​Error Responses

​Use Cases

Overview

Parameters

Request Body

LLM Extraction Options

Example Requests

Basic Scraping

With LLM Extraction (Schema)

With LLM Extraction (Prompt)

Response

Success Response (200)

Response Headers

Error Responses

Use Cases