/extract

Extract structured data from HTML using AI or rule-based extractors. Converts raw HTML into clean, structured content without the scraping overhead.

POST /extract

When to Use

Converting HTML to Markdown for LLM processing
Extracting main article content from noisy pages
Running custom extraction logic on HTML
Processing HTML you've already fetched elsewhere

Request Parameters

Required Parameters

Parameter	Type	Description
`input`	string	The HTML content to extract from

Extraction Options

Parameter	Type	Description
`preset`	string	Built-in extractor preset (see Presets below)
`extractor`	string	Custom JavaScript extractor code

Extraction Presets

`markdown`

Converts HTML to clean Markdown with metadata extraction.

Output:

markdown - Full page as Markdown
html - Cleaned HTML
meta - Extracted metadata (title, author, description, date)

`markdown_content`

Extracts main content and converts to Markdown. Best for articles and blog posts.

Output:

markdown - Main content as Markdown
html - Main content HTML
meta - Metadata

`content`

Extract main readable content with Readability algorithm.

Output:

title - Article title
content - Main content HTML
textContent - Plain text
length - Content length
excerpt - Short excerpt
byline - Author
siteName - Site name

Custom Extractor

Write JavaScript using Cheerio for custom extraction:

extractor.js

function(input, cheerio) {
  const $ = cheerio.load(input);
  return {
    title: $('title').text(),
    heading: $('h1').first().text(),
    links: $('a[href]').map((i, el) => ({
      text: $(el).text(),
      href: $(el).attr('href')
    })).get().slice(0, 10)
  };
}

Example Requests

Using Markdown Preset

request.json

{
  "input": "<html><head><title>My Article</title></head><body><article><h1>Hello World</h1><p>This is content.</p></article></body></html>",
  "preset": "markdown"
}

Using Content Preset

request.json

{
  "input": "<html><body><article><h1>Breaking News</h1><p>Important story content here...</p><p>More details...</p></article><footer>Copyright 2026</footer></body></html>",
  "preset": "content"
}

Custom Extractor

request.json

{
  "input": "<html><body><ul><li>Item 1</li><li>Item 2</li><li>Item 3</li></ul></body></html>",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $('li').map((i,el) => $(el).text()).get(); }"
}

cURL Example

Terminal

curl -X POST https://scraperex1.p.rapidapi.com/extract \
  -H "Content-Type: application/json" \
  -H "X-RapidAPI-Key: YOUR_API_KEY" \
  -H "X-RapidAPI-Host: scraperex1.p.rapidapi.com" \
  -d '{
    "input": "<html><body><h1>Title</h1><p>Content</p></body></html>",
    "preset": "markdown"
  }'

Response (markdown preset)

response.json

{
  "result": {
    "markdown": "# Hello World\n\nThis is content.",
    "html": "<h1>Hello World</h1><p>This is content.</p>",
    "meta": {
      "title": "My Article",
      "author": null,
      "description": null,
      "date": null
    }
  }
}

Response (content preset)

response.json

{
  "result": {
    "title": "Breaking News",
    "content": "<h1>Breaking News</h1><p>Important story content here...</p><p>More details...</p>",
    "textContent": "Breaking News\nImportant story content here...\nMore details...",
    "length": 85,
    "excerpt": "Important story content here...",
    "byline": null,
    "siteName": null
  }
}

Response Fields

Field	Type	Description
`result`	object	Extraction result (structure depends on preset/extractor)
`result.markdown`	string	Markdown output (markdown/markdown_content presets)
`result.html`	string	Cleaned HTML (markdown/markdown_content presets)
`result.meta`	object	Metadata (markdown/markdown_content presets)
`result.title`	string	Article title (content preset)
`result.content`	string	Main content HTML (content preset)
`result.textContent`	string	Plain text content (content preset)

Use Cases

LLM Content Preparation

Convert web pages to Markdown for feeding to language models:

request.json

{
  "input": "<your-html-content>",
  "preset": "markdown_content"
}

Article Extraction

Extract clean article content from news sites:

request.json

{
  "input": "<news-page-html>",
  "preset": "content"
}

Custom Data Extraction

Extract product data from e-commerce pages:

request.json

{
  "input": "<product-page-html>",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return { name: $('.product-title').text(), price: $('.price').text(), rating: $('.rating').attr('data-value') }; }"
}