/extract
Extract structured data from HTML using AI or rule-based extractors. Converts raw HTML into clean, structured content without the scraping overhead.
POST /extract
When to Use
- Converting HTML to Markdown for LLM processing
- Extracting main article content from noisy pages
- Running custom extraction logic on HTML
- Processing HTML you've already fetched elsewhere
Request Parameters
Required Parameters
| Parameter | Type | Description |
|---|---|---|
input | string | The HTML content to extract from |
Extraction Options
| Parameter | Type | Description |
|---|---|---|
preset | string | Built-in extractor preset (see Presets below) |
extractor | string | Custom JavaScript extractor code |
Extraction Presets
markdown
Converts HTML to clean Markdown with metadata extraction.
Output:
markdown- Full page as Markdownhtml- Cleaned HTMLmeta- Extracted metadata (title, author, description, date)
markdown_content
Extracts main content and converts to Markdown. Best for articles and blog posts.
Output:
markdown- Main content as Markdownhtml- Main content HTMLmeta- Metadata
content
Extract main readable content with Readability algorithm.
Output:
title- Article titlecontent- Main content HTMLtextContent- Plain textlength- Content lengthexcerpt- Short excerptbyline- AuthorsiteName- Site name
Custom Extractor
Write JavaScript using Cheerio for custom extraction:
extractor.js
function(input, cheerio) {
const $ = cheerio.load(input);
return {
title: $('title').text(),
heading: $('h1').first().text(),
links: $('a[href]').map((i, el) => ({
text: $(el).text(),
href: $(el).attr('href')
})).get().slice(0, 10)
};
}Example Requests
Using Markdown Preset
request.json
{
"input": "<html><head><title>My Article</title></head><body><article><h1>Hello World</h1><p>This is content.</p></article></body></html>",
"preset": "markdown"
}Using Content Preset
request.json
{
"input": "<html><body><article><h1>Breaking News</h1><p>Important story content here...</p><p>More details...</p></article><footer>Copyright 2026</footer></body></html>",
"preset": "content"
}Custom Extractor
request.json
{
"input": "<html><body><ul><li>Item 1</li><li>Item 2</li><li>Item 3</li></ul></body></html>",
"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $('li').map((i,el) => $(el).text()).get(); }"
}cURL Example
Terminal
curl -X POST https://scraperex1.p.rapidapi.com/extract \
-H "Content-Type: application/json" \
-H "X-RapidAPI-Key: YOUR_API_KEY" \
-H "X-RapidAPI-Host: scraperex1.p.rapidapi.com" \
-d '{
"input": "<html><body><h1>Title</h1><p>Content</p></body></html>",
"preset": "markdown"
}'Response (markdown preset)
response.json
{
"result": {
"markdown": "# Hello World\n\nThis is content.",
"html": "<h1>Hello World</h1><p>This is content.</p>",
"meta": {
"title": "My Article",
"author": null,
"description": null,
"date": null
}
}
}Response (content preset)
response.json
{
"result": {
"title": "Breaking News",
"content": "<h1>Breaking News</h1><p>Important story content here...</p><p>More details...</p>",
"textContent": "Breaking News\nImportant story content here...\nMore details...",
"length": 85,
"excerpt": "Important story content here...",
"byline": null,
"siteName": null
}
}Response Fields
| Field | Type | Description |
|---|---|---|
result | object | Extraction result (structure depends on preset/extractor) |
result.markdown | string | Markdown output (markdown/markdown_content presets) |
result.html | string | Cleaned HTML (markdown/markdown_content presets) |
result.meta | object | Metadata (markdown/markdown_content presets) |
result.title | string | Article title (content preset) |
result.content | string | Main content HTML (content preset) |
result.textContent | string | Plain text content (content preset) |
Use Cases
LLM Content Preparation
Convert web pages to Markdown for feeding to language models:
request.json
{
"input": "<your-html-content>",
"preset": "markdown_content"
}Article Extraction
Extract clean article content from news sites:
request.json
{
"input": "<news-page-html>",
"preset": "content"
}Custom Data Extraction
Extract product data from e-commerce pages:
request.json
{
"input": "<product-page-html>",
"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return { name: $('.product-title').text(), price: $('.price').text(), rating: $('.rating').attr('data-value') }; }"
}