Parse
Parses a PDF into Markdown or JSON with support for OCR, table formatting, and image extraction. Ideal for content extraction, knowledge base creation, and retrieval-augmented generation (RAG) workflo
POST https://pdf.ai/api/v2/parse
Returns JSON schema and citations given a docId , url , or file.
Caching
When using a docId the results of the parse will be cached for subsequent calls with the same settings. Cached responses will result in 0 credits being used. Using a docId and caching will also allow you to save credits using the /extract and /split endpoints too.
Sample Requests
curl -X POST https://pdf.ai/api/v2/parse \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@/path/to/document.pdf" \
-F "quality=standard" \
-F "lang_list=[\"en\"]"import requests
url = "https://pdf.ai/api/v2/parse"
headers = {"X-API-Key": "YOUR_API_KEY"}
with open("/path/to/document.pdf", "rb") as f:
files = {"file": f}
data = {
"quality": "standard",
"lang_list": '["en"]'
}
response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const form = new FormData();
form.append('file', fs.createReadStream('/path/to/document.pdf'));
form.append('quality', 'standard');
form.append('lang_list', '["en"]');
axios.post('https://pdf.ai/api/v2/parse', form, {
headers: {
'X-API-Key': 'YOUR_API_KEY',
...form.getHeaders()
}
}).then(response => {
console.log(response.data);
});Replace placeholder values like <YOUR_API_Key> with your actual values.
Headers
X-API-Key*
string
<API-Key>
Request Format
Content-Type: multipart/form-data
Request Parameters
docId
string
No
Document ID for caching parsed results
url
string
No
URL of the PDF to parse (alternative to file upload)
file
File
No
PDF file to upload (alternative to URL)
quality
string
No
Quality to use: 'standard' or 'advanced' (default: 'standard')
lang_list
array
No
List of languages to detect (default: ['en'])
llm
boolean
No
Enable LLM processing for images (default: false)
Quality Options
standard: This is the default and supports language selection for OCR.
advanced: Utilizes Vision Language Models (VLM), potentially offering better accuracy for complex documents, no need to pass the language selection.
Response format
Credit Usage
Credits are based on the parsing quality, page count, and optional LLM processing.
Cached results use 0 credits.
standard
1 credit
+1 credit per 5 images (rounded up)
advanced
2 credits
+1 credit per 5 images (rounded up)
Credit Usage Examples
10-page document, standard quality, no LLM:
Credits:
1 × 10 = 10 credits
10-page document, advanced quality, no LLM:
Credits:
2 × 10 = 20 credits
10-page document, standard quality, LLM enabled, 12 images:
Base:
1 × 10 = 10 creditsLLM:
1 × ceil(12 / 5) = 1 × 3 = 3 creditsTotal:
13 credits
Cached result (any configuration):
Credits:
0 credits
Last updated
Was this helpful?