Extract
Exract structured data from PDF documents based on a JSON schema. It returns the extracted data and citations linking each field to its source segments in the document. Use this endpoint when you need structured information from PDFs with traceable source references.
POST https://pdf.ai/api/v2/extract
Returns JSON schema and citations given a docId , url , or file.
Caching
Sample Code
Here are examples of how to use the extraction API in different programming languages.
curl -X POST https://pdf.ai/api/v2/extract \
-H "X-API-Key: YOUR_API_KEY" \
-F "docId=your-document-id" \
-F 'schema={"type":"object","properties":{"title":{"type":"string"},"author":{"type":"string"}},"required":["title"]}' \
-F "system_prompt=Extract document metadata accurately."import requests
import json
url = "https://pdf.ai/api/v2/extract"
headers = {"X-API-Key": "YOUR_API_KEY"}
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"author": {"type": "string"}
},
"required": ["title"]
}
data = {
"docId": "your-document-id",
"schema": json.dumps(schema),
"system_prompt": "Extract document metadata accurately."
}
response = requests.post(url, headers=headers, data=data)
print(response.json())Replace placeholder values like <YOUR_API_Key> with your actual values.
Headers
X-API-Key*
string
<API-Key>
Request Format
Content type: multipart/form-data
Request Parameters
schema
string (JSON)
Yes
JSON schema defining the structure to extract
system_prompt
string
No
Custom system prompt for extraction (default: "Be precise and thorough.")
docId
string
No
Document ID for caching parsed results
url
string
No
URL of the PDF to parse (alternative to file upload)
file
File
No
PDF file to upload (alternative to URL)
quality
string
No
Quality to use: 'standard' or 'advanced' (default: 'standard')
lang_list
array
No
List of languages to detect (default: ['en'])
Response format
Credit Usage
Before extracting data from a PDF, the document must be parsed, which will incur credit usage unless a cached parsed result is available. See parse credit usage here.
Extraction Credits
Schema has ≤ 5 fields
2 credits × page count
Extraction Credits
Schema has > 5 fields
4 credits × page count
Total Credit Formula
Total credits = Parse credits + Extraction credits
Examples
10-page document, not cached parse, advanced quality, schema with 3 fields
Parse Credits: 2 × 10 = 20 credits
Extraction Credits: 2 × 10 = 20 credits
Total: 40 credits
5-page document, cached parse, schema with 8 fields
Parse Credits: 0 credits (cached)
Extraction Credits: 4 × 5 = 20 credits
Total: 20 credits
Last updated
Was this helpful?