Extract

Exract structured data from PDF documents based on a JSON schema. It returns the extracted data and citations linking each field to its source segments in the document. Use this endpoint when you need structured information from PDFs with traceable source references.

POST https://pdf.ai/api/v2/extract

Returns JSON schema and citations given a docId , url , or file.

Caching

If a docId is passed, no parsing credits will be used. However, if a file or url is used, parsing credits will apply during the extract call. The docId will be returned after this call, allowing users to use it in future extract requests without incurring additional parsing credits.

Sample Code

Here are examples of how to use the extraction API in different programming languages.

curl -X POST https://pdf.ai/api/v2/extract \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "docId=your-document-id" \
  -F 'schema={"type":"object","properties":{"title":{"type":"string"},"author":{"type":"string"}},"required":["title"]}' \
  -F "system_prompt=Extract document metadata accurately."

Replace placeholder values like <YOUR_API_Key> with your actual values.

Headers

Name
Type
Description

X-API-Key*

string

<API-Key>

Request Format

Content type: multipart/form-data

Request Parameters

Parameter
Type
Required
Description

schema

string (JSON)

Yes

JSON schema defining the structure to extract

system_prompt

string

No

Custom system prompt for extraction (default: "Be precise and thorough.")

docId

string

No

Document ID for caching parsed results

url

string

No

URL of the PDF to parse (alternative to file upload)

file

File

No

PDF file to upload (alternative to URL)

quality

string

No

Quality to use: 'standard' or 'advanced' (default: 'standard')

lang_list

array

No

List of languages to detect (default: ['en'])

Response format

Credit Usage

Before extracting data from a PDF, the document must be parsed, which will incur credit usage unless a cached parsed result is available. See parse credit usage here.

Component
Condition
Credit Calculation

Extraction Credits

Schema has ≤ 5 fields

2 credits × page count

Extraction Credits

Schema has > 5 fields

4 credits × page count

Total Credit Formula

Total credits = Parse credits + Extraction credits

Examples

  • 10-page document, not cached parse, advanced quality, schema with 3 fields

    • Parse Credits: 2 × 10 = 20 credits

    • Extraction Credits: 2 × 10 = 20 credits

    • Total: 40 credits

  • 5-page document, cached parse, schema with 8 fields

    • Parse Credits: 0 credits (cached)

    • Extraction Credits: 4 × 5 = 20 credits

    • Total: 20 credits

Last updated

Was this helpful?