Parse

Parses a PDF into Markdown or JSON with support for OCR, table formatting, and image extraction. Ideal for content extraction, knowledge base creation, and retrieval-augmented generation (RAG) workflo

POST https://pdf.ai/api/v2/parse

Returns JSON schema and citations given a docId , url , or file.

Caching

When using a docId the results of the parse will be cached for subsequent calls with the same settings. Cached responses will result in 0 credits being used. Using a docId and caching will also allow you to save credits using the /extract and /split endpoints too.

Sample Requests

curl -X POST https://pdf.ai/api/v2/parse \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@/path/to/document.pdf" \
  -F "quality=standard" \
  -F "lang_list=[\"en\"]"

Replace placeholder values like <YOUR_API_Key> with your actual values.

Headers

Name
Type
Description

X-API-Key*

string

<API-Key>

Request Format

Content-Type: multipart/form-data

Request Parameters

Parameter
Type
Required
Description

docId

string

No

Document ID for caching parsed results

url

string

No

URL of the PDF to parse (alternative to file upload)

file

File

No

PDF file to upload (alternative to URL)

quality

string

No

Quality to use: 'standard' or 'advanced' (default: 'standard')

lang_list

array

No

List of languages to detect (default: ['en'])

llm

boolean

No

Enable LLM processing for images (default: false)

Quality Options

  • standard: This is the default and supports language selection for OCR.

  • advanced: Utilizes Vision Language Models (VLM), potentially offering better accuracy for complex documents, no need to pass the language selection.

Response format

Credit Usage

Credits are based on the parsing quality, page count, and optional LLM processing.

Cached results use 0 credits.

Quality
Base Cost per Page
LLM Cost(if enabled)

standard

1 credit

+1 credit per 5 images (rounded up)

advanced

2 credits

+1 credit per 5 images (rounded up)

Credit Usage Examples

  • 10-page document, standard quality, no LLM:

    • Credits: 1 × 10 = 10 credits

  • 10-page document, advanced quality, no LLM:

    • Credits: 2 × 10 = 20 credits

  • 10-page document, standard quality, LLM enabled, 12 images:

    • Base: 1 × 10 = 10 credits

    • LLM: 1 × ceil(12 / 5) = 1 × 3 = 3 credits

    • Total: 13 credits

  • Cached result (any configuration):

    • Credits: 0 credits

Last updated

Was this helpful?