Parse

Parses a PDF into Markdown or JSON with support for OCR, table formatting, and image extraction. Ideal for content extraction, knowledge base creation, and retrieval-augmented generation (RAG) workflo

POST https://pdf.ai/api/v2/parse

Returns JSON schema and citations given a docId , url , or file.

Caching

When using a docId the results of the parse will be cached for subsequent calls with the same settings. Cached responses will result in 0 credits being used. Using a docId and caching will also allow you to save credits using the /extract and /split endpoints too.

Sample Requests

curl -X POST https://pdf.ai/api/v2/parse \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@/path/to/document.pdf" \
  -F "quality=standard" \
  -F "lang_list=[\"en\"]"

import requests

url = "https://pdf.ai/api/v2/parse"
headers = {"X-API-Key": "YOUR_API_KEY"}

with open("/path/to/document.pdf", "rb") as f:
    files = {"file": f}
    data = {
        "quality": "standard",
        "lang_list": '["en"]'
    }
    response = requests.post(url, headers=headers, files=files, data=data)

print(response.json())

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

const form = new FormData();
form.append('file', fs.createReadStream('/path/to/document.pdf'));
form.append('quality', 'standard');
form.append('lang_list', '["en"]');

axios.post('https://pdf.ai/api/v2/parse', form, {
  headers: {
    'X-API-Key': 'YOUR_API_KEY',
    ...form.getHeaders()
  }
}).then(response => {
  console.log(response.data);
});

<?php
$url = "https://pdf.ai/api/v2/parse";
$apiKey = "YOUR_API_KEY";

$postFields = [
    'file' => new CURLFile('/path/to/document.pdf'),
    'quality' => 'standard',
    'lang_list' => '["en"]'
];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_HTTPHEADER, ["X-API-Key: $apiKey"]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
curl_close($ch);

echo $response;
?>

Replace placeholder values like <YOUR_API_Key> with your actual values.

Headers

Name

Type

Description

X-API-Key*

string

<API-Key>

Request Format

Content-Type: multipart/form-data

Request Parameters

Parameter

Type

Required

Description

docId

string

Document ID for caching parsed results

url

string

URL of the PDF to parse (alternative to file upload)

file

File

PDF file to upload (alternative to URL)

quality

string

Quality to use: 'standard' or 'advanced' (default: 'standard')

lang_list

array

List of languages to detect (default: ['en'])

llm

boolean

Enable LLM processing for images (default: false)

Quality Options

standard: This is the default and supports language selection for OCR.
advanced: Utilizes Vision Language Models (VLM), potentially offering better accuracy for complex documents, no need to pass the language selection.

Response format

{
  "success": true,
  "markdown": "string",
  "contents": [
    {
      "bbox": [number, number, number, number], // [x0, y0, x1, y1]
      "content": "string",
      "pageNumber": number,
      "type": "string (optional)",
      "imageIds": ["string"] (optional),
      "conf": number (optional),
      "description": "string (optional)"
    }
  ],
  "images": [
    {
      "id": "string",
      "data": "string",
      "pageNumber": number (optional),
      "bbox": [number, number, number, number] (optional),
      "description": "string (optional)"
    }
  ],
  "pageCount": number,
  "docId": "string"
}

Credit Usage

Credits are based on the parsing quality, page count, and optional LLM processing.

Cached results use 0 credits.

Quality

Base Cost per Page

LLM Cost(if enabled)

standard

1 credit

+1 credit per 5 images (rounded up)

advanced

2 credits

+1 credit per 5 images (rounded up)

Credit Usage Examples

10-page document, standard quality, no LLM:
- Credits: 1 × 10 = 10 credits
10-page document, advanced quality, no LLM:
- Credits: 2 × 10 = 20 credits
10-page document, standard quality, LLM enabled, 12 images:
- Base: 1 × 10 = 10 credits
- LLM: 1 × ceil(12 / 5) = 1 × 3 = 3 credits
- Total: 13 credits
Cached result (any configuration):
- Credits: 0 credits

PreviousQuick start NextExtract

Last updated 4 days ago

Was this helpful?