# Parse

<mark style="color:green;">`POST`</mark> `https://pdf.ai/api/v2/parse`

Returns JSON schema and citations given a `docId` , `url` , or `file`.

#### Caching

When using a `docId` the results of the parse will be cached for subsequent calls with the same settings. Cached responses will result in 0 credits being used. Using a `docId` and caching will also allow you to save credits using the `/extract` and `/split` endpoints too.

#### Sample Requests

{% tabs %}
{% tab title="cURL" %}

```shellscript
curl -X POST https://pdf.ai/api/v2/parse \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@/path/to/document.pdf" \
  -F "quality=standard" \
  -F "lang_list=[\"en\"]"
```

{% endtab %}

{% tab title="Python" %}

```python
import requests

url = "https://pdf.ai/api/v2/parse"
headers = {"X-API-Key": "YOUR_API_KEY"}

with open("/path/to/document.pdf", "rb") as f:
    files = {"file": f}
    data = {
        "quality": "standard",
        "lang_list": '["en"]'
    }
    response = requests.post(url, headers=headers, files=files, data=data)

print(response.json())
```

{% endtab %}

{% tab title="Node.js" %}

```javascript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

const form = new FormData();
form.append('file', fs.createReadStream('/path/to/document.pdf'));
form.append('quality', 'standard');
form.append('lang_list', '["en"]');

axios.post('https://pdf.ai/api/v2/parse', form, {
  headers: {
    'X-API-Key': 'YOUR_API_KEY',
    ...form.getHeaders()
  }
}).then(response => {
  console.log(response.data);
});
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php
$url = "https://pdf.ai/api/v2/parse";
$apiKey = "YOUR_API_KEY";

$postFields = [
    'file' => new CURLFile('/path/to/document.pdf'),
    'quality' => 'standard',
    'lang_list' => '["en"]'
];

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_HTTPHEADER, ["X-API-Key: $apiKey"]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);
curl_close($ch);

echo $response;
?>
```

{% endtab %}
{% endtabs %}

Replace placeholder values like `<YOUR_API_Key>` with your actual values.

#### Headers

| Name                                        | Type   | Description |
| ------------------------------------------- | ------ | ----------- |
| X-API-Key<mark style="color:red;">\*</mark> | string | \<API-Key>  |

#### Request Format

Content-Typ&#x65;**:** `multipart/form-data`

#### **Request Parameters**

| Parameter  | Type    | Required | Description                                                    |
| ---------- | ------- | -------- | -------------------------------------------------------------- |
| docId      | string  | No       | Document ID for caching parsed results                         |
| url        | string  | No       | URL of the PDF to parse (alternative to file upload)           |
| file       | File    | No       | PDF file to upload (alternative to URL)                        |
| quality    | string  | No       | Quality to use: 'standard' or 'advanced' (default: 'standard') |
| lang\_list | array   | No       | List of languages to detect (default: \['en'])                 |
| llm        | boolean | No       | Enable LLM processing for images (default: false)              |

#### Quality Options

* **standard**: This is the default and supports language selection for OCR.
* **advanced**: Utilizes Vision Language Models (VLM), potentially offering better accuracy for complex documents, no need to pass the language selection.

#### Response format

{% tabs %}
{% tab title="200 Parsed content" %}
{% code overflow="wrap" %}

```json
{
  "success": true,
  "markdown": "string",
  "contents": [
    {
      "bbox": [number, number, number, number], // [x0, y0, x1, y1]
      "content": "string",
      "pageNumber": number,
      "type": "string (optional)",
      "imageIds": ["string"] (optional),
      "conf": number (optional),
      "description": "string (optional)"
    }
  ],
  "images": [
    {
      "id": "string",
      "data": "string",
      "pageNumber": number (optional),
      "bbox": [number, number, number, number] (optional),
      "description": "string (optional)"
    }
  ],
  "pageCount": number,
  "docId": "string"
}
```

{% endcode %}
{% endtab %}

{% tab title="401 Invalid API key" %}

```json
{
    "error": "Invalid API key"
}
```

{% endtab %}

{% tab title="400: Bad Request No API key or docId is present" %}

```json
{
    "error": "No API key present"
}
```

{% endtab %}
{% endtabs %}

#### Credit Usage

Credits are based on the parsing quality, page count, and optional LLM processing.

**Cached results use 0 credits.**

| Quality    | Base Cost per Page | LLM Cost(if enabled)                |
| ---------- | ------------------ | ----------------------------------- |
| `standard` | 1 credit           | +1 credit per 5 images (rounded up) |
| `advanced` | 2 credits          | +1 credit per 5 images (rounded up) |

#### Credit Usage Examples

* **10-page document, standard quality, no LLM:**
  * Credits: `1 × 10 = 10 credits`
* **10-page document, advanced quality, no LLM:**
  * Credits: `2 × 10 = 20 credits`
* **10-page document, standard quality, LLM enabled, 12 images:**
  * Base: `1 × 10 = 10 credits`
  * LLM: `1 × ceil(12 / 5) = 1 × 3 = 3 credits`
  * Total: `13 credits`
* **Cached result (any configuration):**
  * Credits: `0 credits`
