Read PDFs in n8n Without Losing Your Mind: Read Binary File + Read PDF
Manually copying text out of PDFs is a special kind of torture. Highlight, copy, paste, fix weird line breaks, repeat. If you are doing this more than once, you deserve better.
That is where n8n comes in. With just two nodes, Read Binary File and Read PDF, you can turn stubborn PDF files into clean, usable text that flows through your automations like it always belonged there.
This guide walks you through a simple n8n workflow that:
- Reads a PDF from disk (for example
/data/pdf.pdf). - Extracts searchable text using the Read PDF node.
- Makes that text available to any downstream node for processing, indexing, or sending around the internet.
What this n8n PDF workflow actually does
At a high level, the workflow looks like this:
- Something triggers the workflow (manual, webhook, schedule, you name it).
- Read Binary File grabs the PDF file from your n8n environment and turns it into binary data.
- Read PDF takes that binary data and extracts the readable text from it.
- You use the extracted text in later nodes for emails, search indexing, or AI magic.
That is it. No more copy-paste marathons, just two nodes quietly doing the boring stuff for you.
Before you start: what you need
- An n8n instance running on desktop, Docker, or n8n cloud.
- A PDF file that n8n can actually see.
- For Docker, put it in a mounted folder, for example
/data/pdf.pdf.
- For Docker, put it in a mounted folder, for example
- A bit of basic n8n knowledge: how to add nodes, connect them, and execute a workflow.
If you are comfortable dragging nodes onto the canvas and clicking “Execute”, you are ready.
Quick-start: import the example workflow
If you prefer to start from something that already works instead of building from scratch, here is a minimal JSON workflow you can import into n8n:
{ "nodes": [ {"name":"On clicking 'execute'","type":"n8n-nodes-base.manualTrigger","position":[680,400],"parameters":{},"typeVersion":1}, {"name":"Read Binary File","type":"n8n-nodes-base.readBinaryFile","position":[880,400],"parameters":{"filePath":"/data/pdf.pdf"},"typeVersion":1}, {"name":"Read PDF","type":"n8n-nodes-base.readPDF","position":[1090,400],"parameters":{},"typeVersion":1} ], "connections": { "Read Binary File": {"main":[[{"node":"Read PDF","type":"main","index":0}]]}, "On clicking 'execute'": {"main":[[{"node":"Read Binary File","type":"main","index":0}]]} }
}
Import that into your n8n instance, point it at your own PDF, and you are already halfway to automated PDF bliss.
Step-by-step setup (with just enough detail)
1. Add a trigger so the workflow actually runs
First, drop in a trigger node. For testing, the Manual Trigger is perfect:
- Add the Manual Trigger node.
- Use the “Execute workflow” button to run it on demand.
Once you move to production, you can swap this for something more serious like:
- An HTTP Request or Webhook trigger.
- A Schedule trigger to process PDFs regularly.
- A file watcher style trigger if you use integrations that support that pattern.
2. Read Binary File: get the PDF into n8n
Next, drag in the Read Binary File node and connect it to your trigger. Configure it like this:
- File Path: the path to your PDF inside the n8n runtime, for example:
/data/pdf.pdf
- Binary Property (optional): this is where the file contents are stored in the item’s
binarysection.- By default, it uses something like
dataorfile. - After running the node, check the Execution Data to see the exact property name.
- By default, it uses something like
Run the workflow once, click on the Read Binary File node, and inspect the output. You are looking for:
- A
binarysection that confirms the file was read successfully. - The property name inside
binary, for exampledata. You will need this for the next node.
Think of this node as the “bring the PDF to the party” step.
3. Read PDF: extract the text and stop suffering
Now for the fun part. Add a Read PDF node and connect it to Read Binary File.
Configure the Read PDF node:
- Binary Property: enter the exact binary property name from the previous node, for example:
data
- Page Range (optional):
- Use this if you only need specific pages instead of the whole document.
- Handy when the PDF is huge and you only care about page 1 or the last page.
- Output:
- The node outputs JSON that includes the extracted text.
- The text is usually in a field like
text. Check the execution output to confirm the exact property name.
Click Execute again and inspect the Read PDF node’s output. If your PDF has selectable text, you should now see clean, readable content sitting in the JSON output instead of inside a locked file.
If the PDF is just scanned images with no text layer, you will get little or nothing back. That is normal, and it means you need OCR, not just text extraction. More on that in troubleshooting.
Using the extracted PDF text in your automations
Once the Read PDF node has done its job, the text is available to any node downstream. This is where the real automation fun begins.
Common use cases
- Email the contents Use the extracted text as the body of an email, or include it as part of a summary.
- Index for search Send the text to Elasticsearch, Algolia, or a vector store so you can search or embed it.
- Run AI or NLP Feed the text into an AI node or external API for summarization, classification, or entity extraction.
Accessing the text in a Function node
Here is a simple Function node example that takes the text from Read PDF and exposes it under a new field:
// simple Function node that returns the extracted text as a new field
const text = items[0].json.text || '';
return [{ json: { extractedText: text } }];
Adjust text if your property name is different.
Using expressions in other nodes
You can also use the extracted text directly in any node parameter with an expression, for example:
{{$node["Read PDF"].json["text"]}}
Again, swap text with your actual property name if needed.
Troubleshooting: when your PDF refuses to cooperate
Sometimes PDFs like to be difficult. Here is how to handle the most common issues.
1. “File not found” in Read Binary File
If the Read Binary File node complains that it cannot find the file, check:
- File path correctness Make sure the path you entered matches the path inside the n8n runtime, not just your local machine.
- Docker volume mapping If you use Docker, map a host folder to something like
/data:-v /host/path:/dataThen place your PDF inside that folder and reference it as
/data/pdf.pdf. - Permissions Confirm that the n8n process has permission to read the file.
2. Empty text or weird gibberish
If the Read PDF node returns nothing useful, the PDF is probably a scanned document without an embedded text layer.
Important detail: the Read PDF node only extracts existing text. It does not perform OCR.
For scanned PDFs, consider:
- Using an OCR tool like Tesseract via the Execute Command node.
- Calling an OCR API such as Google Vision or AWS Textract.
- Converting each page to an image and running OCR on each image.
Once you have OCR output, you can feed that text back into the rest of your workflow.
3. Binary property mismatch in Read PDF
If Read PDF complains about missing binary data, it usually means the binary property name does not match.
Fix it by:
- Opening the Read Binary File node output.
- Looking under the
binarysection for the property name, for exampledata. - Pasting that exact name into the Binary Property field of the Read PDF node.
One small typo here and the node will pretend it never saw your file.
Advanced automation ideas for PDF processing
Once you have the basic PDF-to-text pipeline working, you can start to get fancy.
- Process multiple PDFs in a folder List files, then use a SplitInBatches node to loop through each file and send it through Read Binary File and Read PDF.
- Extract specific fields with Regex After extracting text, use a Function node and regular expressions to pull out invoice numbers, dates, totals, or other structured data.
- Automatic OCR fallback Run Read PDF first, then:
- If the extracted text is empty, trigger an OCR service automatically.
This way you get the best of both worlds: fast extraction when text is available and OCR only when necessary.
Security and performance considerations
PDFs often contain sensitive data, so it is worth being a bit paranoid in a good way.
- Access control Limit who can access your n8n instance, especially if it handles personal or confidential documents.
- Data hygiene Before storing extracted text long term, consider redacting or cleaning sensitive parts.
- Large PDFs and memory Very large PDFs can be heavy to process. If you hit memory issues:
- Split the document into smaller files or page ranges.
- Process pages in batches instead of all at once.
Recap: from stubborn PDF to usable text
By combining the Read Binary File and Read PDF nodes in n8n, you get a simple, reliable way to extract text from PDFs that live on your server or inside a container.
Key points to remember:
- Use Read Binary File to load the PDF into a binary property.
- Point Read PDF at that exact binary property name.
- Read PDF only works on searchable text, not raw scanned images.
- Once extracted, the text can be emailed, indexed, or fed into AI and NLP workflows.
If you want to extend this to cloud storage like Google Drive or S3, or to add a proper OCR fallback, you can build on the same pattern and just change how the file is fetched or how you handle empty text.
Next steps: try the n8n PDF template
Ready to stop manually wrestling with PDFs and let automation do the boring parts?
- Import the example workflow into your n8n instance.
- Point the Read Binary File node at your own PDF path.
- Execute the workflow and inspect the Read PDF output.
- Hook the extracted text into email nodes, search indexing, or AI processing.
If you have a specific PDF use case, like invoices, reports, or contracts, you can build a production-ready workflow with OCR, error handling, and storage integration on top of this base.
