📄 Free PDF Suite
🌐English

OCR a PDF Online — Extract Text from Scans, Free & Private

Turn a scanned document — a picture of text — into real, copyable text with optical character recognition. English and Thai are supported, and it all runs in your browser.

🔎

OCR PDF

Turn a scanned PDF — a picture of text — into real, copyable text.

📥

Drop a scanned PDF here or click to browse

Runs in your browser. The first run downloads language data (~10–20 MB); recognition takes a few seconds per page.

What OCR Does, and When You Need It

A scanned page or a photographed document is just an image — to a computer it's a grid of pixels, not letters, so you can't select, copy or search the words in it. Optical Character Recognition (OCR) bridges that gap: it analyses the shapes in the image and reconstructs the actual text. That's what lets you lift a quote out of a scanned contract, make an old report searchable, or paste figures from a faxed invoice without retyping them.

This is the key difference from the PDF to Word tool: that one extracts an existing text layer and works only on born-digital PDFs, whereas OCR creates a text layer from scratch for documents that never had one.

How In-Browser OCR Works

First, the pdf.js engine renders each page of your PDF to a high-resolution image inside the browser. Then Tesseract.js — a browser build of Google's open-source Tesseract OCR engine — examines that image and recognises the characters, producing plain text that's assembled page by page into a downloadable file.

Crucially, this all happens on your own device: your scanned document is never uploaded. The only thing downloaded is the open-source OCR engine and its language data, fetched once on the first run (around 10–20 MB) and then cached by your browser for next time.

Getting the Best Results

OCR accuracy depends heavily on input quality. Use the highest-resolution scan you have, make sure the page is straight and evenly lit, and pick the language that matches the document — choosing English for an English page, Thai for a Thai page, or both for a mixed document. Clean printed text reads very well; faint photocopies, heavy backgrounds and handwriting are much harder and may need a clearer source.

OCR PDF — Frequently Asked Questions

Is my scanned document uploaded to a server for OCR?

No. Recognition runs entirely in your browser with Tesseract.js. Only the open-source OCR engine and language data are downloaded; your document never leaves your device.

Which languages are supported?

English and Thai, individually or together. Choosing the correct language — or both for a mixed document — gives the most accurate results.

Why does OCR take longer than the other tools?

Recognising characters from an image is computation-heavy and runs on your own device, so a few seconds per page is normal. The first run also downloads the language data once before it begins.