What languages are supported?

English, Simplified Chinese, English + Chinese combined, and Japanese. The language model is downloaded once and cached in your browser.

Can I OCR a scanned PDF?

Yes. The tool accepts both images and PDF files. For PDFs, each page is rendered as an image and then processed by the OCR engine.

How accurate is the OCR?

Accuracy depends on image quality. Clear, high-contrast printed text at 150+ DPI typically achieves 90–99% accuracy. Handwritten text, low-resolution scans, or unusual fonts may produce lower accuracy.

Is my data sent to any server?

No. The entire OCR process runs locally in your browser using WebAssembly. Your images and text never leave your device.

Why is the first use slow?

The OCR engine and language data (~4 MB) need to be downloaded on first use. This data is cached in your browser, so subsequent uses start almost instantly.

What languages does the OCR support?

The OCR engine supports multiple languages including English, Chinese, Japanese, Korean, Spanish, French, German, and many more.

How accurate is the text recognition?

Accuracy depends on image quality. Clean, high-resolution images achieve 95%+ accuracy. Blurry, rotated, or low-contrast images may produce more errors.

Can I OCR a multi-page PDF?

Yes. Each page is processed individually and the extracted text from all pages is combined in the output.

Extract Text (OCR) Beta

Your files never leave your device

Language

Drop files here

or click to browse

Max 50.0 MB per file·Supports: JPG · PNG · WebP · BMP · TIFF · PDF

You might also need

🔤

Image to Text

Extract text from images using OCR — supports English, Chinese, and Japanese.

📝

PDF to Text

Pull all text out of a PDF as plain .txt

📑

Images to PDF

Combine JPG, PNG, WebP photos into a single PDF

How OCR works

FileKit uses Tesseract.js, a WebAssembly port of the Tesseract OCR engine, to recognise text entirely in your browser. The language model is downloaded once (~4 MB for English) and cached locally — nothing is uploaded. For best results, use high-contrast images with clearly printed text at a resolution of at least 150 DPI.

How to OCR a Document

1
Upload an image or scanned PDF
Drag and drop a scanned document, photo of a page, or screenshot. Supported formats include JPG, PNG, WebP, and PDF.
2
Select the language
Choose the primary language of the document: English, Chinese (Simplified), Japanese, or English+Chinese combined. Correct language selection improves accuracy significantly.
3
Extract and copy text
FileKit runs Tesseract.js (WebAssembly OCR) entirely in your browser. The recognised text appears in an editable area — copy it or download as a .txt file.

Extract Text (OCR) Beta

You might also need

How OCR works

How to OCR a Document

Upload an image or scanned PDF

Select the language

Extract and copy text

Frequently Asked Questions

Related Guides