Searchable PDF
Also known as: OCR PDF, text-searchable PDF, searchable scan
A searchable PDF is a scanned document that has a hidden, selectable text layer added by OCR (optical character recognition) beneath the page image. You see the original scan, but you can search, select, and copy the recognized text — unlike a plain image-only scan.
- A scan with a hidden OCR text layer underneath
- Searchable, selectable, and accessible — not just an image
- The text layer is tiny; page images set the file size
Image-only vs searchable
When you scan a page, the result is usually a picture of the text — a computer cannot read the words. OCR analyzes that image and writes the recognized characters into an invisible layer aligned with the scan, turning it into a searchable PDF.
The visible page looks identical, but now Find works, screen readers can read it aloud, and you can copy passages. This is the difference between an archive you can dig through and a stack of pictures you cannot.
Making and using searchable PDFs
The text layer adds little to the file size since it is just characters, though the page images still dominate the total. Compressing the scans keeps the file manageable.
To add a text layer to a scan, /tools/searchable-scan-to-pdf runs OCR and outputs a searchable PDF; /tools/ocr-pdf-to-text instead extracts the recognized words as plain text.