OCR (Optical Character Recognition) is the technology that converts images of text — scanned documents, photos of signs, screenshots — into machine-readable, editable text. Modern OCR uses deep learning models trained on millions of text samples to recognize character shapes, words, and layout structure. High-quality OCR can achieve 99%+ accuracy on clean printed text and handles multiple languages, fonts, and layouts.
OCR is essential for making scanned documents searchable — a scanned PDF without OCR is just a collection of images; you can't select text, search for content, or copy text out. After OCR processing, a "text layer" is added alongside the images, making the document fully searchable. This is the difference between a "scanned PDF" (images only) and a "searchable PDF."
FileCurve's Image to Text tool uses OCR to extract text from images and scanned documents. For full OCR on PDFs (creating searchable PDFs from scanned documents), tools like Adobe Acrobat, ABBYY FineReader, or Google Drive's built-in OCR (open a PDF with Google Drive to convert to editable Google Doc) are the standard options.