Tag Archives: ocr

How to Convert Scanned JPEG with Text to Searchable PDF Document

JPEG is probably the most popular extension for storing and transferring images. Many users prefer this format for sending photographs by email or uploading them on social networks. The reasons are obvious: JPEG can be opened with even the most basic image viewer, provides small size and decent image quality.

The format is also used as output extension by many scanning and faxing devices. Using JPEG you can get a quick and small digital copy of a document. As this is a raster format, however, what you get is an image reproduction of the document. Not an actual text. This means scanned JPEG cannot be searched or edited. Depending on your area of work, this can be a considerable problem. For example, if you scan a certificate or an invoice that you need to make quick corrections to, the JPEG format will not be of much help.

Continue reading

How to Convert Scanned TIFF Images with Text to Searchable PDFs

Do you a have a printed document and need to make a digital copy of it? Scanning is a quick option. Or an even quicker one is photographing it with your smartphone. In both cases, what you will get is an image of the text, most likely saved under a raster format, such as PNG, JPEG or TIFF. In fact, the latter is often preferred by scanning and faxing devices due to its good quality rendering of text.

But while being an accessible and flexible format, supporting both lossy and lossless compression, TIFF has one major disadvantage. Texts scanned under this format cannot be edited or searched. This might be a problem if, for example, you have a scanned book and need to look for a certain keyword. Or if you want to translate the scanned document via machine translation. The best solution in such cases is converting to a format that can be edited and searched — PDF.

Continue reading