Text Recognition for Documents in ScanPapyrus

Convert scanned documents to Microsoft Word or PDF retaining the formatting of the text.

A scanned document is a set of images saved in a chosen format. For instance, the DjVu and TIFF formats are designed for storing images only, unlike the PDF format - this format can store both images and text. It may often be sufficient for you to save whole scanned pages as a PDF file without text recognition. In this case, you leave the appearance of the source document maximally unchanged without any of the distortions in the recognition of the text.

If you want to recognize a document, ScanPapyrus allows you to use an OCR cloud service from the company ABBYY (http://www.abbyy.com). This company's recognition algorithms have been developed for years and are perhaps the best in the world. They rarely make mistakes during recognition, retain as much formatting as possible and recognition does not take much time.

You will need an Internet connection to perform text recognition. Furthermore, access to the service is not free of charge.

How to Use Text Recognition in ScanPapyrus

Starting from version 19.0, ScanPapyrus has the Recognition tab.

This tab contains the Recognize Document button. Clicking on this opens a dialog box with recognition parameters. This dialog box allows you to specify the language of the document - or several languages, if the document is in multiple languages (for instance, German and English).

Select the output format of the document here as well. There are several options available:

  • Microsoft Word – the document will be saved as a file in the Microsoft Word format. The formatting of the source document will be retained, and the images will be inserted into the document the way they appear in the original document.
  • Microsoft Excel – if the scanned source document is a table, you can save it as a spreadsheet in the Microsoft Excel format.
  • PDF text and images – the document will be saved as a PDF. The text of the source document will be saved as text in the PDF, and you will be able to copy it from the PDF and search within the document. Images will be inserted into the document as they appear in the original file. The formatting of the source document will be retained as well.
  • Rich text – the document will be saved in the RTF format, which is simpler than the Microsoft Word format.
  • Plain text – the document will be saved as a plain text document without images. The formatting will be lost.

After you specify the parameters, click the Recognize button. The program will send your document to the recognition cloud service. You will see your list of documents sent for recognition on the screen.

Usually, the service completes recognition in one or two minutes. You can see the status of the document recognition process in the Recognition Task List dialog box, which is opened by clicking on the Task List button in the main window of the program. Click the Update Status button to update the status of the task. When recognition is complete, the status of the task will be changed to Finished and the Save As... button will appear on the File column. Click this button to save the document to the selected location on your hard drive.

Recognition Key Activation

To be able to use the recognition service, you need to activate the recognition key. Each key allows you to recognize 100 pages using the recognition service. You can purchase a recognition key at the Price page. You will also get to this page if you click the Buy Key button on the Recognition tab. After your purchase, the key will be sent to your e-mail address. When you receive the recognition key, you will need to activate it in ScanPapyrus. To do it, click the Activate Key button on the Recognition tab.

Paste your key into the input field and click the Activate button. You will see a message confirming your successful key activation. The key will be active for a term of 90 days from the moment of activation.

You can monitor the status of your keys in the Activated Recognition Keys List dialog box, opened by clicking on the Key Status button.

This dialog box shows you how many pages for recognition are available for you and when the term of your key expires.

Your Data Security

ScanPapyrus uses the ABBYY OCR Cloud service for recognition (https://www.abbyy.com). Your data is stored and processed by ABBYY in accordance with their privacy policy (https://www.abbyy.com/privacy/). ABBYY will not have access to, view, use, publish, reproduce or disclose any data you upload. Uploaded data will be stored for the purposes of processing on a server in the European Union for a limited period of time.

ABBYY automatically deletes all uploaded data within forty-two (42) hours after it is uploaded to the service. Processed uploaded data is deleted within twenty-four (24) hours after it is processed by the service.

While sending your pages, you on your own must evaluate the legality, safety, appropriateness, intellectual property rights and usage rights of data you send to the service.

All data is sent via the secure HTTPS protocol, ensuring your protection against data interception.

ScanPapyrus neither sends nor stores your data on third-party servers.