How to recognize a scanned document or book
Get 500 Free Pages for Recognizing Documents
A scanned document is a set of images saved in a chosen format. For instance, the DjVu and TIFF formats are designed for storing images only, unlike the PDF format - this format can store both images and text. It may often be sufficient for you to save whole scanned pages as a PDF file without text recognition. In this case, you leave the appearance of the source document maximally unchanged without any of the distortions in the recognition of the text.
If you want to recognize a document, ScanPapyrus allows you to use an OCR cloud service from the company ABBYY (http://www.abbyy.com). This company's recognition algorithms have been developed for years and are perhaps the best in the world. They rarely make mistakes during recognition, retain as much formatting as possible and recognition does not take much time.
You will need an Internet connection to perform text recognition. Furthermore, access to the service is not free of charge by ABBYY. However, you can get 500 free pages at the first time you sign up on the recognition service.
Creating an account on ABBYY Cloud OCR
Enter your e-mail, create a password and enter captcha. After registration, you can log in your the control panel.
Now you need to create a recognition application. To do this, click ADD NEW APPLICATION.
The page for creating your application opens. The application ID will be created automatically. You will need to enter the app name and select the location of the recognition server (USA or Europe). After clicking the CREATE APPLICATION button, you will receive the password for the created application to your e-mail.
Connecting the created application to ScanPapyrus
Now, you can connect the created application to ScanPapyrus and use text recognition. Start ScanPapyrus, go to the Recognition tab and click Service Settings.
In the window, enter the settings of your created app. Copy Application ID and password from ABBYY Cloud OCR e-mail and paste into the appropriate fields.
How to Use Text Recognition in ScanPapyrus
Now you can send documents for recognition to the ABBYY Cloud OCR service. In the Recognition tab click Recognize online.
The Recognition options window allows you to specify the language of the document - or several languages, if the document is in multiple languages (for instance, German and English).
Select the output format of the document here as well. There are several options available:
- Microsoft Word – the document will be saved as a file in the Microsoft Word format. The formatting of the source document will be retained, and the images will be inserted into the document the way they appear in the original document.
- Microsoft Excel – if the scanned source document is a table, you can save it as a spreadsheet in the Microsoft Excel format.
- PDF text and images – the document will be saved as a PDF. The text of the source document will be saved as text in the PDF, and you will be able to copy it from the PDF and search within the document. Images will be inserted into the document as they appear in the original file. The formatting of the source document will be retained as well.
- Rich text – the document will be saved in the RTF format, which is simpler than the Microsoft Word format.
- Plain text – the document will be saved as a plain text document without images. The formatting will be lost.
After you specify the parameters, click the Recognize button. The program will send your document to the recognition cloud service. You will see your list of documents sent for recognition on the screen.
Usually, the service completes recognition in one or two minutes. You can see the status of the document recognition process in the Recognition Task List dialog box, which is opened by clicking on the Task List button in the main window of the program. Click the Update Status button to update the status of the task. When recognition is complete, the status of the task will be changed to Finished and the Save As... button will appear on the File column. Click this button to save the document to the selected location on your hard drive.
Your Data Security
ABBYY automatically deletes all uploaded data within forty-two (42) hours after it is uploaded to the service. Processed uploaded data is deleted within twenty-four (24) hours after it is processed by the service.
While sending your pages, you on your own must evaluate the legality, safety, appropriateness, intellectual property rights and usage rights of data you send to the service.
All data is sent via the secure HTTPS protocol, ensuring your protection against data interception.
ScanPapyrus neither sends nor stores your data on third-party servers. Your application password is stored on your computer in encrypted form.