From Picture to Text: A Primer on OCR

This presentation assumes that you want to make your documents editable so that you can use CAT tools or just work on the screen of a computer in your original document, replacing text or working bilingual. It also assumes that you receive documents as image-based files. These files can be graphics formats (TIFF, JPEG, GIF, PNG, BMP, etc.) or more commonly image-based PDF files (ones in which you cannot edit or manipulate the text directly).

The solution is Optical Character Recognition (OCR). This presentation will primarily look at two popular OCR programs (ABBYY Fine Reader (multiple language interface including English and Japanese) and Yomitori Kakumei (読取革命) (Japanese interface)) and mention some helpful features of some others. Some uses and techniques and recommendations for getting the most out of OCR will be demonstrated. The object is to help you decide if OCR will help you in your work, give some guidance on selecting a program and help those who use OCR programs to use them more effectively.