Scan and OCR a Document

Here's what I've got to do: transfer three storage rooms full of boxed documents into an electronic library that everyone can access through their PCs using Word 2003. I've got plenty of time, but I think my patience will wear through before too long if I spend all day copy-typing documents. Still, I need the money.

Quell any thoughts of arson that have strayed through your mind. First, you need a clear policy on which documents to keep and which to shred immediately. This tutorial can't help you on that, but anything more than a few years old is unlikely to be of much use. Did you know that Microsoft encourages its employees to clear out their email when it reaches the ripe old age of three months? Heck, even some cheeses are older than that.

Second, you need to get to work with a decent scanner and the Microsoft Office Document Scanning feature that lurks, frequently unnoticed, on the Office Tools menu for Office 2003 and Office XP. In the Scan New Document dialog box (Figure 6-4), choose the preset to usetry "Black and white" to startand then click the Scan button. When the scan is completed, the Microsoft Office Document Imaging window opens.

From here, you can choose File » Save to save the scanned picture under either the default name that Microsoft Office Document Scanning has assigned (note that the document isn't saved yet, even though it appears to have a filename) or a name of your choosing, or choose Tools » Send Text to Word to send the text to a document in a new Word session. You'll probably need to clean up the text in Word before saving it. Arrange the windows so that you can see both the scan of the document and the OCRed text, and make the text match the original.

Figure 6-4. Office 2003 and Office XP include built-in scanning and optical character recognition (OCR) capabilities. You provide the scanner.

