Scan as text

agantuk · May 15, 2010

Looking for a good OCR. Have lots of pages to scan, and need to save them as text, instead of image. Contains only standard fonts.

Can someone suggest a good FREE OCR if available? If not, good commercial ones, but I will have to get the *ahem* versions for those

seshu · May 15, 2010

for direct users :

freeocr

topocr

irfan view - ocr plug-in

for developers :

tesseract ocr

gui for tesseract - vietocr (viatnamese , English)

agantuk · May 15, 2010

Thanks for the list. I haven't tried this yet - only Simple OCR, and the results aren't that great.

Which of the above would you recommend?

seshu · May 15, 2010

top ocr has been working fine for me . simple ocr , free ocr , top ocr ...

all these work with similar technology . perhaps you need abby finereader

agantuk · May 15, 2010

Just tried out FreeOCR and TopOCR.

FreeOCR is average at best. Even standard pages seem to be a tough job for this.

However, TopOCR is awesome. Text only pages are fantastic. Even on pages filled with technical stuff having non-standard English words, it is doing a very good job.

Thanks a lot

Have started downloading *ahem* version of Abbyy. Let me see how this does. Will keep posted!

seshu · May 15, 2010

most of the times, Abby finereader is better than Acrobat pro's in-built OCR ...

it should suit your requirements

there should be no need for the other OCRs !

Praks · May 16, 2010

@ajab.ghajab

Do post your results bro

agantuk · May 16, 2010

^ I will, once I am done with my scans

Just finished with some of the pages. Used ABBYY this time round and am pretty impressed.

I was scanning some material which has some Java code in it, along with some which are plain English. The plain English ones came out great. The ones with the code weren't bad either. I didn't have to edit much in the final document. Attached are the original and scanned versions.

Set 1: Text + code

Original: 4shared.com - document sharing - download 01.pdf

Scanned text: 4shared.com - document sharing - download Part 01.pdf

Set 2: Text + tables

Original: http://www.4shared.com/document/S4BjgVDT/04_online.html

Scanned text: 4shared.com - document sharing - download Part 04.pdf

Praks · May 16, 2010

Gr8,

So which one is best as per your rating ?

agantuk · May 16, 2010

^ As stated in my previous post, I would go with ABBYY. Professional software, and does an extremely good job. Recommended if you have composite documents - pages with text and tables / XMLs / non-standard content.

For plain English documents, TopOCR would do the job well.

Praks · May 17, 2010

Gr8, Mind telling version of Abby you used, Will find from *Ahem* sites

coolraghav · May 17, 2010

why dont you use document imaging scanning.

comes as standard in office 2003.

Praks · May 17, 2010

^^

Do you mind giving link of this feature explaining ? Could not find in Office 2007

seshu · May 17, 2010

^ About Microsoft Office Document Imaging

agantuk · May 17, 2010

Praks said:
Gr8, Mind telling version of Abby you used, Will find from *Ahem* sites

ABBYY FineReader 10 Professional Edition

Lots of seeds available

..:: Free Radical ::.. · May 18, 2010

If you are using ABBYY, switch to version 8.

It has the best and most functional interface.

Later versions are all bloated.

Praks · May 18, 2010

Thanks for sharing.

Worst part is all those with seeds binded with virus/trojans

agantuk · May 18, 2010

I installed the 'trojan' version. The trojan thing is BS, all sorts of cracks cause the AVs to go berserk with alarms. Haven't been impacted so far though.