[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gocr (GPL'ed Optical Character Recognition)



>For those of you with low-volume or occasional OCR requirements, this is
>the best effort I've seen so far. There are others on the link page from
>this site, some of which I've tried and found very much wanting.
>Some impressions (based on a few trials):
>1. Takes longer to compile, per program, than almost anything else I've
>   ever seen. This is an indicator (not a guarantee) of a high degree of
>   complexity in a fairly small amount of code.

You noticed this too?  I sat there, staring at the gnome-terminal window waiting
for the thing to abend.  I've never seen a file take that long to compile.

>2. The packaging stinks. This is actually good, because this guy is paying
>   lots of attention to the core content, not all the froo-furrah. Later he
>   can get fancy.

Not sure what you mean by this?  It's a single binary, what packaging would he
need.

>5. OCR'ing a page of typed copy took about 2 minutes on a 233 MHz Pentium;
>   (2:30 for a virtually flawless page of Courier). Performance ain't great.
>   Who cares? For small volumes, it's livable, for large volumes and production
>   apps, there's commercial software, which costs big bux... but then
>   there's budget for that on large    projects. This levels the playing
>   field a bit.

I'd also note that it is not for those with low-memory situations.  This thing
eats up memory like the Tasmanian Devil at a buffet. 

>Whether you spend nothing or thousands for OCR, you still need to
>proofread, although there's generally fewer mistakes at the very pricey end of
>things. So far, this does a lot of the easy stuff as well as some really
>expensive packages, and it's not even close to release 1.0
>gocr looks good enough, as is, for block-level prototyping. For an early
>release, this is great stuff!

I snatched an except out of a manual last night, about two paragraphs of
Garamond type, trimmed it down to just the text with xv, ocr'd, and had two
minor errors.  Which I thought was very good.  On a whole page, it didn't do as
well.  But if this only gets a little better it will be as good as the
commercial Win32 OCR software I've used.

Systems and Network Administrator
Morrison Industries
1825 Monroe Ave NW.
Grand Rapids, MI. 49505