[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: OCR and Machine Readable Text
Accuracy will depend on the quality of the original being scanned, as
well as the capability of the OCR system; flat originals scan much better
than the "bent open" pages of a book or magazine, heavy stock tends to
let less "bleed" through from the reverse side, fonts with extreme
kerning are more difficult, point size is a factor, etc.
I've seen 97%+ w/ Calera, (about 2 years ago) when using flat, first
generation high quality photocopies w/ minimal skew and courier or similar
typeface. OTOH, the same system did not scan well at all w/ badly skewed
photocopies (caused by the "bend" induced by the binding of the original).
If you are scanning medical journals, take a look at your originals and
also at where the errors are occuring.
You can also use a spell checker (after building up a suitable dictionary
for your application) to cut out some of the error.
I'd guess your results to be less satisfactory for other applications
where extreme accuracy is a must. "3", "8", and "B" for example, are
often confused; not a big problem w/ a medical journal, but plays havoc
w/ code, accouting data, etc.
-r.w.
On Fri, 3 Jan 1997, /**\anonymous/**\ wrote:
> Alan Olsen wrote:
> > I used to work for a company that would transfer entire archives of medical
> > journals. Much of it we would just OCR. Some of it we would send off
> > shore. The OCR software was about 95% reliable and this was over 5 years
> > ago. (And we were using 286 boxes for much of the OCR work. Not a heavy
> > technoligical investment.) I am sure that things have improved a great
> > deal since then. (My new scanner included OCR software. I will have to
> > run a test and report the findings.
>
> I'd like to know what OCR software you were using. All tests we
> completed at my place of employment were very poor quality wise. We
> showed
> a %65 accuracy rate. Not very good when you need to transfer a five
> year
> backlog of medical and technical journals. This was using a high
> resolution
> scanner with a package that was bundled along with it. About a year
> ago,
> my employer considered transfering data taken off of forms into a
> relational
> database using an OCR program. Again, we found the findings to be too
> innacurate for our needs. I may have just been using the wrong programs
> for
> the job, but the findings were depressing...
>
> panther
>
> > ---
> > | If you're not part of the solution, You're part of the precipitate. |
> > |"The moral PGP Diffie taught Zimmermann unites all| Disclaimer: |
> > | mankind free in one-key-steganography-privacy!" | Ignore the man |
> > |`finger -l [email protected]` for PGP 2.6.2 key | behind the keyboard.|
> > | http://www.ctrl-alt-del.com/~alan/ |[email protected]|
>