I’ve been having fun with Tesseract, an open source OCR engine. It works from the command line, taking image files (TIFF and JPEG work for me) and outputting plain text. That’s all. It doesn’t do anything fancy overlay text on an image to generate a searchable pdf (it does output hOCR and handles multiple columns, so I assume that … Read More