Optical Character Recognition in C# using Tesseract

In this post, I’ll demonstrate how to use Tesseract to build an Optical Character Recognition (OCR) application in C#.

In my recent post about OCR in C#, I used Puma.NET to create the OCR application.

https://www.mishelshaji.com/howto/optical-charactor-recognition-ocr-in-c/

The main drawbacks of using Puma.NET were:

  • Less accurate
  • Puma.NET should be installed on the machine.
  • Requires older versions on .NET.

Creating an OCR application in C# using Tesseract

  • Open Visual Studio and create a new C# Console application.
  • Open the Package Manager Console and install the Tesseract nuget package.
Install-Package Tesseract

If you hate typing commands, Right-click on the project in the solution explorer and select Manage NuGet Packages… ->Click on Online tab and search Tesserect->Click install.

This will add Tesseract and other binaries to the project.

  • Next, we should add language files. You can get these English language files from here. Create a folder tessdata in the Debug folder of your project and copy the language files to it.
  • Finally, add the C# code and run the project.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Tesseract;
namespace TesserectOCR
{
    class Program
    {
        static void Main(string[] args)
        {
            var ocrengine = new TesseractEngine(@".\tessdata", "eng", EngineMode.Default);
            var img = Pix.LoadFromFile(@"E:\Capture.png");
            var res = ocrengine.Process(img);
            Console.WriteLine(res.GetText());
            Console.ReadKey();
        }
    }
}

Possible errors

You may get the following error when running the project.

The type ‘System.Drawing.Bitmap’ is defined in an assembly that is not referenced. You must add a reference to assembly ‘System.Drawing, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’.

To fix this, Go to Solution Explorer -> Right-click on References -> Add Reference -> Search Drawing -> Select System.Drawing (A checkmark will appear on the left side if selected) from the result and click OK.

If you enjoyed this post, let me know by leaving a comment below.


2 thoughts on “Optical Character Recognition in C# using Tesseract

  1. I check out your website on my iphone during lunch break. I really like the info you provide here [some content/hyperlink removed]. I’m surprised at how quick your blog loaded on my mobile .. I’m not even using WIFI, just 3G .. Anyways, amazing blog!

Leave a Reply

Your email address will not be published.Required fields are marked *