Tesseract is the most popular open-source OCR engine in industry which is used widely during development of OCR projects. But installing it on Windows is a tedious task and you always run into issues during the setup. Let’s resolve these issues forever by following this step-by-step guideline for installation of Tesseract on Windows.
Installation Steps
Step 1 – Download and install from the link tesseract-ocr-w64-setup-v4.0.0.20181030.exe
If you want to update to the latest version then you can download from the link https://digi.bib.uni-mannheim.de/tesseract/
Although you can choose to download any version, but I always recommend downloading 4.0.0 version which is a stable version.
Step 2 – In the System Environment Variables, add Python installation path as shown below (or copy the path where you have installed Python)
In Windows Search Bar, type “system environment variables” –> click open –> Click Environment Variables button at bottom, Environment Variables tab will appear
Variable Name –PY_HOME
Variable Value –C:\Python3.6
Step 3 – Then in the System Environment Variables, add PythonPath as shown below:
Variable Name –PYTHONPATH
Variable Value –%PY_HOME%\Lib;%PY_HOME%\DLLs;%PY_HOME%\Lib\lib-tk;C:\another-library
Step 4 – In the PATH environment variable add following value
Variable Value – %PY_HOME%;%PY_HOME%\Scripts
Step 5 – Install Microsoft Visual Studio 2022 – Community Version from the link below:
https://visualstudio.microsoft.com/downloads/
After opening this link, go to the below path and download the installer:
All Downloads –> Visual Studio 2022 –> Visual Studio Community 2022
After downloading the installer, Click Download. Please note that during this installation, installer will ask you to choose certain options in the workload tab. Please find below the list of options that you need to select:
- Under Windows section, you have to only select the checkbox Desktop development with C++ and leave default selected options under this section
- Under Web & Cloud, you have to only select the checkbox Python development
After this, click on install to begin installation. Please note that installation will take some time to complete.
Step 6 – Install “Build Tools for Visual Studio 2022” from the link below:
https://visualstudio.microsoft.com/downloads/
Go to All Downloads –> Tools for Visual Studio 2022 –> Build Tools for Visual Studio 2022 and click Download
Please note that during this installation, installer will ask you to choose certain options but you don’t have to select any option and simply click install to begin installation.
Step 7 – Install “Microsoft Visual C++ Redistributable for Visual Studio 2022” from the link below:
https://visualstudio.microsoft.com/downloads/
Go to All Downloads –> Other Tools, Framework, and Redistributables –> Microsoft Visual C++ Redistributable for Visual Studio 2022
You have to select x64 and click Download. After this, simply click installer to begin installation.
Step 8 – Add TESSDATA_PREFIX in the System Environment Variables :
Variable Name –TESSDATA_PREFIX
Variable Value – C:\Program Files (x86)\Tesseract-OCR\tessdata
Step 9 – Add another environment variable “tesseract”
Variable Name –tesseract
Variable Value –C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
Step 10 – In the PATH environment variable add following path of installation of tesseract
Variable Value –C:\Program Files (x86)\Tesseract-OCR
This setup works well on Windows well but sometimes on Windows10, you might run into issues especially while executing Tesseract with pytesseract. So, let me detail out the step that you might need to take to resolve this issue.
With this, you have learned how to install Tesseract but the real challenge is to learn how to implement OCR and create your first project with Tesseract and Python. Learn more to see how easy it is.
Tesseract Setup Issues on Windows 10
Step 1 – We will first go to drive where Python is installed, in my case its in C drive under Python36 folder, from here we will open the pytesseract python file.
Go to C:\Python36\Lib\site-package\pytesseract and open the file pytesseract.py
Step 2 – Once you have opened the file, you need to change the value of tesseract_cmd from tesseract to the value indicated in To listing.
from: tesseract_cmd= 'tesseract'
to: tesseract_cmd='C:\Program Files (x86)\Tesseract-OCR\tesseract.exe’
With this, you will be able to resolve the issue of integration of Pytesseract with Tesseract.
Note – The 4.3.0.36 version of OpenCV is the most stable and reliable version that has been released to date. If you are planning on doing any development work with OpenCV, I would highly recommend using this version.