网上搜一堆解决问题的文章都没用
安装 pytesseract
pip install pytesseract
跑python识别程序,下列语句会报错
pytesseract.image_to_string(Image.open(filename))
报错:
Error opening data file /usr/local/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable issettoyour"tessdata"directory.Failedloadinglanguage'eng'Tesseract couldn't load any languages! Could not initialize tesseract.
Windows中需安装tesseract
1. 下载 tesseract-ocr-setup-4.00.00dev.exe 安装
2. 并新建用户变量
TESSDATA_PREFIX
D:\Program Files (x86)\Tesseract-OCR
3. 再次运行,会有如下报错
tesseract.exe已停止工作
pytesseract.pytesseract.TesseractError:(3221225477, ‘’)
因为安装的是版本4.0了,解决需卸载,并下载安装tesseract 3.02.02 版本,sourceforge有历史安装文件和中文包下载
https://sourceforge.net/projects/tesseract-ocr-alt/files/
https://nchc.dl.sourceforge.net/project/tesseract-ocr-alt/tesseract-ocr-setup-3.02.02.exe
4.下载中文 chi_sim下载,解包到D:\Program Files (x86)\Tesseract-OCR\tessdata
https://nchc.dl.sourceforge.net/project/tesseract-ocr-alt/tesseract-ocr-3.02.chi_sim.tar.gz
5. 跑程序,但是识别结果真烂
材料成分识别成了材料咸分。。。
CentOS 7 安装 tesseract
sudo yum install tesseract -y
pip3 install pytesseract
vi ~/.bash_profile
export TESSDATA_PREFIX=/usr/share/tesseract/tessdata
source ~/.bash_profile
yum install -y tesseract-langpack-chi_sim # 中文包
参考:
https://www.devzoneoriginal.com/2020/11/how-to-install-tesseract-on-centos.html