01 September 2025

How to extract texts from an image on ubuntu?

To extract English texts - 
$ sudo apt install tesseract-ocr
$ tesseract your_image_name.png extracted_text.txt 

 

To extract Simplified Chinese texts -

$ sudo apt install tesseract-ocr tesseract-ocr-chi-sim

$ tesseract your_image.tiff output_text.txt -l chi_sim

 

To extract Traditional Chinese texts -

$ sudo apt install tesseract-ocr tesseract-ocr-chi-tra

$ tesseract your_image.tiff output_text.txt -l chi_tra

 

To extract multiple languages, e.g. English and Simplified Chinese, and Traditional Chinese texts -

$ tesseract your_image.tiff output_text.txt -l eng+chi_sim+chi_tra

No comments:

Post a Comment