- Tesseract.js is a pure javascript library for OCR (Tesseract OCR engine.)
- It gets words out of images (supports over 60 languages)
- It can run either in a browser and Node.js.
- Demo (see)
Include library tesseract.js in HTML
<script src='https://cdn.rawgit.com/naptha/tesseract.js/1.0.10/dist/tesseract.js'></script>
Tesseract.recognize(imgObj, {
lang: langValue
})
.progress(function(p){
console.log('progress', p)
})
.then(function(result){
console.log("Read the image success");
/*To do something*/
})
.catch(function(err){
console.log("Read the image failed");
/*To do something*/
})
.finally(function(resultOrError){
console.log("Finally");
/*To do something*/
});
imgObj is any ImageLike object.(see)
langValue is a property to config a language.(see)
How to detect the language
Tesseract.detect(myImage)
.then(function(result){
console.log(result.script)
})
(then, progress, error and finally methods can be used)
Install tesseract.js package with npm
npm install tesseract.js --save
(requires node v6.8.0 or greater.)
Use it:
let Tesseract = require('tesseract.js')
I use a picture for testing from wiki
My source codes on Node.js and run with this command
node test_ocr.js
example output
##References