Coursework for ISSALE - 2014 Project Demonstration

Preview:

DESCRIPTION

Coursework for ISSALE - 2014 Project Demonstration. SINHALA LANGUAGE OCR. Kasun Perera Chamila Liyanage Tharaka Viswakula Laksri Wijerathna. Sinhala Script consists of:. 18 vowels. 40 consonants. Sinhala Script. 18 modifiers other symbols (rakaranshaya, yansaya) Font: Abhaya - PowerPoint PPT Presentation

Citation preview

Coursework for ISSALE - 2014 Project Demonstration

SINHALA LANGUAGE OCR

● Kasun Perera● Chamila Liyanage● Tharaka Viswakula● Laksri Wijerathna

Sinhala Script consists of:

18 vowels 40 consonants

Sinhala Script

18 modifiersother symbols (rakaranshaya, yansaya)

Font: AbhayaFont Size :12

Selected characters700 අ 708 ල්

701 ැ� 709 න්

702 නි 710 ණ

703 ර 711 සි

704 ස 712 ත්

705 ත 713 යි

706 ක් 714 එ

707 කි 708 ල්

Document Image

Image document has 16 different character types and 11 samples of each character type.

Line and Main Body segmentation● All lines were segmented correctly

o No of Lines in input Image -9 o Program Outputs 9 line segmentso 100% accuracy

● All Main bodies were segmented correctly(No diacritics) o 100% accuracy

Decision Tree Recognition results● Creation of Training(35) and Test data(15)● Decision Tree created using Weka - using Training data● Tested accuracy using Test data

Overall accuracy:

70 %

Bad recognition Chars702- නි / 708- ල් / 711- සි / 712- ත්

Tesseract Recognition results

Overall accuracy:93.181%

Complete OCR- DT MethodOverall accuracy - 28%

Complete OCR - Tesseract

Overall accuracy - 92.8%

Tesseract Output File

Conclusion

Test dataset (15)● Tesseract Accuracy- 93%● DT Accuracy- 70%

Document Image● Tesseract Accuracy- 92.8%● DT Accuracy- 28%

ස්තුතියි...!(Thank you...!)

Recommended