A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text

A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text

Prepared by:

Eng. Randa Ibrahim M. Elanwar

بسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيم

﴾﴾﴾﴾﴾﴾﴾﴾وما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيب﴿﴿﴿﴿﴿﴿﴿﴿

Eng. Randa Ibrahim M. Elanwar(M.D.)Assistant Researcher

Electronic Research Institute

Under the supervision of:

Prof. Dr. Mohsen A. A. Rashwan

Professor of Digital Signal Processing Faculty of Engineering

Cairo University

Prof. Dr. Samia A. A. Mashaly

Professor of Digital Signal Processing Computers & Systems Dept.

Electronic Research Institute

Presentation Organization2

1. Introduction, Thesis Goals & Contributions

2. Text Lines Extraction

3. Words Extraction

4. Words Segmentation4. Words Segmentation

5. User Interfaces

6. Annotation performance evaluation

7. Conclusions & Future Work

3

Introduction4

�� What is ‘Annotation’?What is ‘Annotation’?

�� What is ‘Document Annotation’?What is ‘Document Annotation’?

�� Why Document Annotation?Why Document Annotation?

�� How to Annotate a document?How to Annotate a document?

Introduction5

� Annotation

� Identifying data of particular type using additional data of different type, precisely describing its entities.entities.

� Documents annotation:

� Associating the ASCII/UNICODE corresponding to the document image (offline) or ink info (online).

Introduction6

Trans./

Ground truth

Image/ink

Digital Digital

Library Library

and and

Annotated Document

Key words

Search Engine

Info. Retrieval

Web Web

searchsearch

Introduction7

Annotated Annotated DocumentsDocuments

Building Building

RecognizersRecognizers

Test DataTest Data

Performance Performance EvaluationEvaluation

Result Result AnalysisAnalysis

Training Training DataData

Train Train ModelsModels

Introduction8

Document Annotation

Accelerate & Enhance

Recognizers

Accelerate Digital library construction

Annotation maximizes efficiency, productivity & Annotation maximizes efficiency, productivity & profitability.profitability.

Introduction9

� Region of interest:

line, word, character

� Annotation: identify

boundaries and

associate ASCII /

UNICODE

Introduction10

�� Annotation schemes:Annotation schemes:

• Manual annotation-validation• Laborious, time-consuming• Error prone

Manual

• Manual truth entry• Automatic annotation• Manual Validation

Semi-automatic

• Automatic recognition/truth alignment

• Manual ValidationAutomatic

Thesis Goals11

1• Contribute to the Arabic LR problem solution.

2• Save researchers’ efforts spent on data manipulation.

We provide the We provide the 11stst online Arabic sentence dataset online Arabic sentence dataset (OHASD) and the (OHASD) and the 11stst Arabic semiArabic semi--automatic automatic annotation tool for online handwriting (ATAOH).annotation tool for online handwriting (ATAOH).

3• Pave path to generic tool kits construction.

Contribution: OHASD Dataset12

� unconstrained and natural.

� Texts sampled from daily newspapers.

Texts are dictated to writers. � Texts are dictated to writers.

� 154 paragraphs by 48 writers.

� More than 3800 words and more than 19,400 characters.

Contribution: ATAOH Tool13

1. Easy document browsing and display.

2. Automatic Text-line/Word extraction-segmentation.2. Automatic Text-line/Word extraction-segmentation.

3. Manual options for segmentation validation & annotation correction.

Contribution: ATAOH Tool14

4. Designed and evaluated using OHASD.

5. Composed of a guiding set of interactive user 5. Composed of a guiding set of interactive user interfaces.

6. Reduces human effort by high performance automation

15

Text Line Extraction16

Bottom-up

Smearing

Hough-based

Grouping

Text line Extraction Text line Extraction TechniquesTechniques

Bottom-up Grouping

Graph-based

Cut Text Minimiz.

Top-Down Projection-based


�� Extraction errors are due to:Extraction errors are due to:

• Fluctuating lines1

• Skew variability2

• Touching text lines3

• Fragments due to massive presence of diacritics4


Read the input document

(stroke data)

Start DP to merge

segments pairs

Precede DP as long as valid merges exist

ATAOH provides an automatic text line extraction utility based on dynamic ATAOH provides an automatic text line extraction utility based on dynamic programmingprogramming

Preprocessing: remove dots

Shred the document into

strips

For each strip build

CCs/units

Extend units horiz. To build

segments

If DP stops, final paths are the text lines

Restore dots



DP cost function design

]log[])log[*tan( enaltyCrossOverPenaltyDirectionPtyWidthPenalcePenaltyDistMergingCos ++=

Direction Penalty

Insures merging from

right to left

Cross-over Penalty

Insures merging adjacent segments

Distance Penalty

Insures merging close

segments

Width Penalty

Avoids having left-alone segments


�� Stuck text lines are fixed using a postStuck text lines are fixed using a post--processing step processing step of sticking detection and correction. Gap separated of sticking detection and correction. Gap separated text line segments (bridges) are also fixed.text line segments (bridges) are also fixed.

OHASD dataset is divided to OHASD dataset is divided to 124 124 documents for documents for �� OHASD dataset is divided to OHASD dataset is divided to 124 124 documents for documents for training (training (558 558 text lines) and text lines) and 30 30 documents for test documents for test ((112 112 text lines).text lines).

�� Experiments are conducted for system parameters Experiments are conducted for system parameters optimization (DP stopping thresholds).optimization (DP stopping thresholds).

Text Lines Extraction22

�� The results got for both the training and test sets are:The results got for both the training and test sets are:

Document

Accuracy

Text Line

Accuracy

Training set

Exp 1 results 95.16 84.36

Stick resolve 97.58 85.1Training set Stick resolve 97.58 85.1

Bridge concatenation 100 100

Test SetExp 1 results 93.33 87.22

Stick resolve & Bridge concatenation 96.67 98.5


Performance Comparison

Liwicki(2006)

Zahour(2001)

Li (2006) Our system(2006)

100 docs from

IAMonDB

98% Doc. Acc. 99.94% Stroke Acc.

(2001)

100 offline Arabic docs

97%

Li (2006)

100 offline Arabic docs

92% text line Acc.

Our system

154 online Arabic docs

98% text line Acc., 96.7% Doc. Acc.


�� Conclusions:Conclusions:

• Our method gives promising results.1

• Applicable to off-line documents (minor changes).2

• Overcomes writing on multiple text lines.3

• Applicable to English, French & Greek (overcome diacritics)4

25

Words Extraction26

�� Word Extraction Techniques:Word Extraction Techniques:Break text Break text line to CCsline to CCs

Threshold Threshold basedbased

Compare gap Compare gap width to fixed width to fixed

thresholdthreshold

Classifier Classifier basedbased

Classify gap Classify gap as inter/intra as inter/intra

word gapword gap

Words Extraction27

�� Extraction errors are due to:Extraction errors are due to:

• Wide intra-word gaps (Word split)1

• Narrow inter-word gap (Word stick)2

• Total overlap: no inter-word gap (Word stick)3

Words Extraction28

Read the input Read the input text line (stroke text line (stroke

data)data)

Decision fusion Decision fusion and stick and stick

correctioncorrectionRestore dotsRestore dots

ATAOH provides an automatic words extraction utility based on classifiers ATAOH provides an automatic words extraction utility based on classifiers decisions fusiondecisions fusion

Preprocessing: Preprocessing: remove dots, remove dots,

build OCsbuild OCs

Feature Feature Extraction Extraction

(Global/local)(Global/local)

Initial word Initial word extraction (best extraction (best

classifier)classifier)

Word Stick Word Stick detectiondetection

Words Extraction29

�� Experiments done to point out the best performing Experiments done to point out the best performing classifiers.classifiers.

Words Extraction30

� Feature vectors are fed to SVM (polynomial kernel) for initial word extraction. extraction.

� The extracted words undergo stick detection tests.

Words Extraction31

�� The stick detection tests based on the likelihood The stick detection tests based on the likelihood probability of output word parameter values:probability of output word parameter values:

1.1. Number of OCS Number of OCS

2.2. Word widthWord width

3.3. Number Number of strokesof strokes

Words Extraction32

�� If a word is stuck, all gap decisions from If a word is stuck, all gap decisions from 5 5 classifiers are fused. A classifiers are fused. A separate preseparate pre--trained SVM gives the final decision whether or not to trained SVM gives the final decision whether or not to break the word up.break the word up.

Words Extraction33

OHASD dataset

Training Validation TestTraining

110 Docs, 2802 words

2264 (inter-), 3988 (intra-)

word gaps

Validation

14 Docs, 334 Words

277 (inter-), 437 (intra-) word gaps

Test

30 Docs, 688 Words

616 (inter-), 1117 (intra-) word gaps

Words Extraction34

�� Validation set results:Validation set results:

� Split words represent about 2% of the validation dataset words.

� 96% of stuck words are detected, 62% are resolved correctly, 8% are wrongly resolved, 16% of the lengthy correctly, 8% are wrongly resolved, 16% of the lengthy correct words are damaged.

� Stick resolution lead to 31.19% error reduction in GCR and 43.89% error reduction in WER.

Words Extraction35

�� Test set results:Test set results:

� GCR of 88.4% and WER of 71.5%.

� Word Split and total overlap errors are showing up excessively.

Words Extraction36


Liwicki (2006)Quiniou(2009)

Sun (2004) Our systemLiwicki (2006)

Threshold based

86% WER 95% GCR

(2009)

RBF NN classification

96% WER

Sun (2004)

LDA/KNN/

GMM/MLP/

SVM

89.5%, 90%, 89.8%, 93.2%,

93.7% GCR

Our system

4 SVM + RBF NN

71.5% WER, 88.4% GCR

Words Extraction37

�� Conclusions:Conclusions:

• No publications for Arabic online word extraction problem so far.1

• Results are promising regarding the difficulty of Arabic2

• Odd writers’ habits add more challenge3

• Limitations for not using context help (stick/split detection)4

38

Words Segmentation39

• Essential for analytic word recognition approaches.1

• Touching/overlapping characters and word ambiguity make it difficult.2 make it difficult.2

• Impossible to segment a given word without knowing its identity.3


RulesRules--basedbased

Propose many SPs & validate using rulesPropose many SPs & validate using rules

Human experts perform classificationHuman experts perform classification

�� Segmentation Techniques:Segmentation Techniques:

Result is measured by WSR, SPRR or CSRResult is measured by WSR, SPRR or CSR

ClassifiersClassifiers--basedbased

Propose many SPs & validate by recognitionPropose many SPs & validate by recognition

Classifiers (e.g. NN) perform classificationClassifiers (e.g. NN) perform classification

Result is measured by WRRResult is measured by WRR


�� Segmentation errors can be:Segmentation errors can be:

• Over-segmentation: excessive number of PSP1

• Under-segmentation: less number of PSP2

• Bad-segmentation: correct number of PSP but mis-located3


Word Preprocessing: reWord Preprocessing: re--sampling, smoothing, sampling, smoothing, remove secondary strokesremove secondary strokes

Feature Extraction (Local/vicinity)Feature Extraction (Local/vicinity)

ATAOH provides an automatic words segmentationATAOH provides an automatic words segmentation--annotation utility using HMMannotation utility using HMM

Feature Extraction (Local/vicinity)Feature Extraction (Local/vicinity)

HMM (recognizer/aligner)HMM (recognizer/aligner)

PSP rulePSP rule--based validationbased validation

Restoring secondary strokesRestoring secondary strokes


Local Features

Delta x-y

Vicinity Features

Aspect

Writing direction

Chain code

Eye (word-PAW)

Curliness

Slope

Chords (angles,

length ratio)


�� HMM parameters:HMM parameters:

Models

• 28 characters (reduced to 19) in all positions• 6 ligatures (لم،�،لح،بح،مح،بم)

Window

• Number of samples per window• Window overlap ratio

States

• Number of states per model• Number of Gaussian mixtures per state


�� First HMM is used as recognizerFirst HMM is used as recognizer

�� automatic segmentationautomatic segmentation--annotationannotation

�� Experiments are conducted for:Experiments are conducted for:

Feature set selection Feature set selection 1.1. Feature set selection Feature set selection

2.2. System parameters optimizationSystem parameters optimization

�� Best Feature set: Eye, Chord angles, Aspect Best Feature set: Eye, Chord angles, Aspect and Curlinessand Curliness


�� Best window parameters: Best window parameters: 9 9 samples/window, no overlapsamples/window, no overlap

�� Best HMM parameters: Best HMM parameters: 20 20 states, states, 16 16 mixtures/statemixtures/state

HMM Gaussian mixtures variation didn’t affect the HMM Gaussian mixtures variation didn’t affect the �� HMM Gaussian mixtures variation didn’t affect the HMM Gaussian mixtures variation didn’t affect the results remarkably.results remarkably.

�� We define a new HMM with variable Gaussian mix. We define a new HMM with variable Gaussian mix. number per state.number per state.


�� Varying the number of HMM states, keeping Varying the number of HMM states, keeping 16 16 mixtures mixtures only for the first only for the first 8 8 states only and a single Gaussian states only and a single Gaussian e.we.w.: .:

�� best HMM has best HMM has 36 36 statesstates


�� Varying the location of multiVarying the location of multi--mix. States along HMMmix. States along HMM

��best location is the first best location is the first 8 8 statesstates


�� HMM average result on HMM average result on the validation data set the validation data set using the best design using the best design HMM: HMM: 4646..2323% WSR % WSR ––8080..8787% CSR.% CSR.8080..8787% CSR.% CSR.

� Same writer may have significantly different WSR per document.

� He may have almost same WSR but significantly different WRR.


� Segmentation accuracy is not only related to writer

habits but also related to the character position within

the word PAW

� Segmentation succeeds when a PAW has reasonable � Segmentation succeeds when a PAW has reasonable

number of obvious valleys.

� ∴ HMMs need to be trained by huge open vocabulary

dataset composed of huge variety of words written by

multiple writers.


�� HMM proposes SP validated using HMM proposes SP validated using 8 8 rulesrules

Rule 1 Rule 5



Rule 4 Rule 3



Rule 2 Rule 8



Rule 7 Rule 6


�� Applying the PSP validation rules:Applying the PSP validation rules:


� We have limits of

improvement as we

don’t have solution

for most of under

segmentation caused

by HMM.


�� Spatial information are used to assign the secondary Spatial information are used to assign the secondary strokes to the nearest/overlapping main character.strokes to the nearest/overlapping main character.


�� Applying our system to the test data set:Applying our system to the test data set:


�� Second HMM is used as alignerSecond HMM is used as aligner

�� semisemi--automatic segmentationautomatic segmentation--annotationannotation

�� Experiments done on best features & window Experiments done on best features & window paramsparams

�� HMM parameters optimizationHMM parameters optimization�� HMM parameters optimizationHMM parameters optimization

�� Best HMM parameters: Best HMM parameters: 24 24 states, states, 16 16 mixtures/statemixtures/state

�� We tried the new HMM design with variable Gaussian We tried the new HMM design with variable Gaussian mix. number per state (mix. number per state (11stst octave).octave).


�� Again we notice rapid enhancement in WSR and CSR Again we notice rapid enhancement in WSR and CSR compared to common HMM design.compared to common HMM design.

�� We notice two peeks at We notice two peeks at 34 34 states and states and 45 45 states. The states. The 4545--states HMM design is better on Writer/document level.states HMM design is better on Writer/document level.


�� The PSP validation rules are modified to benefit from The PSP validation rules are modified to benefit from knowing the word truth.knowing the word truth.

Rule 9 Rule 9


�� System Results on validation dataset:System Results on validation dataset:

WSR WUSR WOSR WBSR CSR

Reference 73.35 0.00 0.00 26.65 88.85

R1-R5 81.74 0.00 2.10 16.17 93.44

R1-R5-R4 82.93 0.00 2.10 14.97 93.88

R1-R5-R4-R2 84.13 0.00 0.90 14.97 94.33

R1-R5-R4-R2-R8 85.03 0.00 0.90 14.07 94.46

R1-R5-R4-R2-R8-

R792.81 0.00 0.90 6.29 96.75

R1-R5-R4-R2-R8-

R7-R994.91 0.00 0.60 4.49 97.83

Secondary stroke

restoration94.61 0.00 0.60 4.49 97.10


�� System results on test dataset:System results on test dataset:

WSR WUSR WOSR WBSR CSR

HMM Output 52.23 3.38 4.06 40.32 74.47

After PSP validation 75.64 4.19 8.12 12.04 89.42

After dot restoration 74.42 3.52 8.25 13.80 87.04


�� Computing SPRR for the test data set: Computing SPRR for the test data set: 9393..7474%%

�� Of the total number of test set words Of the total number of test set words 1313..9393% are having single % are having single SPR error, SPR error, 55..5555% are having double SPR error, % are having double SPR error, 22..33% are having % are having triple SPR error and triple SPR error and 33..5252% are having dot restoration errors.% are having dot restoration errors.


• PSP validation stage recovers 15-20% of mis-segmented words

1

•• Comparing system results in both HMM modes for validation and test Comparing system results in both HMM modes for validation and test datasets we conclude:datasets we conclude:

• Dot restoration may cause a loss 0-3% due to irregular writing habits

2

• Validation set results are higher than test set, as its HMM output is higher

3


• Poor HMM results are due to: limited training PAWs variability, odd writing styles

1

• Segmentation succeeds when a

•• Conclusions:Conclusions:

• Segmentation succeeds when a PAW has reasonable number of obvious valleys.

2

• There never exists a single classifier that can achieve good results for all writers.

3

• A classifier ensemble for writers clusters may accomplish the mission successfully.

4

Words Extraction67


Kurniawan(2011)

RehmanKhan (2008)

Our system(2011)

1902 SPs

82.63% SPRR

Khan (2008)

2936 SPs

91.21% SPRR

Our system

2859 SPs

93.74% SPRR

Words Extraction68


Kavallieratou(2000)

De Stefano (2002)

Abdulla (2008)

Our system (recognition)

Our system (aligner)(2000)

500 English and Greek

words

77.8% WSR

(2002)

1600 English words

68% CSR

(2008)

IFN/INIT and

AHD/AUST

90.58%, 95.66% WSR

(recognition)

OHASD

36.64% WSR, 71.36% CSR

(aligner)

OHASD

74.42% WSR, 87% CSR

69

User Interfaces70

�� The Main GUI opens at the start up showing the user The Main GUI opens at the start up showing the user all operations that can be done.all operations that can be done.

User Interfaces71

�� Word Extraction GUI: appears at pressing “Word Extraction” Word Extraction GUI: appears at pressing “Word Extraction” pushbutton on the Main GUI & specifying the document pathpushbutton on the Main GUI & specifying the document path

User Interfaces72

�� The Add Transcription GUI appears when pressing the “Transcript Data File” The Add Transcription GUI appears when pressing the “Transcript Data File” pushbutton on the Main GUI and specifying document path.pushbutton on the Main GUI and specifying document path.

User Interfaces73

�� Annotation is done by entering the word truth in the ground truth text area.Annotation is done by entering the word truth in the ground truth text area.

User Interfaces74

�� Automatic segmentation can be done it by pressing “Auto Segment” pushbutton.Automatic segmentation can be done it by pressing “Auto Segment” pushbutton.

User Interfaces75

�� Manually segmentation correction is done drawing lines by mouse clicks Manually segmentation correction is done drawing lines by mouse clicks “Manual Segment” pushbutton“Manual Segment” pushbutton

User Interfaces76

�� Each character model strokes data are calculated and displayed by pressing 'Insert Each character model strokes data are calculated and displayed by pressing 'Insert data' pushbutton.data' pushbutton.

User Interfaces77

�� 'CHECK' pushbutton plots each character model in a separate figure.'CHECK' pushbutton plots each character model in a separate figure.

User Interfaces78

�� In the output text file format, each word is indexed. Each In the output text file format, each word is indexed. Each character names is listed in order (from right to left). character names is listed in order (from right to left).

�� Beside each character name, stroke information is listed Beside each character name, stroke information is listed (prototype , number of stroke parts, stroke number(s) (prototype , number of stroke parts, stroke number(s) and start(s) and end(s) indices. and start(s) and end(s) indices.

79

Annotation Performance Evaluation80

�� We used samples from We used samples from test dataset.test dataset.

�� AWAT: average word AWAT: average word annotation time.annotation time.


Our Test

Automation

Volunteers

Automationannotation time.annotation time.

�� ADAT: average ADAT: average document annotation document annotation time.time.

Automation15.09 sec AWAT5.42 min ADAT

Manual26 sec AWAT

9.26 min ADAT

Average time Save 43%

Automation16.18 sec AWAT9.89 min ADAT

Manual32.75 sec AWAT16.20 min ADAT

Average time save51.5% (word),40%

(Doc)


�� The time save is proportional with:The time save is proportional with:

• The number of characters per word1

• The number of words per document (database size)2

• The character overlapping (decorative writing styles)3

• The GUI compiler4

• SPR error type correction5

• Automatic segmentation result6


�� Model reliability test:Model reliability test: compute WSR and CSR variances compute WSR and CSR variances among the among the validation dataset writersvalidation dataset writers..

�� The most robust model came to be The most robust model came to be 44 44 states HMM.states HMM.

�� Although robust, this doesn’t guarantee higher result for Although robust, this doesn’t guarantee higher result for the test dataset.the test dataset.

�� Robust Model (Robust Model (6969..1616% WSR % WSR -- 8585..7272% CSR) compared to % CSR) compared to Best Model (Best Model (7373..3535% WSR % WSR –– 8888..8585% CSR).% CSR).

83

Conclusions and Future Work84

�� By our work we aimed at:By our work we aimed at:

• Facilitating development of annotated online datasets for Arabic recognizers.

1

• Providing robust implementations of tools and algorithms.2

• Provide & using OHASD the first sentence dataset of its type.3

• Extend and cluster writer samples variability 1

• Extend dataset vocabulary to all words in Arabic lexica2

• Collaboration with research groups to enhance the ATAOH tool3

�� As futureAs future work wework we want to:want to:



�� Our text line extractionOur text line extraction utilityutility::

• Gives promising results. 1

• Can be applicable to off-line documents with minor changes.2

• Can be appropriate for use with English, French and Greek.3

• Propose solutions to the open issues like skew and touching lines. 1





�� Our word extractionOur word extraction utilityutility::

• Achieves promising results for validation dataset.1

• Less rates are obtained for test dataset due to excessive occurrence of overlapping and split word problems.

2of overlapping and split word problems.

2

• using the help of natural language resources for stick/split detection on context base.

1




�� Our word segmentationOur word segmentation--annotation utility:annotation utility:

• Achieves promising results when employing HMM as aligner (semi-automated annotation).

1

• Remarkable performance of the new HMM design used compared to common HMM.

2

• The powerful rule based PSP validation stage enhanced the HMM • The powerful rule based PSP validation stage enhanced the HMM output results remarkably.

3

• Use a large open vocabulary database having huge varieties of words and writing styles.

1

• Integrating different classifiers covering different divisions of the feature space.

2



�� Ultimately, we aim at:Ultimately, we aim at:

• Upgrading the tool to a generic toolkit used to build online handwriting recognizers engines simply being integrated.

1

• Add plug-in tools for handwriting data collection, standard algorithms for preprocessing, feature extraction, pattern classification, and error analysis.

2

89

﴾﴾وآخر دعواھم أن الحمد � رب العالمينوآخر دعواھم أن الحمد � رب العالمين﴿﴿

﴾﴾الحمد � الذى ھدانا لھذا وما كنا لنھتدى لو� ان ھدانا هللالحمد � الذى ھدانا لھذا وما كنا لنھتدى لو� ان ھدانا هللا﴿﴿

Thank YouThank YouThank YouThank You

89

Science

A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text