Upload
randa-elanwar
View
139
Download
0
Embed Size (px)
Citation preview
A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
Prepared by:
Eng. Randa Ibrahim M. Elanwar
بسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيمبسم اهللا الرحمن الرحيم
﴾﴾﴾﴾﴾﴾﴾﴾وما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيبوما توفيقى اال باهللا عليه توكلت واليه أنيب﴿﴿﴿﴿﴿﴿﴿﴿
Eng. Randa Ibrahim M. Elanwar(M.D.)Assistant Researcher
Electronic Research Institute
Under the supervision of:
Prof. Dr. Mohsen A. A. Rashwan
Professor of Digital Signal Processing Faculty of Engineering
Cairo University
Prof. Dr. Samia A. A. Mashaly
Professor of Digital Signal Processing Computers & Systems Dept.
Electronic Research Institute
Presentation Organization2
1. Introduction, Thesis Goals & Contributions
2. Text Lines Extraction
3. Words Extraction
4. Words Segmentation4. Words Segmentation
5. User Interfaces
6. Annotation performance evaluation
7. Conclusions & Future Work
3
Introduction4
�� What is ‘Annotation’?What is ‘Annotation’?
�� What is ‘Document Annotation’?What is ‘Document Annotation’?
�� Why Document Annotation?Why Document Annotation?
�� How to Annotate a document?How to Annotate a document?
Introduction5
� Annotation
� Identifying data of particular type using additional data of different type, precisely describing its entities.entities.
� Documents annotation:
� Associating the ASCII/UNICODE corresponding to the document image (offline) or ink info (online).
Introduction6
Trans./
Ground truth
Image/ink
Digital Digital
Library Library
and and
Annotated Document
Key words
Search Engine
Info. Retrieval
Web Web
searchsearch
Introduction7
Annotated Annotated DocumentsDocuments
Building Building
RecognizersRecognizers
Test DataTest Data
Performance Performance EvaluationEvaluation
Result Result AnalysisAnalysis
Training Training DataData
Train Train ModelsModels
Introduction8
Document Annotation
Accelerate & Enhance
Recognizers
Accelerate Digital library construction
Annotation maximizes efficiency, productivity & Annotation maximizes efficiency, productivity & profitability.profitability.
Introduction9
� Region of interest:
line, word, character
� Annotation: identify
boundaries and
associate ASCII /
UNICODE
Introduction10
�� Annotation schemes:Annotation schemes:
• Manual annotation-validation• Laborious, time-consuming• Error prone
Manual
• Manual truth entry• Automatic annotation• Manual Validation
Semi-automatic
• Automatic recognition/truth alignment
• Manual ValidationAutomatic
Thesis Goals11
1• Contribute to the Arabic LR problem solution.
2• Save researchers’ efforts spent on data manipulation.
We provide the We provide the 11stst online Arabic sentence dataset online Arabic sentence dataset (OHASD) and the (OHASD) and the 11stst Arabic semiArabic semi--automatic automatic annotation tool for online handwriting (ATAOH).annotation tool for online handwriting (ATAOH).
3• Pave path to generic tool kits construction.
Contribution: OHASD Dataset12
� unconstrained and natural.
� Texts sampled from daily newspapers.
Texts are dictated to writers. � Texts are dictated to writers.
� 154 paragraphs by 48 writers.
� More than 3800 words and more than 19,400 characters.
Contribution: ATAOH Tool13
1. Easy document browsing and display.
2. Automatic Text-line/Word extraction-segmentation.2. Automatic Text-line/Word extraction-segmentation.
3. Manual options for segmentation validation & annotation correction.
Contribution: ATAOH Tool14
4. Designed and evaluated using OHASD.
5. Composed of a guiding set of interactive user 5. Composed of a guiding set of interactive user interfaces.
6. Reduces human effort by high performance automation
15
Text Line Extraction16
Bottom-up
Smearing
Hough-based
Grouping
Text line Extraction Text line Extraction TechniquesTechniques
Bottom-up Grouping
Graph-based
Cut Text Minimiz.
Top-Down Projection-based
Text Line Extraction17
�� Extraction errors are due to:Extraction errors are due to:
• Fluctuating lines1
• Skew variability2
• Touching text lines3
• Fragments due to massive presence of diacritics4
Text Line Extraction18
Read the input document
(stroke data)
Start DP to merge
segments pairs
Precede DP as long as valid merges exist
ATAOH provides an automatic text line extraction utility based on dynamic ATAOH provides an automatic text line extraction utility based on dynamic programmingprogramming
Preprocessing: remove dots
Shred the document into
strips
For each strip build
CCs/units
Extend units horiz. To build
segments
If DP stops, final paths are the text lines
Restore dots
Text Line Extraction19
Text Line Extraction20
DP cost function design
]log[])log[*tan( enaltyCrossOverPenaltyDirectionPtyWidthPenalcePenaltyDistMergingCos ++=
Direction Penalty
Insures merging from
right to left
Cross-over Penalty
Insures merging adjacent segments
Distance Penalty
Insures merging close
segments
Width Penalty
Avoids having left-alone segments
Text Line Extraction21
�� Stuck text lines are fixed using a postStuck text lines are fixed using a post--processing step processing step of sticking detection and correction. Gap separated of sticking detection and correction. Gap separated text line segments (bridges) are also fixed.text line segments (bridges) are also fixed.
OHASD dataset is divided to OHASD dataset is divided to 124 124 documents for documents for �� OHASD dataset is divided to OHASD dataset is divided to 124 124 documents for documents for training (training (558 558 text lines) and text lines) and 30 30 documents for test documents for test ((112 112 text lines).text lines).
�� Experiments are conducted for system parameters Experiments are conducted for system parameters optimization (DP stopping thresholds).optimization (DP stopping thresholds).
Text Lines Extraction22
�� The results got for both the training and test sets are:The results got for both the training and test sets are:
Document
Accuracy
Text Line
Accuracy
Training set
Exp 1 results 95.16 84.36
Stick resolve 97.58 85.1Training set Stick resolve 97.58 85.1
Bridge concatenation 100 100
Test SetExp 1 results 93.33 87.22
Stick resolve & Bridge concatenation 96.67 98.5
Text Line Extraction23
Performance Comparison
Liwicki(2006)
Zahour(2001)
Li (2006) Our system(2006)
100 docs from
IAMonDB
98% Doc. Acc. 99.94% Stroke Acc.
(2001)
100 offline Arabic docs
97%
Li (2006)
100 offline Arabic docs
92% text line Acc.
Our system
154 online Arabic docs
98% text line Acc., 96.7% Doc. Acc.
Text Line Extraction24
�� Conclusions:Conclusions:
• Our method gives promising results.1
• Applicable to off-line documents (minor changes).2
• Overcomes writing on multiple text lines.3
• Applicable to English, French & Greek (overcome diacritics)4
25
Words Extraction26
�� Word Extraction Techniques:Word Extraction Techniques:Break text Break text line to CCsline to CCs
Threshold Threshold basedbased
Compare gap Compare gap width to fixed width to fixed
thresholdthreshold
Classifier Classifier basedbased
Classify gap Classify gap as inter/intra as inter/intra
word gapword gap
Words Extraction27
�� Extraction errors are due to:Extraction errors are due to:
• Wide intra-word gaps (Word split)1
• Narrow inter-word gap (Word stick)2
• Total overlap: no inter-word gap (Word stick)3
Words Extraction28
Read the input Read the input text line (stroke text line (stroke
data)data)
Decision fusion Decision fusion and stick and stick
correctioncorrectionRestore dotsRestore dots
ATAOH provides an automatic words extraction utility based on classifiers ATAOH provides an automatic words extraction utility based on classifiers decisions fusiondecisions fusion
Preprocessing: Preprocessing: remove dots, remove dots,
build OCsbuild OCs
Feature Feature Extraction Extraction
(Global/local)(Global/local)
Initial word Initial word extraction (best extraction (best
classifier)classifier)
Word Stick Word Stick detectiondetection
Words Extraction29
�� Experiments done to point out the best performing Experiments done to point out the best performing classifiers.classifiers.
Words Extraction30
� Feature vectors are fed to SVM (polynomial kernel) for initial word extraction. extraction.
� The extracted words undergo stick detection tests.
Words Extraction31
�� The stick detection tests based on the likelihood The stick detection tests based on the likelihood probability of output word parameter values:probability of output word parameter values:
1.1. Number of OCS Number of OCS
2.2. Word widthWord width
3.3. Number Number of strokesof strokes
Words Extraction32
�� If a word is stuck, all gap decisions from If a word is stuck, all gap decisions from 5 5 classifiers are fused. A classifiers are fused. A separate preseparate pre--trained SVM gives the final decision whether or not to trained SVM gives the final decision whether or not to break the word up.break the word up.
Words Extraction33
OHASD dataset
Training Validation TestTraining
110 Docs, 2802 words
2264 (inter-), 3988 (intra-)
word gaps
Validation
14 Docs, 334 Words
277 (inter-), 437 (intra-) word gaps
Test
30 Docs, 688 Words
616 (inter-), 1117 (intra-) word gaps
Words Extraction34
�� Validation set results:Validation set results:
� Split words represent about 2% of the validation dataset words.
� 96% of stuck words are detected, 62% are resolved correctly, 8% are wrongly resolved, 16% of the lengthy correctly, 8% are wrongly resolved, 16% of the lengthy correct words are damaged.
� Stick resolution lead to 31.19% error reduction in GCR and 43.89% error reduction in WER.
Words Extraction35
�� Test set results:Test set results:
� GCR of 88.4% and WER of 71.5%.
� Word Split and total overlap errors are showing up excessively.
Words Extraction36
Performance Comparison
Liwicki (2006)Quiniou(2009)
Sun (2004) Our systemLiwicki (2006)
Threshold based
86% WER 95% GCR
(2009)
RBF NN classification
96% WER
Sun (2004)
LDA/KNN/
GMM/MLP/
SVM
89.5%, 90%, 89.8%, 93.2%,
93.7% GCR
Our system
4 SVM + RBF NN
71.5% WER, 88.4% GCR
Words Extraction37
�� Conclusions:Conclusions:
• No publications for Arabic online word extraction problem so far.1
• Results are promising regarding the difficulty of Arabic2
• Odd writers’ habits add more challenge3
• Limitations for not using context help (stick/split detection)4
38
Words Segmentation39
• Essential for analytic word recognition approaches.1
• Touching/overlapping characters and word ambiguity make it difficult.2 make it difficult.2
• Impossible to segment a given word without knowing its identity.3
Words Segmentation40
RulesRules--basedbased
Propose many SPs & validate using rulesPropose many SPs & validate using rules
Human experts perform classificationHuman experts perform classification
�� Segmentation Techniques:Segmentation Techniques:
Result is measured by WSR, SPRR or CSRResult is measured by WSR, SPRR or CSR
ClassifiersClassifiers--basedbased
Propose many SPs & validate by recognitionPropose many SPs & validate by recognition
Classifiers (e.g. NN) perform classificationClassifiers (e.g. NN) perform classification
Result is measured by WRRResult is measured by WRR
Words Segmentation41
�� Segmentation errors can be:Segmentation errors can be:
• Over-segmentation: excessive number of PSP1
• Under-segmentation: less number of PSP2
• Bad-segmentation: correct number of PSP but mis-located3
Words Segmentation42
Word Preprocessing: reWord Preprocessing: re--sampling, smoothing, sampling, smoothing, remove secondary strokesremove secondary strokes
Feature Extraction (Local/vicinity)Feature Extraction (Local/vicinity)
ATAOH provides an automatic words segmentationATAOH provides an automatic words segmentation--annotation utility using HMMannotation utility using HMM
Feature Extraction (Local/vicinity)Feature Extraction (Local/vicinity)
HMM (recognizer/aligner)HMM (recognizer/aligner)
PSP rulePSP rule--based validationbased validation
Restoring secondary strokesRestoring secondary strokes
Words Segmentation43
Local Features
Delta x-y
Vicinity Features
Aspect
Writing direction
Chain code
Eye (word-PAW)
Curliness
Slope
Chords (angles,
length ratio)
Words Segmentation44
�� HMM parameters:HMM parameters:
Models
• 28 characters (reduced to 19) in all positions• 6 ligatures (لم،�،لح،بح،مح،بم)
Window
• Number of samples per window• Window overlap ratio
States
• Number of states per model• Number of Gaussian mixtures per state
Words Segmentation45
�� First HMM is used as recognizerFirst HMM is used as recognizer
�� automatic segmentationautomatic segmentation--annotationannotation
�� Experiments are conducted for:Experiments are conducted for:
Feature set selection Feature set selection 1.1. Feature set selection Feature set selection
2.2. System parameters optimizationSystem parameters optimization
�� Best Feature set: Eye, Chord angles, Aspect Best Feature set: Eye, Chord angles, Aspect and Curlinessand Curliness
Words Segmentation46
�� Best window parameters: Best window parameters: 9 9 samples/window, no overlapsamples/window, no overlap
�� Best HMM parameters: Best HMM parameters: 20 20 states, states, 16 16 mixtures/statemixtures/state
HMM Gaussian mixtures variation didn’t affect the HMM Gaussian mixtures variation didn’t affect the �� HMM Gaussian mixtures variation didn’t affect the HMM Gaussian mixtures variation didn’t affect the results remarkably.results remarkably.
�� We define a new HMM with variable Gaussian mix. We define a new HMM with variable Gaussian mix. number per state.number per state.
Words Segmentation47
�� Varying the number of HMM states, keeping Varying the number of HMM states, keeping 16 16 mixtures mixtures only for the first only for the first 8 8 states only and a single Gaussian states only and a single Gaussian e.we.w.: .:
�� best HMM has best HMM has 36 36 statesstates
Words Segmentation48
�� Varying the location of multiVarying the location of multi--mix. States along HMMmix. States along HMM
��best location is the first best location is the first 8 8 statesstates
Words Segmentation49
�� HMM average result on HMM average result on the validation data set the validation data set using the best design using the best design HMM: HMM: 4646..2323% WSR % WSR ––8080..8787% CSR.% CSR.8080..8787% CSR.% CSR.
� Same writer may have significantly different WSR per document.
� He may have almost same WSR but significantly different WRR.
Words Segmentation50
� Segmentation accuracy is not only related to writer
habits but also related to the character position within
the word PAW
� Segmentation succeeds when a PAW has reasonable � Segmentation succeeds when a PAW has reasonable
number of obvious valleys.
� ∴ HMMs need to be trained by huge open vocabulary
dataset composed of huge variety of words written by
multiple writers.
Words Segmentation51
�� HMM proposes SP validated using HMM proposes SP validated using 8 8 rulesrules
Rule 1 Rule 5
Words Segmentation52
�� HMM proposes SP validated using HMM proposes SP validated using 8 8 rulesrules
Rule 4 Rule 3
Words Segmentation53
�� HMM proposes SP validated using HMM proposes SP validated using 8 8 rulesrules
Rule 2 Rule 8
Words Segmentation54
�� HMM proposes SP validated using HMM proposes SP validated using 8 8 rulesrules
Rule 7 Rule 6
Words Segmentation55
�� Applying the PSP validation rules:Applying the PSP validation rules:
Words Segmentation56
� We have limits of
improvement as we
don’t have solution
for most of under
segmentation caused
by HMM.
Words Segmentation57
�� Spatial information are used to assign the secondary Spatial information are used to assign the secondary strokes to the nearest/overlapping main character.strokes to the nearest/overlapping main character.
Words Segmentation58
�� Applying our system to the test data set:Applying our system to the test data set:
Words Segmentation59
�� Second HMM is used as alignerSecond HMM is used as aligner
�� semisemi--automatic segmentationautomatic segmentation--annotationannotation
�� Experiments done on best features & window Experiments done on best features & window paramsparams
�� HMM parameters optimizationHMM parameters optimization�� HMM parameters optimizationHMM parameters optimization
�� Best HMM parameters: Best HMM parameters: 24 24 states, states, 16 16 mixtures/statemixtures/state
�� We tried the new HMM design with variable Gaussian We tried the new HMM design with variable Gaussian mix. number per state (mix. number per state (11stst octave).octave).
Words Segmentation60
�� Again we notice rapid enhancement in WSR and CSR Again we notice rapid enhancement in WSR and CSR compared to common HMM design.compared to common HMM design.
�� We notice two peeks at We notice two peeks at 34 34 states and states and 45 45 states. The states. The 4545--states HMM design is better on Writer/document level.states HMM design is better on Writer/document level.
Words Segmentation61
�� The PSP validation rules are modified to benefit from The PSP validation rules are modified to benefit from knowing the word truth.knowing the word truth.
Rule 9 Rule 9
Words Segmentation62
�� System Results on validation dataset:System Results on validation dataset:
WSR WUSR WOSR WBSR CSR
Reference 73.35 0.00 0.00 26.65 88.85
R1-R5 81.74 0.00 2.10 16.17 93.44
R1-R5-R4 82.93 0.00 2.10 14.97 93.88
R1-R5-R4-R2 84.13 0.00 0.90 14.97 94.33
R1-R5-R4-R2-R8 85.03 0.00 0.90 14.07 94.46
R1-R5-R4-R2-R8-
R792.81 0.00 0.90 6.29 96.75
R1-R5-R4-R2-R8-
R7-R994.91 0.00 0.60 4.49 97.83
Secondary stroke
restoration94.61 0.00 0.60 4.49 97.10
Words Segmentation63
�� System results on test dataset:System results on test dataset:
WSR WUSR WOSR WBSR CSR
HMM Output 52.23 3.38 4.06 40.32 74.47
After PSP validation 75.64 4.19 8.12 12.04 89.42
After dot restoration 74.42 3.52 8.25 13.80 87.04
Words Segmentation64
�� Computing SPRR for the test data set: Computing SPRR for the test data set: 9393..7474%%
�� Of the total number of test set words Of the total number of test set words 1313..9393% are having single % are having single SPR error, SPR error, 55..5555% are having double SPR error, % are having double SPR error, 22..33% are having % are having triple SPR error and triple SPR error and 33..5252% are having dot restoration errors.% are having dot restoration errors.
Words Segmentation65
• PSP validation stage recovers 15-20% of mis-segmented words
1
•• Comparing system results in both HMM modes for validation and test Comparing system results in both HMM modes for validation and test datasets we conclude:datasets we conclude:
• Dot restoration may cause a loss 0-3% due to irregular writing habits
2
• Validation set results are higher than test set, as its HMM output is higher
3
Words Segmentation66
• Poor HMM results are due to: limited training PAWs variability, odd writing styles
1
• Segmentation succeeds when a
•• Conclusions:Conclusions:
• Segmentation succeeds when a PAW has reasonable number of obvious valleys.
2
• There never exists a single classifier that can achieve good results for all writers.
3
• A classifier ensemble for writers clusters may accomplish the mission successfully.
4
Words Extraction67
Performance Comparison
Kurniawan(2011)
RehmanKhan (2008)
Our system(2011)
1902 SPs
82.63% SPRR
Khan (2008)
2936 SPs
91.21% SPRR
Our system
2859 SPs
93.74% SPRR
Words Extraction68
Performance Comparison
Kavallieratou(2000)
De Stefano (2002)
Abdulla (2008)
Our system (recognition)
Our system (aligner)(2000)
500 English and Greek
words
77.8% WSR
(2002)
1600 English words
68% CSR
(2008)
IFN/INIT and
AHD/AUST
90.58%, 95.66% WSR
(recognition)
OHASD
36.64% WSR, 71.36% CSR
(aligner)
OHASD
74.42% WSR, 87% CSR
69
User Interfaces70
�� The Main GUI opens at the start up showing the user The Main GUI opens at the start up showing the user all operations that can be done.all operations that can be done.
User Interfaces71
�� Word Extraction GUI: appears at pressing “Word Extraction” Word Extraction GUI: appears at pressing “Word Extraction” pushbutton on the Main GUI & specifying the document pathpushbutton on the Main GUI & specifying the document path
User Interfaces72
�� The Add Transcription GUI appears when pressing the “Transcript Data File” The Add Transcription GUI appears when pressing the “Transcript Data File” pushbutton on the Main GUI and specifying document path.pushbutton on the Main GUI and specifying document path.
User Interfaces73
�� Annotation is done by entering the word truth in the ground truth text area.Annotation is done by entering the word truth in the ground truth text area.
User Interfaces74
�� Automatic segmentation can be done it by pressing “Auto Segment” pushbutton.Automatic segmentation can be done it by pressing “Auto Segment” pushbutton.
User Interfaces75
�� Manually segmentation correction is done drawing lines by mouse clicks Manually segmentation correction is done drawing lines by mouse clicks “Manual Segment” pushbutton“Manual Segment” pushbutton
User Interfaces76
�� Each character model strokes data are calculated and displayed by pressing 'Insert Each character model strokes data are calculated and displayed by pressing 'Insert data' pushbutton.data' pushbutton.
User Interfaces77
�� 'CHECK' pushbutton plots each character model in a separate figure.'CHECK' pushbutton plots each character model in a separate figure.
User Interfaces78
�� In the output text file format, each word is indexed. Each In the output text file format, each word is indexed. Each character names is listed in order (from right to left). character names is listed in order (from right to left).
�� Beside each character name, stroke information is listed Beside each character name, stroke information is listed (prototype , number of stroke parts, stroke number(s) (prototype , number of stroke parts, stroke number(s) and start(s) and end(s) indices. and start(s) and end(s) indices.
79
Annotation Performance Evaluation80
�� We used samples from We used samples from test dataset.test dataset.
�� AWAT: average word AWAT: average word annotation time.annotation time.
Performance Comparison
Our Test
Automation
Volunteers
Automationannotation time.annotation time.
�� ADAT: average ADAT: average document annotation document annotation time.time.
Automation15.09 sec AWAT5.42 min ADAT
Manual26 sec AWAT
9.26 min ADAT
Average time Save 43%
Automation16.18 sec AWAT9.89 min ADAT
Manual32.75 sec AWAT16.20 min ADAT
Average time save51.5% (word),40%
(Doc)
Annotation Performance Evaluation81
�� The time save is proportional with:The time save is proportional with:
• The number of characters per word1
• The number of words per document (database size)2
• The character overlapping (decorative writing styles)3
• The GUI compiler4
• SPR error type correction5
• Automatic segmentation result6
Annotation Performance Evaluation82
�� Model reliability test:Model reliability test: compute WSR and CSR variances compute WSR and CSR variances among the among the validation dataset writersvalidation dataset writers..
�� The most robust model came to be The most robust model came to be 44 44 states HMM.states HMM.
�� Although robust, this doesn’t guarantee higher result for Although robust, this doesn’t guarantee higher result for the test dataset.the test dataset.
�� Robust Model (Robust Model (6969..1616% WSR % WSR -- 8585..7272% CSR) compared to % CSR) compared to Best Model (Best Model (7373..3535% WSR % WSR –– 8888..8585% CSR).% CSR).
83
Conclusions and Future Work84
�� By our work we aimed at:By our work we aimed at:
• Facilitating development of annotated online datasets for Arabic recognizers.
1
• Providing robust implementations of tools and algorithms.2
• Provide & using OHASD the first sentence dataset of its type.3
• Extend and cluster writer samples variability 1
• Extend dataset vocabulary to all words in Arabic lexica2
• Collaboration with research groups to enhance the ATAOH tool3
�� As futureAs future work wework we want to:want to:
Conclusions and Future Work85
Conclusions and Future Work85
�� Our text line extractionOur text line extraction utilityutility::
• Gives promising results. 1
• Can be applicable to off-line documents with minor changes.2
• Can be appropriate for use with English, French and Greek.3
• Propose solutions to the open issues like skew and touching lines. 1
�� As futureAs future work wework we want to:want to:
Conclusions and Future Work86
Conclusions and Future Work86
Conclusions and Future Work86
�� Our word extractionOur word extraction utilityutility::
• Achieves promising results for validation dataset.1
• Less rates are obtained for test dataset due to excessive occurrence of overlapping and split word problems.
2of overlapping and split word problems.
2
• using the help of natural language resources for stick/split detection on context base.
1
�� As futureAs future work wework we want to:want to:
Conclusions and Future Work87
Conclusions and Future Work87
�� Our word segmentationOur word segmentation--annotation utility:annotation utility:
• Achieves promising results when employing HMM as aligner (semi-automated annotation).
1
• Remarkable performance of the new HMM design used compared to common HMM.
2
• The powerful rule based PSP validation stage enhanced the HMM • The powerful rule based PSP validation stage enhanced the HMM output results remarkably.
3
• Use a large open vocabulary database having huge varieties of words and writing styles.
1
• Integrating different classifiers covering different divisions of the feature space.
2
�� As futureAs future work wework we want to:want to:
Conclusions and Future Work88
�� Ultimately, we aim at:Ultimately, we aim at:
• Upgrading the tool to a generic toolkit used to build online handwriting recognizers engines simply being integrated.
1
• Add plug-in tools for handwriting data collection, standard algorithms for preprocessing, feature extraction, pattern classification, and error analysis.
2
89
﴾﴾وآخر دعواھم أن الحمد � رب العالمينوآخر دعواھم أن الحمد � رب العالمين﴿﴿
﴾﴾الحمد � الذى ھدانا لھذا وما كنا لنھتدى لو� ان ھدانا هللالحمد � الذى ھدانا لھذا وما كنا لنھتدى لو� ان ھدانا هللا﴿﴿
Thank YouThank YouThank YouThank You
89