Upload
colin-powers
View
217
Download
3
Embed Size (px)
DESCRIPTION
Progress 1. Selecting vocabulary( ~ 2/11) 2. Extracting sentences( ~ 2/18) 3. Workflow modeling (~ 2/24) Calculating time duration Making instructions 4. Developing tools 5. Constructing 6. Evaluation
Citation preview
Weekly ReportSemantic Web Research Center진두현 2011-2-25
Contents
Progress Work flow
Work environment Work flow
Experiment Time measured problems
The instructions Instructions
Plan
Progress
1. Selecting vocabulary( ~ 2/11) 2. Extracting sentences( ~ 2/18) 3. Workflow modeling (~ 2/24)
Calculating time duration Making instructions
4. Developing tools 5. Constructing 6. Evaluation
Work environment
Tools: CoreNet Browser, text editor, spread sheet Checking sentences on text editor, Searching concepts on CoreNet Browser, Typing case frames on spread sheet.
Work flowA work flow for a one headword
1. Selecting a Headword2. Determine word sense3. Read a sentence from a file4. Working
a. Parsing Partial parsing Grammaticality check Assigning NP(distinguishing
complement, adjunct)b. Comparing NPs to the
NPs in mindc. Listing a case frame
or discarding or listing to another cate-
goryd. Assigning CoreNet con-
cept to arguments Search CoreNet with a word Find appropriate concept Insert information
단어 “뭉치다”선택표제어 “뭉치다”에서 격틀을 구축할 어의를 결정 : “ 뭉치다” ( 결합 )문장 : ‘ 온몸에 솜털이 나고 아주 작은 꽃들이 뭉쳐서 피지 .’
작업자가 머릿속으로 부분 파싱 : NPsubject( 꽃 ) + Verb( 뭉치다 )
작업자의 선택제약( 인간 등 유정물이 아닌것 ) + 뭉치다
( 꽃 ) 이 + 뭉치다제약에 일치
논항 없음 및 비문 다른 어의에 일치
Experiment
5 headwords( 7 word senses): “ 끼우다” , “ 묶다” , “ 뭉치다 ( 자 , 타 )”, “ 통틀다” , “ 합하다 ( 자 ,타 )”
All words belong to a concept “ 결합” in CoreNet Time duration is measured Made Instruction with a supervisor(Dr. lee) No guarantee in coherence of the constructed
data
Time measured
word Test sen-tences
Targetsentence(WS match)
not con-structed
ex-tracted
dis-carded
case-frame
Time du-ration
Time Per 1 case frame
끼우다 503 110 0 86 24 86 136min
1.58
묶다 976 367 287 70 10 70 100min
1.42
뭉치다 295 136 0 98 31 98 60min 0.61
통틀다 145 145 0 109 36 112 53min 0.47
합하다 158 158 0 115 43 118 84min 0.71
Problems
Much of time was wasted in Typing a word on CoreNet browser(changing mouse
to keyboard) , finding spread sheet and typing and checking the in-
formation of arguments
The instructions
1. Discarding useless sentences 2. Syntactic parsing 3. Selecting argument into case frame
1. Discarding useless sentences
1. Has no explicit arguments: 예 > 끼워팔기 ..
2. Ungrammatical sentence: 심한 구어적 표현들
3. Can’t find any arguments with reasons below Can’t find concepts
‘ 수사 + 분류사’ 만 나타나는 경우 : 5 개를 뭉쳐서 의미 범주가 없거나 애매모호한 의존명사 : 짱 박은 것까지 합해서 명사형 절형태 : ~ ㅁ , ~ 기
인용된 것 중 의미범주를 알 수 없는 것 : ‘ 너에게’를 끼워 넣어
Can’t find words from anywhere : 삼적 (?) 을 합하여
2. Syntactic parsing
Omitted Subject Many of verbs in the text have no explicit subject.
Solution: find in context, set the most probable concept if there’s no clue, set aside.
Noun + numeral +classifier: 나무 한 그루
Noun is argument(head).
Relative clause form: 뭉친 거품
Subject of ‘ 뭉치다’ is ‘ 거품’ Passive problem on “Verb + ‘ 지다’”
No absolute measure: 먹어지다 .
Follow context
3. Selecting arguments into case frame
Coordinated NP Make case frame of each argument
Ex> “ 골프 ( 스포츠 ) 와 승마 ( 스포츠 ) 를 끼워” Case frame 1: 끼우 ; arg1: 골프 ( 스포츠 ) Case frame 2: 끼우 ; arg1: 승마 ( 스포츠 )
Unknown words Proper noun: “ 중국과 동남아국가연합을 묶는…”
Make case frame but exclude ‘word form’ from it Make proper noun list
Unknown in CoreNet but familiar: “ 펀드는 …”
Make new word list and assign it to appropriate concept Make case frame
Plan
1. Selecting vocabulary( ~ 2/11) 2. Extracting sentences( ~ 2/18) 3. Workflow modeling (~ 2/24)
Calculating time duration Making instructions
//up to the comments.. 4. Developing tools ( ~ 3/4) 5. Constructing (3/?? ~ ) 6. Evaluation