View
312
Download
3
Category
Preview:
Citation preview
2
• WP3 as part of CLARIAH – Discipline: Linguis=cs – Data type: primarily textual data
• WP3 as successor of CLARIN • WP3 ‘incorporates’ Nederlab (NWO-‐groot)
3
• Linguis=cs – Support for the researcher in each stage of a research project • What is needed • What is available • What func=onality must be created / improved
• Coopera=on projects with WP2, WP4 Soc Econ & WP5 Media Studies
4
• Theme 1: Data and metadata • Theme 2: Interoperability • Theme 3: Enrichment and annota=on • Theme 4: Search and research
5
• New Resources – text corpora, crowd sourcing, survey tool, databases
• Exis=ng Resources – browsing & searching for data and tools and selec=ng them
• Enriching resources – cura=on, linguis=c annota=ons, transcrip=on, named en==es
• Searching / analyzing (enriched) resources • Representa=on/visualiza=on search results • Store new resources in CLARIAH • Make enhanced publica=ons
6
• Incorporate data / tools in CLARIAH – With proper metadata – With IPR/Ethical Issues properly dealt with – Archiving / Ingest func=onality – Deployment Framework • How to run services efficiently
– Required: standardiza=on (input – output formats), metadata, interface elements – Interoperability (syntac=c and seman=c)
8
• Coopera=on WP4 / WP5 – Text -‐> structured data – WP4: e.g. detect strikes in newspapers of 1965, Athena – WP5: probably convert scanned and OCR’ed `filmladders’ into structured data – Speech -‐> text
10
• Search applica=on for treebanks • LASSY, CGN • One’s own corpus
• Special word rela=ons interface, XPATH interface • New:
• meta-‐data in the search query (period, sex, region, etc.) • results can be presented as aggregate or split by metadata
• Illustra=ons: • CGN (Spoken Dutch Corpus) with metadata • Dutch CHILDES Corpora with metadata
• hjp://zardoz.service.rug.nl:8067/
11
• Search applica=on for treebanks (LASSY, CGN, SONAR) • Example-‐based interface, XPATH interface • New: Uploading one’s own corpus
15
• Meertens (Metadata, Search, Ingest, Interoperability)
• RUN (Curate, Enrich) • VU (Interoperability, Text-‐> Structured) • INL (Search, Metadata, Interoperability) • RUG (Enrich, Search) • UU (Metadata, Search, Interoperability)
Recommended