44

Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Embed Size (px)

Citation preview

Page 1: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust
Page 2: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

• Ülevaade projektist CLARIN• Eesti keeleressursside keskus• Koostööst • Tulevikust

Page 3: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Pisut ajalugu

• Pariis 2006

• Genoa 2006

• Budapest 2007

• Lund 2007

• Nijmegen 2008

Page 4: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

ESFRI European Strategy Forum on Research

Infrastructures

• ESFRI is a strategic instrument to develop the scientific integration of Europe and to strengthen its international outreach.

• The competitive and open access to high quality Research Infrastructures supports and benchmarks the quality of the activities of European scientists, and attracts the best researchers from around the world.

Page 5: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

European Roadmap for Research Infrastructures

Brussels, 19 October 2006

• The ESFRI roadmap identifies 35 large-scale infrastructure projects at various stages of development for the next 10 to 20 years.

Page 6: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

CLARIN

• The CLARIN project is a large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable.

• www.clarin.eu

Page 7: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Expertise and Standards

• CLARIN will make extensive use of the expertise that developed in the European LRT community over the last decades,

• CLARIN will rely on a number of standards that have been released and also push new standards where this seems to be necessary.

Page 8: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Standardization Initiatives

• linguistic terminology: EAGLES, TEI, ISLE, ISO TC37/SC4 etc

• generic schemas: ISO TC37/SC4 etc • knowledge representation: W3C, ISO TC37/SC4 etc • grids, registries and generic APIs: W3C, GGF, OASIS

etc • metadata: Dublin Core, IMDI/ISLE, OLAC, METS,

MPEG7, ISO 11173 etc • corpus construction: BLARK

Page 9: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Grid and Digital Library Initiatives

• Grid/Federation Technology: GGF, DEISA, EGEE, EUGridPMA, TERENA etc

• DL Initiatives: Internet2, OAI etc • European RI Projects: DAM-LR, LIRICS, Kalmar

Union etc

Page 10: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Integration and Dissemination Projects

• Resource Integration: TELRI, INTERA, ECHO, LTWorld, TDS etc

• Dissemination: ELSNET, LREC, (E)ACL, ENABLER, LTRC etc

Page 11: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Existing LRT Associations

• ELRA • ELDA • TELRI • LDC

Page 12: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

The following diagram illustrates the structure of CLARIN:

Page 13: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Executive Board

– Steven Krauwer (UU) - coordinator of CLARIN and chairperson of the EB

– Peter Wittenburg (MPG) - leading work package 2 – Tamás Váradi (HASRIL) - leading work package 3 – Martin Wynne (OTA) - as the Humanities Liaison

Officer – Erhard Hinrichs (UTU) - leading work package 5 – Dan Cristea (UAIC) - leading work package 6 – Kimmo Koskenniemi (UHEL) - leading work package 7 – Bente Maegaard (UCPH) - leading work package 8

Page 14: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Work Package 2 - Technical infrastructure

• CLARIN is devoted to establish an integrated and interoperable research infrastructure for the language resources and technology (LRT) domain.

• The goal is to make language resources and technology much more accessible to all researchers working with language material, in particular in the humanities and social sciences.

• Building such an eScience enabling infrastructure requires investments at various layers - an important one is to establish its technical infrastructure.

Page 15: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Work Package 2 - Technical infrastructure

• Working Groups • Working Group 1 - Requirements for LRT

centres • Working Group 2 - Requirements for the LRT

federation • Working Group 3 - LRT federation pilot • Working Group 4 - Specification of the registry

infrastructure

Page 16: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WG 2.1: Requirements for LRT centres .

Within CLARIN we have to carry out the following steps to build up a first network of service centres:

• determine the types of centres we will need and their "business models“ define a few initial services we expect from CLARIN centres

• define requirements for LRT centres • launch open call for participation in the LRT service

centre network prototype • analyze the repository/archive systems and make

suggestions for changes/adaptations • select centres for participation

Page 17: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WG 2.2: Requirements for the LRT federation.

• define the special requirements of the LRT providers • talk intensively with all national federations and with TERENA about their practices to

establish trusts and about the schemas used • define a suitable architecture for the LRT Federation • define the set of attributes and their usage • define the rules of the LRT federation (in collaboration with WP7) • define criteria for the participation of centres as LRT service providers • define criteria for the necessary national support • ask for applications to participate in the prototype of network of centres • investigate the situation of each centre in detail (national support, local expertise, local

repository/archiving system architecture, etc), select suitable centres and make priority lists • make training courses to have local experts • install the necessary components with Shibboleth in the core and make suitable adaptations

of the authentication and authorization integration components • design and implement methods for the delegated authentication for web applications and

portals • come to agreements with national federations

Page 18: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WG 2.3: LRT federation pilot.

• Implementation of the requirements as specified in working group 2.

Page 19: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WG 2.4: Specification of the registry infrastructure.

1. we need to analyse the experiences with the current metadata systems and refine the suggestions to overcome them

2. we need to analyse the Web Services requirements for registries and the experiences made with various suggestions such as UDDI and ebXML

3. a reference taxonomy of resources and tools needs to be worked out by WP5 that is widely accepted

4. we need to come to a generic ODD based component model that is based on an agreed core and suggested extensions for various resource and tool types (similar to LMF)

5. a standardized component schema needs to be created as well for the XML output

6. a requirements specification document for the registry infrastructure (portals, repositories, tools, etc) needs to be worked out

Page 20: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP3 (Humanities overview)• The sole purpose of inviting Humanities projects

to collaborate with CLARIN in the preparatory phase is to enable us to assess the technological, methodological, organizational etc. requirements involved in serving the Humanities in the later phases of CLARIN.

• We are committed to the idea of collaboration with Humanities projects on a prototype scale as the best means of identifying needs and removing any potential obstance from the way of future synergies between the two fields.

Page 21: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WG3.1 Scoping and Impact Study

• The aim of this working group is to identify, mobilize and bring together a critical mass of producers and users around the infrastructure.

• The initial scoping study will identify actual and potential users of language tools and resources across the heterogeneous fields that constitute the humanities.

Page 22: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WG3.2 Overview of relevant Humanities projects and professional

associations • The aim of this working group is to make an in-depth

survey of past and existing Humanities projects and establish contact with leading professional associations in the Humanities that are potential partners in employing language technology in their research.

• The overall aim is to have a clear understanding of the research concerns and methods of the Humanities field so that CLARIN could make a maximal impact on the field.

Page 23: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WG3.3 Call for Humanities Projects

• This working group will have the task to compile a call for Humanities project that CLARIN will assist.

• It will work out evaluation and decision criteria that need to reflect national wishes, relevance of the proposed projects for testing the infrastructure, concepts and standards and the capability to demonstrate the potential of the infrastructure to other humanities disciplines.

Page 24: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP5 (Language resources and technology overview)

• This WP5 deals with specifying and implementing standards for language resources of all kinds, including e.g. corpora, lexica, grammars and tools for processing them.

• This is a prerequisite for achieving interoperability between linguistic resources and tools.

• Both will be made available through webservices, and workflows integrating several resources, tools, and services will be defined.

Page 25: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP5(Language resources and technology overview)

• WP5, Working Group 1, Tools• WP5, Working group 2, Lexical Resources• WP5, Working Group 3, Corpora

Page 26: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Working Group 5.1, Tools The aims of this working group are:

– To keep stock of basic language processing tools (tokeinizing, morphological analysis, part-of-speech tagging, parsing, named-entity recognition).

– To keep stock of language processing platforms or middleware (UIMA, GATE, CLARK etc.).

– To create a taxonomy of these tools. – To investigate the input- and output-formats / interfaces of

these tools. – To investigate other features of the tools (e.g. language /

domain dependence, resources needed) – To outline steps towards the integration of the tools into the

infrastructure. – To outline criteria for the quality assessment of tools.

Page 27: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Working group 5.2, Lexical Resources

The aims of this working group are: – To keep stock of lexical resources (monolingual / bilingual, form

based / content based / multimedia, terminological data etc.) – To investigate existing standards, adapt them and make

suggestions for changes – To create a taxonomy of these resources. – To investigate the encoding format of these resources – To investigate other features of the tools (e.g. coverage, data

types) – To outline steps towards the integration of these resources into

the infrastructure. – To outline criteria for the quality assessment of these resources

Page 28: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Working Group 5.3, Corpora The aims of this working group are:

– To keep stock of corpora resources (monolingual / bilingual (aligned), domain specific / general, annotated etc.)

– To investigate existing standards, adapt them and make suggestions for changes.

– To create a taxonomy of these resources. – To investigate the encoding format of these resources. – To investigate other features of the tools (e.g. coverage) – To outline steps towards the integration of these

resources into the infrastructure. – To outline criteria for the quality assessment of these

resources

Page 29: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP6 (Dissemination)WP6 will be concerned with the following main activities: • to co-ordinate the posting of information inside the consortium during

the project’s life. • to co-ordinate the large dissemination of information gathered by the

project. This activity will be concerned, – firstly, with organizing a public website area where formation

acquired by the project and of interest to people outside the project’s consortium will be displayed.

– Secondly, a newsletter appearing electronically 4 times per year and other propaganda materials (brochures, leaflets, posters a.s.o.) will be designed, build and largely disseminated.

• to accomplish preparatory work for introducing infrastructure services able to promote appropriate linguistic digital technologies to researchers in the humanities and social sciences, to help them work more efficiently and to facilitate new types of research.

Page 30: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP6(Dissemination)Working Groups

• Working Group 6.1: Planning and Dissemination

• Working Group 6.2: Website and Newsletter• Working Group 6.3: Referral Help Desk and

Registry of Expertise

Page 31: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP7 (Intellectual property rights and business models)

• This work package deals with legal issues of CLARIN, including licensing, authorization and authentication which is necessary for the proper handling and use of language resources.

Page 32: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP7 (Intellectual property rights and business models)

Working groups of the WP7 • Groups to be formed immediately

– Working group 7.2A : Licensing and authorization of materials

– Working group 7.4: Trust relations – Working group 3: ELDA/ELRA coordination

• Groups to be formed later on – Working group 7.2B: Software licensing – Working group 7.2C: IPR legislation

• Topics not yet assigned to any group – Business models – Ethical issues

Page 33: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP8 (Construction and exploitation agreement)

• The main objective of WP8 is the preparation of a ready-to-sign agreement between the participating countries whereby they commit themselves to the joint construction and exploitation of the CLARIN Infrastructure.

• This agreement document is called the CLARIN Construction and Exploitation Agreement (CCEA), and in order to be able to reach consensus about such an agreement, a wide variety of organizational and financial topics will have to be addressed.

Page 34: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP8(Construction and exploitation agreement)

• Working Group 8.1 Governance and Management

• Working group will first make an inventory of known problems and known best (or current) practice solutions with respect to governance and management of international infrastructures, as well as a list of requirements that follow from the way we see the construction and exploitation of CLARIN, and it will make proposals for governance and management for CLARIN after the preparatory phase.

Page 35: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Links to other sites relevant to CLARIN

• Association for Computational Linguistics, http://www.aclweb.org/ • Digital Research Infrastructure for the Arts and Humanities

(DARIAH), http://www.dariah.eu/ • Distributed Access Management for Language Resources (DAM-LR),

http://www.mpi.nl/DAM-LR/ • European Chapter of the ACL (EACL), http://www.eacl.org/ • Evaluations and Language resources Distribution Agency – ELDA,

http://www.elda.org/ • Linguistic Infrastructure for Interoperable Resources and Systems,

Project No.22236 - LIRICS Programme e-content, http://lirics.loria.fr/

• Northern European Association for Language Technology, http://omilia.uio.no/nealt/

• Text Encoding Initiative - TEI, http://www.tei-c.org/

Page 36: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

Eesti keeleressursside keskus

• EKKT toetatud projekt 2008-2010

• Partnerid

• Tegevused 2008

Page 37: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP1 (General Management)

• Rahvuslik juhtkomitee

• HTM peaks nimetama uue esindaja CLARINi Strategic Coordination Board’i.

Page 38: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP2 (Technical infrastructure)

• WG 2.1 - Requirements for LRT centres • WG 2.2 - Requirements for the LRT federation • WG 2.3 - LRT federation pilot • WG 2.4 - Specification of the registry

infrastructure

Page 39: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP3 (Humanities overview)

• WG 3.1 -Scoping and Impact Study• WG 3.2 -Overview of relevant Humanities

projects and professional associations• WG 3.3 -Call for Humanities Projects

Page 40: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP5 (Language resources and technology overview)

• WG 5.1 - Tools• WG 5.2 - Lexical Resources• WG 5.3 - Corpora

Page 41: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP6 (Dissemination)

• WG 6.1 - Planning and Dissemination• WG 6.2 - Website and Newsletter• WG 6.3 - Referral Help Desk and Registry of

Expertise

Page 42: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP7 (Intellectual property rights and business models)

WG 7.2A - Licensing and authorization of materials

WG 7.4 - Trust relations WG 7.3 - ELDA/ELRA coordination

Page 43: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust

WP8 (Construction and exploitation agreement)

WG 8.1 - Governance and Management

Page 44: Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust