26-27 / 11 /200 8, Copenhagen Jaanus Heinlaid Systems Consultant, TietoEnator Eesti

presentation Page 1TietoEnator © 2005

26-27/11/2008, Copenhagen

Jaanus Heinlaid

Systems Consultant, TietoEnator Eesti

ICT roadmap for SEIS: draft strcutureICT roadmap for SEIS: draft strcuture


1 What is ICT?2 What is SEIS (in ICT context)?

2.1 SEIS concept2.2 Underlying principles2.3 Environmental information flows according to SEIS2.4 Base elements of SEIS

3 The road to implementing SEIS3.1 Identify requirements to data

3.1.1 How data is found on SEIS nodes3.1.2 How do we recognise data3.1.3 How data on SEIS nodes is understood by machines3.1.4 How do we know the quality of data

3.2 Build tools for data3.2.1 Data Discovery tools3.2.2 Data repositories3.2.3 Data conversion services3.2.4 Quality Assessment tools3.2.5 Tools for further data processing


What is ICT?

ICT stands for Information Communication Technology. Why does it matter? Everybody what ICT stands for, right? Well, not

quite. It is paradoxical, but often the term ICT doesn't ring a bell in the heads of IT (Information Technology) people. This is because different communities use different terms for what is basically the same thing. According to Wikipedia the term ICT is used in preference to IT in two communities: education and government.

So, since it is the IT people who will eventually implement the technology behind SEIS, maybe the masterminds of this initiative could consider replacing ICT with IT.

< back


What is SEIS (in ICT context)?

This chapter should basically serve as a reminder of what is it we're doing the roadmap for.

It should probably collect and summarize everything ICT-related that has been written about SEIS in different documents. In short- this chapter should present a concise overview of SEIS in ICT perspective, such that a complete newcomer IT person would quickly get the idea behind SEIS.

< back


Underlying principles

This chapter should probably list the famous underlying principles of SEIS, but maybe only those that are very relevant in IT world:

– Information is managed as close as possible to its source– Information is provided once and shared with other for many

purposes– Information should be readily accessible for the end-users

< back


Environmental information flows according to SEISHere probably a good place for the EEA's picture about SEIS data flows

and some explanations:

< back


Base elements of SEISHere once more the 3 general-level elements of SEIS:

Content– Data (all relevant data and information related to Europe’s

environment from local to global scale)– Metadata (the information describing the data (content, quality,

condition, other characteristics) Infrastructure

– A network for accessing and sharing of environmental data between SEIS nodes.

– Tools and services: to help discover and make use of data as well as to streamline it.

Organisation– Division of roles and responsibilities among all actors involved.

< back


How data is found on SEIS nodes It depends on the possibilities and willingness of the organisations between

SEIS nodes (i.e. stakeholders). Given the concept of SEIS and the much-stated guiding principles,

especially the principle of managing information as close as possible to it source, the ideal way would be to build a network of Web Services.

A Web Service is a software system designed to support interoperable machine-to-machine interaction over a network (http://en.wikipedia.org/wiki/Web_services).

This is a popular concept nowadays, but requires stakeholders to have or buy IT expertise. Since not all the stakeholders have equal possibilities in this matter, some mechanisms must still be established for stakeholders to simply report their data in the plain old way of sending it in numerous different formats (e.g. MS Excel, MS Access, XML) to some central repository

Keepers of that repository will process it and publish it in a machine-readable way to other stakeholders.

< back


How do we recognise data

Web Services is just a technological mechanism to find the data. Once some data is found, how do we know it's what we searched for. Well- that's where the magic of metadata comes in. Naturally, stakeholders are not equally attentive to tag their data with metadata. But SEIS must provide mechanisms for tagging (http://en.wikipedia.org/wiki/Tag_(metadata)) other stakeholders metadata.

Metadata is crucial to enable structured searches and comparison of data in different repositories.

< back


How data on SEIS is understood by machines Standardise data formats (per topic) and make the data flows comply with

those standards.

– For those unable to report in standardised formats, some special SEIS nodes must be designed to understand various proprietary data formats and make some XML from them.

– Trying to make stakeholders agree on standardised formats is slow and painful. One of the first attempts was EEA's Waterbase project that started and died in year 2000. But there have also been successful attempts: http://www.eea.europa.eu/maps/ozone.

Make SEIS a Semantic Web.– SEIS seems to be a typical case of a Semantic Web (

http://en.wikipedia.org/wiki/Semantic_web).– The idea is that different parties don't report data in proprietary formats,

but instead they publish data in formats they wish and provide semantics for machines to understand those formats.

– Still cannot do completely without standards: such formats have to be XML-based and syntax of semantics si standradized too. But this is done by W3C and consortiums alike.

The truth is probably to apply both ways. < back


How do we know the quality of data

This is again where metadata comes in hand. At some nodes in SEIS there will definitely be some Quality

Assessments (QA) done. It is vital that SEIS provides ways for reflecting the results of a data

quality assessment in the metadata of the data source. The resulting QA report must become a permanent subjective

comment on the data.

< back


Data Discovery tools

These are probably the most important tools we're going to have for data.

They have to be quite intelligent and they will rely on what is published on SEIS nodes.

They will probably need to provide some web services for directly registering data sources (i.e. data existence is pushed into discovery tools).

< back


Data repositories

Ideally there would not need to be central data repositories. Because ideally all SEIS nodes would comply with the specification of "how data is found".

Unfortunately, the reality is a bit more complicated and not all nodes will be that advanced.

For these nodes, central data repositories will have to be created where the data can be reported to.

The repositories make sure the reported data they have is published in a discoverable way.

< back


Data conversion services

There will definitely have to be services for converting data from a proprietary format into more general machine-understandable formats.

This is for the sake of better discoverability and automated quality assessment and other cases of automatic data processing .

< back


Quality Assessment tools

Quality assessment is probably the first and most important data processing action.

So, some kind of web-based workbenches must be created where quality assessment operations can be carried out on the discovered data and the result permanently related with the data source.

The quality assessment results themselves will become new data sources that must be discoverable by discovery tools.

< back


Tools for further data processing

After the quality of data has become known, further processing can be carried out.

This is probably the stage where human hand-work starts to come into play, but various helper tools can certainly created to help this work.

What kind of helper tools- this has to be found out by analysis of current processes and filtering out the most important cases where machines can come to help.

< back

Documents

26-27 / 11 /200 8, Copenhagen Jaanus Heinlaid Systems Consultant, TietoEnator Eesti