51
Case Study: UNIDO 11.4.2008 1 METIS 2008, Luxembourg: Valentin Todo rov Case Study: UNIDO Valentin Todorov UNIDO v. todorov @ unido .org METIS 2008 (Luxembourg, 9-11 April 2008)

Case Study: UNIDO

Embed Size (px)

DESCRIPTION

Case Study: UNIDO. Valentin Todorov UNIDO [email protected]. METIS 2008 (Luxembourg, 9-11 April 2008). Outline. Introduction and Overview Statistical Metadata Systems and the Statistical Cycle Statistical Metadata in each phase of the Statistical Cycle Systems and Design Issues - PowerPoint PPT Presentation

Citation preview

Page 1: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 1METIS 2008, Luxembourg: Valentin Todorov

Case Study: UNIDO

Valentin Todorov

UNIDO

[email protected]

METIS 2008 (Luxembourg, 9-11 April 2008)

Page 2: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 2METIS 2008, Luxembourg: Valentin Todorov

Outline

• Introduction and Overview• Statistical Metadata Systems and the Statistical Cycle• Statistical Metadata in each phase of the Statistical Cycle• Systems and Design Issues• Organizational and Cultural Issues

Page 3: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 3METIS 2008, Luxembourg: Valentin Todorov

About UNIDO

• UNIDO was set up in 1966 • Became a specialized agency of the UN in 1985• Promote industrialization throughout the developing world • 172 Member States (as of 3 December 2007)• Headquarters in Vienna• Represented in 35 developing countries

Page 4: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 4METIS 2008, Luxembourg: Valentin Todorov

About Statistics in UNIDO

• Service Module “Industrial Governance and Statistics”:– monitor, benchmark and analyse their industrial performance and

capabilities – formulate, implement and monitor strategies, policies and

programmes to improve the contribution of industry to productivity growth and the achievement of the UN Millennium Development Goals (MDGs)

• Building capabilities in industrial statistics - providing technical assistance to: – Introduce best practice methodologies and software systems – Enhance the quality and consistency of the

industrial statistics databases

Page 5: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 5METIS 2008, Luxembourg: Valentin Todorov

About the Organisation

All statistical activities are carried out by the Research and Statistics Branch – PCF/RST

Page 6: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 6METIS 2008, Luxembourg: Valentin Todorov

Overall strategy and metadata management principles

• Conceptual development was initiated in 1999• An integrated data and data documentation (metadata)

framework• A smooth migration policy - must not disrupt established

UNIDO data services• Stepwise development in the context of a migration project

of the statistical databases from an IBM mainframe to a client/server platform

• Backed by the UNIDO Quality Assurance Framework

Page 7: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 7METIS 2008, Luxembourg: Valentin Todorov

Overall strategy (cont.)

• Following the International Recommendations for Industrial Statistics (2008)

• Common formats and nomenclatures for exchange and sharing of statistical data and metadata- SDMX

• Availability of the metadata in three languages (English, French and Spanish)

• Based on a formal framework - the proposed information system architecture comprises two cubes, one for statistical data and another for the metadata interrelated by a set of shared dimensions - see Froeschl et al. (2002), Froeschl and Yamada (2000)

Page 8: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 8METIS 2008, Luxembourg: Valentin Todorov

UNIDO Statistical Process

• Initialisation– Pre-filling of the out-going UNIDO General Industrial Statistics

Questionnaire with previously reported statistical data and metadata (non-OECD countries)

– Excel format– In the appropriate language - English, French or Spanish– Automated using the available data and metadata

• Data Collection – NSO: the completed and returned to UNIDO by the NSO

questionnaires (excel format, rarely hard copy) are entered into the system and are ready for further validation and processing

– OECD: Data for OECD member countries (excel format) are ready for further validation and processing

Page 9: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 9METIS 2008, Luxembourg: Valentin Todorov

UNIDO Statistical Process• Transformation/Processing

– The data collected from the primary or secondary sources are further transformed to a ready-to use data sets

– The data transformation is done in five stages, which not only constitute an operational framework for UNIDO statisticians, but also provides additional description of statistics (generated metadata which are attributed to each data item)

– After undergoing the complete processing phase the incoming and generated data and metadata are stored in the databases

• Dissemination– International Yearbook of Industrial Statistics– INDSTAT and IDSB CD products– Web Country Statistics (Country Brief)– Ad hock requests by internal and external users

Page 10: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 10METIS 2008, Luxembourg: Valentin Todorov

Mapping of the UNIDO cycle phases to these developed by the METIS group

METIS UNIDO

Need Need [optional]

Develop and design Develop and design [optional]

Build Initialisation

Collect Data Collection

Process Transformation/Processing

Analyse Analysis

Disseminate Dissemination

Archive -

Evaluate Evaluation

Page 11: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 11METIS 2008, Luxembourg: Valentin Todorov

Overall structure of ISDE

Page 12: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 12METIS 2008, Luxembourg: Valentin Todorov

ISDE Applications

• ADMIN – provides administrative services, like user and authorisation management, logging and auditing of the system, backup and restore management– outside of the life cycle

• Nomenclature Explorer - maintenance of the core definitional metadata (not related to particular data sets or items) – outside of the life cycle

• Questionnaire - management of the pre-filling and distributing of the questionnaires – used in the Initialisation phase

Page 13: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 13METIS 2008, Luxembourg: Valentin Todorov

ISDE Applications

• Data Wizard – the main data and metadata maintenance tool – Used in the Data Collection and Transformation phases– Provides services for

• Reading in the data and metadata from the returned back questionnaire (Excel)

• Initial validation of the read in data and storing in the database (at stage 1)

• Maintenance of the data and metadata • Screening • Aggregation and further data validations and transformations

Page 14: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 14METIS 2008, Luxembourg: Valentin Todorov

ISDE: Publication applications• Yearbook – a complex set of applications for production

of the International Yearbook of Industrial Statistics – aggregation, layout, – PDF file generation according to pre-defined templates and other

tools– The final result is a publication ready PDF file of about 700 pages

• INDSTAT CD – produce the INDSTAT type of CD products

• IDSB CD – produce the IDSB type of CD products • WEB – generate the necessary data and metadata for

updating the WEB dissemination database – This database is outside of the ISDE system– Managed by the computer section

Page 15: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 15METIS 2008, Luxembourg: Valentin Todorov

ISDE Applications

• Presentation Wizard – mainly a visualization tool which can be used in the Dissemination phase for answering ad hock requests, but because of its versatile functionality it finds a wide usage also in the Data Transformation phase

• Other applications – in this category are included any other applications used in the process, like SAS, R, tools for compilation of Production index numbers and National Accounts data (which are outside of the scope of this document)

Page 16: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 16METIS 2008, Luxembourg: Valentin Todorov

Implementation Strategy

• Developed in the context of migration from Mainframe to a Client/Server platform

• A stepwise approach was chosen because of the following reasons:– The project was not urgent– The software test and sustaining of the new system - in-house– Only limited resources/funds were available– The staff was very willing to participate in the project– The goal was not only to migrate the system but rather to develop

a completely new one and the requirements were not yet completely specified

– A key requirement was that the established UNIDO data services must not be disrupted

Page 17: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 17METIS 2008, Luxembourg: Valentin Todorov

Implementation Steps

• Step 1: High level architecture design, Data model, physical C/S database, definitional metadata tool– Rigorous analysis of the existing system and development of a

data model - as generic as possible in order to be able to accommodate any subsequent changes

– Based on the data model a loader application was developed which allowed in any moment to synchronize the data in Mainframe and in the Sybase database

– The development of the new metadata subsystem was initiated by implementing a tool for maintenance of the definitional metadata

– Thus a kind of proof of concept was successfully completed

Page 18: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 18METIS 2008, Luxembourg: Valentin Todorov

Implementation Steps

• Step 2: Reference metadata, dissemination applications – A capture/maintenance tool was developed – The description/methodological metadata – Word, Excel - were

entered into the system– The Mainframe footnote database (data-item level metadata) was

imported– Thus the complete process of maintenance of the available

metadata was migrated to the Client/Server platform – Data dissemination applications were developed which allowed to

produce the recurrent statistical publications/products from the Mainframe system and from the Client/Server platform in parallel - an ideal acceptance test for the new applications by just comparing the results

Page 19: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 19METIS 2008, Luxembourg: Valentin Todorov

Implementation Steps• Example: International Yearbook of Industrial Statistics

– From the Mainframe was produced as a camera-ready line printer output which was glued together with many MS Word and MS Excel documents

– From the Client/Server system a page numbered PDF file of about 700 pages is automatically generated

• Step 3: Pre-filled questionnaire, data capturing and maintenance– Pre-filling of the questionnaire - for a second time from the new

Client/Server data- and metadata-base– Development of the data capturing/maintenance tools - now in the

phase of final acceptance testing– From June 2008 - only the Client/Serve system will be used– Ultimate decoupling of the new system from the Mainframe

Page 20: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 20METIS 2008, Luxembourg: Valentin Todorov

Metadata classification

No formal metadata classification, but according to their usage and their role in the statistical production process we distinguish roughly between:

• Structural or definitional metadata: refer to metadata that act as identifiers and descriptors of the data (and metadata)

• Reference metadata: describe the properties and quality of the statistical data

• System metadata: used to drive automated processing throughout the phases of the lifecycle

Page 21: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 21METIS 2008, Luxembourg: Valentin Todorov

Metadata in the lifecycle• In each phase of the lifecycle the structural/definitional

metadata are used • The structural metadata are created/updated relatively

independently from the lifecycle– Add a new country (e.g. Serbia and Montenegro recently)– Currency change (e.g. Slovenia, Malta and Cyprus recently)– Country groupings: two more countries joined EU (Bulgaria and

Romania)

• No metadata are created in the first and last phase (Initialisation and Dissemination) but it is possible that corrections are performed

Page 22: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 22METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Initialisation• Pre-filling of the out-going UNIDO General Industrial

Statistics Questionnaire with previously reported statistical data and metadata

• System metadata: drive the automated processing – Template for the questionnaire– Language– ISIC revision– Output format (unit exponent, digits)

• Operational metadata: stage 1 data used for pre-filling• Descriptive, methodological, implicit metadata used for

pre-filling into the questionnaire

Page 23: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 23METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Data Collection

• After receiving back the completed questionnaires, they are entered (automatically) in the system for validation and further processing

• Together with the data the received metadata are entered into the system

• The provided metadata are sometimes not described from the viewpoint of international comparability but rather from the viewpoint of national standards. In such cases the UNIDO statistical staff re-describes/rearranges the provided metadata into explicit information for the deviation from the international standard

Page 24: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 24METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Data Collection (cont.)

• Metadata can be attached to each data item– “Missing because of confidentiality reasons” or – combinations of ISIC codes like “1511 includes 1512”

• Data for OECD member countries– collected through joint OECD/UNIDO questionnaire and – transmitted to UNIDO (Excel format)– do not contain metadata (extracted from other OECD

publications)

Page 25: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 25METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Transformation

• The metadata collected from the NSOs together with the data undergoes the same transformation process as the data and is complemented by metadata generated by the transformation process

• The data transformation is done in five stages - additional description of the data

• At the same time Source and Method metadata are maintained for each data item

• If appropriate, re-description of the provided metadata from viewpoint of international comparability is performed

Page 26: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 26METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Dissemination

• International Yearbook of Industrial Statistics– the main UNIDO statistical product– the latest yearbook released in 2008 covered the data for the

period from 1995 to latest year– The country data was updated for 74 countries and is compiled

from the Stage 1 and Stage 2

• CD products, which might include data from all stages described earlier - www.unido.org/statistics

• Country Brief - statistics by selected variables from the different UNIDO databases for each member state which are posted in UNIDO web-site: http://www.unido.org/statistics

Page 27: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 27METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Dissemination

Page 28: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 28METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Dissemination

Page 29: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 29METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Dissemination

Page 30: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 30METIS 2008, Luxembourg: Valentin Todorov

Metadata in the life cycle: Dissemination

Page 31: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 31METIS 2008, Luxembourg: Valentin Todorov

Systems and Design Issues

• Client/Server architecture build on .Net technology• Centralised database:

– Sybase ASE 12.5 on Linux– Test and production databases

• Client (desktop) applications developed using MS Visual studio in C#

• Commonality through using shareable component libraries – C#

• Other tools:– SAS, R, STATA

• Development tools

Page 32: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 32METIS 2008, Luxembourg: Valentin Todorov

Organizational and Cultural Issues

• No specialised metadata roles are necessary– processing of metadata and data are tightly coupled– responsibilities are organized by country

• No special training for the staff was necessary– all statisticians participated actively in the specification and the

development of the system– the system testing was performed by parallel runs on the

Client/Server and Mainframe

• Nevertheless a complete set of documentation and training materials is being prepared– unifying the terminology and the information about the system– induction training of new colleagues– operational and maintenance concept documents

Page 33: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 33METIS 2008, Luxembourg: Valentin Todorov

THE END

Page 34: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 34METIS 2008, Luxembourg: Valentin Todorov

Examples

Page 35: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 35METIS 2008, Luxembourg: Valentin Todorov

Example:NomenclatureExplorer

Page 36: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 36METIS 2008, Luxembourg: Valentin Todorov

Example:ADMIN -Topics

Page 37: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 37METIS 2008, Luxembourg: Valentin Todorov

Example:DataWizard

View/EditQuestionnaire

Page 38: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 38METIS 2008, Luxembourg: Valentin Todorov

Example:DataWizard

View/EditMetadata

Page 39: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 39METIS 2008, Luxembourg: Valentin Todorov

Example: R Graphics

Histogram

Sepal.Width

Den

sity

2.0 2.5 3.0 3.5 4.0

0.0

0.4

0.8

1.2

setosa versicolor

4.5

5.5

6.5

7.5

BoxplotS

epal

.Wid

th

setosa versicolor

4.5

5.5

6.5

7.5

4.5 5.5 6.5 7.5

2.0

3.0

4.0

Sepal.Length

Sep

al.W

idth

Bagplot

-2 -1 0 1 2

2.0

3.0

4.0

Normal Q-Q Plot

norm quantiles

Sep

al.W

idth

Scatter Plot Matrix

SepalLength

SepalWidth

PetalLength

setosa

SepalLength

SepalWidth

PetalLength

versicolor

SepalLength

SepalWidth

PetalLength

virginica

Three

Varieties

of

Iris

Page 40: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 40METIS 2008, Luxembourg: Valentin Todorov

Example: Implicit metadata• For example several industry categories can be combined and

reported together by a given country for a given indicator and years• In the questionnaire returned by the NSOs such a combination is

expressed in the following way

…1511 Processing/preserving of meat 1234 a/1512 Processing/preserving of fish … a/1513 Processing/preserving of fruit & vegetables … a/… REMARKS: a/ 1511 includes 1512 and 1513

• ‘Exclude’ for other country specific classification discrepancies

• ‘Substitute’ for synonyms

• Aggregations

Page 41: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 41METIS 2008, Luxembourg: Valentin Todorov

Example: System metadata in the Initialisation phase - I

Page 42: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 42METIS 2008, Luxembourg: Valentin Todorov

Example: System metadata in the Initialisation phase - II

Page 43: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 43METIS 2008, Luxembourg: Valentin Todorov

Example: Descriptiveand methodologicalmetadata used in the Initialisation/Data Collection phase

Page 44: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 44METIS 2008, Luxembourg: Valentin Todorov

Example: Metadataattached to each data item used or created in the Initialisation and Data Collection phase

Page 45: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 45METIS 2008, Luxembourg: Valentin Todorov

Overall structure of ISDE

Page 46: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 46METIS 2008, Luxembourg: Valentin Todorov

Backup slides

Page 47: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 47METIS 2008, Luxembourg: Valentin Todorov

Operational Framework: Stages

• Stage 1 – responses to national questionnaires. Detection and if possible correction of obvious reporting errors– Used for re-filling the following edition of the questionnaire– Data are considered official

• Stage 2 – incorporation of published national data. Inconsistent data are corrected using supplementary information from national publications– Published in International Yearbook of Industrial Statistics– Data are considered official

Page 48: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 48METIS 2008, Luxembourg: Valentin Todorov

Operational Framework: Stages (cont.)

• Stage 3 – disaggregation of data. Data are adjusted to eliminate the departures from the level of ISIC aggregation– using national and international sources– using supplementary data

• Stage 4 – automatic disaggregation and interpolation. Missing data are estimated applying related proportion or interpolation whenever applicable– For ISIC 3-digit only

• Stage 5 – estimation of provisional data for the latest years– Selected variables only

Page 49: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 49METIS 2008, Luxembourg: Valentin Todorov

Reference metadata

• Implicit metadata – a special class of metadata arising throughout the specific usage of other metadata. Typical example are the ISIC combinations

• Operational Metadata – generated by the process of data transformation and attributed to the respective data items– a stage indicator reflecting the data item’s credibility– “Source” and “Methods” metadata, describing the source of the

data item and methods applied for its generation

Page 50: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 50METIS 2008, Luxembourg: Valentin Todorov

Reference metadata (cont.)

• Descriptive and Methodological metadata – received from the primary data reporters and than are further processed together with the data.– During this processing additional metadata can be added.– Can be attached to all possible levels ranging from the complete

data set down to individual data items.

Page 51: Case Study: UNIDO

Case Study: UNIDO

11.4.2008 51METIS 2008, Luxembourg: Valentin Todorov

System metadata• Used to drive automated processing throughout the

phases of the life cycle.– layout definitions for the yearbook (for each country, for each

edition of the yearbook).– country lists, used in the automatic generation of the PDF.– installation and packaging lists, directories, templates, etc. for

creation of the CD product.– specific for the application where they are used.