17
Paul Grooten, Matjaž Jug, Robbert Renssen Next Generation Data Management Architecture in Statistical Organisation

Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

  • Upload
    vudung

  • View
    233

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Paul Grooten, Matjaž Jug, Robbert Renssen

Next Generation Data Management

Architecture in Statistical Organisation

Page 2: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

CBS - Key Characteristics

Autonomous Public Body with a Legal Entity (“ZBO”)

Bonaire

180 mEur 2000+

The Hague Heerlen

Official Statistics Economic - Social - Census

National and Regional

Page 3: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Policy and Opinion Support “Machine”

INTELLIGENCE

INFORMATION

DATA

POLICY OPINION

Page 4: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Great Ambitions…

4

Page 5: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Great Ambitions…

5

Page 6: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Great Ambitions…

6

Page 7: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

…but also Challenges

7

Page 8: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Vision of Enterprise Data Lake

8

Data Lake

Microdata Stat. data

Artikelen/Visualisaties

Hergebruiken / Combineren

Slimme / flexibele processen

Afnemers / Onderzoekers

Zelfstandig gebruiken

Afnemers

Publiceren

Afnemers

Ophalen

Berichtgevers Respondenten

Streaming data

Registraties

Exploreren

Page 9: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Key Capabilities

Ability to:

Discover, access and understand

Load, store, model, retrieve

Transform, harmonize, integrate

Access, derive, catalogue

Use (prepare, visualise, analyse…)

Manage as an asset

Secure

9

Page 10: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Capability Groups

10

Consumer Layer (CL)

Data Source Layer (BL)

Data Transformation Layer (DTL)

Data Provisioning Layer (DPL)

Da

ta G

ove

rna

nce

Me

tad

ata

M

an

ag

em

en

t

Se

curi

ty &

A

uth

ori

zati

on

Data Preparation

Data Visualization

Data Analyses Selfservice Reporting

Search & Explore

Dashboarding/ Scorecarding

Messaging

Data Access

Data Hub Data

Aggregation Derive Views Data Catalog

Data Harmonization

Data Transformation

Data Enrichment

Data Validation

Data Storage

Data Load Data Access Data Deletion Data

Extraction

Data Profiling Data

Cleansing Data Quality Management

Classification Management

Variables Management

Data Set Descriptions

Data Set Relations

Backup & Restore

Change Management

Enterprise Architecture

System Management

Configuration Management

Documentation

Management

Usage Monitoring

Authenti-cation

Authorization

User Management

Logging

Auditing

Encryption

Meta Data Catalos

Datameer | Versie 0.4

Data Mining

Model Data Source

Model Data Source

Ingest data

Page 11: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Data Architecture Layers

Vraag

Antwoord

(Leg

acy) D

atasou

rces

Data Source Layer (BL)

CSV SQL DB

Web Srv

ETL tooling

XLS

App

CBDS

Vraag

Consumer Layer (CL)

Web Page S2S

Tooling P V A

P V A = Data Prep = Data Visualization = Data Analytics

Security

Data V

irtualizatio

n

Data Transformation Layer (DTL)

Data Provisioning Layer (DPL)

Building Block 1

Building Block 2

Building Block 3

Building Block 4

Web- Service C

OData Web- Service B

Web- Service A

Security

User Que.

Da

ta G

ove

rna

nce

Tech Meta

Me

tad

ata

Ma

na

ge

me

nt

Import Conceptual Meta

Conn. String

Existing New

Page 12: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Key Building Blocks

12

Metadata Model

Semantic Technology

Data Virtualisation

Big Data Platform

Self-Service BI /

workflow orchestration

Page 13: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

PoC: from internal Data Lake

13

Consumer Layer (CL)

Data Provisioning & Transformation Layers (DPL & DTL)

Me

tad

ata

Ma

na

ge

me

nt

Data Source Layer (BL)

RinPN GBAGSL GBAGLND

101210 M NL

GBAPersoon2012V1

Import Conceptual Meta XML

Data Prep

DS

C GbaPersoon

2012V1 EH

B

BE

Harvesting proces tech meta

Meta Data

Reposi-tory

Export Conceptual Meta

OG-ID Desc

354 XXX

OG Table

Data Visual

= New technology for CBS CIO office | Versie 0.7

Security

Page 14: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

…towards DaaS Architecture

14

CIO office | Version 1.3

Security

CL

Dataso

urces

DSL

Data V

irtualizatio

n

DTL

DPL

Da

ta G

ove

rna

nce

Existing New

User Que.

Me

tad

ata

Ma

na

ge

me

nt

Tech Meta

UDC=Urban Data Center | CL=Consumer Layer | DPL=Data Provisioning Layer DTL=Data Transformation Layer | DSL=Data Source Layer

P V A = Data Prep = Data Visualization = Data Analytics

Security

EHB

Zone CBS Zone UM

Building Block 3

Building Block 4

Web- Service B

Secured VPN

P V A

Zone UDC1

Building Block 5

Building Block 6

Web- Service C

Secured VPN

P V A

Zone UDC2

Building Block 7

Building Block 8

Web- Service D

Secured VPN

P V A

Building Block 1

Building Block 2

Web- Service A

CBDS DSC

P V A

Page 15: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Data Lake project – work in progress

Status Topic Description

Finished 4-layers Data Architecture

Possibility to decouple Data Source Layer from Consumer Layer and create virtual datasets / virtual views. Web Service interface implemented for business register EHB project demonstrated that architecture delivers benefits

Finished Metadata Model Develop Model that describes statistical data in formal and exact way. In theory it is possible to map any statistical dataset to model represented as a graph and use meta to find data (including ranking)

Finished PoC Data Virtual Successfully connected Denodo to Documentum Database (DSC) / improved query possibility & performance boost

In Progress

PoC Metadata Implement metadata model in PoolParty semantic web platform, harvest technical & conceptual metadata and provide URL to DV platform

In progress

Connect Data Sources

Expand number of Data Sources to improve usability of test platform. Perform stress tests

Scope defined

PoC Multi-Zone DV

Use Data Lake as a research platform for distributed data. Implement secure infrastructure

Planned Data Governance and Security

Define Data Governance for managing and securing virtual datasets

15

Page 16: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

What do we want to achieve with the Data Lake Vision?

16

€ M { "

stimulate Cost data- access

Statistical Risk

Growth Re-use

Time to Market

reduce

Page 17: Next Generation Data Management Architecture in ...€¦ · Next Generation Data Management Architecture in Statistical Organisation . CBS ... PoC Metadata Implement metadata model

Foundation for New Business Models