15
Springboard to a Data Lake A Unit-base derived from the SBR

A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

  • Upload
    builiem

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

Springboard to a Data Lake

A Unit-base derived from the SBR

Page 2: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

SN Strategic Agenda

2

Open up new data sources and use new methods

# Be innovative!

Make the innovation process more effective and efficient

Reduce the use of spread sheets and manual work

# Improve and secure quality

Reduce vulnerability caused by spoks

# Make processes more effective and efficient

Implement a flexible, fit for purpose infrastructure

Make data better accessible to statisticians ; implement a data lake

# Towards a state-of-the-art data and information infrastructure

Page 3: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

SN Strategic Agenda

3

# Towards a state-of-the-art data and information infrastructure

Make data better accessible to statisticians; implement a data lake

CBS Data Lake definition:

“A concept to ensure that next to a decoupling of input, processing and output, also the demand for flexibility and coherence is satisfied thereby guaranteeing that the information needs of the statistical producer and statistical user are fulfilled as independently as possible without the interference of methodology and IT support”.

Page 4: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

Respondents

For all aspects of future focussed data management

4

Microdata Stat. data

Smart & flexible processes

Papers

Visualisations

clients

Publishing

clients Retrieve

Streaming data

Registers

Exploring

users / researchers

Self reliant use

Re-use & combining

Page 5: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

Top 7 goals from end-user perspective

Enable more phenomenon based output (a phenomenon is a striking event that you want to explain)

Enable more current and coherent statistics Stimulate the reuse of data Accelerate the statistical processes Grow and stimulate the access to a large number of

existing and new data sources Provide faster response and output to

requests from external clients Accelerate the design process

around collecting and storing data

1

2 3 4 5

6

7

5

Page 6: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

How to get there? Enterprise Data Lake Project

6

Project for a new architecture; data driven

Focus on end user goals; Better accessibility of available datasets Dealing with many data sources, many formats Faster, phenomenon based reporting

Data Lake project consist of three pillars: Metadata repository (technical & conceptual) Data Virtualisation as technology to provide single data platform User-friendly and self serving frontend by making use of Data Preparation Tools (DPT)

1 5

7

Page 7: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

Data lake capabilities and main focus

7

Page 8: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

Change to a Data Driven Architecture

8

Data consumers: custom fit, standard applications, scripts, batch etc.

Data Source Layer (DSL)

Consumer Layer (CL)

DB1 DB3 DB4 DB2

MET

A

MET

A

MET

A

MET

A

Data Transformation Layer (DTL)

Data Provisioning Laag (DPL)

Met

adat

a M

anag

emen

t

Dat

a G

over

nanc

e

backbone

backbone

backbone

backbone

DPT DPT DPT DPT

Page 9: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

12

Yeah great but how about

the Statistical Business Register ???

Page 10: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

From…

10

(Legacy) D

atabases

Data Source Layer (DSL)

Consumer Layer (CL)

SBR

Clients; • At a set time, specifically designed

and with a set content (inflexible) • More “custom fit” datasets needed • Have limited opportunities to create

datasets themselves • Increasing demand for SBR derived

datasets (content and quantity) • Limited coordination in use datasets

SBR Process-environment; • Complex, heavy knowledge on content

and technique needed • Technically direct coupled to statistical

production processes effect on stability of total process

• Not “in rest” Live Register • Snapshots and frozen frames in same

system and from same system to clients

Systems; • Retrieve SBR data periodically • Inflexible • Not all data used • Custom fit datasets made “by

hand”

Page 11: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

Data Transformation Layer (DTL)

Consumer Laag (CL) Datapreparatie Tooling

To:

11

(Legacy) D

atabases

Data Source Layer (DSL)

Data Provisioning Laag (DPL)

SBR

Unit Base

SAT1 SAT2 SATn

Web Page

Web- Service C

Web- Service B

S2S

Web- Service A

• Unlimited addition of content i.e. linkable to Unit Base

• Outside SBR (system) • SBR as a core of SU, not complicated

by surplus data

DTL: • The Unit base is the “Key cabinet” • Data (characteristics, variables) is

added via the satellites • Backbone role SBR strengthened

Building block 1

Building block 2

Building block 3

Building block 4

Building blocks are: • Simple (technical/content) • Coordinated (business logic) • “On demand” • Expandable by the business

Data preparation tooling: • Easy use of building blocks (process) • Easy access to (complex) datasets

• Systems coupled via webservices • Data “on demand” • Webservices easy adjustable and

expendable

Unit base: • SBR data “in rest” • More content • Coupled via additional

sources • Accessible via building

blocks and webservices • Simple data structure

Page 12: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

Results

By realizing the Unit base established 0-version of data driven architecture (concepts) .

Users have on-demand and easy access to a wide and expandable set of SBR coupled data.

Building blocks are adjustable and expandable without IT interference (webservices).

Increased use of SBR (coupled) data System in rest; decoupling of Statistical Business Register

processes and other sources. • Unlimited addition of content (characteristics, variables) that

can be linked to the Unit Base • SBR as a core of SU, not complicated by surplus data true

backbone role 12

Page 13: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

• Scope Statistics Netherlands • All possible sources • Virtual data • Explicit metadata management • Extensive testing of commercially available tools

Buildingblocks from Unit Base can be re-used

Unit base as a springboard… Unit base Data lake

• Scope SBR department • Only SQL Server sources • Physical datastorage • Implicit metadata management

With Unit base proven that concept of Data lake works 13

Page 14: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

The new architecture

14

Security

Consumer Layer (CL)

Web Page S2S

Data Prep Tooling

(Legacy) D

atabases

Data Source Layer (BL)

CSV

Future Data Lake

(Data V

irtualization)

Data Transformation Layer (DTL)

Data Provisioning Layer (DPL)

Building Block 1

Building Block 2

Building Block 3

Building Block 4

Web- Service C

Web- Service B

Web- Service A

Dat

a G

over

nanc

e

SQL DB

Web Srv

ETL tooling

XLS

Existing New

App

User Que.

Tech Meta

Conn. String

Met

adat

a M

anag

emen

t

Import Conceptual Meta

Security

Page 15: A Unit-base derived from the SBR - 統計局ホームページ · PDF file(DPL) SBR . Unit Base . SAT1 . ... XLS . Existing New . App . User . Que. Tech Meta Conn. String . Metadata

Contact information: Irene Salemink [email protected]

Thank You!