28
© 2011 IBM Corporation 1 Luncheon Webinar Series March 16, 2011 InfoSphere Information Analyzer gets Analyzed!!! Sponsored By:

Luncheon Webinar Series March 16, 2011 - DataStagedsxchange.net/uploads/DSXchange_information_Analyzer_V2.pdf · Luncheon Webinar Series March 16, 2011 ... Questions and suggestions

Embed Size (px)

Citation preview

© 2011 IBM Corporation

1

Luncheon Webinar Series March 16, 2011

InfoSphere Information Analyzer – gets Analyzed!!!

Sponsored By:

© 2011 IBM Corporation

IBM InfoSphere Information Analyzer

Questions and suggestions regarding presentation topics? - send to

[email protected]

Downloading the presentation

• http://www.dsxchange.net/2011MarchAnalyzer.html

• Replay will be available within one day with email with details

Pricing and configuration - send to [email protected] Subject line : Pricing

For those that stay through the entire presentation, we have a extra give away!

Bonus Offer – Free premium membership for your DataStage Management! Submit

your management‟s email address and we will offer him access on your behalf.

• Email [email protected] subject line “Managers special”.

• Join us all at Linkedin http://tinyurl.com/DSXmembers

2

© 2011 IBM Corporation

March 16, 2011

Piyush Gupta, VP Product Management, Information Integration Guenter Sauter, Product Manager, Data Quality & Governance IBM Software Group, Information Management Division

InfoSphere Information Analyzer – Analyze!!!

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 4

Agenda

Overview

• Motivation / business drivers

• InfoSphere Information Server

• Data Quality Portfolio

• InfoSphere Information Analyzer

InfoSphere Information Analyzer Deep Dive

• Common quality measurements

• Data rules & metrics

• Reporting and delivery

Customer Case Studies

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 5

Business initiatives depend on trusted information

• Empowering risk & compliance initiatives with the information they require

• Optimizing Revenue Opportunities by ensuring effective and efficient interactions with customers, partners, and suppliers

• Enabling collaborative business processes with consistent and trustworthy information

Reducing the total cost of ownership for maintaining consistent information across the enterprise

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 9

IBM InfoSphere Information Server

Sources

Business

Initiatives legacy

apps

dbs

Xls., xml,

flat

warehouse

z/OS

custom

BI

SAP

Warehouse

MDM

Business

Analysts

Executives

Enterprise

Architects

Data

Analysts &

Architects Subject Matter

Experts

ERP System

Manager Developer

DBA

System Architect

Data

Steward

App

Consolidation

Everything you need to integrate heterogeneous

information and deliver it when and where it is needed

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 11

IBM Data Quality

Data

Quality Cleanse

Monitor

Understand

Business

Process

Technical

Business

Technical

Info

rmatio

n

Go

vern

an

ce

InfoSphere QualityStage

InfoSphere Business Glossary

Blueprint Director

InfoSphere Discovery

InfoSphere Information Analyzer

product Included capability Legend:

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 13

define

(bus.-driven)

test

deploy

Data Quality: Pervasive, Progressive, Continuous Information Analyzer supports the full spectrum across all levels

Business

Measured

Generic

Business

Driven

Business

Aligned Common

Measurements

Data Rules

Metrics

DQ Dashboard

+ Reports

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 14

Applying Information Analyzer The solution perspective in a variety of use cases

Analyze

Integrate External

Sources

Master Data Management

BI Applications

Packaged

Applications

Information Analyzer

(Packaged App.)

Data Warehouse

Monitor quality at the source

to address issues where

information originates

Monitor your trusted systems and their consistency

with sources through transformations

Report status &

progress to the business

Supply metrics

to governance initiative

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 15

Starting with the end goal in mind …

Common

Measurements

Data Rules

Metrics

DQ Dashboard

+ Reports

define

(bus.-driven)

test

deploy

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 16

You cannot manage (data quality) if you cannot measure Aligning goal & outcome to business impact

1. Overall Quality Standing:

How are we doing overall?

2. Rules category breakdown

3. Computed KPI metrics:

How many $$$ are we losing?

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 17

From technical perspective Common measurements often build a foundation for DQ

Common

Measurements

Data Rules

Metrics

DQ Dashboard

+ Reports

define

(bus.-driven)

test

deploy

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 18

Common Data Quality Dimensions and Measurements

Domain quality: completeness, validity, length & format

Cross-domain fitness

• Redundancy

• Inconsistency

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 19

Business-driven data rule definition

Common

Measurements

Data Rules

Metrics

DQ Dashboard

+ Reports

define

(bus.-driven)

test

deploy

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 20

Data Rules Specify consistent & re-usable data rules, driven by business

Examples of Rules:

The Gender field must be populated and must be in the list of accepted values

The Social Security Number must be numeric and in the format 999-99-9999

If Date of Birth Exists AND Date of Birth > 1900-01-01 and < TODAY Then Customer Type Equals „P‟

The Bank Account Branch ID is valid in the Branch Reference master list

“The account

number must meet

the following

condition: …“

Business users

Data Rule

driven by

validated

against

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 21

Data Rules Consistent & re-usable data rules, driven by business

“The account

number must meet

the following

condition: …“

Business users

Data Rule

driven by

You can use governed

business language in data

rule definitions

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 22

Data Rules Bind the consistent & re-usable data rules to data objects where it applies

Data Rule

validated

against

Ability to bind one rule to

multiple locations

navigating to actual

metadata

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 23

Business-driven definition of metrics

Common

Measurements

Data Rules

Metrics

DQ Dashboard

+ Reports

define

(bus.-driven)

test

deploy

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 24

Measure results vs. targets

View Metric & Benchmark summaries

Organize Metrics and Rules within user-defined folders

Create Metrics across single or multiple Data Rules

24

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 27

Business-driven data rule definition

Common

Measurements

Data Rules

Metrics

DQ Dashboard

+ Reports

define

(bus.-driven)

test

deploy

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 28

Improved performance through flexible deployment options Execute rules up to 75% faster through grouping rules into rule sets

Determine flexibly which data rules to execute

together when in one unit

Process the same data only once

even if it needs to be validated by

multiple rules

Significant performance gains

Flexibility in when rules are executed

(per defined schedule or “on demand”

by invocation through API ..)

Select the rules from the

catalog above that need to

be executed together

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 29

Starting with the end goal in mind …

Common

Measurements

Data Rules

Metrics

DQ Dashboard

+ Reports

define

(bus.-driven)

test

deploy

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 30

Comprehensive reporting and tracking environment From high level dashboard to flexible views

30

Quickly assess the health of your

information in summary dashboard

view

Drill into specific

data quality

assessment

results

Understand the

details

in multiple

perspectives and

based on flexible

configuration

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 31

Flexible reporting back to the business

Reports: Over 80 out-of-the-box analysis reports as foundation

Control Output: Schedule execution; determine expiration policy; establish access controls

Custom Processing: API and CLI options to extract and utilize results in your environment and applications

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 33

Results Delivery / Publication REST API/CLI – Example: Generation of custom reports

Broad set of functions exposed through API beyond reporting needs

XML

Server

GET …

XSLT1

XSLT2

XSLT3

HTML

Report1

CSV

Report

HTML

Report2

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 36

Shared Health Delivers patient information at the point of care with IBM

Shared Health’s health information exchange, which connects patients, doctors, employers and insurers with medical information, was expected to grow substantially – 1.8 million to 3 million patients supported.

They needed to rapidly consolidate, standardize and manage information from third-party partners – insurers, labs, prescription drug clearing houses and health providers - that use a wide array of data sources and structures.

They wanted to improve healthcare delivery by making patient information available at the point of care.

Challenge

Shared Health is using IBM InfoSphere Information

Server: DataStage, Information Analyzer, and

QualityStage to create a single, accurate and trusted

source of information for populating its health record

repository, clinician portal and data warehouse.

Solution

Benefits

IBM InfoSphere Information Analyzer is helping

Shared Health understand the structure, content and

quality of data sources; this has helped uncover

missing, inaccurate and inconsistent data.

The solution’s ability to execute processes in parallel

has enabled them to perform analysis on millions of

rows with hundreds of columns in less than 2 hours –

a task that previously took over 24 hours.

IBM InfoSphere QualityStage is helping them

standardize the format of data such as names,

addresses and phone numbers so that it is displayed

as a single format regardless of the source.

IBM InfoSphere Information Server is enabling Shared

Health to successfully compete with larger, more

prevalent companies in their industry.

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 38 38

InfoSphere Information Analyzer

Identify data quality

issues early to reduce

project risks

Monitor quality metrics

over time for compliance

Create business

confidence with trusted

information

Results promotable

across IBM Information

Server

Perform data quality

assessment

Define business rules to

monitor data quality

Establish stewards for

governance of data

quality

Requirements

Benefits

Information

Analyzer

Assess data quality and

facilitate ongoing data quality

monitoring and exception

management

© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 39 39