© 2011 IBM Corporation
1
Luncheon Webinar Series March 16, 2011
InfoSphere Information Analyzer – gets Analyzed!!!
Sponsored By:
© 2011 IBM Corporation
IBM InfoSphere Information Analyzer
Questions and suggestions regarding presentation topics? - send to
Downloading the presentation
• http://www.dsxchange.net/2011MarchAnalyzer.html
• Replay will be available within one day with email with details
Pricing and configuration - send to [email protected] Subject line : Pricing
For those that stay through the entire presentation, we have a extra give away!
Bonus Offer – Free premium membership for your DataStage Management! Submit
your management‟s email address and we will offer him access on your behalf.
• Email [email protected] subject line “Managers special”.
• Join us all at Linkedin http://tinyurl.com/DSXmembers
2
© 2011 IBM Corporation
March 16, 2011
Piyush Gupta, VP Product Management, Information Integration Guenter Sauter, Product Manager, Data Quality & Governance IBM Software Group, Information Management Division
InfoSphere Information Analyzer – Analyze!!!
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 4
Agenda
Overview
• Motivation / business drivers
• InfoSphere Information Server
• Data Quality Portfolio
• InfoSphere Information Analyzer
InfoSphere Information Analyzer Deep Dive
• Common quality measurements
• Data rules & metrics
• Reporting and delivery
Customer Case Studies
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 5
Business initiatives depend on trusted information
• Empowering risk & compliance initiatives with the information they require
• Optimizing Revenue Opportunities by ensuring effective and efficient interactions with customers, partners, and suppliers
• Enabling collaborative business processes with consistent and trustworthy information
Reducing the total cost of ownership for maintaining consistent information across the enterprise
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 9
IBM InfoSphere Information Server
Sources
Business
Initiatives legacy
apps
dbs
Xls., xml,
flat
warehouse
z/OS
custom
BI
SAP
Warehouse
MDM
Business
Analysts
Executives
Enterprise
Architects
Data
Analysts &
Architects Subject Matter
Experts
ERP System
Manager Developer
DBA
System Architect
Data
Steward
App
Consolidation
Everything you need to integrate heterogeneous
information and deliver it when and where it is needed
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 11
IBM Data Quality
Data
Quality Cleanse
Monitor
Understand
Business
Process
Technical
Business
Technical
Info
rmatio
n
Go
vern
an
ce
InfoSphere QualityStage
InfoSphere Business Glossary
Blueprint Director
InfoSphere Discovery
InfoSphere Information Analyzer
product Included capability Legend:
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 13
define
(bus.-driven)
test
deploy
Data Quality: Pervasive, Progressive, Continuous Information Analyzer supports the full spectrum across all levels
Business
Measured
Generic
Business
Driven
Business
Aligned Common
Measurements
Data Rules
Metrics
DQ Dashboard
+ Reports
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 14
Applying Information Analyzer The solution perspective in a variety of use cases
Analyze
Integrate External
Sources
Master Data Management
BI Applications
Packaged
Applications
…
Information Analyzer
(Packaged App.)
Data Warehouse
Monitor quality at the source
to address issues where
information originates
Monitor your trusted systems and their consistency
with sources through transformations
Report status &
progress to the business
Supply metrics
to governance initiative
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 15
Starting with the end goal in mind …
Common
Measurements
Data Rules
Metrics
DQ Dashboard
+ Reports
define
(bus.-driven)
test
deploy
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 16
You cannot manage (data quality) if you cannot measure Aligning goal & outcome to business impact
1. Overall Quality Standing:
How are we doing overall?
2. Rules category breakdown
3. Computed KPI metrics:
How many $$$ are we losing?
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 17
From technical perspective Common measurements often build a foundation for DQ
Common
Measurements
Data Rules
Metrics
DQ Dashboard
+ Reports
define
(bus.-driven)
test
deploy
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 18
Common Data Quality Dimensions and Measurements
Domain quality: completeness, validity, length & format
Cross-domain fitness
• Redundancy
• Inconsistency
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 19
Business-driven data rule definition
Common
Measurements
Data Rules
Metrics
DQ Dashboard
+ Reports
define
(bus.-driven)
test
deploy
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 20
Data Rules Specify consistent & re-usable data rules, driven by business
Examples of Rules:
The Gender field must be populated and must be in the list of accepted values
The Social Security Number must be numeric and in the format 999-99-9999
If Date of Birth Exists AND Date of Birth > 1900-01-01 and < TODAY Then Customer Type Equals „P‟
The Bank Account Branch ID is valid in the Branch Reference master list
“The account
number must meet
the following
condition: …“
Business users
Data Rule
driven by
validated
against
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 21
Data Rules Consistent & re-usable data rules, driven by business
“The account
number must meet
the following
condition: …“
Business users
Data Rule
driven by
You can use governed
business language in data
rule definitions
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 22
Data Rules Bind the consistent & re-usable data rules to data objects where it applies
Data Rule
validated
against
Ability to bind one rule to
multiple locations
navigating to actual
metadata
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 23
Business-driven definition of metrics
Common
Measurements
Data Rules
Metrics
DQ Dashboard
+ Reports
define
(bus.-driven)
test
deploy
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 24
Measure results vs. targets
View Metric & Benchmark summaries
Organize Metrics and Rules within user-defined folders
Create Metrics across single or multiple Data Rules
24
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 27
Business-driven data rule definition
Common
Measurements
Data Rules
Metrics
DQ Dashboard
+ Reports
define
(bus.-driven)
test
deploy
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 28
Improved performance through flexible deployment options Execute rules up to 75% faster through grouping rules into rule sets
Determine flexibly which data rules to execute
together when in one unit
Process the same data only once
even if it needs to be validated by
multiple rules
Significant performance gains
Flexibility in when rules are executed
(per defined schedule or “on demand”
by invocation through API ..)
Select the rules from the
catalog above that need to
be executed together
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 29
Starting with the end goal in mind …
Common
Measurements
Data Rules
Metrics
DQ Dashboard
+ Reports
define
(bus.-driven)
test
deploy
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 30
Comprehensive reporting and tracking environment From high level dashboard to flexible views
30
Quickly assess the health of your
information in summary dashboard
view
Drill into specific
data quality
assessment
results
Understand the
details
in multiple
perspectives and
based on flexible
configuration
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 31
Flexible reporting back to the business
Reports: Over 80 out-of-the-box analysis reports as foundation
Control Output: Schedule execution; determine expiration policy; establish access controls
Custom Processing: API and CLI options to extract and utilize results in your environment and applications
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 33
Results Delivery / Publication REST API/CLI – Example: Generation of custom reports
Broad set of functions exposed through API beyond reporting needs
XML
Server
GET …
XSLT1
XSLT2
XSLT3
HTML
Report1
CSV
Report
HTML
Report2
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 36
Shared Health Delivers patient information at the point of care with IBM
Shared Health’s health information exchange, which connects patients, doctors, employers and insurers with medical information, was expected to grow substantially – 1.8 million to 3 million patients supported.
They needed to rapidly consolidate, standardize and manage information from third-party partners – insurers, labs, prescription drug clearing houses and health providers - that use a wide array of data sources and structures.
They wanted to improve healthcare delivery by making patient information available at the point of care.
Challenge
Shared Health is using IBM InfoSphere Information
Server: DataStage, Information Analyzer, and
QualityStage to create a single, accurate and trusted
source of information for populating its health record
repository, clinician portal and data warehouse.
Solution
Benefits
IBM InfoSphere Information Analyzer is helping
Shared Health understand the structure, content and
quality of data sources; this has helped uncover
missing, inaccurate and inconsistent data.
The solution’s ability to execute processes in parallel
has enabled them to perform analysis on millions of
rows with hundreds of columns in less than 2 hours –
a task that previously took over 24 hours.
IBM InfoSphere QualityStage is helping them
standardize the format of data such as names,
addresses and phone numbers so that it is displayed
as a single format regardless of the source.
IBM InfoSphere Information Server is enabling Shared
Health to successfully compete with larger, more
prevalent companies in their industry.
© 2011 IBM Corporation IBM InfoSphere Information Analyzer March 16, 2011 38 38
InfoSphere Information Analyzer
Identify data quality
issues early to reduce
project risks
Monitor quality metrics
over time for compliance
Create business
confidence with trusted
information
Results promotable
across IBM Information
Server
Perform data quality
assessment
Define business rules to
monitor data quality
Establish stewards for
governance of data
quality
Requirements
Benefits
Information
Analyzer
Assess data quality and
facilitate ongoing data quality
monitoring and exception
management