48347268-TrilliumSoftwareDataQualityEssentials.pdf

Embed Size (px)

Citation preview

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    1/22

    Data Quality Essentials

    For any Data-Intensive Project

    Summary: Projects that involve data: CRM projects, MDM initiatives, ERP implementations, Business

    Intelligence and data warehouse projects, data governance programs, migrations, consolidations,

    harmonizations all offer the opportunity to improve data quality. This paper is a phase by phase guide

    that identifies for both business team members and IT resources, what tasks should be incorporated into

    a project plan for each team function, as relates to data quality. This provides a roadmap for optimal

    effectiveness and coordination. These data quality essentials are based upon best practices collected

    from experiences on thousands of data management projects and successes over the past 30 years.

    Harte-Hanks Trillium Software

    www.trilliumsoftware.com

    Corporate Headquarters

    + 1 (978) 436-8900

    [email protected]

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    2/22

    TRILLIUM SOFTWARE

    About Tril lium SoftwareHarte-Hanks Trillium Software has been selected by companies worldwide, both large and

    small, to improve their operational and analytic business decisions through accurate and timely

    information. Trillium Software offers an integrated data quality suite that delivers complete end-

    to-end data quality life cycle management. The Trillium Software System is recognized ascritical to the success of customer relationship management, master data management, customer

    data integration, data warehouse, business intelligence, enterprise resource planning, supply

    chain management, e-business, and other enterprise applications, and data integration, data

    migration, data stewardship, and data governance initiatives.

    Designed for collaboration and information sharing, the Trill ium Software System lets businessesindividually define what data quality means to their organizations. The Trillium Software System iscomprised of:

    TS Discovery-Collaborate across business and IT resources to assess large volumes of datawithin and across systems. Robust data profiling capabilities allow users to understand datadomains, formats, patterns, and relationships as they exist within the data itself, as well as to see

    whether data conforms to specific business rules and defined data standards. Ongoingmonitoring assesses data to ensure that high quality is maintained at all times.

    TS Quality- Cleanses, standardizes, and matches any data: name and address data; productdata; asset, material, and location data; transactions; etc. World-class global capabilities andautomated, rules-based intelligence give organizations a simple but complete solution to handlemassive volumes of dataout of the box. Organizations can further customize rules and adapt tomeet changing business needs.

    TS Enrichment -Complements, supplements, and amplifies data by drawing on over 5000 third-party sources. This service provides a fully automated process for appending third-party dataseamlessly for storage and distribution.

    TS Insight -Monitor business rules and data quality metrics graphically through a customizableData Quality Dashboard. Use scorecards and trending information to communicate datainitiatives, results, and goals. TS Insight shows which data sources have data quality issues andwhich do not meet minimum corporate threshold and acceptability levels, helping you forecastand allocate the right resources to improve and optimize the business processes that impactthem.

    Usage NoticePermission to use this document is granted, provided that: (1) The copyright notice 2008 by Harte-HanksTrillium Software, appears in all copies, along with this permission notice. (2) Use of this document is onlyfor informational and noncommercial or personal use and does not include copying or posting the documenton any network computer or broadcasting the document through any medium. (3) The document is notmodified from the original version.

    It is illegal to reproduce, distribute, or broadcast this document in any context without express writtenpermission from Trillium Software

    . Use for any other purpose is expressly prohibited by law, and may result

    in severe civil and criminal penalties. Violators will be prosecuted to the maximum extent possible.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 2 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    3/22

    TRILLIUM SOFTWARE

    Project Managers Guide to Data Quality

    Incorporating an effective data quality solution into your project requires a number of

    additional activities throughout the lifecycle of your project plan. Some tasks are more

    suited for business user resources to take the lead on while other tasks are primarily

    technical activities.

    This white paper introduces some of the techniques used by successful companies to plan

    and successfully implement data quality processes as part of an initiative. While

    technology greatly facilitates and automates data quality management, it should be

    applied in accordance with a measurable, objective methodology to assure success and a

    high ROI for the project. As youll see in the pages to follow, process, people, and

    business expertise are major components in achieving an improvement in data quality,

    leaving technology as a way to automate and improve processes.

    This paper is aimed squarely at project managers and describes the step-by-step process

    for implementing data quality as part of a project. Specifically, this white paper highlights:

    The importance and ways to best involve business users in the project to ensure

    their needs are met

    Ways to accomplish a specific limited-scope project, while considering the big

    picture and the going concern of data quality within your organization

    How to incorporate technology throughout a project to expedite data quality

    initiatives

    The processes are best outlined in the following chart:

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 3 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    4/22

    TRILLIUM SOFTWARE

    Most projects have six phases. They are:

    1. Project Preparation defining a team, business objectives and assessing the

    risks involved in project completion for realistic project planning.

    2. Blueprint writing the plan and detailed designs to meet project requirements,

    while mitigating the risks discovered in phase one

    3. Implement executing the action plan for establishing new processes and using

    new technologies.

    4. Rollout Preparation getting the organization ready for the new improvements

    5. Go Live transition time when the new processes and/or technologies are first

    being utilized and being ready to resolve any issues that arise

    6. Maintain tuning the processes and technologies and getting ready for the next

    project. The process is almost never a one-and-done, rather it is iterative

    Each phase requires that both business people and technical resources work together to

    accomplish the project effectively and efficiently.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 4 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    5/22

    TRILLIUM SOFTWARE

    Phase One: Project Preparation

    In this phase, you will evaluate what resources and time are

    needed to execute the project and what issues, roadblocks and

    risks will need to be overcome. Start by setting up your team,

    defining the scope, expectations, and deliverables of the project,

    and conducting an analysis of the current state of your data.

    Define Project Team and Roles

    Ultimately, data and its level of quality must be supported by

    many people in the company, not just IT. Involving subject

    matter experts from affected business areas is an absolutely

    essential step necessary for success because successful

    interpretation and treatment of data is derived from both proper

    Syntax and Context.

    Syntax IT is generally very capable of conforming data to proper syntax

    with relative ease. An example of this might be that telephone numbers

    should all appear in the same format in a database.

    Context Business users are generally the best source of information

    regarding context, or the meaning behind the data. An example of this

    might be the repeated occurrence of an undocumented comment or code

    embedded in a name and address field.

    Each of the types of team members has clearly defined roles for making the initiative a

    success and must be accountable for his/her part. Here is how roles and responsibilities

    are typically defined with regards to the data quality initiative elements of a project.

    Role Responsibility

    ExecutiveLeaders(CIO, CFO, VP)

    Publicly endorse the data quality initiative. Foster support. Securefunding for project. Resolve issues and remove roadblocks.

    Line-Of-BusinessManagers

    Champion the cause. Interface between IT and business. Workwith the executive leaders to understand business objectives,remove barriers and political opposition, and influence politicalchange and cooperation between lines of business. Articulate thedata quality problem in terms of business value and affect changeto business users.

    Data Stewards Understand technology available to meet business objectives.

    Define what is possible and what is not. Develop a deepunderstanding of available data assets, usage, and issues. Drivespecific requirements, provide feedback, participate in UATactivities.

    InformationProfessionals

    Implement business rules for cleansing, standardizing, and de-duplicating the data, support data stewards, run day-to-dayoperations.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 5 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    6/22

    TRILLIUM SOFTWARE

    The right technology will help keep team members engaged and communicating. A data

    analysis environment with a central repository provides the right architecture for multi-role,

    multi-member projects where resources need to interact and communicate about source

    data and target environment designs.

    This architecture provides an infrastructure that allows a common understanding about the

    issues around the quality of data, recommendations for use of data, and whattransformations may need to take place as data is migrated.

    Identify Short and Long-term Business Objectives

    During project planning, business objectives are defined. As part of this task, both short-

    and long-term data quality goals should be identified. Short term objectives usually relate

    directly to the project and activities related to data movement and manipulation. Longer

    term goals usually take into consideration how the work being done on the immediate

    project can be leveraged by the organization and extended for further value.

    In the short term, begin improving data by starting small and keeping the scope well-

    defined. In the long term, keep in mind that if all goes well, you will have success, and you

    will be asked to replicate this success across the company.

    Scope

    Scoping draws clear parameters around the data you are capturing, moving, cleansing,

    standardizing, linking, and enriching, and its use. Each requirement must be assessed to

    determine whether or not the data involved in this project can or will meet the requirement

    to the satisfaction of the business. There are several basic questions to answer:

    1. Does the suggested data exist within the organization?

    2. What source or sources contain this data?

    3. What is the level of quality within each source, for this information?

    4. What cleansing, standardization, or de-duplication is necessary to meet the requirement?

    5. What problems or anomalies must be addressed as part of this project?

    In a data migration, for example, you might be looking for certain key elements to appear

    in the target data model. You may first need to confirm that the anticipated target data

    physically exists within source systems and may next need to determine the best source

    for the data, or most trusted source. If taking data from multiple sources, you may have to

    establish a set of standards that all source systems conform to in order to produce a

    consistent representation of that data in the new target system.

    Understanding the scope of the project early is key to its successful and timely delivery.

    Be sure to categorize the need-to-have data and the nice-to-have data. Be prepared to

    drop off the nice-to-haves if time becomes short or if the effort of moving, cleansing,

    standardizing, etc. outweighs the anticipated business benefit.

    There are ways to limit scope. For example, if youre integrating multiple data sources, will

    it be one large movement of data or several smaller movements? Will the data need to be

    the entire database, or is 6 months enough? Working through these issues with the

    business team and IT will keep the project on-time and on target, and will help manage

    expectations during the project lifecycle so there are no surprises as the project nears a

    close.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 6 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    7/22

    TRILLIUM SOFTWARE

    Analyze Current Technology

    Early in the process, it is a good idea to take inventory of the current data quality

    technology in place. Perform interviews of the key technologists and determine what is

    working and what is not meeting user expectations.

    If a process or technology exists that meets user expectations, can it be leveraged within

    the new solution? If so, can it also be leveraged for other solutions in accordance withlong-term business objectives? If not, is it a good source of standards or logic, which can

    be designed into a new data quality solution that offers more options for future growth?

    Often solutions to data quality problems might be point solutions, without regard to the

    entire enterprise. The key is to develop a solution that can serve the needs of the entire

    organization.

    Assess Data Risks

    Does your source data actually support the business objectives? During a data risk

    assessment, it is crucial to ensure that the available data satisfactorily meets business

    requirements. Much of the legwork for this analysis is performed by IT through mapping-

    data-to-requirements exercises and performing extensive data investigation of sourcesystems. Should questions arise, key business stakeholders should immediately be

    involved to ensure the project is ultimately successful in delivering what the business

    expects.

    If data does not meet expectations, what are the root causes of these gaps and how must

    they be addressed before proceeding with your project? Does project scope need to be

    revisited or do isolated requirements need to be classified as high risk?

    Use a data discovery process on the source data to determine if the data is viable. If the

    data cannot support key business requirements, the project is at a high risk of failure

    despite investments of time and money. Thus, before committing to development, first

    assess data to ensure that the project can ultimately meet user expectations.

    Data discovery is the process of discovering the unknown about your data by bringing

    together IT and business team members, familiar with the meaning of the data content

    and how the data is to be used. This team will address issues that arise early in the

    project lifecycle and create workable solutions. For example, you may not want to

    incorporate specific data elements into your CRM system if there is a high degree of null

    values, since the data will not meet your business needs.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 7 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    8/22

    TRILLIUM SOFTWARE

    Phase 2: Making the Blueprint

    In this phase, youll assess data quality issues in detail and

    begin to build a plan for improving data as part of your overall

    project. Together, key players from IT and the business define

    corporate standards for data, and a baseline measurement of

    your current state of data quality is documented. The baselinewill serve two purposes. It will help to enlist executive support

    by showing the business impact of poor data quality. Secondly,

    it allows you to tangibly show the improvement in the data at

    named milestones after the new system or solution is in

    production.

    Define Success Metrics

    Most organizations are cost-conscious, so it is necessary to

    produce a business case or cost justification for a new initiative.

    Even if not required, it is recommended that you quantify the

    impact of data quality processing, as a methodical way in which

    to measure the impact of your efforts and the value you areproviding to your organization. Though frequently overlooked

    during project execution, these numbers are necessary to drive

    future investments and promotion within an organization.

    Data Quality Metrics and Business Impact

    Define the data quality metrics which you will track. This may

    include both high level data-centric rules and specific rules that

    apply to a particular system or application, such as:

    Metric Business Impact

    Number of records with changes

    to address data fields forcompliance with USPS standards

    Number of duplicate recordsidentified

    Number of processed recordswith incomplete mailing addressbut with valid phone numbers oremails

    Affects the ability to complete marketingprograms that are on budget and successful.

    Affects billing effectiveness.

    Number of records with duplicate

    primary keys

    Unique keys must be generated by IT, thus

    causing delays in the project if unexpected.

    Blank values for critical fields ofdata such as quantity per box insupply chain data or shippingdimensions

    Does the customer get the right quantity ofitems ordered? Can the customer logisticallyhandle the package they receive?

    Adherence to standards such asthe metric or English systems of

    Do the same exact or similar parts exist in thesupply chain, but under different measurement

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 8 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    9/22

    TRILLIUM SOFTWARE

    Metric Business Impactmeasurement systems?

    Total dollar value of bills with noinvoices

    Total dollar value of invoices withno bills

    Is the billing system in compliance withregulations? Is revenue being reportedproperly? Are orders being fulfilled without apurchase order and invoice?

    Technology can play a significant role in uncovering data conditions such as those listed

    above and establish a recorded baseline of these conditions. Not only will technology help

    you organize and document results, but it can further be used to manage conditions going

    forward. Automated data profiling analysis and exception reporting along with drill-down

    functionality gives you results and the tools to involve non-technical users in the analysis

    of these results. You can set conditions, such as those listed above, and understand

    immediately to what degree the metrics are met.

    Formulate Communication Strategy

    A communication strategy should also be put in place at this time. Key business users

    have defined metrics and identified how those metrics can be related to the business to

    quantitatively demonstrate value, but how will the organization hear about the upcoming

    results?

    Moreover, do other members of the business community agree with the relationships that

    have been identified between data, its quality, and the impact of that data on the

    business? For an effective and useful ROI, it is essential to establish buy-in as part of an

    early communication plan task, and follow up with updated metrics at pre-determined

    milestones.

    Define Standards

    Project team members representing the business play a key role in standards definition.The team members involved in this step should be a fair representation of the ultimate

    user audience. For example, if the end user audience will include sales and marketing

    and potentially shipping, someone from each of the named departments should be

    involved in defining system standards. Also, a representative from each of the company's

    departments should act as a data steward to make sure data adheres to the defined

    standards in the new system, if not ALSO in the source systems.

    With every business, there are certain standards that can be applied to every piece of

    data. For example, a name and address should almost always conform to postal

    standards. E-mail addresses conform to a certain shape with a user name, internet

    domain and an @ sign in the middle. However, there may be data for which your team

    needs to define a new standard. This is typically a part number, item description, supply

    chain data, and other non-address data. For this, you need to set the definition with thebusiness team. As part of the process, explore the current data, decide what special data

    exists in your required fields, and establish system standards that can then be automated

    and monitored for compliance.

    Create Executive Buy-in

    While the technical team members are busy with technical designs for the new system or

    solution, business team members are ensuring that the efforts have positive impact on the

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 9 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    10/22

    TRILLIUM SOFTWARE

    business. Most project managers find it helpful at this point to seek out the endorsement

    of a ranking executive. Using the data quality metrics and business impact generated in

    the previous step, keeping the executives in the loop about your initiative will help you

    maintain your endorsement of the data quality initiative, foster support, and secure funding

    for future projects or additional resources. If there are any internal political challenges,

    executives can help resolve issues and remove roadblocks. If they are already well

    informed of your efforts, status, and potential positive impact, it will be much easier toinvoke their support.

    Access Data

    At this point in design, it is necessary to take a deep dive into data extracts, representative

    of the actual data that will be used as part of the production system. The purpose here is

    to understand what mappings, transformations, processing, cleansing, etc. must be

    established to create and maintain data that meets the needs and standards of the new

    system or solution.

    IT resources are generally responsible for defining data extracts and gaining appropriate

    access to source systems. This data can then be shared with other team members to

    support detailed design tasks.

    Analyze Source Data

    The same principles and benefits of a collaborative approach between IT and business

    team members (described earlier for risk mitigation analysis) are relevant for the in-depth

    source system analysis necessary for effective and efficient detailed design. Most often, a

    collaborative effort is hindered by the fact that it becomes time consuming and inefficient to

    involve business users in the process to clarify data questions because they lack the

    technical skills necessary to self-sufficiently access data, investigate anomalies, and thus

    offer insights to drive design. Advanced data discovery tools eliminate this challenge if

    they offer an intuitive interface through which business users can do all the tasks named

    above, independent of IT, but for the purpose of collaboration with IT.

    It is very feasible to leverage the same technology used for risk mitigation and data metricdefinition to now help with this up-front analysis of the data. Additional functionality,

    available within a data discovery tool, presents users with statistics, results, and a window

    into the data, so that information can be easily digested and navigated by IT and business

    users alike. A special purpose data browser makes it possible for users to identify and

    review issues in the data, collaborate and reach consensus about what should be done.

    Use technology to facilitate source system analysis. Employ advanced profiling and data

    discovery functions for comprehensive column and attribute analysis. Identify potential

    problems within structured data fields such as dates, postal codes, product codes,

    customer codes, addresses or any attributes that should conform to a particular format

    and structure. Configure custom data quality rules, and flag any attributes that do not

    conform.

    When youre done with the analysis, you will have a very good idea of the challenges you

    face in integrating data and the information necessary to develop designs that address the

    challenges proactively. At this time it is also a good idea to revisit the project plan and

    confirm that appropriate time and resources have been allocated to deal with any data

    issues that have been uncovered.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 10 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    11/22

    TRILLIUM SOFTWARE

    Capture a Baseline

    Business team members have defined the data quality metrics and business impact in a

    previous step. Now is the time to take a baseline measurement. As part of the source

    system analysis, a baseline of each source system should be captured and stored as well

    as how multiple systems conform to expected metrics or business rules. In some cases, it

    will make sense to look not only at each source system in isolation, but across systems.

    Data Architecture & Schema/Data Model

    As the data model is being developed, a crucial step often overlooked is confirming that

    the source data supports the anticipated data model design. The best way to have

    confidence that this is the case is to reverse engineer the data and understand the

    relationships that naturally exist within the data. This should occur independently of

    metadata and system documentation, and should be a complete reflection of the data

    itself. Here again is an opportunity to leverage the technology that the team has been

    using and is comfortable with: a data discovery tool will already contain all the source

    information required for this analysis and should contain the functionality to display a data

    model or schema that represents the native state of the data itself. These schematics can

    then easily be compared and cross-referenced to intended data model plans by the data

    modeling team, saving them potentially weeks of manual efforts and missed exceptions.

    Data Architecture & Platforms

    There are any number of different data quality solutions that may be built as part of your

    overall project. The biggest consideration for a non-custom or in-house data quality

    solution is whether or not the technology you are acquiring supports process execution on

    all platforms of the source and target systems within your given project.

    To take things a step further and offer more long-term value however, as the designs are

    being set, and if technology investments are being made, revisit the long term business

    objectives outlined during the Blueprint phase. Evaluate vendor tools using your longer

    term vision and ensure that you have options for future connectivity requirements. Provide

    your organization with the flexibility to extend the data quality processes you design for

    your immediate project to other systems, in different environments, and on other platforms

    that exist within your technical enterprise infrastructure.

    Develop Test Case Scenarios

    As you examine your data, you will uncover patterns and common occurrences in the data

    that require resolution. For example, names may appear in your CRM sources in any one

    of the following formats:

    Smith, John John Smith Smith/John John and Jan Smith

    Its up to you and your business team members to decide how to standardize each of

    these name formats for optimal efficiency in the target systems. Should John and Jan

    Smith be linked, but separate records in your master file or remain as a single entry?

    Set up a test file or database of records that present these common data situations for QA

    purposes during this stage of the project. A quality assurance (QA) task will be completed

    prior to going live with new data. This test case scenario definition effectively begins to

    build a list of data quality anomalies, which you can leverage to build and test business

    rules and quality processes. Some of the business rules and test cases will come standard

    with the cleansing process of packaged data quality solutions. These should be highly

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 11 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    12/22

    TRILLIUM SOFTWARE

    tunable to meet your organizations specific needs. Others, you can begin to build, based

    on your needs.

    Define Exceptions Process

    In a data quality process, an exception occurs when a piece of data cannot be interpreted

    by the business rules and process engine your team has defined, i.e. an address does not

    contain enough information to be verified with the USPS standardization business rules.

    When a data quality exception occurs, the data steward must resolve the exception and

    decide whether the anomaly is an unusual occurrence, or whether new rules should

    become part of the data quality process. Your project should define a clear way to handle

    exceptions, including automated distribution (of error records) where possible, areas of

    responsibility for correcting, and a method to report anomalies back to the source.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 12 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    13/22

    TRILLIUM SOFTWARE

    Phase 3: Implement

    When all the planning is done, its time to begin to put the

    technology in place to improve data using automation wherever

    possible. For the technology resources implementation tasks,

    we recommend Trillium Software Data Quality Methodology, a

    white paper detailing how to standardize, enrich, and matchdata, and how to fine-tune the business rules to optimize data.

    Although this is the most technical of the plans phases,

    business users still play an important role in this phase.

    Create User Acceptance Test Plan

    As team members create User Acceptance Test (UAT) plans,

    additional considerations should be incorporated that investigate

    and display the results of data quality processes that have been

    built into the new system or solution. As a result, UAT should

    include not only testing of new functionality and/or reports, but

    should also be prepared to include data quality test case

    scenarios. Test data should include both good and problematicinput data so that a wider business audience (UAT resources) is

    forced to confirm that the data quality processes are producing

    desirable results.

    Create Data Quality Processes

    During the implementation phase, the technical team puts

    together the data quality process defined and designed during the Blueprint phase. This

    usually includes cleansing, standardization, enrichment, and matching/linking processing.

    For more details and best practices on data quality process creation, refer to Trillium

    Software Data Quality Methodology, another white paper available through the resource

    library reached via www.trilliumsoftware.com.

    QA Initial Results

    Results. The most important part of the data quality process should be that business

    users are happy with the results. As you begin to implement new data quality process

    designs, project managers should have business users run sample data through the data

    quality processes to ensure results meet their expectations. Business users can compare

    results before and after processing with the same data discovery tool that they have been

    using all along. Coarse-tune processes using sample data, then switch over to a complete

    data set for formal QA.

    Once results have been verified, its time to load sample data into the target applications

    and begin testing it more thoroughly. By taking the extra step with the business during the

    QA cycle, youre much more likely to be successful the first time you load data and will

    avoid loading and reloading data repeatedly.

    Validate Rules

    In phase one and two, youve both determined what you have and what you need. Rules

    are developed in an iterative analytic process. This requires access to knowledge about

    intended meaning of the data. Business users and data analysts should work together on

    this process, applying the same technology and process described for analyzing source

    data, if additional questions come up. Give business users an opportunity to set up test

    data scenarios and allow them to review the results after the cleansing process.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 13 of 22

    http://www.trilliumsoftware.com/http://www.trilliumsoftware.com/
  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    14/22

    TRILLIUM SOFTWARE

    This is also your opportunity to review and add your specific terminology, e.g., industry-

    specific terms, company-specific definitions and regional colloquialisms) not initially part of

    the standardization terminology. This is a chance also to determine whether they will

    require geography-specific standardization.

    Tune Business Rules and Standards

    You may find that some of the initial data quality process design does not meetexpectations or act as expected. Business team members should be able to interact

    directly with a rules-based engine and tune the rules to produce results more in line with

    expectations. This requires an intuitive interface and tuning tools to be built into any

    products that are purchased to facilitate data quality processing.

    Involving business team members directly with the tuning process ensures that the rules

    exactly meet their needs and removes the risk of failed expectations late in the game.

    Integrate Data Quality Processes wi th Applications and Services

    Once data quality processes are designed and tested, and business team members are

    well underway validating and tuning rules, the processes are ready for integration into one

    or more applications and/or services. If architected with an eye to the future, data qualityprocesses should be reusable across multiple systems, platforms, and applications.

    Vendor products should likewise support the principles of reuse across the enterprise, to

    support growth over time as well as providing flexibility of deployment options within your

    project.

    Although your projects solution may not need the capability to expand and grow over

    time, having the option of extending to any and all applications, even those that may come

    to your company through mergers and acquisitions is a feasible best practice that should

    be seriously considered. This also includes the ability to carry the rules from one

    application to the next.

    This will likewise meet any need to expand from a scheduled batch process to a multi-use

    real-time process wherever and whenever you need it within enterprise systems.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 14 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    15/22

    TRILLIUM SOFTWARE

    Phase 4: Rollout Preparation

    During this phase of a project plan, business users and IT must

    determine how and when the development environment is

    migrated to production. Before that however, UAT must be

    completed and users must be trained on the changes they will

    encounter when using the new system or solution. The HelpDesk must also be properly prepared and able to answer any

    questions that arise as a result of the new technology or

    changes to existing technology.

    Execute User Acceptance Testing Plan (UAT)

    The User Acceptance Test plan should include a record of the

    business users sign-off of the documented scenarios and the

    data quality processes that influence automated changes.

    Some different types of UAT strategies include:

    New System Testapplication to be tested is

    entirely new (not an enhancement or systemupgrade).

    Regression Testamount of change in an existingapplication requires a full system retest.

    Limited Testamount of change in an existingapplication requires only change-specific testing.

    Data discovery tools can here again help aid the project during

    the UAT process by giving both business users and technical users a view into the data.

    Teams can collaborate and view the results of any data quality process, before and after

    the process is run.

    Its valuable to test inside the target application, too. Things to test include:

    particularly important when using a real time interface into the data

    systems and applications that interface with the ones included in your project

    used throughout the project, or otherwise, to

    quickly address any questions that arise.

    r

    they need to use the new solution, to facilitate end-user adoption. Make them

    aware of:

    or pop-ups requesting validation of automated cleansing and

    The positive impact and business benefits of new, cleaner data

    Al l formsquality tool

    Al l reportsensures the results from the reports are as expected

    Test scenariostest the results of the data quality processs impact on

    Throughout your UAT, make sure your business users have easy access to the data,

    whether it be through tools and technologies

    User Training/Help Desk Training

    Users must be made aware of new applications going online and the Help Desk shouldknow who to call to escalate any technical issues. Effective user training is a critical facto

    for a successful implementation. Here, the goal is simple: give your users the skills and

    confidence

    Any new required fields or formats as they enter data into the system

    Any new screensmatching of data

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 15 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    16/22

    TRILLIUM SOFTWARE

    The involvement of both business users and IT users in the process of creatinghigh quality data.

    Production System Cutover Plan

    As the system is rolled out to end users, the operations and support team should have all

    the tools, processes, and knowledge to support them. A plan for transitioning from the

    project team to the operations and support team is crucial.

    Most product managers will create a schedule and plan to engage the new system with

    newly cleansed data. The migration to production generally occurs during an off-peak

    time. The decisions you need to make include training, if/how to phase the rollout,

    expertise needed when the cutover occurs, whether to run multiple systems (old and new)

    in tandem, and if so, for how long, whether to hire additional resources (e.g., consultants

    or contractors) to assist, and any additional security considerations.

    Successfully Complete Initial Cleanse/Load

    For many projects, the first step of going live involves an initial load or an initial cleanse

    process. Data is rarely loaded without encountering errors during the extraction,

    transforming, and loading of data. The errors generally fall into these categories.

    Incomplete errors consists of missing records or missing fields. What is notbeing loaded and what should happen to those records or fields absent ofdata?

    Syntax relates to data formatting and how data is represented. Is data theright shape? Does the data fall within value range?

    Semantics conveys what data means. Is there hidden value in unstructureddata? Are there names in address fields, despite compliance with correct datashape? Are there slightly different duplicate records?

    If you have executed the tasks communicated so far, you have significantly reduced the

    likelihood of any of the above-mentioned issues from occurring on your project. By taking

    the time upfront to thoroughly investigate source system data, incorporate necessaryprocessing into your designs, and perform UAT that includes anticipated problematic data

    conditions, you have proactively addressed the issues that cause most project teams

    severe headaches late in the game.

    Should something unexpected occur and require attention, you already have the

    resources and infrastructure in place to quickly react: your team of both IT and business

    users is already familiar with the project, the data, and any technology you have been

    using (i.e., your data discovery tool) and can swiftly look at the data and assess the

    problem for a quick resolution.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 16 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    17/22

    TRILLIUM SOFTWARE

    Phase 5: Go Live

    Congratulations you are going live! During this phase your

    team will turn on the switch and your new data quality processes

    will begin to provide immediate benefits to your organization.

    The fruits of your labor will begin to be realized.

    SWOT Team

    At this stage, its a good idea to have in place a cross-functional

    SWOT (Strengths, Weaknesses, Opportunities, Threats) team

    including business analysts or departmental resources familiar

    with business processes, performance engineers, data

    architects, field technicians, and contacts from any vendors, to

    be available on an emergency basis to provide rapid problem

    resolution.

    Teams may adopt different processes to help them understand

    the problem presented and to design a response. Practitioners

    using problem-solving processes believe that it is important toanalyze a problem thoroughly to understand it and design

    interventions that have a high probability of working.

    The intent is to intervene early after a problem is identified and to

    provide ways by which that problem may be alleviated and the

    corporation can achieve success.

    Teams should meet to complete a post mortem, discussing how the project went and how

    to further improve on data quality during the next round.

    Problem Resolution

    All support organizations have some form of processes and procedures in place for

    helping to resolve user and system-generated queries, issues, or problems in a consistent

    manner. In some organizations these processes are very structured; in others they are

    more informal.

    In addition to efficient processes, it is also very important that the support team have well

    defined roles and responsibilities to reduce response time to customer needs. Here is an

    example of an escalation hierarchy, along with the individuals who perform these tasks, for

    a fairly large implementation. For this example, when a problem is identified, it is escalated

    as follows.

    Tier 1: Help Desk - Help Desk technicians provide first-line support to the user community

    and perform any additional training and remote operations to resolve issues. If the help

    desk is unable to resolve the issue, it is escalated to Tier 2.

    Tier 2: Informational Professionals - Information Professionals are typically more aware of

    the data aspect of the operation than the Help Desk. With the aid of a data discovery tool

    and access to the end-user application, they troubleshoot the issue. If the information

    professionals are unable to resolve the issue, they escalate it to the Data Stewards at

    Tier 3.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 17 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    18/22

    TRILLIUM SOFTWARE

    Tier 3: Data Stewards - Depending on the nature of the problem, Information

    Professionals can contact Data Stewards, who tend to have an enterprise view of a data

    subject area, as opposed to knowledge of data and processes within a given application

    only. Although most issues are resolved at the first three support tiers, in rare cases,

    issues can get escalated to Tier 4.

    Tier 4: Project Managers - An issue usually reaches this level if an architectural change isrequired to resolve it. The project managers will have to analyze the situation and take the

    appropriate actions to resolve it.

    The hierarchy just described is merely one example of a support and escalation hierarchy.

    No matter what type of support hierarchy you have, it is crucial that each group within it

    understand its role and responsibilities. Moreover, the team must be able to quickly

    resolve or escalate any issues that arise.

    Post Mortem

    Re-run your baseline processes and collect updated results for a quantified measurement

    of your impact. Gather up your metrics, your support log, your exceptions processing log,

    and other relevant documentation. Call a meeting to:

    Ensure that the project met the business objectives

    Ensure that the project met the outlined success criteria

    List the lessons learned during the project; use it as input to improve future project

    delivery

    Conduct performance reviews for team members

    Perform Ongoing Data Quality Processing

    Now that your system is live, you not only have cleansed historical data loaded into the

    system or solution, but your ongoing data quality processes should be keeping newincoming data free of the problems identified and prepared for.

    Define Monitor ing Processes

    Given that all systems and processes are assumed to be operating well, it is now time to

    ensure that appropriate monitoring processes are in place. Regularly scheduled data

    audits are a great way to ensure that data continues to meet expectations, and highlights

    any areas of quality that have slipped or new problem areas that have become evident.

    To facilitate this process, many organizations leverage the technology used for risk

    assessment, baseline measurements, source system analysis and design, and user

    acceptance testing. For example, a data discovery tool in which you have trained

    business users to use to investigate data, collaborate with IT, and measure data metrics,as well as all the knowledge capital built into the environment is easily adapted to perform

    scheduled audits and ongoing monitoring. Email workflows can be defined to alert key

    players when problems arise or when problems exceed or fall below defined thresholds.

    Monitoring ensures that you continue to meet or even exceed user expectations over time

    so that your data assets become a trusted source, actively USED by the business.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 18 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    19/22

    TRILLIUM SOFTWARE

    Phase 6: Maintain

    In most religions of the world, there is a day to reflect on the

    good work youve done, admit your shortcomings, and set a

    plan in place to improve. Phase six is that day for those who

    believe in data quality. It is also a time of joy, however. In this

    phase, the fruits of your labor will be realized and you should notbe shy about telling the world what you have accomplished.

    Announce Successes

    One of the keys to maintaining funding for your project is to

    internally publicize the successes youve had. In reality, a data

    quality initiative should be constantly re-sold at every

    opportunity, to continue to reinforce in peoples minds, the value

    you are introducing to your organization.

    Ways to communicate your success include:

    Create a monthly data quality email update Establish a presence on the company Web site or

    intranet

    Ask the sponsor(s) to send out a memo about theproject from time to time. Feed the sponsor businessbenefit information such as money saved on marketingmailings, improved marketing sell-through rates,improved inventory and supply chain savings, etc.

    Identify and work closely with a select group of users to help with thecommunication. Identify what they are contributing to the project, and publishthat information.

    Recognize the customers/users of the data first and how they are benefitingfrom your improved data.

    This is also a very good time to remind the company that data quality is everyones

    problem and ways they can help solve data quality issues.

    Monitor

    You can keep track of data quality in a number of ways. A full analysis with your data

    discovery tool is one way. Each time you compare it to your baseline or the previously

    measured baseline, you will get a very detailed idea of how your data quality initiative is

    progressing.

    Some tools include a way to automatically keep track of data quality, too. For example,

    they may include an e-mail notification feature to inform key personnel when business

    rules are violated such as when data values do not meet predefined requirements,

    threshold values are exceeded, or nulls are present where unacceptable. These powerful

    features prevent errors from impacting your business, should your enterprise use data

    sources that are prone to change.

    Data stewards, system owners, and/or key business contacts can receive alerts on critical

    changes and errors. The tools can then allow users to call up the violation(s) and drill

    down on the error(s).

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 19 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    20/22

    TRILLIUM SOFTWARE

    Organizations with action-oriented governance programs use such features to alert key

    stakeholders and responsible parties of data anomalies. Each day, stewards can address

    the issues at hand and create prioritized tasks to resolve the issues identified.

    Collect New Requirements for Next Phase

    With your success in hand, its time to begin gathering requirements for the next phase.

    Business users will be inspired by the new intelligence available to them and begin to askfor additional data. They may want you to expand the systems exposed to your newly

    developed data quality process, add additional data sources to your new system or

    solution, or incorporate new systems and applications if your company is in acquisition

    mode. If you play it right, word will get out about your successes, and your solution and/or

    data quality services will be in demand.

    Hold a meeting or series of meetings, to gather new requirements for version 1.1 of the

    project.

    Manage Change Requests/Exceptions

    A change request conveys a major change to the project/requirements of your new

    system. Having passed UAT, users may now see additional opportunities to improvebusiness processes, and a way to manage these requests is necessary. Most project

    managers feel that every project should have some formal change request process and

    that every request should follow the process. A simple change request process might look

    something like this:

    1. User submits a change request.

    2. The assigned resource, perhaps the data steward, assesses the change request

    to see if it's worth investigating. Compare the benefits to how difficult the change

    is (impact). Weed out the obvious change requests with limited benefit and place

    them in a nice-to-have list for future reference. Assess risks in making the

    changes necessary.

    3. The data steward and project manager should document and communicate the

    assessment to key stakeholders.

    4. If the change sounds reasonable and has obvious benefit, ask the corporate

    sponsor to accept the changes in schedule, cost, quality and risk. Remain

    objective and let the sponsor decide if the change request has merit.

    5. Communicate the schedule and status of the change request to key stakeholders.

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 20 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    21/22

    TRILLIUM SOFTWARE

    Trillium Software Value Proposition

    We have covered a lot of ground herein, but if you distill this paper down into its essence,

    there are just six guiding principles to use to incorporate a successful and complete data

    quality strategy, regardless of whether you have a project that is related to CRM, ERP,

    SCM, CDI, DW, BI, MDM, SOA, data harmonization, OLTP, legacy migration, and so on.

    When building a solution for data quality, make sure your solution is designed forexpansion over time, as your organization develops new needs and faces new

    challenges. The guiding principles that will best prepare you for extension and growth

    over time recommend a solution that is:

    Principle Description

    Comprehensive Deliver fit-for-purpose data for all types of data, everywhere, anytime. It includes the

    ability to support global business, not just information from the US and UK, but from

    China, Japan, Germany, Mexico, etc. Not just single byte, but double byte data. Not

    just name and address data, but all types of data. Important for

    consolidation/migration projects with international or cross-functional reach.

    Intelligent Contains intelligence to identify and address problems in context so you do not have

    to apply heavy human resources to fix the data. Important for lowering IT costs and

    saving you money on human resources.

    Seamless Does your solution have the capability to expand and grow over time, extending to

    any and all applications, even those that may come to your company through

    mergers and acquisitions? Important when you want to apply data quality to key

    enterprise applications.

    Dynamic Can you quickly and precisely change the rules if you need to, to adapt to meet

    changing business needs? Important because business models and business

    processes can quickly change as technology advances.

    Measurable Can you measure that your solution is working both immediately and over time? Does

    it produce quantifiable results that can impact business? Important as an internal self-

    justification of your team, a way to continue improvement, and a way to justify

    expenditures.

    The Trillium Software System answers these challenges with a scalable, flexible

    framework that supports the integration of data quality processes into any system, at any

    time, anywhere in the world. From tactical projects to strategic practices, the Trillium

    Software System increases integration efficiency, lowers development costs, and provides

    faster return on investment (ROI) from data quality initiatives through:

    Modular software design

    Universal connectivity components

    Architecture-neutral core technology

    Portable, reusable resources

    Tunable processes

    Expandable, global support

    Copyright 2008 Harte-Hanks Trillium SoftwareAll rights reserved Page 21 of 22

  • 7/25/2019 48347268-TrilliumSoftwareDataQualityEssentials.pdf

    22/22

    TRILLIUM SOFTWARE

    The Trillium Software System facilitates near-instantaneous replication and portability

    across practically any platform or system. It lets you leverage efforts from one Trillium

    Software System implementation across new projects and the entire enterprise,

    dramatically reducing costs in multiple implementations and allowing you to easily create,

    propagate, and maintain an enterprise data quality standard.

    Trillium Software System includes TS Discovery, a data discovery tool that you canleverage across the life of your project. It facilitates the inclusion of business users in all

    phases of your project to reduce risk, and better meet the demands of the business. The

    Trillium Software System also includes TS Quality, a rules-based engine that promotes

    reusable data quality processes, business user-defined and -tuned rules, and the most

    deployment options of any data quality product on the market. The Trillium Software

    System also offers TS Enrichment, a data enrichment service available for supplementing

    and increasing data available within source systems.

    Copyright 2008 Harte Hanks Trillium Software