Datastage to ODI

IBM Datastage to Oracle Data Integrator

Nagendra Kandala

ETL vs ELT Transform in Separate ETL

Server Proprietary Engine Poor Performance High Costs IBM & Informatica’s approach

Transform in Existing RDBMS Leverage Resources Efficient High Performance

ETL Hassles

Maintenance overhead - Since ETL uses the mechanism of transforming data in separate server it requires maintaining extra hardware

Low Flexibility – Dependency on ETL tool vendor Multi-layered architecture –Data needs to travel across one more layer before it lands

into data mart. No near-real-time-processing - Inefficient operation of Business process workflows

engine. ETL operates more as “IT data integration process workflow engines”. Demands Specialized skills and Learning curve for implementation Developer must define every step of complex ETL flow logic Slowly Dimension implementation requires complex logic and longer duration

ELT Approach High Performance - ELT is parallelized according to the data set, and disk I/O is usually

optimized at the engine level for faster throughput High Flexibility - ELT leverages RDBMS engine hardware for scalability. When the

RDBMS engine receives upgrades, performance tuning, additional hardware, ELT engine takes advantage of it right away.

Lightweight – No middle tier. No extra servers for deployment of jobs. No stress on Network – Transformation is done on the RDBMS server once the

database is on the target server and doesn’t need a separate transformation server Knowledge Modules take care of complex transformations like Slowly Changing

Dimension implementations relatively easy Simple transformation specifications via SQL and easy processing of XML Declarative based design simplifies the number of steps needed to perform complex

conversion

DS Vs ODI FeaturesData stage ODI Details

Job Sequence Load Plan This is used for kicking off flow of tasks in a sequence with dependencies defined

Job Sequence Package

This is the intermediate block in both the tools.This is used to combine a subset of detail tasks to form a group which can be reused if needed

Job run ScenarioThis is a wrapper around the detail task(job/interface).This is required for execution.

Job Interface

This is the basic block in both the tools. This defines the various transformations needed to translate the data from the source and load it into the target as required.

Source/Target DatastoreThis imports the metada objects as needed for defining sources and targets

Containers FolderThis is to group the objects based on their functionality. Data stage has one level , ODI can have multi level.

Data Stage Architecture

Oracle Data Integrator High-level Architecture

Before Migrating to new approach

Oracle Data Integrator edge

Approach One : Using Converters

We can use converter tools available in market like AnalytiX DS – LiteSpeed and Stagekillaz

Expected approach is we extract DataStage export file (DSE XML) and upload it to an ODI metadata repository by using these tools

However, there might be few custom changes which can be hard to migrate from older system e.g.- custom scripts , new subject areas

Accuracy depends on the complexity of customizations

Approach Two: ODI KM`s

Using ODI Knowledge Modules to leverage complex codes existing in old systems Knowledge Modules (KM) are components of Oracle Data Integrator’s Open Connector

Technology that are generic, highly specific and reusable code templates which defines the overall data integration process.

Each KM contains the knowledge required by ODI to perform a specific set of actions such as connecting to this technology, extracting data from it, transforming the data, checking it, integrating it, etc.

Approach Two contd.

KM Types : RKM (Reverse Knowledge Module) are used to perform a customized reverse-engineering of data models

for a specific technology. It extracts metadata from a metadata provider to ODI repository LKM (Loading Knowledge Module) are used to extract data from heterogeneous source systems (files,

middleware, databases, etc.) to a staging area. target structure JKM (Journalizing Knowledge Modules) are used to create a journal of data modifications (insert, update

and delete) of the source databases to keep track of changes IKM (Integration Knowledge Module) are used to integrate (load) data from staging to target tables.

These are used in Interfaces. CKM (Check Knowledge Module) are used to check data consistency i.e. constraints on the sources and

targets are not violated. SKM (Service Knowledge Module) Data Services are specialized web services that enable access to

application data in datastores, and to the changes captured for these datastores using Changed Data Capture.

Approach Two contd.

KMs driving SQLs Advantage of this approach is that developer can analyze extensively the effects of

migrating to new system The Driver SQLS are tested , validated and used in the IKM and LKM of ODI where they

can be called in source or target commands as required Scripts like Shell/PLSQL for integrating the custom changes can be handled efficiently

using KMs Accuracy is a bit higher than using conversion tools since all the customizations are

considered and has more visibility

References

http://www.ateam-oracle.com Raiffeisen International KPI Partners