224756455-Datastage-Material.pdf

Embed Size (px)

Citation preview

  • 8/15/2019 224756455-Datastage-Material.pdf

    1/110

    ContentsIntroduction to Datastage ............................................................................................................................. 3

    History of Datastage ..................................................................................................................................... 3

    Architecture of Datastage ............................................................................................................................. 3

    Components of Datastage ............................................................................................................................ 3

    How to create Project: .................................................................................................................................. 3

    Steps to Create Sample Job: ......................................................................................................................... 5

    How to Open DS Designer ......................................................................................................................... 6

    What is Job? .............................................................................................................................................. 6

    Types of Jobs? ........................................................................................................................................... 7

    Creating a Datastage Job: ......................................................................................................................... 7How to Create ODBC Connection. ............................................................................................................ 7

    Business Requirement: ......................................................................................................................... 9

    Add Different Stages.(Add SQL Enterpirse, Oracle enterprise) .......................................................... 10

    Configure the Stages(Source Stage and Target Stage) ....................................................................... 10

    Compile the Job .................................................................................................................................. 12

    Populating Surrogate Key in Order Method Dimension Table: .............................................................. 12

    Surrogate key File Creation ..................................................................................................................... 15

    RCP(Runtime Column Propagation) ........................................................................................................ 16

    Employee Dimension Population ............................................................................................................ 18

    Parameters: ................................................................................................................................................. 21

    Global Parametres: ................................................................................................................................. 22

    ORDER METHOD DIM POPULATION Using Insert and Update. .................................................................. 25

    Change Capture Stage: ................................................................................................................................ 25

    Datastage Administrator Activity. ............................................................................................................... 26

    Deleting a Corrupted Project .................................................................................................................. 34

    Restarting RTI server ............................................................................................................................... 34

    Adding Oracle DSN Entries ...................................................................................................................... 35

    Configuring .odbc.ini file to add DSN for DB2 connectivity: ................................................................... 36

    Configuring uvconfig file for avoiding Timeout: ..................................................................................... 40

  • 8/15/2019 224756455-Datastage-Material.pdf

    2/110

    Steps for Creating a New Datastage Project ........................................................................................... 43

    Deadlock Daemon Locks using UNIVERSE Commands ........................................................................... 45

    Deleting a Project using Datastage Administrator .................................................................................. 47

    LDAP Configuration ................................................................................................................................. 49

    Package Installation guidelines ............................................................................................................... 52

    Releasing Resource Locks using UNIVERSE Commands .......................................................................... 59

    Renaming Datastage Project Using Universe.......................................................................................... 61

    Restart of DataStage 7.5.1a Services ...................................................................................................... 62

    Restart of DataStage 8.1 Services ........................................................................................................... 66

    Restarting RTI Agent ............................................................................................................................... 75

    Overview of Datastage Stages: ................................................................................................................... 76

    Aggregator Stage : ................................................................................................................................... 76Change Apply Stage ................................................................................................................................ 82

    Takes the change data set, that contains the changes in the before and after data sets, from theChange Capture stage and applies the encoded change operations to a before data set to compute anafter data set. .......................................................................................................................................... 82

    Filter Stage : ............................................................................................................................................ 84

    Funnel Stage : .......................................................................................................................................... 88

    Join Stage : .............................................................................................................................................. 91

    Lookup Stage : ......................................................................................................................................... 95Merge Stage : Join Stage combines a sorted master data set with one or more update data sets.The columns from the records in the master and update data sets are merged so that the outputrecord contains all the columns from the master record plus any additional columns from each updaterecord. 96

    Modify Stage ........................................................................................................................................... 98

    Pivot Stage : .......................................................................................................................................... 103

    Remove Duplicates Stage...................................................................................................................... 104

    Surrogate Key Generator Stage ............................................................................................................ 105Switch Stage .......................................................................................................................................... 106

    Compress Stage : ................................................................................................................................... 109

    Expand Stage ......................................................................................................................................... 110

  • 8/15/2019 224756455-Datastage-Material.pdf

    3/110

    Introduction to Datastage

    History of Datastage

    Architecture of Datastage

    Components of Datastage

    How to create Project:1) Login to Datastage Administrator

    2)

  • 8/15/2019 224756455-Datastage-Material.pdf

    4/110

    3) 4) Go to Project Tab and Click on Add button

    5) 6) Provide the name of the project you want to create.

    7)

    Click on Project Properties provide the Access to users.

  • 8/15/2019 224756455-Datastage-Material.pdf

    5/110

    Steps to Create Sample Job:

    1) Understand the Business Requirement2) Open Datastage designer3) Create Job(Parallel)/Server Job4) Add stages5) Configure each stage

    a. Datasource Infob. Table Names, Column namesc. Mapping

    6) Save, Compile and Run

  • 8/15/2019 224756455-Datastage-Material.pdf

    6/110

    How to Open DS Designer

    Enter Datastage User Name and Password and select the project you want to work and Click onOK

    What is Job?• Executable DataStage program

    • Created in DataStage Designer

    • Built using a graphical user interface

    • Compiles into Orchestrate shell language (OSH)

  • 8/15/2019 224756455-Datastage-Material.pdf

    7/110

    Types of Jobs?

    Creating a Datastage Job:

    How to Create ODBC Connection.

  • 8/15/2019 224756455-Datastage-Material.pdf

    8/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    9/110

    Business Requirement:

    Gosales(MSSQL) Gosalesdw(Oracle)Order Method ORDER_METHOD_DIM

    OrderMethodCDOrderNm ORDER_METHOD_ID ORDER_METHOD_DESC

  • 8/15/2019 224756455-Datastage-Material.pdf

    10/110

    Add Different Stages.(Add SQL Enterpirse, Oracle enterprise)

    Configure the Stages(Source Stage and Target Stage)

  • 8/15/2019 224756455-Datastage-Material.pdf

    11/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    12/110

    Compile the Job

    Populating Surrogate Key in Order Method Dimension Table:Wants to load the DIM_ORDER_METHOD from Order_Method table hosted on MS SQL Server database.In this case, want to generate Surrogate Key.

    Mapping

    Source Table(Order_Method) Target Table(DIM_ORDER_METHOD) -- ORDER_METHOD_KEY(surrogate_key)ORDER_METHOD_CODE ORDER_METHOD_CODEORDER_METHOD_DESC ORDER_METHOD_DESC

    Two ways to populating Surrogate key(Flat File or DB Sequence)

    1) By using Surrogate Key Stage in the Job itself

  • 8/15/2019 224756455-Datastage-Material.pdf

    13/110

    2)

    By Using Transformer Stage in the Job.

    Steps to design the Job

    1) Add Stages(SQL enterprise, Trnsformer, Oracle enterprise)2) Configure each and every stage

  • 8/15/2019 224756455-Datastage-Material.pdf

    14/110

    3)

    4)

    5)

    6)

  • 8/15/2019 224756455-Datastage-Material.pdf

    15/110

    7)

    8)

    Surrogate key File Creation1) Create new job to generate Surrogate file2) Add Surrogate Key Generator Stage

    3) Configure, compile, Run to generate the file

  • 8/15/2019 224756455-Datastage-Material.pdf

    16/110

    4)

    RCP(Runtime Column Propagation)RCP can be used to populate columns available in the source table without defining at the

    stages.

    We can enable RCP functionality at Datastage Administrator

  • 8/15/2019 224756455-Datastage-Material.pdf

    17/110

    To use RCP in the jobs we need enable RCP flag at Source Stage and Transformer Stage.

  • 8/15/2019 224756455-Datastage-Material.pdf

    18/110

    Employee Dimension PopulationBusiness Requirement: Want to populate the Employee Dimension table from source Employee tableavailable in Gosales

    1) Add 3 Stages(SQL Enterprise, Oracle enterprise, Transformer)

    Emp: R2: Wants to Populate Gender Name in the Target table based on Gender Code

    BR: EMP_DIM: R3 Want to store Year of Hire and Month of Hire based on Hire Date available in thesource.

    Emp_dimEMP Trsnfr

  • 8/15/2019 224756455-Datastage-Material.pdf

    19/110

    EMP_DIM: R4 Wants to populate Termination Reason in the Employee Dim.

    Add Lookup Stage and Database Stage to read the records from Termination Lookup Table.

  • 8/15/2019 224756455-Datastage-Material.pdf

    20/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    21/110

    EMP_DIM: R5: Wants to populate Manger Code1, Manager_name1, Manger Code2, Manger Code3,Manger Code4, Manger Code5 in to Employee DIM table.

    Parameters:Parameters can be divided into 2 types. 1) Global Parameters(Project Level). 2) Local

    Parameters(Job Level).

    Global Parameters can be created in a datastage Administrator

    Local Parameters can be created in Datastage designer.

    Data Set

    EMp_Hist

    Joiner(emp_cd=emp_cd

    EMP HIST

    Joiner(mgr1_cd=emp_cd

  • 8/15/2019 224756455-Datastage-Material.pdf

    22/110

    Global Parametres:

  • 8/15/2019 224756455-Datastage-Material.pdf

    23/110

    If you want to use these global parameters in the Job. We need include the parameters in the Job

    Parameters.

  • 8/15/2019 224756455-Datastage-Material.pdf

    24/110

    If we want logically group parameters, First need to create Parameter SET.

  • 8/15/2019 224756455-Datastage-Material.pdf

    25/110

    ORDER METHOD DIM POPULATION Using Insert and Update.

    Change Capture Stage:

  • 8/15/2019 224756455-Datastage-Material.pdf

    26/110

    How to Implement SCD Type 2 using Change Capture:Source table:

    Order MethodOrder Method

    Code Order method NM1 Fax2 Web3 Email4 Telephone

    Target table should be

    KEY CD NM CURR_INDICATOR601 1 Fax Y602 2 Web Y603 3 E-Mail N604 3 Email Y605 4 Telephone Y

    Add different stages as mentioned below:

    Configure Source DB stage to connect SQL Serer DB

  • 8/15/2019 224756455-Datastage-Material.pdf

    27/110

    Configure Lkp table to connect Order Method Dim table on Target DB:

  • 8/15/2019 224756455-Datastage-Material.pdf

    28/110

    Configure Change Capture Stage as mentioned Below:

  • 8/15/2019 224756455-Datastage-Material.pdf

    29/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    30/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    31/110

    Configure Target DB to connect target table(Order Method Dim) for Update purpose.

    SCD population using SCD Stage:

    Configure the SCD stage as follows

  • 8/15/2019 224756455-Datastage-Material.pdf

    32/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    33/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    34/110

    Datastage Administrator Activity.

    Deleting a Corrupted ProjectSteps to remove the project manually through UV:

    1) Login to UNIX using dsadm.

    2) Make sure you are in the DSEngine folder

    3) Source the dsenv and login to the UV

    $ . ./dsenv

    $ bin/uv

    4) Execute " LIST UV.ACCOUNT " and if you see the project nametype: " DELETE UV.ACCOUNT project name"

    5) "LIST UV_SCHEMA” to see the List of project names, then if you see the

    Project name type: " VERIFY.SQL SCHEMA FIX"

    6) Check that you cannot see the project by typing " LIST UV_SCHEMA"

    7) If you still see the project then enter " Drop SCHEMA project_name cascade; "

    Restarting RTI server1. Logon as Super dsadm -

    2. Change the directory to the RTIServer bin

    $ cd /opt/dsSoftware/Ascential/RTIServer/bin/

    3. Start or stop the server using nohup command

  • 8/15/2019 224756455-Datastage-Material.pdf

    35/110

    For starting

    $ nohup ./RTIServer.sh start &

    For stopping

    $ nohup ./RTIServer.sh stop &

    4. Check whether the RTI server has been restarted. Execute the below command

    $ ps -ef| grep RTI

    Find the sample output for the above command

    dsadm 4977 4946 0 16:26:09 pts/7 0:00 grep RTI

    dsadm 20018 1 0 Feb 27 ? 1061:21/opt/dsSoftware/Ascential/RTIServer/apps/jre/bin/java -Xmx256m -server -Dprogra

    Or open the IE browser and enter the url in the address bar

    http://:8080/rti/

    Eg : http://stvsauxpac01.corpnet2.com:8080/rti/

    http://kopsapace02.corpnet2.com:8080/rti/

    Adding Oracle DSN Entries1. Logon to Datastage Server as dsadm

    2. Change directory to DSEngine folder

    3. Start or stop the server using nohup command

    4. Edit .odbc.ini file to add an entry for DSN. Pls find a sample Oracle DSN entries

    [ukdev495]

    Driver=/opt/dsSoftware/Ascential/DataStage/branded_odbc/lib/VMor820.so

    Description=DataDirect Oracle

    ServerName=ukdev495

    CatalogOptions=0

    ProcedureRetResults=0

    EnableDescribeParam=0

    http://stvsauxpac01.corpnet2.com:8080/rti/http://stvsauxpac01.corpnet2.com:8080/rti/http://stvsauxpac01.corpnet2.com:8080/rti/

  • 8/15/2019 224756455-Datastage-Material.pdf

    36/110

    EnableStaticCursorsForLongData=0

    ApplicationUsingThreads=1

    5. Save the fileNote: DataStage does not need to be stopped or restarted after this change

    Configuring .odbc.ini file to add DSN for DB2 connectivity:

    The .odbc.ini file in the Datastage home directory (/opt/dsSoftware/Ascential/DataStage/DSEngine)should have an entry for every database to which the user wants to connect using ODBC connectivity.

    This document is about adding an entry in .odbc.ini file to allow for DB2 connectivity.

    The Sample Entry for DB2 connectivity is given below

    [PMAR_JDE_446_ODBC]

    Driver=/opt /dsSoftware /Ascent ia l /DataStage/branded_odbc/l ib /VMdb220.so

    Descr ip t ion=DataDirec t 5 .00 DB2 Wire Protocol Driver

    AddStr ingToCreateTable=

    Alterna teID=

    Collec t ion=JDFDATA

    DynamicSect ions=100

    GrantAuthid=PUBLIC

    GrantExecute=1

    IpAddress=166.71.155.29

    Isola t ionLevel=CURSOR_STABILITY

    Locat ion=NETDATA

  • 8/15/2019 224756455-Datastage-Material.pdf

    37/110

    LogonID=SCDWUSER

    Password=

    Package=PMARPCK

    PackageOwner=SCDWUSER

    TcpPort=446

    WithHold=1

    The entry within [ ] is the name of the entry (PMAR_JDE_446_ODBC in this case)

    The driver is the location of ODBC driver for DB2. An ODBC driver is needed to allow connectivity fromDatastage to any Database.

    AddStringToCreateTable is the string that should be added while issuing create table commands

    Collection is the name of the Library that has tables to which the user has access . (I believe that nomatter which library you are using here you would be able to access the ones your DB user has

    privileges).

    IpAddress is the IP Address of the Database Server

    Location is the name of the Relational Database (RDB) on the AS/400 server

    LogonID is the user Logon with which the user logs on to RDB on AS/400

    Password is the password for the user

    Package is any name upto 7 characters to uniquely identify this connectivity

  • 8/15/2019 224756455-Datastage-Material.pdf

    38/110

    PackageOwner is typically the same user

    TCPPort is the port number on which DB2 is listening.

    Finding out the Location:

    Run the "WRKRDBDIRE" command on AS/400 and use the entry against the one that is typed *LOCAL.

    Finding out the Port Number:

    To determine the correct port number, execute 'NETSTAT' from an AS/400 command line. Chooseoption 3 to display a list of active ports on the AS/400. Find the entry for DRDA and press F-14 to togglethe display the port number.

    DRDA is the communicating protocol for communicating with DB2.

    For Collection we have used JDFDATA, which is a Vanilla Library.

    Once the changes are made to .ODBC.INI file, the next step is to Bind the package. This is essential

    before checking the DSN Connectivity.

    The bind command can be executed from branded_odbc/lib directory as

    $. /bind20 PMAR_JDE_8471_ODBC

  • 8/15/2019 224756455-Datastage-Material.pdf

    39/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    40/110

    ./bin/dssh

    The DSEngine shell starts.

    3. Log to the project:

    LOGTOPMARDev

    Where project_name is case sensitive.

    4. Get a list of available DSNs by typing:

    DS_CONNECT

    5. Test the required connection by typing:

    DS_CONNECT PMAR_JDE_8471_ODBC

    6.

    Once the test is successful,exit out by pressing .Q

    Once the DSN connectivity is tested from the Unix box, the next step is to import tables from Datastageusing ODBC and start using the same in Datastage Jobs.

    Configuring uvconfig file for avoiding Timeout :

    1. Logon using dsadm.

    2. Check that there are no client connections or phantom jobs running in the background

    This can be checked by issuing the commands

    $ ps –efd | grep phantom

    $ ps –efd | grep dsapi

  • 8/15/2019 224756455-Datastage-Material.pdf

    41/110

    There should not be any processes as a result of the above commands. If there are anyphantom processes or client connections, they need to be killed using the process below.

    Request the client (szs42740 for example) to close their client connections and/or logonto to Unix Box and kill the process.

    If the clients (szs42740 for example) are not traceable and/or there is a pressing need torestart the Datastage service, issue the following commands

    $ super mdc-kill-phantom

    $ super mdc-kill-dsapi_slave

    The first command kills all the phantom processes and the second command kills all thedsapi_slave connections.

    3. Change the directory to

    $ /opt/dsSoftware/Ascential/DataStage/DSEngine/bin

    4. Source the dsenv file.

    $ ../dsenv

  • 8/15/2019 224756455-Datastage-Material.pdf

    42/110

    5. Stop the service

    $ . /uv -admin -stop.

    6. Change the directory

    $ /opt/dsSoftware/Ascential/DataStage/DSEngine

    7. Take a back up of uvconfig file.

    8. Change the below mentioned values in the uvconfig file using Vi editor

    RLTABSZ 100

    GLTABSZ 100

    MAXRLOCK 99

    9. Save the uvconfig file.

    10. Make the changes to take effect

    $ ./uv –admin –regen

    11. Restart the DS server

  • 8/15/2019 224756455-Datastage-Material.pdf

    43/110

    $ ./uv –admin –start

    12. Change the directory to DsEngine .To Confirm the changes have taken effect issue

    the command below

    $ /opt/dsSoftware/Ascential/DataStage/DSEngine/

    $ bin/uvregen -t

    Note: $ is the Unix prompt

    Steps for Creating a New Datastage Project

    Step 1: Login to Datastage Administrator using dsadm

    Step 2: After successful login click on the “Projects” Tab and then click “Add”.

    Step 3: “Add Project” window will be displayed.

    Step 4: Enter the name of the Project.

    Step 5: Enter the Project Path (/datastage/Projects/)

  • 8/15/2019 224756455-Datastage-Material.pdf

    44/110

    Step 6: Click “OK”

    Step 7: Select the created project and click “Properties”

    Step 8: Check the options in the Project Properties as displayed in the image below

    Steps for Creating Access permissions for a Datstage Project

    Step 1: Login to Datastage Administrator using dsadm

    Step 2: Change the directory to /datastage/Projects/

    Step 3: Identify the .developer.adm file

    Step 4: Open the .developer.adm file and enter only the primary group(Eg:dstage) or

    secondary group (Eg : ds_scdw) for giving access permission.

    Step 5: Save the .developer.adm file .

  • 8/15/2019 224756455-Datastage-Material.pdf

    45/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    46/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    47/110

    >Q

    Find the explanation for the messages in the Dsdlock.log

    1. DeadLock Daemon started in Query Mode by pid .

    Someone executed dsdlockd -p command

    2. DeadLock Daemon started in Dead Process Cleanup Mode.

    The deadlock daemon on waking found defunct processes and initiated a cleanup

    3. DeadLock Daemon started in Normal Mode by pid.

    Someone (whose pid is given) started the deadlock daemon, maybe from

    DEADLOCK.MENU, maybe from the command line. If pid = 1 this was the auto-start

    on re-boot.

    Deleting a Project using Datastage Administrator

    Steps to remove the project through Datastage Administrator:

    1) Login to Datastage Administrator using dsadm.

  • 8/15/2019 224756455-Datastage-Material.pdf

    48/110

    2) Navigate to the Projects Tab

  • 8/15/2019 224756455-Datastage-Material.pdf

    49/110

    3) Select the Project name that needed to be deleted.

    4) Once the project name is selected Click on Delete button.

    5) This will ask for a confirmation ‘Are you sure you want to delete the project?’ Click ‘Yes’ todelete the project.

    6. This will delete the selected datastage project.

    LDAP Configuration

    Initial Setting in the WAS for Global security

  • 8/15/2019 224756455-Datastage-Material.pdf

    50/110

    Please follow the steps below to configure LDAP for IBM Information Server.

    Prerequisites: Create VSED user id and Password which has full administrative rights

    And get the Type, Host, Port, Base Distinguished name

    Step 1: Login to the WAS Web Console using the https://:9043/ibm/console/logon.jsp and Click Security -> Global Security ->Under User registries, click LDAP

  • 8/15/2019 224756455-Datastage-Material.pdf

    51/110

    Enter the required details as given below

    Step 2: Change Active User Registry to LDAP

  • 8/15/2019 224756455-Datastage-Material.pdf

    52/110

    Step 3: Login as root to the Datastage Server. Stop the IBM Information Server

    Cd etc/rc2.d

    # ./AppServerAdmin.sh -was -user yqz99739 -password mask31july

    Info WAS instance /Node:stvus059Node01/Server:server1/ updated with new user information

    Info MetadataServer daemon script updated with new user information

    # ^C

    # ./DirectoryAdmin.sh -delete_groups

    #

    ./DirectoryAdmin.sh -delete_users#

    ^C#

    # ./DirectoryAdmin.sh -delete_users

    Package Installation guidelines

    Step 1: Login as dsadm

    Step 2: Change the directory to DSEngine Directory

    Eg: cd /local/apps/dsSoftware/715A/Ascential/DataStage/DSEngine

    Step 3: Change the directory to bin in DSEngine

    $cd bin

    Step 4: Source the dsenv file

    $../dsenv

  • 8/15/2019 224756455-Datastage-Material.pdf

    53/110

    Step 5: Execute the Datastage Package installer command

    $. /dspackinst

    Please find the screen shot for steps 3, 4, and 5

    Please find the screenshot 1, 2 3 after the execution of Step 5

    Screenshot 1:

    Screenshot 2:

    Screenshot 3:

  • 8/15/2019 224756455-Datastage-Material.pdf

    54/110

    Step 6: Enter the package directory (Screen shot 4)

    Screen shot 4:

    The package Installer will display the package information.(Screen shot 5)

    Screen shot 5:

  • 8/15/2019 224756455-Datastage-Material.pdf

    55/110

    Step 7 The package installer will search for the Projects on the server and select the project you want to the plug-in to beregistered (Screen shot 6)

    Screen shot 6:

    Step 8: Enter the Log file destination directory (Screen shot 7)

    Screen shot 7:

  • 8/15/2019 224756455-Datastage-Material.pdf

    56/110

    The package installer will show the installation details which you have given in the previous steps (Screen shot 8)

    Screen shot 8:

    Step 9: Enter the options if you want to proceed (Screen shot 9)

    Screen shot 9:

    The installation confirmation will be displayed as below (Screen shot 10)

  • 8/15/2019 224756455-Datastage-Material.pdf

    57/110

    Screen shot 10:

    Note: Proper considerations have to be taken when doing an FTP from the plug-in source to the Datastage server.

    Releasing Resource Locks

    DataStage Director pull down Job->Cleanup Resources. Choosing this option will open the Job Resourcesinterface.

  • 8/15/2019 224756455-Datastage-Material.pdf

    58/110

    To release a locked Item:

    Select Show All in the Processes pane Select Show All in the locks pane Locate the Item id you wish to unlock and note the PID/User#. For example

    rjPLAW_P1_LoadSTG_Seq has a PID of 27645

    Locate the PID in the Processes pane and select the row. Release the lock by clicking on the Logout button. This will kill the process holding the lock,thus

    releasing it.

  • 8/15/2019 224756455-Datastage-Material.pdf

    59/110

    Releasing Resource Locks using UNIVERSE Commands

    Step 1. Logon using dsadm.

    Step 2. Change the directory to

    $ cd /opt/dsSoftware/Ascential/DataStage/DSEngine/bin

    Step 3. Source the dsenv file.

    $ ../dsenv

    Step 4.Issue the below command

    $ ./uv

    and the following message will be displayed

    DataStage Command Language 7.0

    Copyright (c) 1997 - 2003 Ascential Software Corporation. All Rights Reserved

    DSEngine logged on: Fri May 12 13:24:49 2006

    Step 5. Issue the below command

    $ LOGTO eg: LOGTO SCDW

  • 8/15/2019 224756455-Datastage-Material.pdf

    60/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    61/110

    $ LOGTO UV

    $ UNLOCK INODE 6997 USER 65053 ALL

    The below messages will be displayed

    Clearing Record locks.

    Clearing GROUP locks.

    Clearing FILE locks.

    Renaming Datastage Project Using Universe

    Please backup and save the original project if anything goes wrong ;

    1. From within DS Administrator create project newname

    2. From server remove directory newname

    3. Rename newname.tmp to newname

    4. Source the .dsenv

    5. Type > bin/uv or bin/uvsh

    6. LOGTO project

    7. Type UPDATE.ACCOUNT (This ensure that pointers are updated to reflect the correctinstallation directories etc.)

    8. Type DS.TOOL(s) and select option 2 (this is to rebuild the repository indexes)

    9. Once complete type n or press returns

    10. Access the Project as normal

  • 8/15/2019 224756455-Datastage-Material.pdf

    62/110

    Restart of DataStage 7.5.1a ServicesRestarting of a Datastage service may be necessary under various circumstances. The most commonneed for a restart of the service is the changes made to the Environment file “dsenv”.

    Restarting of Datastage Service is a two step process

    1. Stop Datastage Service2. Start Datastage Service

    Prerequisites

    1. Logon as dsadm -

    2. Stop SITESCOPE MONITOR – VERY IMPORTANT

    After logging onto the Unix box using our login credentials, we switch the user to dsadm

    This can be done by issuing the command

    su – dsadm

  • 8/15/2019 224756455-Datastage-Material.pdf

    63/110

    2. Check existence of client connections –

    Before attempting to stop Datastage service, ensure that there are no client connections orphantom jobs running in the background.

    This can be checked by issuing the commands

    ps – efd | grep phantom

    ps – efd | grep dsapi

    There should not be any processes as a result of the above commands. If there are any phantomprocesses or client connections, they need to be killed using the process below.

    a. Find out the user of the process. This can be found by looking at the process entry

    Eg: szs42740 7854 7846 0 11:29:25 ? 0:07 dsapi_slave 9 8 0

    A sample entry as shown above, indicates that the user “szs42740” is having a client connection

    (dsapi_slave).

    In such case, request the client (szs42740 in the example) to close their client connections and/orlog onto to Unix Box and kill the process.

    If the clients are not traceable and/or there is a pressing need to restart the Datastage service, issuethe following commands

    super mdc-kill-phantom

  • 8/15/2019 224756455-Datastage-Material.pdf

    64/110

    super mdc-kill-dsapi_slave

    The first command kills all the phantom processes and the second command kills all the dsapi_slaveconnections.

    Stop Datastage Service

    Attempt to stop the service after performing prerequisite activities detailed above.

    The Datastage Service can be stopped by issuing the commands

    cd $DSHOME

    . . /dsenv

    bin/uvsh

    cd bin

    . /uv – admin – stop

    This shuts down the server engine and frees any resources held by the server engine process.

    Start Datastage Service

    Wait for atleast 30 seconds, after stopping the Datastage service, before you attempt to restart theDatastage Service

  • 8/15/2019 224756455-Datastage-Material.pdf

    65/110

    The Datastage Service can be started by issuing the commands

    /opt/dsSoftware/Ascential/Datastage/DSEngine/bin/uv – admin – start

    This command starts the dsrpcd daemon, which is daemon for server engine.

    Check Datastage Service

    Check whether the Datastage service is running by issuing the following command

    netstat – na | grep 31538

    The above command may have multiple line output, but if the service is running then there shouldbe a row with a “LISTEN”.

    *.31538 *.* 0 0 49152 0 LISTEN

    Common Problems – Restarting Datastage Service

    1. Datastage Service is started, but cannot connect from Datastage clients

    Symptom

    The Datastage service was stopped and restarted, but attempting to connecting to the Host fromthe Datastage client (eg: Designer), results in an error such as

  • 8/15/2019 224756455-Datastage-Material.pdf

    66/110

    and issuing the command

    netstat -na | grep 31538

    does not return any record with “LISTEN”.

    Cause

    The service was restarted without ensuring that client connections are closed. This causes the “port” to

    be unavailable for any connections

    Remedy

    Restart the service once again using the commands by issuing commands under the sections “StopData stage Service” and “Start Datastage Service”. Stopping and Starting the service again is known toresolve this issue.

    For any other problems, contact IBM Support

    Restart of DataStage 8.1 Services

  • 8/15/2019 224756455-Datastage-Material.pdf

    67/110

    Restarting of a Datastage service may be necessary under various circumstances. The most commonneed for a restart of the service is the changes made to the Environment file “dsenv”.

    Restarting of IBM Information Server 8.1

    3. Stop Datastage Service4. Start Datastage Service

    Prerequisites

    3. Disable the SiteScope Monitor for the server that you are going to re-start (e.g breus002)http://stvsawnv0539:8888/SiteScope/accounts/

    4. Login as dsadm user and switch to super root user.

    After logging onto the Unix box using our login credentials, we switch the user to dsadm

    This can be done by issuing the command

    super root-shell

    http://stvsawnv0539:8888/SiteScope/accounts/http://stvsawnv0539:8888/SiteScope/accounts/http://stvsawnv0539:8888/SiteScope/accounts/

  • 8/15/2019 224756455-Datastage-Material.pdf

    68/110

    2. Check existence of client connections –

    Before attempting to stop Datastage service, ensure that there are no client connections orphantom jobs running in the background.

    This can be checked by issuing the commands

    ps – efd | grep phantom

    ps – efd | grep dsapi

    There should not be any processes as a result of the above commands. If there are any phantomprocesses or client connections, they need to be killed using the process below.

  • 8/15/2019 224756455-Datastage-Material.pdf

    69/110

    b. Find out the user of the process. This can be found by looking at the process entryEg: szs42740 7854 7846 0 11:29:25 ? 0:07 dsapi_slave 9 8 0

    A sample entry as shown above, indicates that the user “szs42740” is having a client connection

    (dsapi_slave).

    In such case, request the client (szs42740 in the example) to close their client connections and/orlog onto to Unix Box and kill the process.

    If the clients are not traceable and/or there is a pressing need to restart the Datastage service, issuethe following commands

    super mdc-kill-phantom

    super mdc-kill-dsapi_slave

    The first command kills all the phantom processes and the second command kills all the dsapi_slaveconnections.

  • 8/15/2019 224756455-Datastage-Material.pdf

    70/110

    Stop Datastage Service

    Attempt to stop the service after performing prerequisite activities detailed above.

  • 8/15/2019 224756455-Datastage-Material.pdf

    71/110

    The Datastage Service can be stopped by issuing the commands

    cd /etc/rc2.d

    # ./S99ds.rc 'stop'

    Stopping JobMonApp

    JobMonApp has been shut down.

    DataStage Engine 8.1.0.0 instance "ade" has been brought down.

    # ./S99ISFAgents 'stop'

    Agent stopped.

    LoggingAgent stopped.

    # ./S99ISFServer 'stop'

    ADMU0116I: Tool information is being logged in file

    /local/apps/DRS_dstage/IS81/IBM/AppServer/profiles/default/logs/server1/stopServer.log

    ADMU0128I: Starting tool with the default profile

    ADMU3100I: Reading configuration for server: server1

    ADMU3201I: Server stop request issued. Waiting for stop status.

    ADMU4000I: Server server1 stop completed.

    # ps -efd|grep java

    root 6042 5544 0 21:39:52 pts/3 0:00 grep java

    # ps -efd|grep ds

    dsadm 3201 3200 0 20:27:24 ? 0:00 /opt/openssh/libexec/sftp-server

    dsadm 3094 3092 0 20:24:47 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 3096 3094 0 20:24:47 pts/2 0:00 -ksh

    dsadm 14084 14082 0 14:50:16 pts/4 0:00 -ksh

  • 8/15/2019 224756455-Datastage-Material.pdf

    72/110

    dsadm 14082 14080 0 14:50:16 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 2050 25672 0 20:00:24 pts/5 0:00 tail -f startServer.log

    root 6045 5544 0 21:40:01 pts/3 0:00 grep ds

    dsadm 3200 3198 0 20:27:24 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 25670 25668 0 17:53:45 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 25672 25670 0 17:53:45 pts/5 0:00 -ksh

    dsadm 5509 5497 0 21:31:36 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 26245 1 0 18:01:36 ? 0:05 /local/apps/DRS_dstage/IS81/IBM/InformationServer/Server/PXEngine/bin/resource_

    dsadm 5511 5509 0 21:31:36 pts/3 0:00 -ksh

    # kill 26245

    # ps -efd|grep ds

    dsadm 3201 3200 0 20:27:24 ? 0:00 /opt/openssh/libexec/sftp-server

    dsadm 3094 3092 0 20:24:47 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 3096 3094 0 20:24:47 pts/2 0:00 -ksh

    dsadm 14084 14082 0 14:50:16 pts/4 0:00 -ksh

    dsadm 14082 14080 0 14:50:16 ? 0:00 /opt/openssh/sbin/sshd -R

    root 6100 5544 0 21:41:23 pts/3 0:00 grep ds

    dsadm 2050 25672 0 20:00:24 pts/5 0:00 tail -f startServer.log

    dsadm 3200 3198 0 20:27:24 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 25670 25668 0 17:53:45 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 25672 25670 0 17:53:45 pts/5 0:00 -ksh

    dsadm 5509 5497 0 21:31:36 ? 0:00 /opt/openssh/sbin/sshd -R

    dsadm 5511 5509 0 21:31:36 pts/3 0:00 – ksh

  • 8/15/2019 224756455-Datastage-Material.pdf

    73/110

    netstat – na | grep 3153

    This shuts down the server engine and frees any resources held by the server engine process.

    Start Datastage Service

    # ./S99ISFServer 'start'

    # ./S99ISFAgents 'start'

    LoggingAgent.pid: No such file or directory

    Starting LoggingAgent...

    LoggingAgent started.

    Agent.pid: No such file or directory

    Starting Agent...

    Agent started.

    # ./S99ds.rc 'start'.

    # ./S99dsrfcd.rc 'start'

    Check Datastage Service

    Check whether the Datastage service is running by issuing the following command

    netstat – na | grep 3153

    The above command may have multiple line output, but if the service is running then there shouldbe a row with a “LISTEN”.

  • 8/15/2019 224756455-Datastage-Material.pdf

    74/110

    *.31538 *.* 0 0 49152 0 LISTEN

    Common Problems – Restarting Datastage Service

    2. Datastage Service is started, but cannot connect from Datastage clients

    Symptom

    The Datastage service was stopped and restarted, but attempting to connecting to the Host fromthe Datastage client (eg: Designer), results in an error such as

    and issuing the command

    netstat -na | grep 31538

    does not return any record with “LISTEN”.

    Cause

  • 8/15/2019 224756455-Datastage-Material.pdf

    75/110

    The service was restarted without ensuring that client connections are closed. This cau ses the “port” tobe unavailable for any connections

    Remedy

    Restart the service once again using the commands by issuing commands under the sections “StopDatastage Service” and “Start Datastage Service”. Stopping and Starting the service again is kno wn toresolve this issue.

    For any other problems, contact IBM Support

    Super root –shell

    Restarting RTI Agent

    1. Logon using super dsadm.

    2. Change the directory to the RTIAgent bin

    /opt/Ascential/RTIAgent

    $ cd /opt/dsSoftware/Ascential/RTIAgent/bin/

    3. Start or stop the server using nohup command

    For starting

    $ nohup ./RTIAgent.sh start &

  • 8/15/2019 224756455-Datastage-Material.pdf

    76/110

    For stopping

    $ nohup ./RTIAgent.sh stop &

    4. Check whether the RTI server has been restarted .execute the below command

    $ ps -ef| grep RTIAgent

    Find the sample output for the above command

    ps -efd|grep RTIAgent

    dsadm 26190 26178 0 14:02:32 pts/3 0:00 grep RTIAgent

    dsadm 26164 1 0 14:02:00 pts/2 0:01 /opt/dsSoftware/Ascential/RTIAgent/jre/bin/java -Djava.library.path=/opt/dsSoft

    Overview of Datastage Stages:

    Aggregator Stage : Aggregator classifies data rows from a single input link into groups and calculates totals or otheraggregate functions for each group. The summed totals for each group are output from the stage thro'output link. Group is a set of record with the same value for one or more columns

    Example : Transaction records might be grouped by both day of the week and by month. Thesegroupings might show the busiest day of the week varies by season.

  • 8/15/2019 224756455-Datastage-Material.pdf

    77/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    78/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    79/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    80/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    81/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    82/110

    Change Apply Stage

    Takes the change data set, that contains the changes in the before and afterdata sets, from the Change Capture stage and applies the encoded changeoperations to a before data set to compute an after data set.The Change Apply stage read a record from the change data set and from the before data set, comparestheir key column values, and acts accordingly.

  • 8/15/2019 224756455-Datastage-Material.pdf

    83/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    84/110

    Filter Stage :The Filter stage transfers, unmodified, the records of the input data set which satisfy the specifiedrequirements and filters out all other records.

    Filter stage can have a single input link and a any number of output links and, optionally, a single rejectlink. You can specify different requirements to route rows down different output links. The filtered outrecords can be routed to a reject link, if required.

  • 8/15/2019 224756455-Datastage-Material.pdf

    85/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    86/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    87/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    88/110

    Funnel Stage :Funnel Stage copies multiple input data sets to a single output data set. This operation is useful for

    combining separate data sets into a single large data set. The stage can have any number of input linksand a single output link.

  • 8/15/2019 224756455-Datastage-Material.pdf

    89/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    90/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    91/110

    Join Stage :Definition : Join Stage performs join operations on two or more data sets input to the stage and thenoutputs the resulting data set.

  • 8/15/2019 224756455-Datastage-Material.pdf

    92/110

    The input data sets are notionally identified as the "right" set and the "left" set, and "intermediate" sets.It has any number of input links and a single output link.

  • 8/15/2019 224756455-Datastage-Material.pdf

    93/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    94/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    95/110

    Lookup Stage :Lookup Stage used to perform lookup operations on a data set read into memory from any

    other Parallel job stage that can output data.

    It can also perform lookups directly in a DB2 or Oracle database or in a lookup table contained

    in a Lookup File Set stage.

  • 8/15/2019 224756455-Datastage-Material.pdf

    96/110

    Merge Stage :

    Join Stage combines a sorted master data set with one or more update data sets. The columnsfrom the records in the master and update data sets are merged so that the output record contains allthe columns from the master record plus any additional columns from each update record.

    A master record and an update record are merged only if both of them have the same values forthe merge key column(s) that you specify. Merge key columns are one or more columns that exist inboth the master and update records.

    The data sets input to the Merge stage must be key partitioned and sorted. This ensures thatrows with the same key column values are located in the same partition and will be processed by thesame node.

  • 8/15/2019 224756455-Datastage-Material.pdf

    97/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    98/110

    Modify StageThe Modify stage alters the record schema of its input data set. The modified data set is then output. It

    is a processing stage. It can have a single input and single output.

  • 8/15/2019 224756455-Datastage-Material.pdf

    99/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    100/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    101/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    102/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    103/110

    Pivot Stage :Pivot Stage converts columns in to rows.

    Eg., Mark-1 and Mark-2 are two columns.

    Task : Convert all the columns in to one column.

    Implication : Can be used to co SCD Type-3 to Type-2.

    Using Methodology : In the deviation field of the output column change the input columns in to onecolumn.

    Eg., Column Name – "Marks".

    Derivation : Mark-1 and Mark-2.

    Note : Column "Marks" is derived from the input columns Mark-1 and Mark-2.

  • 8/15/2019 224756455-Datastage-Material.pdf

    104/110

    Remove Duplicates StageThe Remove Duplicates stage takes a single sorted data set as input, removes all duplicate records, andwrites the results to an output data set.

    Removing duplicate records is a common way of cleansing a data set before you perform furtherprocessing. Two records are considered duplicates if they are adjacent in the input data set and haveidentical values for the key column(s).

  • 8/15/2019 224756455-Datastage-Material.pdf

    105/110

    Surrogate Key Generator StageThe Surrogate Key stage generates key columns for an existing data set

    User can specify certain characteristics of the key sequence. The stage generates sequentiallyincrementing unique integers from a given starting point. The existing columns of the data set arepassed straight through the stage.

    If the stage is operating in parallel, each node will increment the key by the number of partitions beingwritten to.

  • 8/15/2019 224756455-Datastage-Material.pdf

    106/110

    Switch StageThe switch stage takes a single data set as input and assigns each input row to an output data set basedon the value of a selector field.

    It can have a single input link, up to 128 output links and a single rejects link. This stage performs anoperation similar to a C switch statement. Rows that satisfy none of the cases are output on the rejectslink.

  • 8/15/2019 224756455-Datastage-Material.pdf

    107/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    108/110

  • 8/15/2019 224756455-Datastage-Material.pdf

    109/110

    Compress Stage :The Compress stage uses the UNIX compress or GZIP utility to compress a data set. It converts a data setfrom a sequence of records into a stream of raw binary data

    A compressed data set cannot be processed by many stages until it is expanded, i.e., until its rows arereturned to their normal format. Stages that do not perform column based processing or reorder therows can operate on compressed data sets. For example, you can use the copy stage to create a copy ofthe compressed data set.

  • 8/15/2019 224756455-Datastage-Material.pdf

    110/110

    Expand StageThe Expand stage uses the UNIX compress or GZIP utility to expand the data set. It converts a data setfrom a stream of raw binary data into sequence of records.