39

ETL with SSIS

Embed Size (px)

DESCRIPTION

This presentation is an overview of ETL with SSIS, Architecture, Basics, Sources and Destinations, Development Features, Reusability with the ETL Toolkit, Metadata, Impact, Dependency Analysis and Lineage, Metadata Toolkit, PramaticWorks BI Documenter, Task Factory and BI xPress, Enterprise Information Management, Data Warehousing, High Availability and Scalability topics.

Citation preview

Page 1: ETL with SSIS
Page 2: ETL with SSIS

Agenda •

•−

−−

−−

Page 4: ETL with SSIS

SQL Server Integration Services

Page 5: ETL with SSIS

Key Components:

•Integrations Services service

•Package

•Control Flow •Tasks

•Containers

•Precedence Constraints

•Data Flow •Data Sources

•Transformations

•Data Destinations

Refer to the SSIS 101 section for a walk-through of ETL basics

Page 6: ETL with SSIS

Unstructured data

Legacy data: Binary files

Application database

Change

Tables

OLTP

DW

Heterogeneous Sources and Targets

•−

Page 7: ETL with SSIS

Visual Development Environment

Drag and Drop Package Designer

Edit and Debug in Visual Studio Environment

Breakpoints, watches, variable inspection

Data viewers enable visualization of data flow

Full Source Control management

Build and deploy

Impact analysis and with build and validate

Deployment utility moves packages from development, and test to production

Custom code integration

Script objects using VB.NET

Custom components using C# .NET

Integrate existing code, or develop new solutions

Page 8: ETL with SSIS

•− http://sqlmetadata.codeplex.com

Page 9: ETL with SSIS

Tool / feature Description

Dependency Analyzer (DependencyAnalyzer.exe)

•Command line application that parses your BI project: •Integration Services packages (finds data flows, captures metadata) •SQL databases •Analysis Services cubes (provides data lineage to source databases) •Stores metadata in SQL Server database

Dependency Viewer (DependencyViewer.exe)

Show dynamic graph of dependencies for SSAS, SSIS, SQL RDBMS (dependencies and lineage of objects in the lineage repository)

Data Source View A DSV that connects to the lineage repository (SSIS META database) that can be used by Reporting Services.

Lineage Repository A database called SSIS_META that can be used to house metadata from nearly any system.

Reports Standard reports for impact analysis studies. You will find two key reports out of the box with several sub-reports

Report Model A report model that you can use with Report Builder to allow end-users to create ad-hoc reports.

Integration Services Samples A few sample packages to start auditing and viewing lineage on

Page 10: ETL with SSIS

•−

Page 11: ETL with SSIS

Master Data Services

Data quality

Familiar tools

Performance

Data Quality Services

Profiling

Cleansing

Matching

Impact Analysis

Lineage tracking

with data sources

Integration Services

Easy data loading

MDS and DQS Integration

Page 12: ETL with SSIS

Page 14: ETL with SSIS

Map data reference schema data values to valid values

Run data cleanse

Page 15: ETL with SSIS

Map to DQS Knowledge Base

Map Automated Corrections

NEW SSIS DQS Cleansing Task

Page 16: ETL with SSIS

• http://projectbarcelona.cloudapp.net

Visual and Drill-Down Dependency Details

Page 17: ETL with SSIS

New in SQL Server 2008 A few SSIS and data warehousing improvements

SAP-BW Adapter

Teradata Adapter

Oracle Adapter

MERGE SQL Statement

Change Data Capture (CDC)

Persistent Lookups

Data Profiling

…………

Star Join Query Optimization

Parallel Query Enhancements

New - Report Builder 2.0

Enhanced Data Visualization

Rendering for Word & Excel

IIS Agnostic Report Deployments

Data Mining Engine Improvements

MDX Query , Writeback Optimizations

Best Practice Design Alerts

Scale-out AS engine & backup

…………

Data Compression

Backup Compression

Resource Governor

Policy Based Administration

Reference Architectures

Partition-Aligned Indexed Views

…………

Page 18: ETL with SSIS

SSIS Support for Data Warehousing

•−

Page 19: ETL with SSIS
Page 20: ETL with SSIS

P5

Pn …

P3

P4

P1

P2

Scheduler DTExec (1)

DTExec (2)

DTExec (n)

Work Pile Work Horses Shared Resources

Supervisor

Page 21: ETL with SSIS

• Configuring SSIS to manage multiple clustered instances − Create an additional folder for each instance on the cluster

− Copy the configuration file in place on each node participating in the cluster, and restart SSIS on each node

− Use Management Studio to connect to the network name or the IP address of the clustered instance of SQL. Do not use the name or IP address of the cluster node. When you connect by using the network name or IP address of the SQL instance, you will always connect to the node that is hosting SQL. By having the same configuration file on each node of SQL, you are assured of seeing the same set of folders regardless of which node is hosting your instance of SQL, and you can therefore manage the packages in your instance

− Use Checkpoints in packages for restart ability and ETL continuation

• NOTE: We do not recommend clustering the SSIS service. Clustering the service does not

provide high availability for the SSIS service, and it does not provide for automatic restart of packages after failover.

Page 22: ETL with SSIS

Page 23: ETL with SSIS

New Integration Services Dashboard and Reports

Drill-Down reports for troubleshooting issues New objects for troubleshooting:

• catalog.event_messages

• catalog.event_message_context

• catalog.executable_statistics

• catalog.execution_data_statistics

• dm_execution_performance_counters

New SSIS Management System Views and Queries

Page 24: ETL with SSIS
Page 25: ETL with SSIS

Page 26: ETL with SSIS

SQL Agent: inside SQL Server Management Studio

Page 27: ETL with SSIS

BIDS Layout

Toolbox

Connection Managers

Control Flow

Data Flow

Solution Explorer

Properties

Menu Bar

Page 28: ETL with SSIS

Control Flow

Toolbox

• Control Flow Items

• Tasks

• Containers

• Maintenance Tasks

Separated

Control Flow

• Central Location for

Package Development

• Containers help control

programmatic workflow

Page 29: ETL with SSIS

Data Flow

Toolbox

• Data Flow Sources

• Transformations

• Data Flow Destinations

Data Flow Tab

• Data Flow Development

• Multiple Data Flows

Can Exist in a Package

• Connection Mangers

are required to define

Data Source and Data

Destination

Connectors

Page 30: ETL with SSIS

Connection Managers

Page 31: ETL with SSIS

Variables

Page 32: ETL with SSIS

Configurations

Page 33: ETL with SSIS

Logging

Page 34: ETL with SSIS

Event Handlers

• Optional component of a package

that fires additional tasks when

certain Events occur

• Executables are chosen from a list

of existing tasks in the package from

the package level, container, or task

level

• The following Events handlers can

be used:

• OnError

• OnExecStatusChanged

• OnInformation

• OnPostExecute

• OnPostValidate

• OnPreExecute

•OnPreValidate

• OnProgress

• OnQueryCancel

• OnTaskFailed

• OnVariableValueChange

• OnWarning

Page 35: ETL with SSIS

Package Explorer

• Provides a Hierarchical view of your

package design

• Allows the viewing of Properties

from the property window

• You can delete components from

the Package Explorer view, but

cannot add components.

•The Properties window is available

for all objects, in all tabs including the

package itself.

Page 36: ETL with SSIS

BIDS

Task Color Coding

• When you run a package,

BIDS depicts execution progress

by displaying each task or

container using a color that

indicates execution status.

Gray = waiting to run

Yellow = currently running

Green = completed successfully

Red = ended unsuccessfully.

•After you stop package execution,

the color-coding disappears.

Page 37: ETL with SSIS

Progress Tab

Progress Tab

• Enabled when a package executes

in BIDS and the Debug Progress

Reporting option is enabled under the

SSIS menu

• The Progress tab lists tasks and

containers in execution order and

includes the start and finish times,

warnings, and error messages.

•After you stop package execution,

the progress information remains

available on the Execution Results

tab.

Page 38: ETL with SSIS

SSIS Server and Job Activity Monitor

Page 39: ETL with SSIS

© 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.