Upload
jen-underwood
View
8.504
Download
18
Embed Size (px)
DESCRIPTION
This presentation is an overview of ETL with SSIS, Architecture, Basics, Sources and Destinations, Development Features, Reusability with the ETL Toolkit, Metadata, Impact, Dependency Analysis and Lineage, Metadata Toolkit, PramaticWorks BI Documenter, Task Factory and BI xPress, Enterprise Information Management, Data Warehousing, High Availability and Scalability topics.
Citation preview
Agenda •
•−
−−
−
−
−−
−
•
•
•
•
•
−
−
− http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx
•
−
−
−
•
•
•
−
−
−
•
•
−
−
−
−
−
•
−
−
−
•
−
−
−
SQL Server Integration Services
Key Components:
•Integrations Services service
•Package
•Control Flow •Tasks
•Containers
•Precedence Constraints
•Data Flow •Data Sources
•Transformations
•Data Destinations
Refer to the SSIS 101 section for a walk-through of ETL basics
Unstructured data
Legacy data: Binary files
Application database
Change
Tables
OLTP
DW
Heterogeneous Sources and Targets
•−
−
−
−
−
−
−
•
•
•
Visual Development Environment
Drag and Drop Package Designer
Edit and Debug in Visual Studio Environment
Breakpoints, watches, variable inspection
Data viewers enable visualization of data flow
Full Source Control management
Build and deploy
Impact analysis and with build and validate
Deployment utility moves packages from development, and test to production
Custom code integration
Script objects using VB.NET
Custom components using C# .NET
Integrate existing code, or develop new solutions
Tool / feature Description
Dependency Analyzer (DependencyAnalyzer.exe)
•Command line application that parses your BI project: •Integration Services packages (finds data flows, captures metadata) •SQL databases •Analysis Services cubes (provides data lineage to source databases) •Stores metadata in SQL Server database
Dependency Viewer (DependencyViewer.exe)
Show dynamic graph of dependencies for SSAS, SSIS, SQL RDBMS (dependencies and lineage of objects in the lineage repository)
Data Source View A DSV that connects to the lineage repository (SSIS META database) that can be used by Reporting Services.
Lineage Repository A database called SSIS_META that can be used to house metadata from nearly any system.
Reports Standard reports for impact analysis studies. You will find two key reports out of the box with several sub-reports
Report Model A report model that you can use with Report Builder to allow end-users to create ad-hoc reports.
Integration Services Samples A few sample packages to start auditing and viewing lineage on
•
•−
−
−
−
−
−
−
−
Master Data Services
Data quality
Familiar tools
Performance
Data Quality Services
Profiling
Cleansing
Matching
Impact Analysis
Lineage tracking
with data sources
Integration Services
Easy data loading
MDS and DQS Integration
•
•
−
−
−
•
•
•
•
•
−
−
−
−
−
• http://msdn.microsoft.com/en-us/sqlserver/hh323832.aspx
Map data reference schema data values to valid values
Run data cleanse
Map to DQS Knowledge Base
Map Automated Corrections
NEW SSIS DQS Cleansing Task
• http://projectbarcelona.cloudapp.net
•
•
•
•
Visual and Drill-Down Dependency Details
New in SQL Server 2008 A few SSIS and data warehousing improvements
SAP-BW Adapter
Teradata Adapter
Oracle Adapter
MERGE SQL Statement
Change Data Capture (CDC)
Persistent Lookups
Data Profiling
…………
Star Join Query Optimization
Parallel Query Enhancements
New - Report Builder 2.0
Enhanced Data Visualization
Rendering for Word & Excel
IIS Agnostic Report Deployments
Data Mining Engine Improvements
MDX Query , Writeback Optimizations
Best Practice Design Alerts
Scale-out AS engine & backup
…………
Data Compression
Backup Compression
Resource Governor
Policy Based Administration
Reference Architectures
Partition-Aligned Indexed Views
…………
SSIS Support for Data Warehousing
•
•−
−
−
−
−
−
−
−
−
−
−
•
•
P5
Pn …
P3
P4
P1
P2
Scheduler DTExec (1)
DTExec (2)
DTExec (n)
Work Pile Work Horses Shared Resources
Supervisor
• Configuring SSIS to manage multiple clustered instances − Create an additional folder for each instance on the cluster
− Copy the configuration file in place on each node participating in the cluster, and restart SSIS on each node
− Use Management Studio to connect to the network name or the IP address of the clustered instance of SQL. Do not use the name or IP address of the cluster node. When you connect by using the network name or IP address of the SQL instance, you will always connect to the node that is hosting SQL. By having the same configuration file on each node of SQL, you are assured of seeing the same set of folders regardless of which node is hosting your instance of SQL, and you can therefore manage the packages in your instance
− Use Checkpoints in packages for restart ability and ETL continuation
• NOTE: We do not recommend clustering the SSIS service. Clustering the service does not
provide high availability for the SSIS service, and it does not provide for automatic restart of packages after failover.
•
−
−
•
−
−
•
−
−
−
•
−
−
New Integration Services Dashboard and Reports
Drill-Down reports for troubleshooting issues New objects for troubleshooting:
• catalog.event_messages
• catalog.event_message_context
• catalog.executable_statistics
• catalog.execution_data_statistics
• dm_execution_performance_counters
New SSIS Management System Views and Queries
•
−
−
−
−
SQL Agent: inside SQL Server Management Studio
BIDS Layout
Toolbox
Connection Managers
Control Flow
Data Flow
Solution Explorer
Properties
Menu Bar
Control Flow
Toolbox
• Control Flow Items
• Tasks
• Containers
• Maintenance Tasks
Separated
Control Flow
• Central Location for
Package Development
• Containers help control
programmatic workflow
Data Flow
Toolbox
• Data Flow Sources
• Transformations
• Data Flow Destinations
Data Flow Tab
• Data Flow Development
• Multiple Data Flows
Can Exist in a Package
• Connection Mangers
are required to define
Data Source and Data
Destination
Connectors
•
•
•
•
•
•
•
•
Connection Managers
Variables
•
•
•
Configurations
Logging
•
•
Event Handlers
• Optional component of a package
that fires additional tasks when
certain Events occur
• Executables are chosen from a list
of existing tasks in the package from
the package level, container, or task
level
• The following Events handlers can
be used:
• OnError
• OnExecStatusChanged
• OnInformation
• OnPostExecute
• OnPostValidate
• OnPreExecute
•OnPreValidate
• OnProgress
• OnQueryCancel
• OnTaskFailed
• OnVariableValueChange
• OnWarning
Package Explorer
• Provides a Hierarchical view of your
package design
• Allows the viewing of Properties
from the property window
• You can delete components from
the Package Explorer view, but
cannot add components.
•The Properties window is available
for all objects, in all tabs including the
package itself.
BIDS
Task Color Coding
• When you run a package,
BIDS depicts execution progress
by displaying each task or
container using a color that
indicates execution status.
Gray = waiting to run
Yellow = currently running
Green = completed successfully
Red = ended unsuccessfully.
•After you stop package execution,
the color-coding disappears.
Progress Tab
Progress Tab
• Enabled when a package executes
in BIDS and the Debug Progress
Reporting option is enabled under the
SSIS menu
• The Progress tab lists tasks and
containers in execution order and
includes the start and finish times,
warnings, and error messages.
•After you stop package execution,
the progress information remains
available on the Execution Results
tab.
SSIS Server and Job Activity Monitor
© 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.