Text of Managing etl with microsoft sql server integration services
Why ColdFusion isnt the best tool in your toolbelt
Manager of Data and Development for Data Diver Technologies in Chandler, AZ Managed the Phoenix CFUG Ive worked with ColdFusion and SQL Server since the late 90s Hates CCS front end work and CSS hates me How to find me on the Interwebs: Email: firstname.lastname@example.org Twitter: @cf_coder
Extract, Transform and Load Can be data from many types of sources Flat Text files, Excel, XML, Other Databases Destinations can also vary Cubes, Flat Text Files, Excel, XML, Other Databases Basic Process Get Data, Mess with it and Stuff it somewhere else
Familiarity with platform is tempting Inherently Slow Reads and Transforms data Line by line Line by line kills performance Doesnt take advantage of processes inherent in the DB platform for bulk data processing
Native database functionality for bulk data BCP : Bulk Copy Process Clunky Requires Command Text files Run from command line or Powershell Import Data Wizard (SQL Server) One off Imports, no transformations
SQL Server Integration Services (SSIS) Runs in BIDS Business Intelligence Development Studio Multiple inputs and outputs Workflow and process based Scriptable and dynamic inputs
Starting up the environment Project Based Can open just files, but projects allow for more options. Integration with source control SVN (Tortise and VisualSVN)/GIT/Others
Project Deployment Models Package Deployment Model Stores packages in a disk based folder structure Project Deployment Model Deploys packages to a special database Either model can be run outside of BIDS.
Connection Managers These will set up the input/output files and database connections. Can be set at project or package level. Create DB Connection Create Connections to any Files Can create new files for output or input
Workflow using Pre-Built Components Lots of options Get Files, run commands, loop over things Primarily Execute SQL and Data Flow Tasks Connectors Go/No Go, evaluation, data viewers Red connectors re-direct on error.
Variables Create at the package level Set at the component level Uses most Basic Types of variable classes Ints, Strings, Booleans, Dates and Objects Objects are typically Recordsets but can also be Arrays and COM objects.
File Connections Multi-File pulls from Directories QUERY Result sets Web Services OLEDB/ODBC Connections Getting Data Into Your Package
FTP Component Built in and very simple Does not do advanced FTP functions HTTP File Requests No built in component for this Use Script or Command Line (wget)
4/10/14 Data Flow Task Heavy lifting of getting data from A to B Utilizes the pre-defined connections Can create them on the fly if necessary Can transform data in the pipeline en-route to the destination tables/files
4/10/14 Change the data in the pipeline Can be helpful if youre not a SQL person Editors make choosing transformations easy Performance can be impacted Anything that impedes the pipeline slows the data transfer process down. Becomes a larger issue with larger datasets
4/10/14 Component Level Errors Can Redirect Flow to another path Allows for cleanup of temp tables/files, send Email to administrator Data Flow Errors Redirection of Bad Rows to an Error File Can trap General or Truncation Errors.
For and For Each Containers For Containers are basically count loopers For Each Containers Loops over files, arrays, and objects Must be used with Variables Great way to deal with sequential files
The Marketing Database of Choice! Not terribly difficult to work with Data is in UTF-8 format, Nvarchar Varchar Need special connection parameters to skip top level rows Make Sure project Debug Options set to 32-bit
Report Format (multi-row records) Most difficult format to work with Conditional Split Transformer Built in and works with the pipeline Cursor and String Parsing Allows for keeping rows grouped with IDs
XML You must have a DTD SSIS will create one for you Data is in UTF-8 format Either Transform to VARCHAR or use NVARCHAR for result
Cannot call packages directly from ColdFusion Run manually through BIDS (easiest) SQL Agent - Scheduler for SQL Server Options for the Agent Run the Agent on a schedule and look for work Set up the Agent task and then execute via TSQL
Executing a SQL Agent Job from TSQL You need to be in MSDB to run the Agent. Access to MSDB limited to SysAdmin roles. USE msdb ; GO EXEC dbo.sp_start_job N'AgentTaskName' ; GO
Data Validation Delay Validation YES SSIS will check for existence of Tables/Files. Create temp tables beforehand or turn off. Excel Files Must use 32-bit debug execution mode. Data Type conversions Try importing to VARCHARs and convert afterwards if needed.
Learn SQL String Manipulation! This is the toughest aspect of dealing with text files and textual data. Very limited number of functions to use Left(), Right, Substring(), Len() CharIndex() and PatIndex() String Length Math
Temp Tables Where is TempDB? #Table vs. ##Table vs. Temp_Table Usage as part of aggregate queries Make sure you have enough space for TempDB operations. When dealing with large data sets, TempDB can get very large very fast.