26
Intelijen bisnis Dari Wikipedia bahasa Indonesia, ensiklopedia bebas Istilah intelijen bisnis (bahasa Inggris : business intelligence, BI) merujuk pada teknologi , aplikasi , serta praktik pengumpulan, integrasi, analisis, serta presentasi informasi bisnis atau kadang merujuk pula pada informasinya itu sendiri. Tujuan intelijen bisnis adalah untuk mendukung pengambilan keputusan bisnis . Sistem BI memberikan sudut pandang historis, saat ini, serta prediksi operasi bisnis, terutama dengan menggunakan data yang telah dikumpulkan ke dalam suatu gudang data dan kadang juga bersumber pada data operasional. Perangkat lunak mendukung penggunaan informasi ini dengan membantu ekstraksi, analisis, serta pelaporan informasi. Aplikasi BI menangani penjualan, produksi, keuangan, serta berbagai sumber data bisnis untuk keperluan tersebut, yang mencakup terutama manajemen kinerja bisnis . Informasi dapat pula diperoleh dari perusahaan-perusahaan sejenis untuk menghasilkan suatu tolok ukur . Artikel bertopik manajemen ini adalah sebuah rintisan . Anda dapat membantu Wikipedia dengan mengembangkannya . Business intelligence From Wikipedia, the free encyclopedia Business intelligence (BI) mainly refers to computer -based techniques used in identifying, extracting , [clarification needed ] and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes. [1] BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing , analytics , data mining , process mining , business performance management , benchmarking , text mining and predictive analytics .

Business Intelligence ETL

Embed Size (px)

Citation preview

Page 1: Business Intelligence ETL

Intelijen bisnis

Dari Wikipedia bahasa Indonesia, ensiklopedia bebas

Istilah intelijen bisnis (bahasa Inggris: business intelligence, BI) merujuk pada teknologi, aplikasi, serta praktik pengumpulan, integrasi, analisis, serta presentasi informasi bisnis atau kadang merujuk pula pada informasinya itu sendiri. Tujuan intelijen bisnis adalah untuk mendukung pengambilan keputusan bisnis.

Sistem BI memberikan sudut pandang historis, saat ini, serta prediksi operasi bisnis, terutama dengan menggunakan data yang telah dikumpulkan ke dalam suatu gudang data dan kadang juga bersumber pada data operasional. Perangkat lunak mendukung penggunaan informasi ini dengan membantu ekstraksi, analisis, serta pelaporan informasi. Aplikasi BI menangani penjualan, produksi, keuangan, serta berbagai sumber data bisnis untuk keperluan tersebut, yang mencakup terutama manajemen kinerja bisnis. Informasi dapat pula diperoleh dari perusahaan-perusahaan sejenis untuk menghasilkan suatu tolok ukur.

 

Artikel bertopik manajemen ini adalah sebuah rintisan. Anda dapat membantu Wikipedia dengan mengembangkannya.

Business intelligence

From Wikipedia, the free encyclopedia

Business intelligence (BI) mainly refers to computer-based techniques used in identifying, extracting,[clarification needed] and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes.[1]

BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, business performance management, benchmarking, text mining and predictive analytics.

Business intelligence aims to support better business decision-making. Thus a BI system can be called a decision support system (DSS).[2] Though the term business intelligence is sometimes used as a synonym for competitive intelligence, because they both support decision making, BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence gathers, analyzes and disseminates information with a topical focus on company competitors. Business intelligence understood broadly can include the subset of competitive intelligence.[3]

Contents

[hide]

Page 2: Business Intelligence ETL

1 History 2 Business intelligence and data warehousing 3 Business intelligence and business analytics 4 Applications in an enterprise 5 Requirements gathering

o 5.1 Approacho 5.2 Preparationo 5.3 Identify the interview team

5.3.1 Research the organization 5.3.1.1 Select the interviewees 5.3.1.2 Develop the interview questionnaires

5.3.2 Schedule and sequence the interviews 5.3.2.1 Prepare the interviewees

o 5.4 Issues with requirements gathering and interviews 6 Prioritization of business intelligence projects 7 Success factors of implementation 8 User aspect 9 Marketplace

o 9.1 Industry-specific 10 Semi-structured or unstructured data

o 10.1 Unstructured data vs. Semi-structured datao 10.2 Problems with semi-structured or unstructured datao 10.3 The use of metadata

11 Future 12 See also 13 References

[edit] History

In a 1958 article, IBM researcher Hans Peter Luhn used the term business intelligence. He defined intelligence as: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."[4]

Business intelligence as it is understood today is said to have evolved from the decision support systems which began in the 1960s and developed throughout the mid-80s. DSS originated in the computer-aided models created to assist with decision making and planning. From DSS, data warehouses, Executive Information Systems, OLAP and business intelligence came into focus beginning in the late 80s.

In 1989 Howard Dresner (later a Gartner Group analyst) proposed "business intelligence" as an umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems."[2] It was not until the late 1990s that this usage was widespread.[5]

[edit] Business intelligence and data warehousing

Page 3: Business Intelligence ETL

Often BI applications use data gathered from a data warehouse or a data mart. However, not all data warehouses are used for business intelligence, nor do all business intelligence applications require a data warehouse.

In order to distinguish between concepts of business intelligence and data warehouses, Forrester Research often defines business intelligence in one of two ways:

Using a broad definition: "Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making."[6]

When using this definition, business intelligence also includes technologies such as data integration, data quality, data warehousing, master data management, text and content analytics, and many others that the market sometimes lumps into the Information Management segment. Therefore, Forrester refers to data preparation and data usage as two separate, but closely linked segments of the business intelligence architectural stack.

Forrester defines the latter, narrower business intelligence market as "referring to just the top layers of the BI architectural stack such as reporting, analytics and dashboards."[7]

[edit] Business intelligence and business analytics

Thomas Davenport has argued that business intelligence should be divided into querying, reporting, OLAP, an "alerts" tool, and business analytics. In this definition, business analytics is the subset of BI based on statistics, prediction, and optimization.[8]

[edit] Applications in an enterprise

Business Intelligence can be applied to the following business purposes (MARCKM), in order to drive business value:[citation needed]

1. Measurement – program that creates a hierarchy of Performance metrics (see also Metrics Reference Model) and Benchmarking that informs business leaders about progress towards business goals (AKA Business process management).

2. Analytics – program that builds quantitative processes for a business to arrive at optimal decisions and to perform Business Knowledge Discovery. Frequently involves: data mining, process mining, statistical analysis, Predictive analytics, Predictive modeling, Business process modeling

3. Reporting /Enterprise Reporting – program that builds infrastructure for Strategic Reporting to serve the Strategic management of a business, NOT Operational Reporting. Frequently involves: Data visualization, Executive information system, OLAP

4. Collaboration /Collaboration platform – program that gets different areas (both inside and outside the business) to work together through Data sharing and Electronic Data Interchange.

5. Knowledge Management – program to make the company data driven through strategies and practices to identify, create, represent, distribute, and enable adoption of insights and

Page 4: Business Intelligence ETL

experiences that are true business knowledge. Knowledge Management leads to Learning Management and Regulatory compliance/Compliance.

[edit] Requirements gathering

According to Ralph Kimball [9] , the requirements of business users impact nearly every decision made throughout the design and implementation of a DW/BI system. The business requirements relate to all aspects of the daily business processes and hence are critical to successful data warehousing. Business requirements analysis occurs at two distinct levels:

Macro level: understand business needs and priorities relative to the overall business strategy

Micro level: understand user needs and desires in the context of a single, relatively narrowly defined project.

[edit] Approach

There are two basic interactive techniques for gathering requirements:

Conducting interviews: Speaking with users about their jobs, their objectives, and their challenges. This is either done with individuals or small groups.

Facilitated sessions and seminars that encourage creative mind-mapping.

[edit] Preparation

[edit] Identify the interview team

Lead interviewer – directs the questioning Scribe – takes notes during the interview. A tape recorder may be used to supplement the

scribe. Observers (optional) – watch but do not contribute. This may be for the purpose of

training the observers in the interview approach, or so that the observers can comment on the interview after the event.

[edit] Research the organization

Reports, review of business operations, part of the annual report to gain insight regarding organizational structure. If applicable, a copy of the resulting documentation from the latest internal business/ IT strategy and planning meeting.

[edit] Select the interviewees

Select a cross section of representatives. Study the organization to get a good idea of all the stakeholders in the project. These include:

Business interviewees (to understand the key business processes)

Page 5: Business Intelligence ETL

IT and Compliance/Security Interviewees (to assess preliminary feasibility of the underlying operational source systems to support the requirements emerging from the business side of the house)

[edit] Develop the interview questionnaires

Multiple questionnaires should be developed because the questioning will vary by job function and level.

The questionnaires for the data audit sessions will differ from business requirements questionnaires

Be structured. This will help the interview flow and help organize your thoughts before the interview.

[edit] Schedule and sequence the interviews

Scheduling and rescheduling takes time; prepare these a good time in advance! Sequence your interviews by beginning with the business driver, followed by the business sponsor. This is to understand the playing field from their perspective. The optimal sequence would be:

Business driver Business sponsor An interviewee from the middle of the organizational hierarchy Bottom of the organizational hierarchy

The bottom is a disastrous place to begin because you have no idea where you are headed. The top is great for overall vision, but you need the business background, confidence, and credibility to converse at those levels. If you are not adequately prepared with in-depth business familiarity, the safest route is to begin in the middle of the organization.

[edit] Prepare the interviewees

Make sure the interviewees are appropriately briefed and prepared to participate. As a minimum, a letter should be emailed to all interview participants to inform them about the process and the importance of their participation and contribution. The letter should explain that the goal is to understand their job responsibilities and business objectives, which then translate into the information and analyses required to get their job done. In addition they should be asked to bring copies of frequently used reports or spreadsheet analyses.

The letter should be signed by a high ranking sponsor, someone well respected by the interviewees. It is advisable not to attach a list of the fifty questions you might ask in hopes that the interviewees will come prepared with answers. The odds are that they won’t take the time to prepare responses and even get intimidated by the volume of your questions.

[edit] Issues with requirements gathering and interviews

Page 6: Business Intelligence ETL

The process of conducting an interview may seem exhaustive at first, but the ground rule is to be well prepared in all steps. Techniques for questioning may be a good idea to investigate before conducting the interview. Ask open-ended questions such as why, how, what-if, and what-then questions. Ask unbiased questions.

Wrongfully asked questions can lead to wrong answers and, in the worst case, wrong requirements are gathered. The whole process is valuable in time and resources, and the wrong data can slow down the development of the whole BI installation. Be sure that everyone in the interviewee team is aware of their role to support that everything goes as planned. The next part is to synthesize around the business processes.

[edit] Prioritization of business intelligence projects

It is often difficult to provide a positive business case for business intelligence (BI) initiatives and often the projects will need to be prioritized through strategic initiatives. Here are some hints to increase the benefits for a BI project.

As described by Kimball[10] you must determine the tangible benefits such as eliminated cost of producing legacy reports.

Enforce access to data for the entire organization. In this way even a small benefit, such as a few minutes saved, will make a difference when it is multiplied by the number of employees in the entire organization.

As described by Ross, Weil & Roberson for Enterprise Architecture,[11] consider letting the BI project be driven by other business initiatives with excellent business cases. To support this approach, the organization must have Enterprise Architects, which will be able to detect suitable business projects.

[edit] Success factors of implementation

Before implementing a BI solution, it is worth taking different factors into consideration before proceeding. According to Kimball et al. These are the three critical areas that you need to assess within your organization before getting ready to do a BI project[12]:

1. The level of commitment and sponsorship of the project from senior management2. The level of business need for creating a BI implementation3. The amount and quality of business data available.

Business Sponsorship

The commitment and sponsorship of senior management is according to Kimball et al., the most important criteria for assessment.[13] This is because having strong management backing will help overcome shortcomings elsewhere in the project. But as Kimball et al. state: “even the most elegantly designed DW/BI system cannot overcome a lack of business [management] sponsorship”.[14] It is very important that the management personnel who participate in the project have a vision and an idea of the benefits and drawbacks of implementing a BI system. The best business sponsor should have organizational clout and should be well connected within

Page 7: Business Intelligence ETL

the organization. It is ideal that the business sponsor is demanding but also able to be realistic and supportive if the implementation runs into delays or drawbacks. The management sponsor also needs to be able to assume accountability and to take responsibility for failures and setbacks on the project. It is imperative that there is support from multiple members of the management so the project will not fail if one person leaves the steering group. However, having many managers that work together on the project can also mean that the there are several different interests that attempt to pull the project in different directions. For instance if different departments want to put more emphasis on their usage of the implementation. This issue can be countered by an early and specific analysis of the different business areas that will benefit the most from the implementation. All stakeholders in project should participate in this analysis in order for them to feel ownership of the project and to find common ground between them. Another management problem that should be encountered before start of implementation is if the Business sponsor is overly aggressive. If the management individual gets carried away by the possibilities of using BI and starts wanting the DW or BI implementation to include several different sets of data that were not included in the original planning phase. However, since extra implementations of extra data will most likely add many months to the original plan. It is probably a good idea to make sure that the person from management is aware of his actions.

Implementation should be driven by clear business needs.

Because of the close relationship with senior management, another critical thing that needs to be assessed before the project is implemented is whether or not there actually is a business need and whether there is a clear business benefit by doing the implementation.[15] The needs and benefits of the implementation are sometimes driven by competition and the need to gain an advantage in the market. Another reason for a business-driven approach to implementation of BI is the acquisition of other organizations that enlarge the original organization it can sometimes be beneficial to implement DW or BI in order to create more oversight.

The amount and quality of the available data.

This ought to be the most important factor, since without good data – it does not really matter how good your management sponsorship or your business-driven motivation is. If you do not have the data, or the data does not have sufficient quality any BI implementation will fail. Before implementation it is a very good idea to do data profiling, this analysis will be able to describe the “content, consistency and structure [..]”[15] of the data. This should be done as early as possible in the process and if the analysis shows that your data is lacking; it is a good idea to put the project on the shelf temporarily while the IT department figures out how to do proper data collection.

Other scholars have added more factors to the list than these three. In his thesis “Critical Success Factors of BI Implementation” [16] Naveen Vodapalli does research on different factors that can impact the final BI product. He lists 7 crucial success factors for the implementation of a BI project, they are as follows:

1. Business-driven methodology and project management2. Clear vision and planning

Page 8: Business Intelligence ETL

3. Committed management support & sponsorship4. Data management and quality5. Mapping solutions to user requirements6. Performance considerations of the BI system7. Robust and expandable framework

[edit] User aspect

Some considerations must be made in order to successfully integrate the usage of business intelligence systems in a company. Ultimately the BI system must be accepted and utilized by the users in order for it to add value to the organization.[17][18] If the usability of the system is poor, the users may become frustrated and spend a considerable amount of time figuring out how to use the system or may not be able to really use the system. If the system does not add value to the users´ mission, they will simply not use it.[18]

In order to increase the user acceptance of a BI system, it may be advisable to consult the business users at an early stage of the DW/BI lifecycle, for example at the requirements gathering phase.[17] This can provide an insight into the business process and what the users need from the BI system. There are several methods for gathering this information, such as questionnaires and interview sessions.

When gathering the requirements from the business users, the local IT department should also be consulted in order to determine to which degree it is possible to fulfill the business's needs based on the available data.[17]

Taking on a user-centered approach throughout the design and development stage may further increase the chance of rapid user adoption of the BI system.[18]

Besides focusing on the user experience offered by the BI applications, it may also possible to motivate the users to utilize the system by adding an element of competition. Kimball [17] suggests implementing a function on the Business Intelligence portal website where reports on system usage can be found. By doing so, managers can see how well their departments are doing and compare themselves to others and this may spur them to encourage their staff to utilize the BI system even more.

In a 2007 article, H. J. Watson gives an example of how the competitive element can act as an incentive.[19] Watson describes how a large call centre has implemented performance dashboards for all the call agents and that monthly incentive bonuses have been tied up to the performance metrics. Furthermore the agents can see how their own performance compares to the other team members. The implementation of this type of performance measurement and competition significantly improved the performance of the agents.

Other elements which may increase the success of BI can be by involving senior management in order to make BI a part of the organizational culture and also by providing the users with the necessary tools, training and support.[19] By offering user training, more people may actually use the BI application.[17]

Page 9: Business Intelligence ETL

Providing user support is necessary in order to maintain the BI system and assist users who run into problems.[18] User support can be incorporated in many ways, for example by creating a website. The website should contain great content and tools for finding the necessary information. Furthermore, helpdesk support can be used. The helpdesk can be manned by e.g. power users or the DW/BI project team.[17]

[edit] Marketplace

There are a number of business intelligence vendors, often categorized into the remaining independent "pure-play" vendors and the consolidated "megavendors" which have entered the market through a recent trend of acquisitions in the BI industry.[20]

Some companies adopting BI software decide to pick and choose from different product offerings (best-of-breed) rather than purchase one comprehensive integrated solution (full-service).[21]

[edit] Industry-specific

Specific considerations for business intelligence systems have to be taken in some sectors such as governmental banking regulations. The information collected by banking institutions and analyzed with BI software must be protected from some groups or individuals, while being fully available to other groups or individuals. Therefore BI solutions must be sensitive to those needs and be flexible enough to adapt to new regulations and changes to existing laws.

[edit] Semi-structured or unstructured data

Businesses create a huge amount of valuable information in the form of e-mails, memos, notes from call-centers, news, user groups, chats, reports, web-pages, presentations, image-files, video-files, and marketing material and news. According to Merrill Lynch, more than 85 percent of all business information exists in these forms. These information types are called either semi-structured or unstructured data. However, organizations often only use these documents once.[22]

The management of semi-structured data is recognized as a major unsolved problem in the information technology industry.[23] According to projections from Gartner (2003), white collar workers will spend anywhere from 30 to 40 percent of their time searching, finding and assessing unstructured data. BI uses both structured and unstructured data, but the former is easy to search, and the latter contains a large quantity of the information needed for analysis and decision making.[23][24] Because of the difficulty of properly searching, finding and assessing unstructured or semi-structured data, organizations may not draw upon these vast reservoirs of information, which could influence a particular decision, task or project. This can ultimately lead to poorly-informed decision making.[22]

Therefore, when designing a Business Intelligence/DW-solution, the specific problems associated with semi-structured and unstructured data must be accommodated for as well as those for the structured data.[24]

Page 10: Business Intelligence ETL

[edit] Unstructured data vs. Semi-structured data

Unstructured and semi-structured data have different meanings depending on their context. In the context of relational database systems, it refers to data that cannot be stored in columns and rows. It must be stored in a BLOB (binary large object), a catch-all data type available in most relational database management systems.

But many of these data types, like e-mails, word processing text files, PPTs, image-files, and video-files conform to a standard that offers the possibility of metadata. Metadata can include information such as author and time of creation, and this can be stored in a relational database. Therefore it may be more accurate to talk about this as semi-structured documents or data,[23] but no specific consensus seems to have been reached.

[edit] Problems with semi-structured or unstructured data

There are several challenges to developing BI with semi-structured data. According to Inmon & Nesavich,[25] some of those are:

1. Physically accessing unstructured textual data – unstructured data is stored in a huge variety of formats.

2. Terminology – Among researchers and analysts, there is a need to develop a standardized terminology.

3. Volume of data – As stated earlier, up to 85% of all data exists as semi-structured data. Couple that with the need for word-to-word and semantic analysis..

4. Searchability of unstructured textual data – A simple search on some data, e.g. apple, results in links where there is a reference to that precise search term. (Inmon & Nesavich, 2008)[25] gives an example: “a search is made on the term felony. In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies.”

[edit] The use of metadata

To solve problems with searchability and assessment of data, it is necessary to know something about the content. This can be done by adding context through the use of metadata.[22] A lot of systems already capture some metadata (e.g. filename, author, size, etc.), but more useful would be metadata about the actual content – e.g. summaries, topics, people or companies mentioned. Two technologies designed for generating metadata about content are automatic categorization and information extraction.

[edit] Future

A 2009 Gartner paper predicted[26] these developments in the business intelligence market:

Page 11: Business Intelligence ETL

Because of lack of information, processes, and tools, through 2012, more than 35 percent of the top 5,000 global companies will regularly fail to make insightful decisions about significant changes in their business and markets.

By 2012, business units will control at least 40 percent of the total budget for business intelligence.

By 2012, one-third of analytic applications applied to business processes will be delivered through coarse-grained application mashups.

A 2009 Information Management special report predicted the top BI trends: "green computing, social networking, data visualization, mobile BI, predictive analytics, composite applications, cloud computing and multitouch."[27]

According to a study by the Aberdeen Group, there has been increasing interest in Software-as-a-Service (SaaS) business intelligence over the past years, with twice as many organizations using this deployment approach as one year ago – 15% in 2009 compared to 7% in 2008.[citation needed]

An article by InfoWorld’s Chris Kanaracus points out similar growth data from research firm IDC, which predicts the SaaS BI market will grow 22 percent each year through 2013 thanks to increased product sophistication, strained IT budgets, and other factors.[28]

[edit] See also

Accounting intelligence Analytic applications Artificial intelligence marketing Business Intelligence 2.0 Business process discovery Business process management Business activity monitoring Business service management Customer dynamics Data Presentation Architecture Data visualization Decision engineering Enterprise planning systems Document intelligence Integrated business planning Location intelligence Meteorological intelligence Mobile business intelligence Operational intelligence Process mining Runtime intelligence Sales intelligence Spend management Test and learn

Page 12: Business Intelligence ETL

[edit] References

1. ̂ "BusinessDictionary.com definition". Retrieved 17 March 2010.2. ^ a b D. J. Power (2007-03-10). "A Brief History of Decision Support Systems, version

4.0". DSSResources.COM. Retrieved 2008-07-10.3. ̂ Kobielus, James (April 30, 2010). "What’s Not BI? Oh, Don’t Get Me Started....Oops

Too Late...Here Goes....". "“Business” intelligence is a non-domain-specific catchall for all the types of analytic data that can be delivered to users in reports, dashboards, and the like. When you specify the subject domain for this intelligence, then you can refer to “competitive intelligence,” “market intelligence,” “social intelligence,” “financial intelligence,” “HR intelligence,” “supply chain intelligence,” and the like."

4. ̂ "A Business Intelligence System" (PDF). IBM Journal. October 1958. Retrieved 2008-07-10.

5. ̂ Power, D. J.. "A Brief History of Decision Support Systems". Retrieved November 1, 2010.

6. ̂ Evelson, Boris (November 21, 2008). "Topic Overview: Business Intelligence".7. ̂ Evelson, Boris (April 29, 2010). "Want to know what Forrester's lead data analysts are

thinking about BI and the data domain?".8. ̂ Tom Davenport. Interview. Analytics at Work: Q&A with Tom Davenport. January 4,

2010.9. ̂ Kimball et al., 2008: 6310. ̂ Ralph Kimball et al. "The Data warehouse Lifecycle Toolkit" (2nd ed.), page 2911. ̂ Jeanne W. Ross, Peter Weil, David C. Robertson (2006) "Enterprise Architecture As

Strategy", page 117.12. ̂ Kimball et al. 2008: pp. 29813. ̂ Kimball et al., 2008: 1614. ̂ Kimball et al., 2008: 1815. ^ a b Kimball et al., 2008: 1716. ̂ Naveen K Vodapalli (2009-11-02). "Critical Success Factors of BI Implementation". IT

University of Copenhagen. Retrieved 2009-11-12.17. ^ a b c d e f Ralph Kimball et al. "The Data warehouse Lifecycle Toolkit" (2nd ed.)18. ^ a b c d Swain Scheps "Business Intelligence For Dummies", 2008, ISBN 978-0-470-

12726-019. ^ a b H.J. Watson and B.H. Wixom "The Current State of Business Intelligence",

Computer Volume 40 Issue 9, September 200720. ̂ Pendse, Nigel (March 7, 2008). "Consolidations in the BI industry". The OLAP Report.21. ̂ Imhoff, Claudia (April 4, 2006). "Three Trends in Business Intelligence Technology".22. ^ a b c R. Rao "From Unstructured Data to Actionable Information", IT Pro, November |

December 2003, p. 14-1623. ^ a b c Blumberg, R. & S. Atre "The Problem with Unstructured Data", DM Review

November 2003b24. ^ a b Negash, S "Business Intelligence", Communications of the Association of

Information Systems, vol. 13, 2004, p. 177.195.25. ^ a b Inmon, B. & A. Nesavich, "Unstructured Textual Data in the Organization" from

"Managing Unstructured data in the organization", Prentice Hall 2008, p. 1-13

Page 13: Business Intelligence ETL

26. ̂ "Gartner Reveals Five Business Intelligence Predictions for 2009 and Beyond", http://www.gartner.com/it/page.jsp?id=856714

27. ̂ Campbell, Don (June 23, 2009). "10 Red Hot BI Trends". Information Management.28. ̂ http://infoworld.com/d/cloud-computing/saas-bi-growth-will-soar-in-2010-511

Extract, transform, loadFrom Wikipedia, the free encyclopedia

Extract, transform and load (ETL) is a process in database usage and especially in data warehousing that involves:

Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database or data warehouse)

Contents

[hide]

1 Extract 2 Transform 3 Load 4 Real-life ETL cycle 5 Challenges 6 Performance 7 Parallel processing 8 Rerunnability, recoverability 9 Virtual ETL 10 Best practices 11 Dealing with keys 12 Tools 13 See also 14 References

[edit] Extract

The first part of an ETL process involves extracting the data from the source systems.

Page 14: Business Intelligence ETL

ETL Architecture Pattern

Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization/format. Common data source formats are relational databases and flat files, but may include non-relational database structures such as Information Management System (IMS) or other data structures such as Virtual Storage Access Method (VSAM) or Indexed Sequential Access Method (ISAM), or even fetching from outside sources such as through web spidering or screen-scraping. The streaming of the extracted data source and load on-the-fly to the destination database is another way of performing ETL when no intermediate data storage is required. In general, the goal of the extraction phase is to convert the data into a single format which is appropriate for transformation processing.

An intrinsic part of the extraction involves the parsing of extracted data, resulting in a check if the data meets an expected pattern or structure. If not, the data may be rejected entirely or in part.

[edit] Transform

Page 15: Business Intelligence ETL

The transform stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the end target. Some data sources will require very little or even no manipulation of data. In other cases, one or more of the following transformation types may be required to meet the business and technical needs of the target database:

Selecting only certain columns to load (or selecting null columns not to load). For example, if the source data has three columns (also called attributes) say roll_no, age and salary then the extraction may take only roll_no and salary. Similarly, the extraction mechanism may ignore all those records where salary is not present (salary = null).

Translating coded values (e.g., if the source system stores 1 for male and 2 for female, but the warehouse stores M for male and F for female), this calls for automated data cleansing;[dubious – discuss] no manual cleansing occurs during ETL

Encoding free-form values (e.g., mapping "Male" to "1" and "Mr" to M) Deriving a new calculated value (e.g., sale_amount = qty * unit_price) Sorting Joining data from multiple sources (e.g., lookup, merge) Aggregation (for example, rollup — summarizing multiple rows of data — total sales for

each store, and for each region, etc.) Generating surrogate-key values Transposing or pivoting (turning multiple columns into multiple rows or vice versa) Splitting a column into multiple columns (e.g., putting a comma-separated list specified

as a string in one column as individual values in different columns) Disaggregation of repeating columns into a separate detail table (e.g., moving a series of

addresses in one record into single addresses in a set of records in a linked address table) Lookup and validate the relevant data from tables or referential files for slowly changing

dimensions. Applying any form of simple or complex data validation. If validation fails, it may result

in a full, partial or no rejection of the data, and thus none, some or all the data is handed over to the next step, depending on the rule design and exception handling. Many of the above transformations may result in exceptions, for example, when a code translation parses an unknown code in the extracted data.

[edit] Load

The load phase loads the data into the end target, usually the data warehouse (DW). Depending on the requirements of the organization, this process varies widely. Some data warehouses may overwrite existing information with cumulative information, frequently updating extract data is done on daily, weekly or monthly basis. Other DW (or even other parts of the same DW) may add new data in a historicized form, for example, hourly. To understand this, consider a DW that is required to maintain sales records of the last year. Then, the DW will overwrite any data that is older than a year with newer data. However, the entry of data for any one year window will be made in a historicized manner. The timing and scope to replace or append are strategic design choices dependent on the time available and the business needs. More complex systems can maintain a history and audit trail of all changes to the data loaded in the DW.

Page 16: Business Intelligence ETL

As the load phase interacts with a database, the constraints defined in the database schema — as well as in triggers activated upon data load — apply (for example, uniqueness, referential integrity, mandatory fields), which also contribute to the overall data quality performance of the ETL process.

For example, a financial institution might have information on a customer in several departments and each department might have that customer's information listed in a different way. The membership department might list the customer by name, whereas the accounting department might list the customer by number. ETL can bundle all this data and consolidate it into a uniform presentation, such as for storing in a database or data warehouse.

Another way that companies use ETL is to move information to another application permanently. For instance, the new application might use another database vendor and most likely a very different database schema. ETL can be used to transform the data into a format suitable for the new application to use.

An example of this would be an Expense and Cost Recovery System (ECRS) such as used by accountancies, consultancies and lawyers. The data usually ends up in the time and billing system, although some businesses may also utilize the raw data for employee productivity reports to Human Resources (personnel dept.) or equipment usage reports to Facilities Management.

Tools

Programmers can set up ETL processes using almost any programming language, but building such processes from scratch can become complex. Increasingly, companies are buying ETL tools to help in the creation of ETL processes.[6]

By using an established ETL framework, one may increase one's chances of ending up with better connectivity and scalability. A good ETL tool must be able to communicate with the many different relational databases and read the various file formats used throughout an organization. ETL tools have started to migrate into Enterprise Application Integration, or even Enterprise Service Bus, systems that now cover much more than just the extraction, transformation, and loading of data. Many ETL vendors now have data profiling, data quality, and metadata capabilities.

Data Manipulation LanguageFrom Wikipedia, the free encyclopedia

This article needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (June 2009)

Page 17: Business Intelligence ETL

Data Manipulation Language (DML) is a family of computer languages used by computer programs and/or database users to insert, delete and update data in a database (compared to DDL, which allows users to modify tables). Read-only querying, i.e. SELECT, of this data may be considered to be either part of DML or outside it, depending on the context.

Currently the most popular data manipulation language is that of SQL, which is used to retrieve and manipulate data in a Relational database.[1] Other forms of DML are those used by IMS/DLI, CODASYL databases (such as IDMS), and others.

Data Manipulation Language comprises the 'SQL-data change' statements,[2] which modify stored data but not the schema or database objects. Manipulation of persistent database objects (e.g. tables or stored procedures) via the 'SQL-schema' statements,[2] rather than the data stored within them, is considered to be part of a separate Data Definition Language. In SQL these two categories are similar in their detailed syntax, data types, expressions etc., but distinct in their overall function.[2]

Data Manipulation Languages have their functional capability organized by the initial word in a statement, which is almost always a verb. In the case of SQL, these verbs are:

SELECT ... FROM ... WHERE ... INSERT INTO ... VALUES ... UPDATE ... SET ... WHERE ... DELETE FROM ... WHERE ...

The purely read-only SELECT query statement is classed with the 'SQL-data' statements[2] and so is considered by the standard to be outside of DML. The SELECT ... INTO form is considered to be DML because it manipulates (i.e. modifies) data. In common practice though, this distinction is not made and SELECT is widely considered to be part of DML.[3]

Most SQL database implementations extend their SQL capabilities by providing imperative, i.e., procedural, languages. Examples of these are Oracle's PL/SQL and DB2's SQL PL.

Data manipulation languages tend to have many different flavors and capabilities between database vendors. There have been a number of standards established for SQL by ANSI,[1] but vendors still provide their own extensions to the standard while not implementing the entire standard.

There are two types of data manipulation languages:

Procedural Declarative

Each SQL DML statement is a declarative command. The individual SQL statements are declarative, as opposed to imperative, in that they describe what the program should accomplish, rather than describing how to go about accomplishing it.

Page 18: Business Intelligence ETL

Data manipulation languages were initially only used by computer programs, but (with the advent of SQL) have come to be used by people as well.

[edit] See also

CRUD Data Definition Language Data Control Language Query language

[edit] References

"The SQL92 standard" .

1. ^ a b SQL922. ^ a b c d SQL92 4.22.2, SQL-statements classified by function3. ̂ "Data Manipulation Language Statements". Oracle. "Data manipulation language

(DML) statements query or manipulate data in existing schema objects."