Upload
tahereh-saheb
View
249
Download
0
Embed Size (px)
Citation preview
Big Data Architecture & Business ValueTAHEREH SAHEBPHD IN STS FROM RPI. NYASSISTANT PROFESSOR AT TARBIAT MODARES [email protected]
Big data is an enterprise asset and
needs to be managed as an integrated element of your
current Enterprise Architecture
بیگ دیتا: کشف الگوها، پیش بینی آینده و ارایه تحلیل های تجویزی
Big data adds depth to our analysis of events through providing broader and deeper data
Big data adds depth to our analysis of events through providing broader and deeper data
چه عواملی باعث می شوند که برخی جاده ها امن تر از سایر جاده ها باشند؟
چه مقدار بارش برف باعث خطرناک شدن اتوبونهای میان ایالتی می شوند؟
آیا فارغ از میزان بارش، برخی از اتوبونها از سایر اتوبونها امن تر می باشند؟
آیا عوامل تصادف در سه سال اخیر مانند عوامل پنج سال پیشین می باشد؟از زمانی که میزان حداکثر مجاز سرعت در ایالت خاصی در چهار سال پیش تغییر کرد چه اثراتی بر امنیت جاده ها ایجاد شده است؟
از زمانی که دو اتوبان میان ایالتی در عرض سه سال پیش ساخته شدند، چگونه الگوهای کلی تصادف نسبت به زمان قبل از بازگشایی آنان تغییر کرده است؟
Big data adds depth to our analysis of events through providing broader and deeper data
آیا هیچ رابطه ای میان تعداد رانندگان با گواهینامه های رانندگی در ایالت و امنیت جاده وجود دارد؟ اگر بلی، آیا این تغییر پس از بازگشایی دو اتوبان جدید ایجاد شد؟
آیا هیچ رابطه ای میان رانندگانی که در حین آب و هوای بد تصادف کرده اند و رکورد کrلی رانندگی آنان وجود دارد؟
آیا هیچ ارتباطی بین دوری محل کار و منزل و احتمال تصادف وجود دارد؟ اگر بلی، چه زمانی از روز؟ چه روزی از هفته؟
4-Design Big Data
Architecture
3 -Draw Business
Information Map or Data
Flow DIagram
2-Determine Business Drivers & Problems
1-Envision
Before designing a big data architecture
استفاده از بیگ دیتا چه نوع بینش و ارزش تجاری به سازمان شماخواهد داد؟
آیا امکان ترفیع انبار داده کنونی شرکت وجود دارد؟ چگونه هزینه گسترش محیط کنونی و یا راه حلهای جدید را ارزیابی
نمایم؟ چه مهارتهایی برای درک و تحلیل ایجاد و نگهداری پروژه های بیگ دیتا
الزم است؟ آیا داده های سازمانی کنونی به ما بینشی در خصوص کسب و کار می
دهد؟
To answer these questions, use a structured approach for evaluating the viability of a big data solution according to the dimensions shown in the following figure
Business value from the insight that might be gained from analyzing the data
Governance considerations for the new sources of data and how the data will be used
People with relevant skills available and commitment of sponsors
Volume of the data being captured Variety of data sources, data types, and data
formats Velocity at which the data is generated, the speed
with which it needs to be acted upon, or the rate at which it is changing
Veracity of the data, or rather, the uncertainty or trustworthiness of the data
Phase one: Vision
When we paint a vision of a future architecture, we start with a basic understanding of our current state and we begin to speculate on how it might evolve.
A challenge in many organizations is that at inception, the technical vision may not be aligned to the business vision .
In fact, business visionaries and potential sponsors must drive the technical vision .
So we must understand current business utilization of our current information architecture and how that could change in the future.
The vision phase is mostly about gathering requirements and exploring the art of the possible.
Phase 2: Determine Business Drivers & Problems
The next phase of the methodology takes a much deeper look at
1 -look at the business drivers & Problems2 -to map the business problem to its big data
type .
Busi
ness
Pr
obl
em
Bi
g
Data
Type
Bi
g
Data
Characteristics
Big Data Business Drivers
Some of the key business drivers: Operational effectiveness Delivering a better
customer service Improved Innovation Improved company
growth
Business Problem: Predicting power consumption
Mapping the business problem to its big data typeDescription Big Data Type Business
ProblemUtility companies have rolled out smart meters to
measure the consumption of water, gas, and electricity at regular intervals of one hour or less. These smart meters generate huge volumes of interval data that
needs to be analyzed.--------------------------------------
Utilities also run big, expensive, and complicated systems to generate power. Each grid includes
sophisticated sensors that monitor voltage, frequency, and other important operating characteristics.
---------------------------------------
A big data solution can analyze power generation (supply) and power consumption (demand) data using
smart meters.
Machine-generated data Utilities: Predict power
consumption
مصرف گیری اندازه و هوشمند کنتورهای
عملیاتی خصوصیات سایر و ولتاژ گیری اندازه منظور به سنسورها
Business Drive : Operating efficiencyBusiness Problem: Predicting power consumptionBig Data Type: data delivered by the sensors
Customer Churn!!!!
TelecommunicationDescription Big Data Type Business
ProblemTelecommunications operators need to build detailed customer churn models
that include social media and transaction data.
Web and social data
Customer Churn
The value of the churn models depends on the quality of customer attributes
(customer master data such as date of birth, gender, location, and income) and
the social behavior of customers.Telecommunications providers who
implement a predictive analytics strategy can manage and predict churn by
analyzing the calling patterns of subscribers.
Transaction data
ریزش مدلهای تهیهاساس بر مشتریانهای شبکه های داده
تراکنش و اجتماعی
ریزش مدلهای ارزشخواص کیفیت به بستگیمانند ) دیتا مستر مشتریان ) و درامد جنسیتو تولد، تاریخمشتریان اجتماعی رفتار.دارد مشتریان زدن زنگ الگوهای تحلیل با ریزش بینی پیش و مدیریت منظور به کننده بینی پیش تحلیل
Fraud Detection
Financial Services& Health Care
Description Big Data Type Business Problem
مدیریت تقلب پیش بینی می کند احتمال اینکه یک تراکنش و یا حساب یک مشتری دچار تقلب شده است.
راه حلهای ارایه شده به تحلیل تراکنشها در زمان واقعی پرداخته و پیشنهاداتی را برای اقدام فوری تولید می کنند.
این اقدامات برای متوقف کردن تقلب و سواستفاده عمدی از .حساب ضروری است
Machine-generated data
Fraud Detection
Solutions are typically designed to detect and prevent myriad fraud and risk types across multiple industries, including
Transaction data
Credit and debit payment card fraudDeposit account fraudTechnical fraudBad debtHealthcare fraudMedicaid and Medicare fraudProperty and casualty insurance fraudWorker compensation fraudInsurance fraudTelecommunications fraud
Human-generated
طبقه بندی مشکالت کسب و کار بر اساس نوع بیگ دیتا به ما در دیدن خصوصیات انواع داده کمک می کند.
این خصوصیاتها می تواند به ما کمک کند تا متوجه شویم:- داده چگونه به دست آمده است؟1- چگونه داده ها در فرمت مناسب پردازش شده است؟2- تا چه اندازه داده ها های جدید قابل دسترس خواهند بود؟3
Business
Problem
Big Data Type
Big Data
Characteristic
s
•Analysis type — Whether the data is analyzed in real time or batched for later analysis. A mix of both types may be required by the use
case: Fraud detection; analysis must be done in real time or near real time. Trend analysis for strategic business decisions; analysis can be in batch mode.
• Processing methodology — The type of technique to be applied for processing data (e.g., predictive, analytical, ad-hoc query, and reporting).
Analysis TypeProcessing methodology
Data Frequency and Size
How much data is expected and at what frequency does it arrive. Knowing frequency and size helps determine the storage mechanism, storage format, and the necessary preprocessing tools.
Data frequency and size depend on data sources: •On demand, as with social media data•Continuous feed, real-time (weather data, transactional data)•Time series (time-based data)
Data TypeContent Format
•Data type — Type of data to be processed — transactional, historical, master data, and others. Knowing the data type helps segregate the data in storage.
• Content format — Format of incoming data — structured (CRM, for example), unstructured (audio, video, and images, for example), or semi-structured.
Format determines how the incoming data needs to be processed and is key to choosing tools and techniques and defining a solution from a business perspective
Data Sources
•Data source — Sources of data (where the data is generated) — web and social media, machine-generated, human-generated, etc.
Identifying all the data sources helps determine the scope from a business perspective. The figure shows the most widely used data sources.
Data ConsumersHardware
•Data consumers — A list of all of the possible consumers of the processed data:
• Business processes• Business users• Enterprise applications• Individual people in various business roles• Part of the process flows• Other data repositories or enterprise
applications
•Hardware — The type of hardware on which the big data solution will be implemented — commodity hardware or state of the art. Understanding the limitations of hardware helps inform the choice of big data solution
Phase 3: Business Information Maps: data flow?
Once we understand what data our business analysts need, we must figure out where we
should get the data & Which business
processes needs what kind of data ?
Example: fictitious manufacturer of luxury cars, Lux Motor Cars (LMC)
Lease vehicles in the United States, the following information is uncovered:• The variance between when a car is due for scheduled service and when it is actually serviced is quite high. In other words, drivers don’t seem to adhere to the service schedule as closely as otherLMC owners do.• The number of miles driven after a “Check Engine” light came on was dramatically higher for LMC Lease vehicles. It appears that lessees weren’t taking the “Check Engine” light very seriously.
LMC is currently not able to put such a program into place. Much of the data they need to run a program of this type is not currently available, though this is not fully understood in the business. So, we need to build a business information map describing the current state so that all can better understand how this part of the business operates with the data that it has today.The good news is that LMC is ready to make a telematics investment.
Big Data Initiative: deploying a telematics system
First Step: Data Sources
Second Step: Data Flow
deploying a telematics system
• Stakeholders will receive from the M&W system the key
performance indicators (KPIs) that have been heretoforeunavailable.• Sensors will exchange data with the M&W system including the interactive diagnostics that include both driver alerts and on-demand logs (interactive diagnostics).• Lessee and / or drivers will receive near real-time (RT) driver alerts and exchange messages to deal with service scheduling(near RT driver alerts and scheduling).
Phase 4: Drawing future state Big Data architecture
We are ready to design our future state technical architecture
and the IT architecture team will engage extensively in this phase .
However, before a more detailed design is started, we might want to
first understand the skills we have in our organization and the
impact those skills (or lack of) might have in the architecture .
We’ll also want to clearly understand the good and bad things about
our current state architecture and how we might extend it
through the introduction of new software components and
systems
Current state of Information Architecture
In this example, the Enterprise Data Warehouse (EDW) provides thehistoric database of record. Data is extracted from multiple OLTP systems (the ERP and CRM systems are pictured). Only Structured Data!Data marts surround the EDW. Business analysts accessthe marts and / or EDW using reporting and ad hoc query and analysis tools.
At the current state, the sources of data are limited to structured data!But the business needs a deeper analysis of consumers, therefore needs a new set of external streaming, unstructured and semi-structured data!
A Business Problem: to better understand the success of promotions and sales efforts.
Data is gathered in Hadoop after being captured when 1) shoppers buy items and 2) browse on the web site and 3) when they enter the brick-and mortar stores. 4) Sentiment data that expresses the shoppers’ opinion of doing businesswith the company is gathered in Hadoop from social media. The streaming data landed in clusters of NoSQL databases that can easily be scaled for high-ingestion demands andthen loaded into Hadoop for analysis.
We will next determine if there is a need to query and analyze data residing in our traditional data warehouse information architecture and the Hadoop cluster at the same time.
You want to be an on-demand Business?
Add a real-time recommendation engine and why an event
processing engine to guide the shopper by presenting specific
products in the web store while they are engaged
Example of an event processing
For example, if sensors in the brick-and-mortar store begin to detect delays in reaching cashiers and dissatisfied customers abandoning the items they wanted to buy, predefined rules might trigger devices to signal cashiers who are engaged in other activities to open up additional cash registers and alleviate the backup
Big Data Architecture