Upload
trillium-software
View
415
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Be Certain. Be Trillium Certain.
The Bigger They Are The
Harder They Fall: Big Data & the Data Quality
Imperative
Nigel Turner, VP Strategic Information Management
Tuesday 19th June 2012
The bigger they are the harder they fall…
But big can pay off…
Big Data – what is it?
� Set of new concepts, practices & technologies to manage &
exploit digital data
� OVUM defines it as:
� “A data computational problem that is large and varied enough to
demand new approaches to traditional SQL & related practices”
� Key premise is that all data has potential value if it can be collected, analysed and used to generate actionable insight
Big Data – its characteristicsThe 3Vs
• Reflects exponential growth of data – predicted 40-60% per annum
• Today 2.5 quintillion bytes of data are created every day
• 90% of all digital data was created in the last two years
• Data generated more varied and complex than before:
– Text, Audio, Images, Machine Generated etc.
• Much of this data is semi-structured or unstructured
• Traditional IT techniques ill equipped to process & analyse it
• Data often generated in real time
• Analysis and response needs to be rapid, often also real time
• Traditional BI / DW environments becoming obsolescent – new
approaches are needed
What’s different about Big Data?
� New technologies which enable distributed & highly
scalable MPP (Massively Parallel Processing), e.g.
� Apache Hadoop
� MapReduce
� NoSQL databases
� Strong emphasis on analytical approaches
� Emergence of “data science”
� Predictive Analytics
� Data Mining
� The “democratisation” of data
� Data made available to all (cf Cloud Computing)
� Business and not IT led BI
Where does Big Data come from?Widely known sources
Where does Big Data come from?Social Media & Social Networks
Where does Big Data come from?Machine Generated data
Big Data – some vertical applications
� Retail: using point of sale & social media data to
supplement & enrich traditional CRM / Marketing data
� Insurance & Banking: fraud detection
� Health: holistic patient analysis
� Utilities: consumption peaks & troughs & capacity
planning
� Telcos: call routing optimisation & customer churn
� Manufacturing: predictive fault identification & supply
chain optimisation
� Research: particle analysis, genomics etc.
Big Data in practice - Volvo
� Every Volvo vehicle has hundreds of
microprocessors / sensors
� Data generated used within the car itself but
also captured for analysis by Volvo and its dealers
� All data is loaded into a centralised data
analysis hub & integrated with CRM, dealership & product data
� Used to optimise design & manufacturing, enhance customer interaction & improve
safety
Big data in practice – fraud detection
Big Data – why invest?
� Better understanding of customer & market behaviour
� Improved knowledge of product & service performance
� Aids innovation in products & services
� Fact based and more rapid decision making
� Enhances revenue
� Reduces costs
� Stimulates economic growth
Big Data – the impact on individuals
� Employees
� Empower & devolve decision making
� Create new job & upskilling opportunities
� Consumers
� Better targeted offers
� Improved products & services that meet needs
Big Data – the privacy concern
Big Data – Foundations of Success
� Identifying the right data to solve the business problem or
opportunity
� The ability to integrate & match varied data from multiple data
sources
� structured, semi-structured, unstructured
� Building the right IT infrastructure to support Big Data
applications
� Having the right capabilities & skills to exploit the data
Big Data – the data integration challenge
SOCIAL
MEDIA
SENSORS
CS
DATA
MOBILES
EX
TE
RN
AL
DA
TA
SO
UR
CE
S
INT
ER
NA
L D
AT
A S
OU
RC
ESCRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
Big Data – Barriers & Pitfalls
� The sheer volume of data – what’s worth using?
� Data extraction challenges
� The ability to match data from disparate sources / formats / media
� The time taken to integrate new data sources
� The risks of mismatching and incorrect identification of individuals
� Legal & regulatory pitfalls
� Security concerns – corporate & individual
� Lack of skills & expertise
� Making the case for investment
Big Data – the Data Quality Imperative (1)
� Need to profile external and internal data sources
� Need to classify data to define what data really matters
� Need to assure the quality of internal (and some external)
data sources for accuracy, completeness, consistency
� Need to define & apply business rules & metadata
management to how the data will be defined and used
� Need for a data governance framework to ensure consistency & control
Big Data – the Data Quality Imperative (2)
� Need processes & tools to enable:� Source data profiling
� Data integration
� Data parsing
� Data standardisation
� Business rule creation & management
� Metadata management & a shared business / IT glossary
� Data de-duplication
� Data normalisation
� Data standardisation
� Data matching
� Data enrichment
� Data audit
� Many of these functions must be capable of being carried out in real time with zero lag
Big Data – the key enablerE
XT
ER
NA
L D
AT
A S
OU
RC
ES
INT
ER
NA
L D
AT
A S
OU
RC
ES
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
DATA QUALITY PLATFORM
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
Big Data – some algorithms
1. BIG DATA + POOR DATA QUALITY = BIG PROBLEMS
2. DATA DEMOCRITISATION – DATA GOVERNANCE =
ANARCHY
3. DATA MASH UPS – DATA QUALITY = DATA MESS
4. BIG DATA ANALYTICS + POOR DQ = WRONG RESULTS
5. BIG DATA – DATA ASSURANCE = JAIL
6. 3V + DATA QUALITY = 4V (VALIDITY)
Big Data – the future
� To date Big Data has been overhyped but now a tipping point has come
� It is here and will grow in volume, velocity & variety
� Immature concept & market so hard to plan – but consolidation is happening
� Big data in a business context reflects emerging generation’s expectations & needs
� Data will increasingly be seen as an asset
� Data skills will become increasingly valued
Big Data – how Trillium Software can help
� Current Trillium Software products & services
can help you succeed in your Big Data journey:
� Real time & batch data capabilities in:o Data profiling
o Parsing
o Standardisation
o De-duplication
oMatching
o Enrichment
o Audit
� Strategic consulting services to prepare for and
realise Big Data opportunities