21
An Introduction to BIG DATA CUSO Seminar on Big Data Prof. Dr. Philippe Cudré-Mauroux http:// exascale.info May 22, 2014 Fribourg–Switzerland 1

An Introduction to Big Data

Embed Size (px)

DESCRIPTION

An Introduction to Big Data CUSO Seminar on Big Data, Switzerland Prof. Philippe Cudre-Mauroux eXascale Infolab http://exascale.info/

Citation preview

Page 1: An Introduction to Big Data

1

An Introduction toBIG DATA

CUSO Seminar on Big Data

Prof. Dr. Philippe Cudré-Mauroux

http://exascale.info

May 22, 2014

Fribourg–Switzerland

Page 2: An Introduction to Big Data

2

On the Menu Today

• Big Data: Context• Big Data: Buzzwords

– 3 Vs of Big Data

• Big Data Landscape• Hadoop• Big Data in Switzerland

Page 3: An Introduction to Big Data

3

Instant Quizz

• 3 Vs of Big Data?• CAP?• Hadoop?• Spark?

Page 4: An Introduction to Big Data

Exascale Data Deluge

• Science– Biology– Astronomy– Remote Sensing

• Web companies– Ebay– Yahoo

• Financial services,

retail companies

governments, etc.

© Wired 2009

➡ New data formats➡ New machines➡ Peta & exa-scale

datasets➡ Obsolescence of

traditional information infrastructures

Page 5: An Introduction to Big Data

5

The Web as the Main Driver

© Qmee

Page 6: An Introduction to Big Data

6

Big Data Central Theorem

Data+Technology Actionable Insight $$

Page 7: An Introduction to Big Data

7

Big Data Buzz

Between now and 2015, the firm expects big data to create

some 4.4 million IT jobs globally; of those, 1.9 million will

be in the U.S. Applying an economic multiplier to that

estimate, Gartner expects each new big-data-related IT job

to create work for three more people outside the tech

industry, for a total of almost 6 million more U.S. jobs.

Growth in the Asia Pacific Big Data market is

expected to accelerate rapidly in two to three

years time, from a mere US$258.5 million last

year to in excess of $1.76 billion in 2016,

with highest growth in the storage segment.

Page 8: An Introduction to Big Data

8

Big Data as a New Class of Asset

• The Age of Big Data (NYTimes Feb. 11, 2012)http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html

“Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.”

Page 9: An Introduction to Big Data

9

Page 10: An Introduction to Big Data

10

The 3-Vs of Big Data

• Volume– Amount of data

• Velocity– speed of data in and out

• Variety– range of data types and sources

• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization"

Page 11: An Introduction to Big Data

11

What can you do with the data

• Reporting– Post Hoc– Real time

• Monitoring (fine-grained)• Exploration• Finding Patterns• Root Cause Analysis• Closed-loop Control• Model construction• Prediction• …

© Mike Franklin

Page 12: An Introduction to Big Data

12

10 ways big data changes everything

• Some concrete examples – http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/

1. Can gigabytes predict the next Lady Gaga?

2. How big data can curb the world’s energy consumption

3. Big data is now your company’s virtual assistant

4. The future of Foursquare is data-fueled recommendations

5. How Twitter data-tracked cholera in Haiti

6. Revolutionizing Web publishing with big data

7. Can cell phone data cure society’s ills?

8. How data can help predict and create video hits

9. The new face of data visualization

10. One hospital’s embrace of big data

Page 13: An Introduction to Big Data

13

Typical Big Data Success Story

• Modeling users through Big Data– Online ads sale / placement [e.g., Facebook]– Personalized Coupons [e.g., Target]– Product Placement [Walmart]– Content Generation [e.g., NetFlix]– Personalized learning [e.g., Duolingo]– HR Recruiting [e.g., Gild]

Page 14: An Introduction to Big Data

14

More Data => Better Answers?

• Not that easy…• More Rows: Algorithmic complexity kicks in• More Columns: Exponentially more hypotheses

• Another formulation of the problem:– Given an inferential goal and a fixed computational budget,

provide a guarantee that the quality of inference will increase monotonically as data accrue (without bound)

• In other words:

=> Data should be a resource, not a load

© Mike Jordan

Page 15: An Introduction to Big Data

15

Big Data Infrastructures

Page 16: An Introduction to Big Data

16

A Concrete Example: Zynga

Page 17: An Introduction to Big Data

Leading the Pack of Wolves: Hadoop

• Google: Map/Reduce paper published 2004• Open source variant: Hadoop

• Map-reduce = high-level programming model and implementation for large-scale parallel data processing

• Right now most overhyped system in CS

17

Page 18: An Introduction to Big Data

18

What about Swiss Big Data?

• Competitive Research Groups

• Swiss Big Data User Group

• Swiss companies playing catch-up– Productized Big Data systems at leading telcos & financial

companies– Big Data is not a new technology: it's a fact;

• Deal with it POCs in most banks, insurance companies, retailers

Page 19: An Introduction to Big Data

19

Tasty Bites of Big Data (1)

Thursday afternoon

• 13:30-15:00: Big Data ProfilingFelix Naumann (Hasso Plattner Institute)

• 15:15-16:45: Realtime AnalyticsChristoph Koch (EPFL)

• 16:45-17:45: Current Trends and Challenges in Big Data BenchmarkingKais Sachs (SAP / Spec)

Page 20: An Introduction to Big Data

20

Tasty Bites of Big Data (2)

Friday

• 9:00 - 10:30: Structured Data in Web Search Alon Halevy (Google)

• 10:45 - 12:15: Human Computation for Big DataGianluca Demartini (UNIFR)

• 13:30-15:00: Analysing and Querying Big Scientific DataThomas Heinis (EPFL)

• 15:00-16:30: The Evolution of Big Data FrameworksCarlo Curino (Microsoft Research)

Page 21: An Introduction to Big Data

Social Event, Friday – Beer Tasting!Basse-Ville Fribourg / 15 CHF per Person

Everything You Always Wanted to Know About Beer. * But Were

Afraid to Ask!18:00 @ Café du Belvédère, Grand-Rue 3619:00 @ Fri-Mousse, Rue de la Samaritaine 19 Limited Places, Inscription is mandatory at:

http://xr.si