Transcript
Page 1: Lambda Architecture and open source technology stack for real time big data

Lambda Architecture and Open Source Tools for

Real-time Big Data● Concepts & Techniques “Thinking with Lambda”● Case studies in Practice

Trieu Nguyen - http://nguyentantrieu.info or @tantrieuf31Principal Engineer at eClick Data Analytics team, FPT OnlineAll contents and thoughts in this slide are my subjective ideas and compiled from Communities

Page 2: Lambda Architecture and open source technology stack for real time big data

Just a little introduction● 2008 Java Developer, developed Social

Trading Network for a small startup (Yopco)● 2011 worked at FPT Online, software engineer

in Banbe Project, Restful API for VnExpress Mobile App

● 2012 joined Greengar Studios in 6 months, scaling backend API mobile games (iOS, Android)

● 2013 back to FPT Online, R&D about Big Data & Analytics, developing the new core Analytics Platform (on JVM Platform)

Page 3: Lambda Architecture and open source technology stack for real time big data

Contents for this talk

● The lessons from history● Problems In Practice● What is the Lambda Architecture?● Why lambda architecture for real-time big

data ?● Open Source Technology Stack ● Lambda in Practice (Mobile Data and Web Data)● Lessons I have learned● Questions & Answers

Page 4: Lambda Architecture and open source technology stack for real time big data

History ?The best way to predict the future is looking at the past and now ?

Page 5: Lambda Architecture and open source technology stack for real time big data

Big data is a buzzword for old problems

Page 7: Lambda Architecture and open source technology stack for real time big data

Learning ?

Page 8: Lambda Architecture and open source technology stack for real time big data

Working ?

Page 9: Lambda Architecture and open source technology stack for real time big data
Page 11: Lambda Architecture and open source technology stack for real time big data

This is most valuable things!

This is Big DATA

Page 12: Lambda Architecture and open source technology stack for real time big data
Page 13: Lambda Architecture and open source technology stack for real time big data

We can't solve problems by using the same kind of thinking we used when we created them.Albert Einstein

Think more withLambda and Reactive

Page 14: Lambda Architecture and open source technology stack for real time big data
Page 15: Lambda Architecture and open source technology stack for real time big data
Page 16: Lambda Architecture and open source technology stack for real time big data

Where Big Data can be used

Page 17: Lambda Architecture and open source technology stack for real time big data
Page 18: Lambda Architecture and open source technology stack for real time big data

BBC Horizon 2013 The Age of Big Data

http://www.youtube.com/watch?v=RE0ITQ7XQjM

Page 19: Lambda Architecture and open source technology stack for real time big data
Page 20: Lambda Architecture and open source technology stack for real time big data
Page 21: Lambda Architecture and open source technology stack for real time big data
Page 22: Lambda Architecture and open source technology stack for real time big data
Page 23: Lambda Architecture and open source technology stack for real time big data

Google’s mission is to organize

the world’s information and make it

universally accessible and useful.

Page 24: Lambda Architecture and open source technology stack for real time big data
Page 25: Lambda Architecture and open source technology stack for real time big data

Organize the world’s information?

Page 26: Lambda Architecture and open source technology stack for real time big data
Page 27: Lambda Architecture and open source technology stack for real time big data

How did Google scale their search engine ?How does Hadoop really work ?

Page 28: Lambda Architecture and open source technology stack for real time big data
Page 29: Lambda Architecture and open source technology stack for real time big data

http://stackoverflow.com/questions/6087834/how-scalable-is-mapreduce-in-the-original-functional-languages

Page 30: Lambda Architecture and open source technology stack for real time big data

Trends of Now and the Future

● MapReduce Programming● Reactive Programming● Functional Programming● Streaming Computation

=> All just the special cases of Lambda

Page 31: Lambda Architecture and open source technology stack for real time big data
Page 32: Lambda Architecture and open source technology stack for real time big data

So what is the λ (Lambda) Architecture ?

Page 33: Lambda Architecture and open source technology stack for real time big data
Page 34: Lambda Architecture and open source technology stack for real time big data
Page 35: Lambda Architecture and open source technology stack for real time big data

the Lambda Architecture:

● apply the (λ) Lambda philosophy in designing big data system

● equation “query = function(all data)” which is the basis of all data systems

● proposed by Nathan Marz (http://nathanmarz.com/), a software engineer from Twitter in his “Big Data” book.

● is based on three main design principles:

○ human fault-tolerance – the system is unsusceptible to data loss or data

corruption because at scale it could be irreparable. (BUGS ?)

○ data immutability – store data in it’s rawest form immutable and for

perpetuity. (INSERT/ SELECT/DELETE but no UPDATE !)

○ recomputation – with the two principles above it is always possible to

(re)-compute results by running a function on the raw data.

Page 36: Lambda Architecture and open source technology stack for real time big data

Lambda In Practice2 case studies from my experiences

Page 37: Lambda Architecture and open source technology stack for real time big data

Case Study 1: Mobile Data

Monitor API Backend + System KPI

Page 38: Lambda Architecture and open source technology stack for real time big data

Problem:Inside “mobile data”, What's the most valuable piece of information

Page 39: Lambda Architecture and open source technology stack for real time big data

Backend System for mobile app

I applied “Lambda” here

Page 40: Lambda Architecture and open source technology stack for real time big data

Web vs Mobile AppWeb

Visitors

Visits

Pageviews

Events

Mobile AppUsers

Sessions

Events

Page 41: Lambda Architecture and open source technology stack for real time big data

Metrics: Cause and Effect● Screen Size => App Design, UI/UX, Usability● App version => Deployment, Marketing● Connectivity => Code, User Experience ● Location => Marketing, User Behaviour● OS => Marketing, Cost, Development● Memory => User Experience ● Feature Session => How to engage app users

Page 42: Lambda Architecture and open source technology stack for real time big data

The data and the size, not too big for a small startup!

Where is the lambda ?I used Groovy + GPars (Groovy Parallel Systems) + MongoDB for fast parallel computation (actor model) on statistical datahttp://gpars.codehaus.org/ The GPars framework offers Java developers intuitive and safe ways to handle Java or Groovy tasks concurrently. Support:

● Dataflow concurrency● Actor programming model● CSP● Agent - an thread-safe reference to mutable state● Concurrent collection processing● Composable asynchronous functions● Fork/Join● STM (Software Transactional Memory)

Page 43: Lambda Architecture and open source technology stack for real time big data

Mobile Apps => Backend APIs => Statistics => Find the Trends & Insights?

Page 44: Lambda Architecture and open source technology stack for real time big data

Reactive Data Analytics for Mobile Apps

It means real-time recommendation by:➔ context (location, time)➔ user profile (preferences, level,

...)

Page 45: Lambda Architecture and open source technology stack for real time big data

Big Data on Small Devices: Data Science goes Mobilehttp://strataconf.com/strata2013/public/schedule/detail/27605

Page 46: Lambda Architecture and open source technology stack for real time big data

Case Study 2: Web Data

● Real-time Data Analytics ● Monitoring Stream Data (Reactive)

http://eclick.vn

Page 47: Lambda Architecture and open source technology stack for real time big data

at eClick we have30~40 GB Logs in Stream10~20 GB Bandwidthjust for tracking user actions (click, impression,...) in ONE day !

at eClick we must check campaigns in near-real-time (seconds) !

at eClick we have many types of log (video, web, mobile, system logs, ad-campaign, articles, … )

Page 48: Lambda Architecture and open source technology stack for real time big data
Page 49: Lambda Architecture and open source technology stack for real time big data
Page 50: Lambda Architecture and open source technology stack for real time big data

“lambda architecture” proposed by @nathanmarz

Page 51: Lambda Architecture and open source technology stack for real time big data

Netty Http Server

Kafka

Storm

Redis

Hadoop Tools

KPI Report

Internet

the open-source lambda architecture at eClick

Redis

Akka Workers

TCP Connection

Page 52: Lambda Architecture and open source technology stack for real time big data

The big-data technology stack ● Netty (http://netty.io/) a framework using reactive programming

pattern for scaling HTTP system easier, by JBoss http://www.jboss.org ● Kafka (http://kafka.apache.org/) a publish-subscribe messaging

rethought as a distributed commit log, open sourced by Linkedin● Storm (http://storm-project.net/) the framework for distributed

realtime computation system, by Twitter● Redis (http://redis.io/) a advanced key-value in-memory NoSQL

database, all fast statistical computations in here.● Groovy for scripting layer on JVM, ad-hoc query on Redis ● Hadoop ecosystem: HDFS, Hive, HBase for batch processing● RxJava https://github.com/Netflix/RxJava a library for composing

asynchronous and event-based programs● Hystrix https://github.com/Netflix/Hystrix : for Latency and Fault

Tolerance for Distributed Systems

Page 53: Lambda Architecture and open source technology stack for real time big data

My new ideas for the future

Connecting the active functor pattern + reactive programming + stream computation + in-memory computing to make:● real-time data analytics easier● better recommendation system● build more profitable in big data

More Information:● http://activefunctor.blogspot.com/ (a special case of Lambda

that actively search best connections to form optimal topology) - from ideas when internship at DRD with my advisor.

● Can a function be persistent (stored as data), distributed in a cluster (cloud), reactive to right data (best value in network) ?

● http://www.reactivemanifesto.org/ (reactive pattern)

Page 54: Lambda Architecture and open source technology stack for real time big data

LessonsWhat I have learned from Lambda and Big Data World

Page 55: Lambda Architecture and open source technology stack for real time big data
Page 56: Lambda Architecture and open source technology stack for real time big data

What I have learned● Study about lambda and read some books● Ask questions=> analytics=> Profit & Value● Collect any data you can, learn inside !● Implement it! Just right tools for right jobs.● Turn your data into the things everyone can

"look & feel"

Page 57: Lambda Architecture and open source technology stack for real time big data

read papers

Page 58: Lambda Architecture and open source technology stack for real time big data

Study the “lambda”I studied Haskell in 2007 with Dr.Peter Gammie http://peteg.org/ when internship at DRD (a non-profit organization).● Imperative programs will always be vulnerable to data races because

they contain mutable variables.● There are no data races in purely functional languages because they

don't have mutable variables.

Page 59: Lambda Architecture and open source technology stack for real time big data

Reading some books

Page 60: Lambda Architecture and open source technology stack for real time big data
Page 61: Lambda Architecture and open source technology stack for real time big data
Page 62: Lambda Architecture and open source technology stack for real time big data

Improve your business knowledge !=> read the Behavioral Economics Books

http://www.goodreads.com/shelf/show/behavioral-economics

Page 63: Lambda Architecture and open source technology stack for real time big data

Collect the data ?

Page 64: Lambda Architecture and open source technology stack for real time big data

Use your imagination is more than just knowledge you have

Page 65: Lambda Architecture and open source technology stack for real time big data

Think more about Butterfly Effect!

Page 66: Lambda Architecture and open source technology stack for real time big data

“Logic will get you from A to Z;

imagination will get you

everywhere.” - Albert Einstein

Use your imagination with data analytics, not just logic

Learn Data Visualization


Recommended