23
RFX - Full-Stack Technology for Real-time Big Data Key questions 1. What is RFX ? 2. Why is RFX ? 3. How to use RFX ? 4. The vision ... by [email protected] on 27/01/2016 http://engineering.adsplay.net

RFX - Full-Stack Technology for Real-time Big Data

Embed Size (px)

Citation preview

RFX - Full-Stack Technology for Real-time Big Data

Key questions1. What is RFX ?2. Why is RFX ?3. How to use RFX ?4. The vision ...

by [email protected] on 27/01/2016

http://engineering.adsplay.net

History

● Applied Lambda Architecture ○ https://en.wikipedia.org/wiki/Lambda_architecture

● In 2012, we used Apache Storm http://storm.apache.org (version 0.7)

● but we want to improve it and made it as full-stack framework

● In 2013, I started RFX with “Reactive philosophy in Mind” for common Big Data problems

● Since 2014 to now, RFX as main tool for our daily real-time big data tasks at FPT

● Core engineers:○ [email protected][email protected]

What is RFX ?

● RFX is “Reactive Function X”● “Function X” is a feature in specific product● “Reactive” means every function can be “feel” and “react” to

optimize UX for user in specific context.

● The framework, is built from open source projects:○ Computing Unit with Akka Actor ( http://akka.io )○ Network Communication with Netty ( http://netty.io )○ Data Processing with Apache { Kafka, Hadoop , Spark }○ Redis ( http://redis.io )○ Front-end with MEAN stack (MongoDB, ExpressJS, AngularJS , NodeJS)

Projects and Products using RFX

1. http://vnexpress.net a. counting article pageviewb. recommendation engine

2. https://eclick.vn a. click analyticsb. impression analytics

3. http://itvad.vn a. Video PlayView Analyticsb. User Behaviour Analyticsc. Heatmap Analyticsd. Device Analyticse. Revenue Ad Optimization

4. …

Projects and Products using RFX

Projects and Products using RFX

● Divide code into Micro-Services: ○ Analytical layer ( rfx-stream ) ○ Business logic layer ( rfx-query )○ Machine Learning layer (Apache Spark)○ Database layer (Redis, Mongo, Hadoop)○ Front-end layer (MEAN stack)

● Focus on best practices and reusability ● Foundation for scalability (system and business)● Test-driven development for Real-Time Analytics● Continuous integration & improvement

Why is RFX ?

Why is RFX ?

Why is RFX ?

Reactive Function (X) Philosophy

Core elements of rfx-stream

Why is RFX ?

Core backend modules

rfx-track: ● collecting all events from JavaScript deliveryrfx-stream: ● processing stream data (PipelineProcessing pattern)● processing real-time analytics ● processing business logic (by reactive function)rfx-cronjob: ● synchronizing real-time data to report database (copy

data from Redis to MongoDB)

Core frontend modules

rfx-report: ● visualizing data in real-time● monitoring real-time eventrfx-agent: ● tracking user activity: heatmap data, ...● logging user activity to rfx-track (via network

protocol: HTTP, TCP or UDP)

What problems could be solved with RFX

1. Processing Logs: a. Pageviewb. Ad Impressionc. Click analyticsd. Heatmap User Data

2. real-time user segmentation3. react to user behaviour4. auto UX optimization

Vision for RFX

Vision for RFXto be Fast Data Intelligence Platform

Quick demo for playview analytics

deployed at http://itvad.vn

Quick demo for device analytics

● Ad Click Prediction: http://research.google.com/pubs/pub41159.html ● Software Engineering for Machine Learning https://sites.google.

com/site/software4ml/accepted-papers ● Fault-tolerant and Scalable Joining of Continuous Data Streams http:

//research.google.com/pubs/pub41318.html ● Dynamic Ad Layout Revenue Optimization for Display Advertising http:

//wan.poly.edu/KDD2012/forms/workshop/ADKDD12/doc/a2.pdfBehavioral analytics http://en.wikipedia.org/wiki/Behavioral_analytics

● Real-time User Segmentation http://www.slideshare.net/Hadoop_Summit/doctor-nguyen-june27425pmroom230av2

● Implementing a real-time data pipeline https://chimpler.wordpress.com/2014/07/01/implementing-a-real-time-data-pipeline-with-spark-streaming/

● Distributed Event Processing Rule Engine http://eugenedvorkin.com/distributed-event-processing-rule-engine-with-storm-spring-and-groovy/

Research links