Upload
vpon
View
686
Download
3
Embed Size (px)
DESCRIPTION
Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design
Citation preview
Rethinking Scalable Real-time Cloud Service That Serves 100K QPS with Scala, Akka, and Actor Based Design
Austin Huang
High Concurrency / Low Latency / Scalable 廣告交易系統設計與演進Part. 1
Vpon Inc.Austin Huang
2013/8/3
Topic shared during last COSCUP
3
Outline
Vpon? Who? New Challenges New Architecture Lesson Learned Q&A
Vpon Inc.
創立於 2008 年 主要營運項目 : Mobile AD 現為台灣本土最大的 Mobile AD Network 研發中心 : 台北 營運據點 : 台北 / 上海 / 香港 / 東京 2014/7 完成 B 輪千萬美金投資 夥伴數 : 70+ (% of RD>50%, We are
hiring!)
Typical use case
Clicks
Conversions
The media
Landing pagesADs
即時廣告交易系統 什麼是即時廣告交易系統 ?
-「廣告費」與「廣告版位」的即時撮合交易
6
Media( 手機 App)
廣告主即時廣告交易系統
手機每一次發出廣告請求,都需要依據其「時間 / 地點 / 使用的 App / 電信公司 / 使用者偏好等」,即時找到最適合的廣告做投遞
「我要的很簡單,只要以最快的速度、最省錢的方法,把廣告投遞給最多最正確的人就好」 by CEO
所以需求是 ?
7
8
Problems solved last year
Low Latency- < 50ms for EVERY request
High Concurrency- 8k+ QPS, increasing
Linear Scalable- High Growth Rate: YoY: 350%
Targeting- Big Data / Data Mining …
Low (or NO) Budget- What The F …
廣告費欄位需即時正確,否則就會蒙受損失PS: 依照去年情況,系統費用欄位 delay 一秒鐘,大約損失 NT$300
Architecture shared last year
Application
Tomcat
haproxy
…Application
TomcatApplicatio
n
Tomcat
haproxy
Node A
Infinispan Dist. Cluster
Node N Node A
Infinispan Repl. Cluster
Node N
Application Server: 15 nodes Infinispan: 5 nodes, stores 100M K/V Capacity: 8k+ QPS
New Challenges
Global business- Buy and Sell in different countries- Multiple DC / Hybrid cloud infrastructure
More business coverage- Self-operating Ad network- + Turnkey solution
Much more capacity requirement- Data from all over the world- 100k+ QPS, latency <50ms
Data Analysis
DSP / AdN Platform
Bidding Engine
Pricing Engine
HBase
MapReduce/Spark
Message Routing / Streaming Processing
Ad Request
Hadoop Distributed File
System (HDFS)
User Profiles
Ad Requests
HTTP POST
Avro Avro Avro
Ad videos, images
HTTP Get
Data Processing and Archiving
Creative and
videos
AD management
Report UI
(Django, RoR, SSH)
Vpon AD services backend functions
CDN
Recommender System
Other undergoing
topics
Reporting system
Sales Support System
AD-hoc reporting
Operation
Ganglia
Solr
AD Operation
ADMonitoring System
Scenario modeling
Avro
Web
Proxy+
Cache
User Profiles(Data Store)
Rsync, Avro Avro
Python + pig, hive, Hadoop Streaming, Spark
Python + pig, hive, Hadoop Streaming, Spark
Advertisers
HA Proxy
New Architecture
Asynchronous in design Move computing to data Cache in actors in every node
- Reduce data accessing- No more cache consistency problem- Good for trouble shooting- Less maintenance cost
Remove hotspot by distribute tasks to actors Flexible resource management
- Shutdown server instance in off-peak
Move Computing to Data
Move data takes tremendous time- Remember we have less than 30ms
Access AD decision data takes time Heavy loading on DataStore / Cache
App
Tomcat
haproxy
…App
Tomcat
App
Tomcat
haproxy
Node A
Infinispan Dist. Cluster
Node N Node A
Infinispan Repl. Cluster
Node N
User Profile = 2kBQPS = 100kAt least 1.56Gb/s
Why Scala and Akka
Scala- Functional- Great for concurrency- Reuse legacy Java code
Akka- Actor Pattern- Asynchronous and Distributed by design.- Form cluster
New Architecture
New Architecture – cont.
Preliminary Benchmark
Agent Cluster (Not yet optimized) AWS
- Akka Nodeor3.xlarge (8 vCPU, 61G RAM) * 3
- Data Storeoi2.xlarge (8 vCPU, 61G RAM, 1600GB SSD)
* 3 Results
- 15,321 QPS (~10x comparing to old architecture)
- 23ms average process time
Lesson Learned
Maximizing server capacity by asynchronous - Asynchronous is never easy- It takes time to learn the correct practice
Move computing to data- Accessing data is expensive
Reduced hotspot / bottleneck by separating tasks into different actors in different nodes - Dispatch tasks by consistent hash
We need
Scala, Java, Python, RoR, Hadoop, Spark, Docker, Operation experts
20
THANK YOU!QUESTIONS ?