20
Rethinking Scalable Real- time Cloud Service That Serves 100K QPS with Scala, Akka, and Actor Based Design Austin Huang

Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

  • Upload
    vpon

  • View
    686

  • Download
    3

Embed Size (px)

DESCRIPTION

Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Citation preview

Page 1: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Rethinking Scalable Real-time Cloud Service That Serves 100K QPS with Scala, Akka, and Actor Based Design

Austin Huang

Page 2: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

High Concurrency / Low Latency / Scalable 廣告交易系統設計與演進Part. 1

Vpon Inc.Austin Huang

2013/8/3

Topic shared during last COSCUP

Page 3: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

3

Outline

Vpon? Who? New Challenges New Architecture Lesson Learned Q&A

Page 4: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Vpon Inc.

創立於 2008 年 主要營運項目 : Mobile AD 現為台灣本土最大的 Mobile AD Network 研發中心 : 台北 營運據點 : 台北 / 上海 / 香港 / 東京 2014/7 完成 B 輪千萬美金投資 夥伴數 : 70+ (% of RD>50%, We are

hiring!)

Page 5: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Typical use case

Clicks

Conversions

The media

Landing pagesADs

Page 6: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

即時廣告交易系統 什麼是即時廣告交易系統 ?

-「廣告費」與「廣告版位」的即時撮合交易

6

Media( 手機 App)

廣告主即時廣告交易系統

手機每一次發出廣告請求,都需要依據其「時間 / 地點 / 使用的 App / 電信公司 / 使用者偏好等」,即時找到最適合的廣告做投遞

Page 7: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

「我要的很簡單,只要以最快的速度、最省錢的方法,把廣告投遞給最多最正確的人就好」 by CEO

所以需求是 ?

7

Page 8: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

8

Problems solved last year

Low Latency- < 50ms for EVERY request

High Concurrency- 8k+ QPS, increasing

Linear Scalable- High Growth Rate: YoY: 350%

Targeting- Big Data / Data Mining …

Low (or NO) Budget- What The F …

廣告費欄位需即時正確,否則就會蒙受損失PS: 依照去年情況,系統費用欄位 delay 一秒鐘,大約損失 NT$300

Page 9: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Architecture shared last year

Application

Tomcat

haproxy

…Application

TomcatApplicatio

n

Tomcat

haproxy

Node A

Infinispan Dist. Cluster

Node N Node A

Infinispan Repl. Cluster

Node N

Application Server: 15 nodes Infinispan: 5 nodes, stores 100M K/V Capacity: 8k+ QPS

Page 10: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

New Challenges

Global business- Buy and Sell in different countries- Multiple DC / Hybrid cloud infrastructure

More business coverage- Self-operating Ad network- + Turnkey solution

Much more capacity requirement- Data from all over the world- 100k+ QPS, latency <50ms

Page 11: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Data Analysis

DSP / AdN Platform

Bidding Engine

Pricing Engine

HBase

MapReduce/Spark

Message Routing / Streaming Processing

Ad Request

Hadoop Distributed File

System (HDFS)

User Profiles

Ad Requests

HTTP POST

Avro Avro Avro

Ad videos, images

HTTP Get

Data Processing and Archiving

Creative and

videos

AD management

Report UI

(Django, RoR, SSH)

Vpon AD services backend functions

CDN

Recommender System

Other undergoing

topics

Reporting system

Sales Support System

AD-hoc reporting

Operation

Ganglia

Solr

AD Operation

ADMonitoring System

Scenario modeling

Avro

Web

Proxy+

Cache

User Profiles(Data Store)

Rsync, Avro Avro

Python + pig, hive, Hadoop Streaming, Spark

Python + pig, hive, Hadoop Streaming, Spark

Advertisers

HA Proxy

Page 12: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

New Architecture

Asynchronous in design Move computing to data Cache in actors in every node

- Reduce data accessing- No more cache consistency problem- Good for trouble shooting- Less maintenance cost

Remove hotspot by distribute tasks to actors Flexible resource management

- Shutdown server instance in off-peak

Page 13: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Move Computing to Data

Move data takes tremendous time- Remember we have less than 30ms

Access AD decision data takes time Heavy loading on DataStore / Cache

App

Tomcat

haproxy

…App

Tomcat

App

Tomcat

haproxy

Node A

Infinispan Dist. Cluster

Node N Node A

Infinispan Repl. Cluster

Node N

User Profile = 2kBQPS = 100kAt least 1.56Gb/s

Page 14: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Why Scala and Akka

Scala- Functional- Great for concurrency- Reuse legacy Java code

Akka- Actor Pattern- Asynchronous and Distributed by design.- Form cluster

Page 15: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

New Architecture

Page 16: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

New Architecture – cont.

Page 17: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Preliminary Benchmark

Agent Cluster (Not yet optimized) AWS

- Akka Nodeor3.xlarge (8 vCPU, 61G RAM) * 3

- Data Storeoi2.xlarge (8 vCPU, 61G RAM, 1600GB SSD)

* 3 Results

- 15,321 QPS (~10x comparing to old architecture)

- 23ms average process time

Page 18: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

Lesson Learned

Maximizing server capacity by asynchronous - Asynchronous is never easy- It takes time to learn the correct practice

Move computing to data- Accessing data is expensive

Reduced hotspot / bottleneck by separating tasks into different actors in different nodes - Dispatch tasks by consistent hash

Page 19: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

We need

Scala, Java, Python, RoR, Hadoop, Spark, Docker, Operation experts

Page 20: Vpon @ COSCUP 2014, Rethinking scalable real-time cloud service that serves 100k QPS with Scala, Akka, and Actor based design

20

THANK YOU!QUESTIONS ?