43
Cassandra @

Cassandra @ Yahoo Japan | Cassandra Summit 2016

  • Upload
    yahoo

  • View
    660

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Cassandra @

Page 2: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Satoshi Konnohttp://www.cybergarage.org

• Engineering Manager of NoSQL Team @ Yahoo! Japan

• Open Source Software Developer for Virtual Reality, IoT and Cloud Computing

• Doctor's Course Student @ JAISTDéfago Lab : The φ accrual failure detector

About me

2

Page 3: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Agenda

• Company Profile

• Summary of C* Clusters

• Issues and Solutions of C*

• Next Generation Infrastructures for C*

Page 4: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Company Profile

4

Page 5: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Founded : January 31, 1996

Businesses : Internet Advertising

e-Commerce

Members Services, etc.

Web Services : 100+

Smartphone Apps: 50+ (iOS), 50+ (Android)

Employees : 5,800+ (as of June 30, 2016)

Head Office : Chiyoda-ku, Tokyo, Japan

Company Profile

5

Page 6: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Shareholder Composition

6

An independent and public company in the Japanese Market

U.S. Japan

35.5 % 42.9 %

Market Cap

$22 billion

Market Cap

$29 billion

Market Cap

$60 billion

Page 7: Cassandra @ Yahoo Japan | Cassandra Summit 2016

18th Largest Internet Company in market cap

7

0

100

200

300

400

500

600

bilion U.S. dollars

http://www.statista.com/statistics/277483/market-value-of-the-largest-internet-companies-worldwide/

Page 8: Cassandra @ Yahoo Japan | Cassandra Summit 2016

19 years

1617

18

Revenue ¥652B, Operating Income ¥171B (FY2015)

Continued Growth Sustained

Page 9: Cassandra @ Yahoo Japan | Cassandra Summit 2016

60%Consumer

32%

%

Others

8 %Marketing Solutions

Revenue Portfolio

(FY2015)

Page 10: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Extensive Reach to a Wide Range of Users

10

80 %

80% of all Japanese Internet users use Yahoo! JAPAN

Nielsen NetView June 2015 : Data by Brands. Access from home and work using PCs (excl. internet applications)

Page 11: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Many Strong Services

11

Media

US

Search Video Answer Mail

JP

US

JP

Membership C2C Payment C2C EC B2C EC Local

Search Knowledge search MailNews

YAHUOKU!Premium Wallet Loco

Page 12: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Summary of C* Clusters

12

Page 13: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Yahoo! JAPAN Database Platforms

13

300+

Systems

NoSQL

Team

100+Services

Page 14: Cassandra @ Yahoo Japan | Cassandra Summit 2016

OSS Database Platforms

14

300+

Systems

180Systems

MySQL630DBs

100Systems

Cassandra130DBs

30

70

60

40

Yahoo Japan

NoSQL

Team

RDB

Team

Page 15: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Cassandra @ Yahoo! JAPAN

15

2010 2012 2014 2016

ServiceDepartments

OurTeam

0.5 0.8 1.x

0.8 1.x 2.x 3.x

NoSQL

Team

Page 16: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Our Cassandra Clusters

16

30Clusters

30TBUsages

1000+Nodes

300,000

Read/sec

100,000

Write/sec

2016

10Nodes /

Cluster

160Nodes /

Cluster

…1

Shared

Cluster

30Special

Clusters

30Systems

50Systems

3DCs

Page 17: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Our Use Case Summary on Cassandra

17

100

Systems

20

Database Caching

10

Advertising Services

40

User Databases

50

Service Databases

Browsing History

Impression Data

・・・・

Meta Data

Aggregated Data

・・・・

Generated Data

Session Data

Meta Data

Aggregated Data

・・・・

Generated Data

Recommendation

Demographic Data

Life Log

・・・・

Preference Data

Behavior History

Page 18: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Our Issues and Solutions

18

Page 19: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #1 : C10k Problem – C* Proxy

19

PC + Tablet

3.36B PV

Smart Device

3.45B PV

6.8 Billion PV / month

Page 20: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #1 : C10k Problem – C* Proxy

20

Yahoo Japan Services

..........

10 〜 200 Front-end Servers / Service

PHOTO:AFLO

Page 21: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #1 : C10k Problem – C* Proxy

• PROBLEM : 200 front-end servers * 128 processes

* 2 (C* request + C* heart beat)

=51,200 connections / node

21PHOTO:AFLO

200 Front-end Servers

128 processes

51,200 connections !

Page 22: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #1 : C10k Problem – C* Proxy

• PROBLEM : 200 front-end servers * 128 processes

* 2 (C* request + C* heart beat)

=51,200 connections / node

22PHOTO:AFLO

Page 23: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #1 : C10k Problem – C* Proxy

• PROBLEM : 200 front-end servers * 128 processes

* 2 (C* request + C* heart beat)

=51,200 connections / node

23

Process down

PHOTO:AFLO

Page 24: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #1 : C10k Problem – C* Proxy

• SOLUTION : 200 front-end servers * 128 processes

* 1 proxy * 2 (C* request + C* heart beat)

=400 connections / node

24

200 front-end servers

1 proxy

400 connections !

128 processes

PHOTO:AFLO

Page 25: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #2 : Boostrap Problem - Driver

• Heavy Services : ↑3000qps/node= C* cluster with real servers (SSD is recommended)

• Light Services : ↓1000qps/node and ↓3GB/node= C * cluster with virtual servers on OpenStack

25

Heavy Service Light Service

CPU = GoodvCPU = Cheap

Page 26: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #2 : Boostrap Problem - Driver

• PROBLEM : All processes in each front-end server tries

to connect a new C* node which is added into the cluster

at the same time ...

26

..........

! ! !

! ! !

vCPU = Cheap

PHOTO:AFLO

Page 27: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #2 : Boostrap Problem - Driver

• PROBLEM : The authentication of C* based on BCrypt is

heavy processing for the vCPU nodes.

27

..........

!

vCPU : Authentication (BCrypt) is heavy !

! !

! ! !

PHOTO:AFLO

Page 28: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #2 : Boostrap Problem - Driver

• PROBLEM : Most processes can not connect to C*

clusters on OpenStack due to the authentication

processing, and the processes will timeout and repeat to

connect without waiting endlessly …

28

All vCPU Usages = 100% !

PHOTO:AFLO

vCPU : Authentication (BCrypt) is heavy !

Timeout ! Retry !

Page 29: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #2 : Boostrap Problem - Driver

• SOLUTION : Improving the C* drivers not to connect

simultaneously when the connection is failed.

29

..........

!! !

! ! !

PHOTO:AFLO

Page 30: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #3 : Multi-tenancy – Slow Query

• Small Services : (↓500qps and ↓10GB) / keyspace

= Shared C* cluster with real servers

30

Shared

Cluster

50Services

Page 31: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #3 : Multi-tenancy – Slow Query

• PROBLEM : Couldn’t find the causal service of the high

loading queries in the multi-tenancy cluster.

31

Shared

Cluster Which

services ?

QUERY

QUERY

PHOTO:AFLO

Page 32: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #3 : Multi-tenancy – Slow Query

• SOLUTION : CASSANDRA-12403 - Slow query

detecting

32

Shared

Cluster

Service Remove

Special

Cluster

QUERY

PHOTO:AFLO

Slow Query !

Page 33: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #4 : Multi-racking – Inbound Params

• PROBLEM : Our C* clusters are build with other services

in a same rack or under a same core switch.

33PHOTO:AFLO

Page 34: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #4 : Multi-racking – Inbound Params

• PROBLEM : C* Streaming occurs when the node is

added or remove by the our operation or the failure

detection.

34

Streaming

PHOTO:AFLO

Page 35: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #4 : Multi-racking – Inbound Params

• PROBLEM : The streaming of C* rises a heavy traffic,

and it troubles the other services.

35

Streaming

Streaming

Streaming

Stop C*

streaming !

PHOTO:AFLO

stream_throughput_outbound

stream_throughput_outbound

stream_throughput_outbound

Page 36: Cassandra @ Yahoo Japan | Cassandra Summit 2016

ISSUE #4 : Multi-racking – Inbound Params

• SOLUTION : CASSANDRA-11303 - New inbound

throughput parameters for streaming

36

Streaming

Streaming

Streaming

PHOTO:AFLO

stream_throughput_outbound

stream_throughput_outbound

stream_throughput_outbound

stream_throughput_inbound

stream_throughput_inbound

stream_throughput_inbound

Page 37: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Next Generation Infrastructures

for C*

37

Page 38: Cassandra @ Yahoo Japan | Cassandra Summit 2016

• PURPOSE : To abstract our data center resources using

OpenStack.

Apps

Platforms

Infrastructures

APIAPI

API API API API

OpenStack @ Yahoo! JAPAN

38

50,000+

instances

Page 39: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Trial #1 : Special Hypervisor for C*

• PROBLEM : Our hypervisors of OpenStack has C* and

other service VMs.

39

Noisy

Neighbours

Page 40: Cassandra @ Yahoo Japan | Cassandra Summit 2016

Trial #1 : Special Hypervisor for C*

• SOLUTION : Trying to offer the special hypervisors

which runs only C* VMs.

40

vCPU : 8+, Mem : 16GiB+

SSD : 100GiB+

Optimal

Flavors for C*

10Gbps x 2

Page 41: Cassandra @ Yahoo Japan | Cassandra Summit 2016

TRIAL#2 : Bare Metal Clusters for C*

• PROBLEM : vCPU of OpenStack is cheap to run a C*

node in our special service environment such as the

many connections.

41

vCPU : Authentication (BCrypt) is heavy !

Page 42: Cassandra @ Yahoo Japan | Cassandra Summit 2016

TRIAL #2 : Bare Metal Clusters for C*

• SOLUTION : Trying to offer the special bare metal

clusters which runs only C* using OpenStack Ironic.

42

Ironic

Xeon D-1541 2.1GHz (1CPU)

32GBMEM / SATA SSD 400GB

10Gbps x 2

Page 43: Cassandra @ Yahoo Japan | Cassandra Summit 2016