Upload
yahoo
View
283
Download
2
Embed Size (px)
Citation preview
Satoshi Konno
Cassandra: Now and the Future
@ Yahoo! JAPAN
10 October 2017
whoami
2
Satoshi Konnohttp://www.cybergarage.org
Engineering Manager of NoSQL and NewSQL Teams @ Yahoo! Japan
Open Source Software Developer forVirtual Reality, IoT and Cloud Computing
Doctor's Course Student @ JAISTDéfago Lab: The φ accrual failure detector
Agenda
3
• What is the NoSQL Team?
• Why did we choose Cassandra?
• What is NewSQL?
• NewSQL with Cassandra
Copyrig ht © 2017 Yahoo Japan Corporation. All Rig hts Reserved .
NoSQL Team
What is Yahoo! JAPAN?
5
Many Strong Services
Media
US
Search Video Answer Mail
JP
US
JP
Membership C2C Payment C2C EC B2C EC Local
Search Knowledge search MailNews
YAHUOKU!Premium Wallet Loco
What is the NoSQL Team?
6
300+
Systems
100+Services
NoSQL Team
Cassandra @ Yahoo! JAPAN
7
2010 2012 2014 2016 2018
Service
Departments
NoSQL
Team
0.5 0.8 1.x
0.8 1.x 2.x 3.xNoSQL
Team
Cassandra @ Yahoo! JAPAN
8
50Clusters
50TBUsages
2000+Nodes
500,000
Read/sec
500,000
Write/sec
2017
10Nodes /
Cluster
200Nodes /
Cluster
…1
Shared
Cluster
50Special
Clusters
50Systems
50Systems
3DCs
Copyrig ht © 2017 Yahoo Japan Corporation. All Rig hts Reserved .
NoSQL Team with
Cassandra
Key Value Store
Team
Search Engine
Team
Before Cassandra
10
Services
Search Engine
Key Value Store
• Problems: Inappropriate usage by internal platforms and new demands for more big data.
Don’t store
any big key
data!!
Don’t use it
like a key
value store!!
2012
We store data of
our services on
your platforms.
We want to store
more big data of
our new services
easily.
Key Value Store
Team
Search Engine
Team
NoSQL Team
11
• Launched NoSQL Team in 2012
We should build
new centralized
platform for more
big data!!
2012
NoSQL
Team
Service
Departments
Join
Join
Join
However, many open source NoSQL
databases have been released already,
so we have to evaluate these.
New
NoSQL Team
12
• NoSQL team selected Cassandra as our first centralized NoSQL database.
Services
2012
• High Availability
• Performance
• Persistence
• Scalability
• …..
• Maintainability
• Appropriate
Open Source
License
• …..
Function Point Analysis
No1
NoSQL
Team
NoSQL 2012 2014 2016 2018
State of NoSQL Databases
13
0.x 1.x 2.x 3.x 4.x
2.x 3.x 4.x
0.x 1.x 2.x
2.x 3.x
1.x 2.x
1.x 2.x
0.9.7.x
Copyrig ht © 2017 Yahoo Japan Corporation. All Rig hts Reserved .
NewSQL
NewSQL Trends
• NewSQL= NoSQL (Scalability) + RDBMS (SQL, ACID)= Scalable RDBMS like NoSQL
15
NoSQL
Team
Services
Requests for NewSQL
16
NoSQL
Team
We have big on-
premises data centers
and, we can’t use the
NewSQL platforms in
our private cloud.
Public Cloud OSS
We want to make use
of our knowledge
experience with
Cassandra.
Private Cloud Knowledge
Experience
17
NewSQL with Cassandra
FunctionGoogle Amazon
MariaDB CockroachdbSpanner Aurora
Logging
Query
Engine
Transaction
Schema
Store
Storage
NoSQL
Team
Could we use Cassandra
for storage layer of
NewSQL databases?
Copyrig ht © 2017 Yahoo Japan Corporation. All Rig hts Reserved .
Trial for
NewSQL with
Cassandra
(OSS SQL Engines with Cassandra)
Trial Concept
19
• OSS SQL Engine + Distributed Storage= PostgreSQL + Cassandra or
SQLite + Cassandra
Function Traial
Logging
Query Engine
Transaction
Schema
Store
Storage
NoSQL
Team
Could we replace storage
layer of SQL databases
with Cassandra?
Study Implementation
20
NoSQL
Team
PostgreSQL’s storage is
abstracted as the storage
manager, but …..
Storage Manager
SQLite’s storage is
abstracted as the virtual file
system too, but …..
Virtual File System
NoSQL
Team
To implement the abstract
functions directly is hard to
debug ….
POSIX Emulation
21
#define open(path, flags, mode) posix_vfs_cassandra_open(path, flags, mode)
#define close(fd) posix_vfs_cassandra_close(fd)
#define read(fd, buf, nbytes) posix_vfs_cassandra_read(fd, buf, nbytes)
#define write(fd, buf, nbytes) posix_vfs_cassandra_write(fd, buf, nbytes)
#define access(path, mode) posix_vfs_cassandra_access(path, mode)
#define unlink(path) posix_vfs_cassandra_unlink(path)
#define fstat(fd, buf) posix_vfs_cassandra_fstat(fd, buf)
#define fsync(fd) posix_vfs_cassandra_fsync(fd)
#define lseek(fd, offset, whence) posix_vfs_cassandra_lseek(fd, offset, whence)
NoSQL
Team
Develop compliant library with Cassandra for POSX file I/O functions,
and replace the POSIX functions with Cassandra compliant functions
The storage layers of PostgreSQL and SQLite are
implemented using POSIX file I/O functions.
Storage Manager Virtual File System
POSIX file I/O
functions
POSIX file I/O
functions
POSIX file I/O
functionsCompliant file I/O
functions
NoSQL
Team
This implementation method
easy to write the unit test,
and it is easy to debug too.
Cassandra
File Management
22
CREATE TABLE IF NOT EXISTS posix.storage (
path varchar,
block_no bigint,
block blob,
PRIMARY KEY (path, block_no));
SQL Engines
Storage Manager
Virtual File System
• File A
• File B
• .....
• .....
• .....
• File N
Block 0
File
Block 1 Block 2 ….. ….. ….. ….. …..
File
Benchmark
23
0
5
10
15
20
25
30
35
INSERT SELECT UPDATE
SQLite (Disk) SQLite+C* (1KB) SQLite+C* (4KB) SQLite+C* (8KB)
• Naive Implementation
X : Multi-threads
X : Async Requests
• Don’t care
X : Only Storage Layer
X : Access Coflict
(v3.20.1 + speedtest.tcl)
This is very a naive
and rough
implementation of a
distributed database
now, but ….
Copyrig ht © 2017 Yahoo Japan Corporation. All Rig hts Reserved .
References