41
站在Digg的尸体上 谈谈Cassandra 盛大云计算 王旭

Cassandra Technical and history overview

  • Upload
    xu-wang

  • View
    1.499

  • Download
    4

Embed Size (px)

DESCRIPTION

jie's

Citation preview

Page 1: Cassandra Technical and history overview

站在Digg的尸体上谈谈Cassandra

盛大云计算 王旭

Page 2: Cassandra Technical and history overview

About Me

• 码农 @ 盛大云计算 / 云硬盘 (EBS)

• 前Hadoop Developer

• 技术作者、译者《Cassandra权威指南》(据说像机器翻译的)

• http://wangxu.me/blog/

• @gnawux

Page 3: Cassandra Technical and history overview

大纲

• Cassandra的成长历程

• Cassandra的主要特性

• Cassandra的相关工具

• Cassandra的核心技术

Page 4: Cassandra Technical and history overview

Cassandra是谁

• 希腊神话中的特洛伊公主• 太阳神阿波罗的爱慕与赠予• 悲剧的女先知• ……

Page 5: Cassandra Technical and history overview

我们谈的Cassandra• 无中心的、高性能、可扩展的、分布式非关系型数据库

• 出身名门:

• Facebook开源的NoSQL数据库

• 万千宠爱:

• Twitter, Digg, Rackspace...

• 飞来横祸

• Twitter, Digg, Facebook

• 涅槃重生

• Twitter, Netflix, Rackspace, Reddit

Page 6: Cassandra Technical and history overview

Digg的故事

John Quinn

Page 7: Cassandra Technical and history overview

Digg的未来

http://about.digg.com/blog/looking-future-cassandra/

Page 10: Cassandra Technical and history overview

Twitter in 2010FRIDAY, JULY 9, 2010

Page 11: Cassandra Technical and history overview

Facebook in 2010

Page 12: Cassandra Technical and history overview

为什么Cassandra不靠谱• Repair: 数据修复的可靠性

(CASSANDRA-1316)

• Scale: 插入节点对集群的影响/负载均衡 (CASSANDRA-192)

• Compaction: 对性能的影响

• Memory (Key cache, OOM...)

• Mature....

Page 13: Cassandra Technical and history overview

今日Cassandra

Page 14: Cassandra Technical and history overview

Cassandra @ Twitter

Page 15: Cassandra Technical and history overview

Web Analytics of Twitter

Page 16: Cassandra Technical and history overview

SpiderDuck of Twitter

Page 17: Cassandra Technical and history overview

Rainbird of Twitter

Page 18: Cassandra Technical and history overview

Cassandra @ Netflix

Page 19: Cassandra Technical and history overview

Cassandra @ Netflix

Page 20: Cassandra Technical and history overview

为什么Cassandra很酷

• 无中心架构,Column Family数据模型

• 很快,写操作更快• 性能线性扩展(ref: netflix, ycsb...)

• Counter数据类型

Page 21: Cassandra Technical and history overview

Gartner的成熟度曲线

Gartner的技术成熟度模型:技术萌芽期、过热期、幻觉破灭谷底区、复苏区、生产力成熟期

Page 22: Cassandra Technical and history overview

主要特性

Page 23: Cassandra Technical and history overview

关于NoSQL

• NoSQL Databases and Polyglot Persistence (多模式/混合式的持久化)

• NoSQL and BigData

• NoSQL and Not Only SQL

Page 24: Cassandra Technical and history overview

多种数据模型

• Key-Value

• Graph

• Document (JSON...)

• Column Family, 多维哈希表, 稀疏表

Page 25: Cassandra Technical and history overview

Column Family• 类比RDBMS中的表

• 存储⼀一系列的列• 每个列是⼀一个三元组(名:值:时间戳)

• 不同的行的列不必相同• 列是有序的(有索引的)• 可以取出⼀一列,或进行区间查询• 常用场景:时间线、不同属性……

• Super Column Family, Composite Column

Page 26: Cassandra Technical and history overview

CAP原则

⼀一致性、可用性与分区耐受性,三者只能取其二

Page 27: Cassandra Technical and history overview

Cassandra的可调⼀一致性• 数据存取API可以指定需要的⼀一致性

• CL.ZERO

• CL.ANY

• CL.ONE

• CL.QUORUM

• CL.ALL

• W+R>N 意味着强⼀一致性

Page 28: Cassandra Technical and history overview

操作接口

• Cassandra API

• Thrift API

• Clients: Hector, PYCASSA...

• CQL

Page 30: Cassandra Technical and history overview

主要适用场景• 需要高性能(尤其是写入性能)、随时可用、结构化、海量数据的场景

• 时间线• 消息• 广告跟踪• ……

Page 31: Cassandra Technical and history overview

日常运维• Node repair

• gc_grace_seconds

• 避免多节点同时repair

• 增加节点• 划分token range, 指定seed node

• 均衡数据• ⼀一种策略:成倍增加节点

Page 32: Cassandra Technical and history overview

相关工具

Page 33: Cassandra Technical and history overview

各种客户端

• Java: Hector, Astyanax by Netflix

• Scala: Cassie by Twitter

• Python: Pycassa...

• Ruby: cassandra by Twitter

• ...

Page 34: Cassandra Technical and history overview

OpsCenter by DataStax

Page 35: Cassandra Technical and history overview

Priam by Netflix

• 每个节点上运行的辅助工具,用于:• Backup and recovery (to S3)

• Bootstrapping and automated token assignment.

• Centralized configuration management

• RESTful monitoring and metrics

Page 36: Cassandra Technical and history overview

核心技术

Page 37: Cassandra Technical and history overview

DHT

• Dynamo by Amazon

• 结构化P2P, ⼀一致性哈希

• Gossip

• 读时修复• Anti-Entropy, Merkle Tree

Page 38: Cassandra Technical and history overview

写入路径

• Commit Log

• MemTable

• SSTable

Page 39: Cassandra Technical and history overview

Bloom Filter

• Bloom Filter vs. Hash

• 出发点:磁盘访问是代价高昂的• 假阳性

Page 40: Cassandra Technical and history overview

参考

• 官方主页: http://cassandra.apache.org/

• Datastax: http://www.datastax.com/

• 《Cassandra权威指南》: http://www.ituring.com.cn/book/9

Page 41: Cassandra Technical and history overview

Q & A谢谢