23
Hadoop Scalability at Facebook Dmytro Molkov ([email protected] ) YaC, Moscow, September 19, 2011

Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

  • Upload
    yaevents

  • View
    11.277

  • Download
    5

Embed Size (px)

DESCRIPTION

Дмитрий Мольков, FacebookБакалавр прикладной математики Киевского национального университета им. Тараса Шевченко (2007). Магистр компьютерных наук Stony Brook University (2009). Hadoop HDFS Commiter с 2011 года. Член команды Hadoop в Facebook с 2009 года.Тема докладаМасштабируемость Hadoop в Facebook.ТезисыHadoop и Hive являются прекрасным инструментарием для хранения и анализа петабайтов информации в Facebook. Работая с такими объемами информации, команда разработчиков Hadoop в Facebook ежедневно сталкивается с проблемами масштабируемости и эффективности Hadoop. В докладе пойдет речь о некоторых деталях оптимизаций в разных частях Hadoop инфраструктуры в Facebook, которые позволяют предоставлять высококачественный сервис. Это может быть, например, оптимизация стоимости хранения в многопетабайтных HDFS кластерах, увеличение пропускной способности системы, сокращение времени отказа системы с помощью High Availability разработок для HDFS.

Citation preview

Page 1: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Hadoop Scalability at Facebook

Dmytro Molkov ([email protected])YaC, Moscow, September 19, 2011

Page 2: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

How Facebook uses HadoopHadoop ScalabilityHadoop High AvailabilityHDFS Raid

Page 3: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

How Facebook uses Hadoop

Page 4: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Usages of Hadoop at Facebook▪ Warehouse

▪ Thousands of machines in the cluster▪ Tens of petabytes of data▪ Tens of thousands of jobs/queries a day▪ Over a hundred million files

▪ Scribe-HDFS▪ Dozens of small clusters▪ Append support▪ High availability▪ High throughput

Page 5: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Usages of Hadoop at Facebook (contd.)▪ Realtime Analytics

▪ Medium sized hbase clusters▪ High throughput/low latency

▪ FB Messages Storage

▪ Medium sized hbase clusters▪ Low latency▪ High data durability▪ High Availability

▪ Misc Storage/Backup clusters

▪ Small to medium sized▪ Various availability/performance requirements

Page 6: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Hadoop Scalability

Page 7: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Hadoop Scalability▪ Warehouse Cluster - A “Single Cluster” approach

▪ Good data locality▪ Ease of data access▪ Operational Simplicity

▪ NameNode is the bottleneck▪ Memory pressure - too many files and blocks▪ CPU pressure - too many metadata operations against a single node

▪ Long Startup Time▪ JobTracker is the bottleneck

▪ Memory Pressure - too many jobs/tasks/counters in memory▪ CPU pressure - scheduling computation is expensive

Page 8: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

HDFS Federation Wishlist

▪ Single Cluster

▪ Preserve Data Locality

▪ Keep Operations Simple

▪ Distribute both CPU and Memory Load

Page 9: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Hadoop Federation Design

NameNode #1

NameNode #N

DataNode

Data Node

...DataNode

Page 10: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

HDFS Federation Overview▪ Each NameNode holds a part of the NameSpace

▪ Hive tables are distributed between namenodes

▪ Hive Metastore stores full locations of the tables (including the namenode) -> Hive clients know which cluster the data is stored in

▪ HDFS Clients have a mount table to know where the data is

▪ Each namespace uses all datanodes for storage -> the cluster load is fully balanced (Storage and I/O)

▪ Single Datanode process per node ensures good utilization of resources

Page 11: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Map-Reduce Federation▪ Backward Compatibility with the old code

▪ Preserve data locality

▪ Make scheduling faster

▪ Ease the resource pressure on the JobTracker

Page 12: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Map Reduce Federation

Cluster Resourc

e Manager

Job Clien

t

ResourceRequest

TaskTrack

er

TaskTrack

erResourceHeartbeats

JobCommunication

...

Page 13: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

MapReduce Federation Overview▪ Cluster Manager only allocates resources

▪ JobTracker per user -> few tasks per JobTracker -> more responsive scheduling

▪ ClusterManager is stateless -> shorter restart times -> better availability

Page 14: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Hadoop High Availability

Page 15: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Warehouse High Availability▪ Full cluster restart takes 90-120 mins

▪ Software upgrade is 20-30 hrs of downtime/year

▪ Cluster crash is 5 hrs of downtime/year

▪ MapReduce tolerates failures

Page 16: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

HDFS High Availability Design

Primary

NN

StandbyNN

NFS

DataNodes

Edits Log Edits Log

Block Reports/Block Received

Block Reports/Block Received

Page 17: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

Clients Design▪ Using ZooKeeper as a method of name resolution

▪ Under normal conditions ZooKeeper contains a location of the primary node

▪ During the failover ZooKeeper record is empty and the clients know to wait for the failover to complete

▪ On a network failure clients check if the ZooKeeper entry has changed and retry the command agains the new Primary NameNode if the failover has occurred

▪ For the large clusters Clients also cache the location of the primary on the local node to ease the load on the zookeeper cluster

Page 18: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

HDFS Raid

Page 19: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

HDFS Raid

▪ 3 way replication▪ Data locality - necessary only for the new data▪ Data availability - necessary for all kinds of data

▪ Erasure codes▪ Data locality is worse than 3 way replication▪ Data availability is at least as good as 3 way replication

Page 20: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

HDFS Raid Detais

10 blocks replicated 3 times = 30 physical blocksEffective replication factor 3.0

10 blocks replicated twice + checksum (XOR) block replicated twice = 22 physical blocks.Effective replication factor 2.2

XOR

Reed Solomon Encoding

10 blocks replicated 3 times = 30 physical blocksEffective replication factor 3.0

10 blocks with replication factor 1 + erasure codes (RS) replicated once = 14 physical blocks.Effective replication factor 1.4

Page 21: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

HDFS Raid Pros and Cons▪ Saves a lot of space

▪ Provides same guarantees for data availability

▪ Worse data locality

▪ Need to reconstruct blocks instead of replicating (CPU + Network cost)

▪ Block location in the cluster is important and needs to be maintained

Page 22: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

facebook.com/[email protected]

Page 23: Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

(c) 2007 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0