73
Προχωρημένα Θέματα Βάσεων Δεδομένων Google File System, HDFS, BigTable και HBase

Προχωρημένα Θέματα Βάσεων Δεδομένων

  • Upload
    erma

  • View
    51

  • Download
    4

Embed Size (px)

DESCRIPTION

Προχωρημένα Θέματα Βάσεων Δεδομένων. Google File System, HDFS, BigTable και HBase. Περιεχόμενα. Εισαγωγή GFS HDFS BigTable. Γενικά. Λειτουργίες ενός κατανεμημένου συστήματος: Ονομασία Διαμοιρασμός Προσωρινή αποταμίευση Αντίγραφα αρχείων Επιπλέον πρέπει να παρέχει στους χρήστες του: - PowerPoint PPT Presentation

Citation preview

Slide 1

Google File System, HDFS, BigTable HBaseGFSHDFSBigTable : :

Cloud . . . .

GFS: Google File System . , , . . datasets GB2 reads: reads random reads (append). append. data analytics . bandwidth latency. create/deleteopen/closeread/writeSnapshotrecord append master. chunkserver. clients. chunks . caching.ClientChunkserverChunkserverChunkserverMaster chunk64 client master. . master.Chunks hotspots.Master master : namespace. chunks. chunks. master. (, garbage collection, ) master chunks heartbeat GFS operation log.Checkpoints. :ConsistentDefinedInconsistentLease, mutation Mutation . lease primary mutation . . chunkserver pipelining.

Current lease holder?identity of primarylocation of replicas(cached by client)3a. data3b. data3c. dataWrite requestPrimary assign s/n to mutationsApplies itForward write requestOperation completedOperation completedOperation completedor Error reportLease, mutation ()ClientChunkserverChunkserverChunkserverMasterLease, mutation ()ClientChunkserverChunkserverChunkserverMaster namespace locking pathname .Read lock.Write lock. directory. . bandwidth. (replicas). . chunkserver. . master .Garbage collection . .Stale replica detectionChunk version number. . garbage collection . . master. chunks. checksums. . corrupted chunks.Hadoop HDFSHadoop Distributed File System (HDFS) Hadoop. HDFS blocks . open source GFS Hadoop HDFS .10.000 .100.000.000 .10PB . Hardware. . . Batch processing . HDFS cluster Write-once-read-many append blocks block 128 MB. block (DataNodes). . buffer. (Clients). Clients blocks Client DataNodes

HDFS SecondaryNameNodeClient

NameNodeDataNodes1. filename

2. BlockId, DataNodes3.Read dataCluster MembershipCluster MembershipNameNode : Maps a file to a file-id and list of MapNodesDataNode : Maps a block-id to a physical location on diskSecondaryNameNode: Periodic merge of Transaction logNameNode - DataNodeNameNodeMetadata RAM paging Blocks DataNodes block , , , DataNode Block (.. ext3) block (.. CRC) Clients. Block blocks NameNode Pipelining

HDFS

Write Data Pipelining Client DataNodes block Client block DataNode DataNode DataNode Pipeline Client block

Nameode Blocks

. rack rack Clients ReplicationDatanodes1221425534354Rack 1Rack 213 Checksums CRC32 Client checksum 512 bytes DataNodes checksums Client checksum DataNode Client NameNodeA single point of failure (HA solution)

Rebalancer datanode .: DataNodes DataNodes . Cluster Rebalancer. Rebalancer . Command line. HDFSTransactional data? (e.g. concurrent reads and writes to the same data) HDFS file .Structured data? (e.g. record oriented views, columns) metadata Relational data? (e.g. indexes) . HDFS HBase (BigTable)...BigTable Bigtable - (scalability) GoogleAnalytics, Google Earth, web indexing, OSDI0637 Batch processing MapReduce 38 , , : bytes(row,column,time)Value39 (rows) .: row key indexed BigTable full table scan40 (columns) column families. column families application column families ( ~100) columns family:qualifier column families41Tablets tabletsTablet: start end. SSTables

SSTable:

Tablet: (timestamps) 43

rowkey: URL ?Column familiesContents: column id. value html contents ( )Anchor: column id url link. Value link.: cnnsi.com?API 1/2 :Put(row_key, column_key,timestamp,value): .Get(row_key) : Get(row_key, column_key, timestamp): Scan(start_row_key, end_row_key): start_key end_key45API 2/2 joins!!! ( MapReduce ) get(column_key) : row_keyNo multi-row transactionsAtomic single-row writesOptional atomic single-row reads server-side script (sawzal)46 : master server tablet servers :Google filesystemSSTableChubby47libraryGFSSSTableTablet serverlibraryGFSSSTableTablet serverClientMasterChubbyMaster tablet tablet server tablet servers GFS schema Master GFS.49Tablet server tablets tablets ( compaction) tablet 100-200 MB / 50SSTableFormat Google MB ( 128MB) RAM blocks KB ( 64) block index SSTable , index block block 2 disk accesses.

51ChubbyQuorum serversPaxos Algorithm ( ) : lock (atomic transactions) master (.. schema) tablet servers 52 tablets chubby root tablet root tablet tablets METADATA (1st METADATA) METADATA tablets 53 tabletsChubbyA rowROOTMETADATATablea row 128MB tablet ROOT , 234 128MB tablets 261 bytes clients cache location. ( cache), 6 network msgs location. tablets tablet tablet server master server Chubby tablet servers MasterLock Chubby, ls dir, live server tablets assigned, METADATA unassigned servers.55 Master

tablets GFS tablet commit log memtable RAM tablet memtable57

58 commit log memtable 59 SSTableSSTableSSTableCommit Logmemtablewrite60 (compaction) memtable :Minor compaction SSTable GFSMajor compaction: SSTables minor compactions61SSTableSSTableSSTableCommit LogmemtablewriteSSTableSSTableMinor CompactionMajor Compaction62 SSTable memtable63 SSTableSSTableSSTableCommit Logmemtableread64Locality groups column families SSTable 65 SSTable block ( ) 10/1 zip 3/1 .

66Caching Scan Cache key-value( ) Block Cache blocks SSTable( )67Bloom filters SSTable row sstables. false positives: sstable. false negatives: , . locality group68 Bloom Filter

Bloom Filter

commit log tablet server ( tablet) server , Tablets re-assign servers log tablets ( REDO/UNDO) key commit log 64 chunks tablet : disk seek seq. read recovery tablet71HBase Bigtable Apache project HadoopHDFS GFS Hadoop MapReduceJavaBigTableHBaseMasterTabletServerSSTableTabletChubbyGFSHMasterRegion ServerHfile ()TableRegionZookeeperHDFSHBase server running instance