Column-Oriented Storage Techniques for MapReduce

Column-Oriented Storage Techniques for MapReduce

Avrilia Floratou (University of Wisconsin Madison)Jignesh M. Patel (University of Wisconsin Madison)Eugene J. Shekita (While at IBM Almaden Research Center)Sandeep Tata (IBM Almaden Research Center)Presented by: Luyang Zhang && Yuguan LiColumn-Oriented Storage Techniques for MapReduce1MotivationDatabasesMapReduceColumn OrientedStoragePerformanceProgrammabilityFault tolerance22Column-Oriented Storage 3Benefits:Column-Oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data.

Column-Oriented organizations are more efficient when new values of a column are supplied for all rows at once.

Column data is of uniform type, which provides some opportunity for storage size optimization. (e.g. Compression)Questions4How to incorporate columnarstorage into an existing MR system (Hadoop) without changing its core parts?

How can columnar-storage operate efficiently on top of a DFS (HDFS)?

Is it easy to apply well-studied techniques from the database field to the Map-Reduce framework given that:It processes one tuple at a time.It does not use a restricted set of operators.It is used to process complex data types.

Challenges 5In Hadoop, it is often convenient to use complex types like arrays, maps, and nested records to model data. --- which leads to a high deserialization cost and lack of effective column-oriented compression techniques.

Serialization: data structure in memory bytes that can be transmitted Deserialization: bytes data structure in memory (Since hadoop is written in Java, more complex than C++ )Challenges6Compression: Although the column data seems to be more similar and share a high compression ratio, the complex type makes some existed technologies cannot be applied to Hadoop.

Programming API: Some technologies are not feasible for hand-coded mapreduce function.OutlineColumn-Oriented StorageLazy Tuple ConstructionCompressionExperimental EvaluationConclusions

7Column-Oriented Storage in Hadoop8Main Idea: Store each column of the dataset in a separate file

Problems: How can we generate roughly equal sized splits so that a job can be effectively parallelized over the cluster?

How do we make sure that the corresponding values from different columns in the dataset are co-located on the same node running the map task?Column-Oriented Storage in HadoopNameAgeInfoJoe23hobbies: {tennis}friends: {Ann, Nick}David32friends: {George}John45hobbies: {tennis, golf}Smith 65hobbies: {swimming}friends: {Helen}1st node2nd nodeHorizontally Partitioning Into split-directoriesNameAgeInfoJoe23hobbies: {tennis}friends: {Ann, Nick}David32friends: {George}NameAgeInfoJohn45hobbies:{tennis, golf}Smith 65hobbies: {swimming}friends: {Helen}NameJoeDavidAge2332Infohobbies: {tennis}friends:{Ann, Nick}friends: {George}NameJohnSmith Age4565Infohobbies: {tennis, golf}hobbies: {swimming}friends: {Helen}Introduce new InputFormat/OutputFormat :ColumnInputFormat (CIF)ColumnOutputFormat (COF)

9/data/2013-03-26//data/2013-03-26/s1/data/2013-03-26/s2ColumnInputFormat V.S RCFile Format10RCFile Format: Avoid Replication and Co-location problemUsing Pax instead of a true column-oriented format, all columns will be packed in a single row-group as a split.Efficient I/O elimination become difficult. Metadata need additional space overhead.

CIF:Need to tackle Replication and Co-location Efficient I/O eliminationConsider adding a column to a dataset.Replication and Co-locationHDFSReplicationPolicyNode A Node BNode CNode D

NameAgeInfoJoe23hobbies: {tennis}friends: {Ann, Nick}David32friends: {George}John45hobbies: {tennis, golf}Smith 65hobbies: {swimming}friends: {Helen}NameJoeDavidAge2332Infohobbies: {tennis}friends:{Ann, Nick}friends: {George}NameJoeDavidNameJoeDavidAge2332Age2332Infohobbies: {tennis}friends: {Ann,Nick}friends: {George}Infohobbies: {tennis}friends:{Ann, Nick}friends: {George}CPPIntroduce a new column placement policy (CPP)Can be assigned to dfs.block.replicator.classname11

ExampleAgeNameRecordif (age < 35)return name2332453050JoeDavidJohnMaryAnnMap Method23Joe32DavidWhat if age > 35?Can we avoid reading and deserializing the name field?12ColumnInputFormat.setColumns(job,Age,Name)OutlineColumn-Oriented StorageLazy Tuple ConstructionCompressionExperimentsConclusions

13Lazy Tuple Construction

Deserialization of each record field is deferred to the point where it is actually accessed, i.e. when the get() methods are called.

*:Deserialize only those columns that are actually accessed in map function Mapper ( NullWritable key, Record value){String name;int age = value.get(age);if (age < 35) name = value.get(name);}Mapper ( NullWritable key, LazyRecord value){String name;int age = value.get(age);if (age < 35) name = value.get(name);}14LazyRecord implements Record15lastPos =curPosnameWhy do we need these: Without lastPos pointer, each nextRecord call would require all the columns to be deserialized to extract the length information to update their respective curPos pointer.agelastPoscurPosskipSkip List (Logical Behavior)R1R2R10R20R99R100......

R90...R1R1R20R90R100R100...R10Skip 100 RecordsSkip 1016R1R2R10R20R90R99R1R10R20R90R1R100ExampleAgeJoeJaneDavidNameSkip10 = 1002Skip100 = 9017 Skip 10 = 868Mary10 rows10 rows100 rowsSkip BytesAnn23394530if (age < 35)return name17John012102ExampleAgehobbies: tennisfriends : Ann, NickNullfriends : GeorgeInfoSkip10 = 2013Skip100 = 19400 Skip 10 = 1246hobbies: tennis, golf10 rows10 rows100 rows23394530if (age < 35)return hobbies18OutlineColumn-Oriented StorageLazy Record ConstructionCompressionExperimentsConclusions

19Compression# Records in B1# Records in B2LZO/ZLIB compressed blockRID : 0 - 9LZO/ZLIB compressed blockRID : 10 - 35B1B2NullSkip10 = 210Skip100 = 1709Skip 10 = 3040: {tennis, golf}10 rows10 rows100 rowsDictionaryhobbies : 0friends : 1Compressed BlocksDictionary Compressed Skip ListsSkip BytesDecompress0 : {tennis}1 : {Ann, Nick}1: {George}20OutlineColumn-Oriented StorageLazy Record ConstructionCompressionExperimentsConclusions

21RCFileMetadataMetadataJoe, DavidJohn, Smith23, 32{hobbies: {tennis} friends: {Ann, Nick}}, {friends:{George}}{hobbies: {tennis, golf}},{hobbies: {swimming}friends: {Helen}}Row Group 1Row Group 2NameAgeInfoJoe23hobbies: {tennis}friends: {Ann, Nick}David32friends: {George}John45hobbies: {tennis, golf}Smith 65hobbies: {swimming}friends: {Helen}45, 6522Experimental Setup42 node clusterEach node:2 quad-core 2.4GHz sockets32 GB main memoryfour 500GB HDDNetwork : 1Gbit ethernet switch23Overhead of Columnar StorageSynthetic Dataset 57GB 13 columns6 Integers, 6 Strings, 1 MapQuery Select *24Single node experimentBenefits of Column-Oriented StorageQuery Projection of different columns25Single node experimentWorkloadURLInfo { String url String srcUrl time fetchTime String inlink[] Map metadata Map annotations byte[] content}If( url contains ibm.com/jp )find all the distinctencodings reported by the pageSchemaQueryDataset : 6.4 TBQuery Selectivity : 6%2627 SEQ: 754 secComparison of Column-Layouts (Map phase)283040Comparison of Column-Layouts (Map phase)Comparison of Column Layouts (Total job)29 SEQ: 806 secConclusionsDescribe a new column-oriented binary storage format in MapReduce.Introduce skip list layout.Describe the implementation of lazy record construction.Show that lightweight dictionary compression for complex columns can be beneficial.

30Comparison of Sequence Files31RCFile32Comparison of Column-LayoutsLayout Data Read (GB)Map Time(sec)Map Time RatioTotal Time (sec)Total Time RatioSeq - uncomp.64001416-1482-Seq - record3008820-889-Seq - block2848806-886-Seq - custom30407541.0x8061.0xRCFile11137021.1x7611.1xRCFile - comp1022023.7x2912.8xCIF - ZLIB3612.859.1x7710.4xCIF9612.460.8x7810.3xCIF - LZO5412.461.0x7910.2xCIF - SL759.281.9x7011.5xCIF -DCSL617.0107.8x6312.8x3333Comparison of Column-Layouts34 SEQ: 754 secCIF DCSL results in the highest map time speedup and improves the total job time by more than an order of magnitude (12.8X).RCFile35 SEQ: 754 sec36Comparison of Sequence Files SEQ: 754 sec

Documents

Column-Oriented Storage Techniques for MapReduce