Hadoop overview

  • View
    2.245

  • Download
    4

Embed Size (px)

Transcript

  • 1. Hadoop Overview Cecil (hyeonseok.c.choi@gmail.com)13 10 21

2. ?PIG13 10 21 3. What is Big Data?13 10 21 4. 13 10 21 5. TDWI BIG Data 3 Volume TB or PB Velocity / Variety , 2 Big data : What is Big Data?, Datameer, 13 10 21 6. ?13 10 21 7. - ex) IBM vs. Google13 10 21 8. IBM , , 13 10 21 9. IBM , , 40 !!13 10 21 10. Google 13 10 21 11. Google 58 13 10 21 12. ?13 10 21 13. ... 13 10 21 14. ... !!13 10 21 15. ? ? Failure ?13 10 21 16. 13 10 21 17. But, !! 13 10 21 18. MapReduce MapReduce 13 10 21 19. (Doug Cutting) GFS(Google File System), MapReduce http://hadoop.apache.org/Hadoop = HDFS(GFS) + MapReduce13 10 21 20. Hadoop...13 10 21 21. , (Hadoop Distributed File System)13 10 21 22. HDFS FAT32: 4G, NTFS: 4~64G, EXT3: 2TB ~ 64TB : create/delete/append/modify(x)13 10 21 23. HDFS Architecture13 10 21 24. Name-node HDFS Data-node Client HDFS API 13 10 21 25. Block : 64M Replication : 3 13 10 21 26. Metadata Metadata Name-node Name-node 3 Name-node (FSImage, Edit log) Name-node Name-node , Name-node 13 10 21 27. FSImage & EditLog Name-node FSImage EditLog EditLog . , Secondary Name-nodeName-node Memory FileSystemStorage Edit Log/ /userAB/etcC/varDE Edit Log1. Add File 2. Delete File 3. Append File : :+FS ImageFS Image13 10 21 New FS Image 28. FSDataOutputStreamDFSOutputFlow - Write OperationStreamDFSOutputStreamDataStreamerDataStreamer Name-node Name-node Data-node Data-node Data-node 13 10 21 29. Flow - Read Operation Name-node Name-node , Data-node Best case: Data-node 13 10 21 30. 13 10 21 31. Map-Reduce Overview Map-Reduce Split 13 10 21 32. Map & Reduce read a book write a book ,Map-Reduce Map ReduceMapMap2 Reduce < a, 2> 13 10 21 Map: (k1, v1) -> list(k2, v2) Reduce: (k2, list(v2)) -> list(k3, v3) ex) Reduce: : 33. Example WordCount-Mapper Map: (k1, v1) -> list(k2, v2) public static class TokenizerMapper extends Mapper{ private final static IntWritable one = new IntWritable(1); private Text word = new Text();}13 10 21 public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } 34. Example WordCount-Reducer Reduce: (k2, list(v2)) -> list(k3, v3) public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable();}13 10 21 public void reduce(Text key, Iterable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } 35. Example WordCount-Driver main Class public class WordCount { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args) .getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount "); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } 13 10 21 36. Map-Reduce Architecture13 10 21 37. Map-Reduce-I JobClient JobTracker Map-Reduce JobTracker (Map/Reduce) TaskTracker JVM HDFS Data-node . - 13 10 21 JobClient submitJob() 38. Map-Reduce-II(YARN) Job JobTracker Map-Reduce Resource Manager Application Master Node Manager Node Manager 13 10 21 39. Map-Reduce-II(YARN) Job JobTracker Map-Reduce Resource Manager Application Master Node Manager Node Manager 13 10 21 40. (2/2) Map-Reduce I vs YARN: Hortonworks, http://hortonworks.com/hadoop/yarn13 10 21 41. Hadoop Echo System13 10 21 42. Apache HBase NoSQL (), Google Big Table - MasterServer RegionServer . - HA Apache Zookeer 13 10 21 43. Apache PIG Hadoop A = load './input.txt'; B = foreach A generate atten(TOKENIZE((chararray)$0)) as word; C = group B by word; D = foreach C generate COUNT(B), group; store D into './wordcount';13 10 21 44. Apache Hive Hive clients hive --service hiveserver - SQL - 13 10 21 45. Apache Sqoop RDBMS ToolImportExport : siliconweek, 13 10 21 46. Apache Flume HDFS Logical: Physical: , JVM Logical Master: Logical, Physical Agent: Collector Collector: HDFS : cloudera, http://archive.cloudera.com/cdh/3/ume-0.9.0+1/UserGuide.html 13 10 21 47. Apache Zookeeper - Google Chubby - - , , - HA 13 10 21 48. References . ! . : , 2012 . . http://ko.wikipedia.org/wiki/%EB%B9%85_%EB%8D%B0%EC%9D %B4%ED%84%B0 . Sanjay Ghemawat, Howard Gobio, and Shun-Tak Leung. The Google le system. In 19th Symposium on Operating Systems Principles, pages 2943, Lake George, New York, 2003. . . http://hadoop.apache.org/ Hadoop YARN A next-generation framework for Hadoop data processing, Hortonworks, http://hortonworks.com/hadoop/yarn Flume User Guide, Cloudera, http://archive.cloudera.com/cdh/3/ume-0.9.0+1/ UserGuide.html13 10 21