Upload
kuo-chun-su
View
21.574
Download
7
Embed Size (px)
Citation preview
Hadoop
Hadoop,theAppleofOurEyes
1/74
JavaSEJavaEESOAP/RESTfulServicesDesignPatternsEJB/JPAJavaEEStruts/Spring/HibernateOpenSourceFrameworkJBossASGlassFishApplicationServer
Java.NETHadoopPlatformNoSQLBigDataGoogleAppEngineMicrosoftAzureCloudBeesAndroidWindowsPhoneSmartPhone
PS.GoogleSearch
Bio
2/74
https://www.google.com.tw/imghp?hl=zh-TW&tab=wi
Agenda0.
1.Hadoop
2.Hadoop
3.Hadoop
4.Hadoop
5.Hadoop
6.
Hadoop
3/74
4/74
LuceneNutchDougCuttingLuceneNutchGoogle2003/20042006NutchHadoopHadoopDoug2008-01ApacheTop-LevelProject2009-09DougCuttingClouderaArchitect2011-06Yahoo!HadoopSpinOffHortonworks
Hadoop
5/74
http://www.cloudera.com/http://www.hortonworks.com/
TheApacheHadoopsoftwarelibraryisaframeworkthatallowsforthedistributedprocessingoflargedatasetsacrossclustersofcomputersusingsimpleprogrammingmodels.
Itisdesignedtoscaleupfromsingleserverstothousandsofmachines,eachofferinglocalcomputationandstorage.
Ratherthanrelyonhardwaretodeliverhigh-availability,thelibraryitselfisdesignedtodetectandhandlefailuresattheapplicationlayer,sodeliveringahighly-availableserviceontopofaclusterofcomputers,eachofwhichmaybepronetofailures.
ApacheHadoop
6/74
http://hadoop.apache.org/
...
...
HadoopBigData
7/74
Hadoop+BigData
()
8/74
Hadoop+BigData
()
9/74
1. SubmitJob2. JTTaskTT3. TTTask4. TTJT
Hadoop1.x-MapReduce(MRv1)
JobTracker(Master)TaskTracker(Slave)
10/74
Hadoop1.x
HadoopHDFS(Storage)HadoopMapReduce(ComputingEngine+ResourceManagement+JobScheduling/Monitoring+...)
Cluster4,000-4,500NodeJobTrackerConcurrentTask40,000HDFSNamespace/sales/accounting...MapReduceJob...
ClusterTask
11/74
Hadoop
BatchJobInteractiveQueryReal-TimeProcessingGraphProcessingIterativeModeling
Hadoop(BatchProcessing)
BatchJobBatchJobJobJobI/OOverhead
Hadoop(HDFS)(MapReduce)
12/74
13/74
Hadoop
14/74
MapReduce
Hadoop(HDFS)(YARN)
HadoopBatchDataOperatingSystem
MapReduceBatchProcessingHiveTezInteractiveSQLQuery...
15/74
MapReduceHadoopMapReduceJobMapReduce
16/74
MapReducePhase1ResourceManagementMapReduceYARNOtherYARNFrameworks
17/74
MapReducePhase2MapReduceYARNBatchJobComputingFrameworkYARNTezStormGiraphSparkOpenMPI...
18/74
MapReducePhase3MapReduce(HivePig)ComputingFramework(Tez)
19/74
HDFS
HighAvailabilityNamespaceSnapshotI/O2.5-5...
HDFS->HDFS2
20/74
Hadoop2.x
HadoopCommon(CoreLibraries)HadoopHDFS(Storage)HadoopMapReduce(ComputingEngine)HadoopYARN(ResourceManagement+JobScheduling/Monitoring)
Hadoop2.x...BackwardCompatibilityYahoo!Hadoop2.x35,000+Node...
21/74
1. SubmitJob2. AM3. RMAM4. RequestRM5. Container6. AM/Container7. Client/AM8. AM
Hadoop2.x-MapReduce(MRv2)ResourceManagerNodeManager-ResourceApplicationMaster-Framework-SpecificResourceManagerResourceNodeManagerContainerContainerResourceScheduleTask
22/74
MapReduce(MRv2)ResourceManagerResourceArbitratorCapacityFairnessSLAPluggableInterface
ApplicationMasterMRv1MRv2ResourceManagerNodeManagerContainer
ApplicationMasterMRv1ResourceManagerMRv2ResourceManagerScalable10,000+NodeApplicationMasterPer-Application
ApplicationMasterFramework-SpecificResourceManagerFramework
23/74
YARN-YetAnotherResourceNegotiatorAGeneral-PurposeDistributedApplicationManagementFrameworkDataOperatingSystemforEnterpriseHadoop
24/74
Resourcevs.Container
ResourceModel
RackHostResourceCPUCore
ContainerResourceModelResource
YARNApplicationApplicationMasterContainerCommand-Line3rd-PartyJARSecurityTokenNodeManagerContainer
ContainerOSProcess
25/74
HadoopBatch
26/74
Hadoop
27/74
Windows
28/74
Hadoop
29/74
HDFSDistributedFileSystemMapReduceDistributedDataAnalysisEngineAvroLanguage-NeutralDataSerializationSystem(2010-05Top-LevelProject)MahoutScalableLibraryforMachineLearningHBaseDistributedDataStorage(2010-05Top-LevelProject)PigHighLevelLanguageforDataAnalysis(2010-09Top-LevelProject)HiveDataWarehousingandSQL-LikeQuery(2010-09Top-LevelProject)SqoopDataMigrationToolBetweenHDFSandRDBMS
HadoopEcosystem
30/74
HCatalogHadoopNamingService
31/74
Yahoo!PigPigLatinMapReduceJob
FacebookHiveHiveQLMapReduceJob
HivePig
HadoopBigDataMapReduce/Java
32/74
StingerInitiative
HortonworksHadoopMapReduceDataProcessingPlatformHiveInteractiveQueryPB-ScaleProcessing
SpeedHive10100ScaleTBPBSQLCompatibilitySQL
1344145Developer39Hive3Release
33/74
http://hortonworks.com/labs/stinger/
StingerInitiativeHive
34/74
Hive-Speed
35/74
Hive-ScaleORCFile(OptimizedRowColumnarFile))ORCFileHCatalogPigMapReduce
36/74
file:///Users/monster/Dropbox/Courseware/TheAppleOfBigData/(http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/
Hive-SQLCompatibilityRolePrivilegeGrantRevoke
37/74
SqoopClouderaHadoopRDBMSJDBCMapReduce
38/74
http://sqoop.apache.org/
Hadoop
39/74
HadoopDistributionDougCutting2011HadoopWorldKeynote
ThesimilaritybetweenHadoopandLinuxkernel,andthecorrespondingsimilaritybetweenthebigstackofHadoop(Hive,Hbase,Pig,Avro,etc.)andthefullyoperationaloperatingsystemswithitsdistributions(RedHat,Ubuntu,Fedora,Debianetc.)
HadoopDistribution
ClouderaClouderaDistributionforHadoop(CDH)OracleOracleBigDataApplianceIntelIntelDistributionforHadoop(IDH)ClouderaHortonworksHortonworksDataPlatform(HDP)MicrosoftMicrosoftHDInsightMapRMapRDistributionforApacheHadoop(M3,M5,M7)...
Make()ApacheBigTop
40/74
http://bigtop.apache.org/
ClouderaDistributionforHadoop
2014900M740MIntel
41/74
http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.htmlhttp://www.theregister.co.uk/2014/06/02/cloudera_board_announcement/
OracleBigDataAppliance
OracleBigDataPlatformClouderaDistributionforHadoop(CDH)
42/74
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
HortonworksDataPlatform
HortonworksHadoopWindowsPortingYARN
201350MFunding2014100M
43/74
http://hortonworks.com/hdp/
MicrosoftHDInsight
HDInsightHortonworksDataPlatform(HDP)
44/74
http://azure.microsoft.com/zh-tw/documentation/services/hdinsight/
MapRDistributionforApacheHadoop
201335MFunding2014GoogleQualcomm110M
45/74
http://www.mapr.com/products/hadoop-download
Hadoop
46/74
ParallelProcessing
TezSpark...
UserInterface
Hue
SQLonHadoop
ImpalaPrestoDrill/Dremel/BigQuery...
DataCollector
FlumeChukwaScribe...
MachineLearning
Mahout...
HadoopBigData
47/74
http://tez.apache.org/https://spark.apache.org/http://gethue.com/http://impala.io/http://prestodb.io/http://incubator.apache.org/drill/http://research.google.com/pubs/pub36632.htmlhttps://developers.google.com/bigquery/?hl=zh-twhttp://flume.apache.org/https://chukwa.apache.org/https://github.com/facebookarchive/scribehttps://mahout.apache.org/
TezHortonworksAframeworkfornearreal-timebigdataprocessingInspiredbyMicrosoftDryadStingerInitiativeDataflowmodelonadirectedacyclicgraph(DAG)ofnodesQueryPlan
48/74
http://tez.apache.org/http://hortonworks.com/hadoop/tez/http://cs.brown.edu/~debrabant/cis570-website/papers/dryad.pdfhttp://hortonworks.com/labs/stinger/
UCBerkeleyAMPLab20092010OpenSourceDataBricksHDFSGeneral-PurposeClusterComputingSystemIn-MemoryHadoop100In-DiskHadoop10YARNMLLibMahoutCrunchCascadingSparkClouderaDataBricksIBMIntelMapRHivePigSqoopOozie
Spark-Lightning-FastClusterComputing
49/74
https://amplab.cs.berkeley.edu/projects/spark-lightning-fast-cluster-computing/http://databricks.com/http://hortonworks.com/press-releases/hortonworks-announces-apache-spark-yarn-ready/https://spark.apache.org/mllib/https://mahout.apache.org/http://spark.apache.org/
Hue-HadoopUserExperienceClouderaOnlineDemohttp://demo.gethue.com/
50/74
http://gethue.com/http://demo.gethue.com/
Hue-InteractiveSQL&Dashboard
51/74
http://gethue.com/
Impala-Real-TimeQueriesinHadoopCloudera2012HDFS/HBaseDistributedParallelSQLQueryEngineinRealTimeGoogleF1Fault-TolerantDistributedRDBMSDremelAdHocQueryToolSQLonHadoopMapReduceIn-MemoryProcessCompliantwithANSI-92SQLStandardClouderaODBCDriverforImpalaBI/DW
52/74
http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.htmlhttp://research.google.com/pubs/pub38125.htmlhttp://research.google.com/pubs/pub36632.htmlhttp://www.cloudera.com/content/support/en/downloads/connectors/impala/impala-odbc-v2-5-15.html
PrestoFacebook20122013FacebookDataWarehouseQueryExecutionEngineCacheANSI-SQLCompatibleQueryCPUEfficiencyHive4-78-10MostofFacebookispicturesofcats,updatesaboutbodilyfunctions,nihilisticramblings,andthepingingsofZyngagamesfeedinge-stimstofolk,itbearsnotingthatnoneofthisreallymattersfordesigningmassivedatasystems.
53/74
http://prestodb.io/http://www.theregister.co.uk/2013/06/07/hey_presto_facebook_reveals_exabytescale_query_engine/
Google2010DremelInteractiveAnalysisofWeb-ScaleDatasets
ApacheDrillDremelOpenSourceScaleOut10,000+NodePBTrillionRecord
GoogleBigQueryDremelIaaS
DremelDrillBigQueryWhatisBigQuery,ItsFeaturesandSomeSuccessfulProductsWhoGotBenefitedfromIt?
54/74
http://research.google.com/pubs/pub36632.htmlhttp://incubator.apache.org/drill/https://developers.google.com/bigquery/http://www.netsolutionsindia.com/blog/what-is-bigquery-its-features-and-some-successful-products-who-got-benefited-from-it/
()
Hadoop
55/74
Cloudera
Intel()DataBricks(Spark)IBMOracleMapR...
Hortonworks
Microsoft()...
56/74
http://www.cloudera.com/content/cloudera/en/partners.htmlhttp://databricks.com/http://hortonworks.com/partners/
Hivevs.Impala
DidClouderaJustShootTheirImpala?
ClouderaHiveHiveReal-TimeDistributedSQLProcessingHiveHiveSpark
57/74
http://hortonworks.com/blog/cloudera-just-shoot-impala/https://issues.apache.org/jira/browse/HIVE-7292
Hive
Impala
Tez/YARNvs.Spark
Cloudera,MapR,IBM,andIntelbetonSparkasthenewheartofHadoop
ClouderaHiveSparkSparkSQLonHadoopHiveHiveonSparkSharkClouderaSharkHiveHiveClouderaHiveonSparkHiveQueryPlannerHiveMapReduceTezHortonworksHiveonTezQueryPlannerHiveonMapReduceHiveonTezonYARN
58/74
http://www.theregister.co.uk/2014/06/30/cloudera_and_co_spark/https://issues.apache.org/jira/browse/HIVE-7292
HadoopMachineLearning
RecommendationMiningClusteringClassificationUseCase
2014-04-25GoodbyeMapReduceCodebaseScalaDSLSpark
Mahout
59/74
https://mahout.apache.org/
60/74
PythonRubyC/C++C#PerlBash...
ProgrammingJava
MapReduceYARNHadoopMapReduceExamplespopcornyGradle
HadoopStreaming
61/74
http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/http://www.codedata.com.tw/author/popcornyhttps://github.com/popcornylu/hadoop-wordcounthttp://hadoop.apache.org/docs/r1.2.1/streaming.html
DataflowMapReduceGoogleFlumeMillWheelDataflowSDKJavaBigQueryDataflowDataflowBigQueryGoogleTwitterDataflowTwitter
DataflowJava
iThomeGoogleI/O2014Dataflow
62/74
http://googlecloudplatform.blogspot.tw/2014/06/sneak-peek-google-cloud-dataflow-a-cloud-native-data-processing-service.htmlhttp://www.ithome.com.tw/news/89181
Data
SQLonHadoopNoSQLandHadoop...
HueHiveSQLHiveQLImpalaHiveANSI-SQLSqoopJDBCRDBMS/BI/DWHBaseNoSQL...
63/74
MySQLHadoopApplierMySQLBinaryLogEventlibhdfsCLibraryHadoopReal-TimeIntegration/BackupBetweenMySQLandHadoop
64/74
http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html
PhoenixCLI-Sqlline PhoenixGUI-SQuirrel
Phoenix-WeputtheSQLbackinNoSQLSalesforceHBaseJDBCWrapperClientSQLQueryHBaseScanJDBCResultSetQuerymsMillion
65/74
http://squirrel-sql.sourceforge.net/http://phoenix.apache.org/
JDBCAPIHBase
publicclassHelloPhoenix{publicstaticvoidmain(String[]args)throwsSQLException{Connectioncon=DriverManager.getConnection("jdbc:phoenix:[zookeeper]");
Statementstmt=con.createStatement();
stmt.executeUpdate("createtabletest(mykeyintegernotnullprimarykey,mycolumnvarchar)");stmt.executeUpdate("upsertintotestvalues(1,'Hello')");stmt.executeUpdate("upsertintotestvalues(2,'World!')");
con.commit();
PreparedStatementstatement=con.prepareStatement("select*fromtest");
ResultSetrs=statement.executeQuery();
while(rs.next())System.out.println(rs.getString("mycolumn"));
statement.close();
con.close();}}
66/74
Windows
WindowsAzureHDInsightEmulator
Linux
ClouderaQuickStartVMsforCDHHortonworksHDPSandboxBigSQLQuickStartVM
Browser
ClouderaLive
HadoopasaService
MicrosoftAzureHDInsightServiceAmazonElasticMapReduce(EMR)
Platform
67/74
http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHThttp://www.cloudera.com/content/support/en/downloads.htmlhttp://hortonworks.com/hdp/downloads/http://www.bigsql.org/se/preDownload.jsphttp://go.cloudera.com/cloudera-live.htmlhttp://manage.windowsazure.com/http://aws.amazon.com/cn/elasticmapreduce/
1Hadoop
68/74
5HadoopVM
69/74
10HadoopCluster
70/74
71/74
72/74
73/74
ThePossibilitiesofHadoopforBigData
0:36
74/74