Upload
erhwen-kuo
View
341
Download
5
Embed Size (px)
Citation preview
1
2
2
2
2
2
2
2
2
2
3 . 1
3 . 2
3 . 2
3 . 2
3 . 2
3 . 2
3 . 2
3 . 3
3 . 4
3 . 4
3 . 4
3 . 4
3 . 4
3 . 5
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 5
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 5
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 5
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 6
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 6
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 7
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 7
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 8
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 8
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 9
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 9
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 10
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 10
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 11
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 11
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 12
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 12
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 13
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 13
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 14
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
3 . 14
http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
4 . 1
4 . 2
4 . 2
4 . 3
4 . 3
4 . 3
4 . 3
4 . 4
4 . 4
4 . 4
4 . 4
5 . 1
5 . 2
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
5 . 2
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
5 . 3
5 . 4
http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
5 . 5
5 . 6
https://maven.apache.org/http://apache.stu.edu.tw/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip
5 . 7
5 . 8
http://www.eclipse.org/downloads/http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/mars/2/eclipse-jee-mars-2-win32-x86_64.zip
5 . 9
5 . 10
5 . 11
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/assets/files/e2-spk-s01_java.zip
5 . 12
5 . 13
6 . 1
6 . 2
http://scala-ide.org/index.htmlhttp://scala-ide.org/download/sdk.html
6 . 3
6 . 4
6 . 5
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/assets/files/e2-spk-s01_scala.zip
6 . 6
6 . 7
6 . 8
7 . 1
7 . 2
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/spark.apache.orghttp://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
7 . 3
7 . 4
7 . 5
7 . 6
valdistData=sc.parallelize(Seq("eighty20","spark","traing","hello","world"))valresult_count=distData.count()println("Countresultis:"+result_count)
7 . 7
8 . 1
8 . 2
valdistData=sc.parallelize(Seq("eighty20","spark","traing","hello","world"))valresult_count=distData.count()println("Countresultis:"+result_count)
8 . 3
9 . 1
9 . 2
9 . 3
10 . 1
10 . 2
10 . 2
10 . 2
10 . 2
10 . 2
10 . 2
10 . 3
10 . 4
http://spark.apache.org/docs/latest/spark-standalone.html
10 . 5
http://spark.apache.org/docs/latest/running-on-mesos.htmlhttp://mesos.apache.org/
10 . 6
http://spark.apache.org/docs/latest/running-on-yarn.html
11 . 1
11 . 2
11 . 2
11 . 2
11 . 2
11 . 2
11 . 2
11 . 2
12 . 1
12 . 2
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/index.html#/4
12 . 3
12 . 4
12 . 5
12 . 6
12 . 7
spark-submit\--classcc.eighty20.spark.s01.sc_00_helloworld\--masterlocal\e2spks01-0.0.1.jar
12 . 8
http://spark.apache.org/docs/latest/submitting-applications.html
spark-submit\--classcc.eighty20.spark.s01.sc_00_helloworld\--masterspark://192.168.0.2:7077\e2spks01-0.0.1.jar
12 . 9
13 . 1
13 . 2
http://192.168.0.2/apps/e2-spk-v01/present/e2-spk-s01/index.html#/5
13 . 3
13 . 4
13 . 5
13 . 6
13 . 7
spark-submit\--classcc.eighty20.spark.s01.sc_00_helloworld\--masterlocal\e2spks01-0.0.1.jar
13 . 8
http://spark.apache.org/docs/latest/submitting-applications.html
spark-submit\--classcc.eighty20.spark.s01.sc_00_helloworld\--masterspark://192.168.0.2:7077\e2spks01-0.0.1.jar
13 . 9
14 . 1
14 . 2
14 . 3
14 . 4
14 . 4
14 . 4
14 . 4
14 . 4
14 . 4
14 . 4
14 . 4
14 . 5
14 . 5
14 . 5
14 . 5
14 . 6
14 . 6
14 . 6
14 . 6
14 . 6
14 . 7
14 . 8
14 . 8
14 . 8
14 . 8
14 . 8
14 . 9
14 . 10
14 . 11
14 . 12
14 . 13
14 . 14
14 . 15
14 . 16
14 . 17
14 . 17
14 . 17
14 . 17
14 . 17
14 . 17
14 . 17
14 . 17
14 . 18
14 . 18
14 . 18
14 . 18
14 . 18
14 . 18
14 . 19
14 . 19
14 . 19
14 . 19
14 . 19
14 . 19
14 . 20
14 . 20
14 . 20
14 . 20
14 . 20
14 . 21
14 . 21
14 . 22
14 . 22
14 . 22
14 . 22
14 . 22
14 . 22
14 . 22
14 . 22
14 . 22
14 . 22
14 . 23
14 . 24
14 . 24
14 . 24
14 . 24
14 . 24
14 . 25
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
14 . 26
http://www.csie.ntnu.edu.tw/~u91029/DirectedAcyclicGraph.html
valr00=sc.parallelize(0to9)valr01=sc.parallelize(0to90by10)valr10=r00.cartesian(r01)valr11=r00.map(n=>(n,n))valr12=r00.zip(r01)valr13=r01.keyBy(_/20)valr20=Seq(r11,r12,r13).foldLeft(r10)(_union_)
14 . 27
valr00=sc.parallelize(0to9)valr01=sc.parallelize(0to90by10)valr10=r00.cartesian(r01)valr11=r00.map(n=>(n,n))valr12=r00.zip(r01)valr13=r01.keyBy(_/20)valr20=Seq(r11,r12,r13).foldLeft(r10)(_union_)
14 . 27
valr00=sc.parallelize(0to9)valr01=sc.parallelize(0to90by10)valr10=r00.cartesian(r01)valr11=r00.map(n=>(n,n))valr12=r00.zip(r01)valr13=r01.keyBy(_/20)valr20=Seq(r11,r12,r13).foldLeft(r10)(_union_)
14 . 27
14 . 28
14 . 28
14 . 28
14 . 28
14 . 28
15
16 . 1
packagecc.eighty20.spark.s01;importorg.apache.spark.SparkConf;importorg.apache.spark.api.java.JavaRDD;importorg.apache.spark.api.java.JavaSparkContext;publicclasssc_01_anatomy_driver{publicstaticvoidmain(String[]args){StringmasterURL="local[*]";//(1) SparkConfconf=newSparkConf()//(2).setAppName("sc_01_anatomy_driver").setMaster(masterURL); JavaSparkContextsc=newJavaSparkContext(conf);//(3) StringfileName=""; if(args.length>0&&args[0]!=null&&!args[0].isEmpty())//(4) fileName=args[0]; else fileName="pom.xml"; JavaRDDlines_rdd=sc.textFile(fileName);//(5) longlines_count=lines_rdd.count();//(6) System.out.printf("Thereare%slinesin%s\n" ,lines_count,fileName); sc.close();}}
16 . 2
StringmasterURL="local[*]";//(1)
16 . 3
SparkConfconf=newSparkConf()//(2).setAppName("sc_01_anatomy_driver").setMaster(masterURL);
16 . 4
JavaSparkContextsc=newJavaSparkContext(conf);//(3)
16 . 5
StringfileName="";if(args.length>0&&args[0]!=null&&!args[0].isEmpty())//(4)fileName=args[0];elsefileName="pom.xml"; JavaRDDlines_rdd=sc.textFile(fileName);//(5)longlines_count=lines_rdd.count();//(6)System.out.printf("Thereare%slinesin%s\n" ,lines_count,fileName);
16 . 6
16 . 7
17 . 1
packagecc.eighty20.spark.s01importorg.apache.spark.{SparkConf,SparkContext}objectsc_01_anatomy_driver{defmain(args:Array[String]){ valmasterURL="local[*]"//(1) valconf=newSparkConf()//(2) .setAppName("sc_01_anatomy_driver") .setMaster(masterURL) valsc=newSparkContext(conf)//(3) valfileName=util.Try(args(0)).getOrElse("pom.xml")//(4) vallines_rdd=sc.textFile(fileName).cache()//(5) vallines_count=lines_rdd.count()//(6) println(s"\nThereare$lines_countlinesin$fileName")}}
17 . 2
valmasterURL="local[*]"//(1)
17 . 3
valconf=newSparkConf()//(2) .setAppName("sc_01_anatomy_driver") .setMaster(masterURL)
17 . 4
valsc=newSparkContext(conf)//(3)
17 . 5
valfileName=util.Try(args(0)).getOrElse("pom.xml")//(4)vallines_rdd=sc.textFile(fileName).cache()//(5)vallines_count=lines_rdd.count()//(6)println(s"\nThereare$lines_countlinesin$fileName")
17 . 6
17 . 7
18
19 . 1
ERROR php:dyingforunknownreasonsWARN dave,areyouangryatme?ERROR didmysqljustbarf?WARN xylonsapproachingERROR mysqlcluster:replacewithsparkcluster
19 . 2
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")//transformedRDDsvalerrors=lines.filter(_.startsWith("ERROR"))valmessages=errors.map(_.split("\t")).map(r=>r(1)).cache()//action1valmysql_errors=messages.filter(_.contains("mysql")).count()//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 3
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")
19 . 4
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")
19 . 5
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")
19 . 6
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")
19 . 7
//transformedRDDsvalerrors=lines.filter(_.startsWith("ERROR"))valmessages=errors.map(_.split("\t")).map(r=>r(1)).cache()
19 . 8
//action1valmysql_errors=messages.filter(_.contains("mysql")).count()
19 . 9
//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 10
//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 11
//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 12
//baseRDDvallines=sc.textFile("hdfs://sample_log_file_path/log.txt")//transformedRDDsvalerrors=lines.filter(_.startsWith("ERROR"))valmessages=errors.map(_.split("\t")).map(r=>r(1)).cache()//action1valmysql_errors=messages.filter(_.contains("mysql")).count()//action2valphp_errors=messages.filter(_.contains("php")).count()
19 . 13
19 . 14
19 . 15
19 . 16
19 . 17
19 . 18
20 . 1
#ApacheSparkSparkisafastandgeneralclustercomputingsystemforBigData.Itprovideshigh-levelAPIsinScala,Java,Python,andR,andanoptimizedenginethatsupportsgeneralcomputationgraphsfordataanalysis.Italsosupportsarichsetofhigher-leveltoolsincludingSparkSQLforSQLandDataFrames,MLlibformachinelearning,GraphXforgraphprocessing,andSparkStreamingforstreamprocessing.
##OnlineDocumentationYoucanfindthelatestSparkdocumentation,includingaprogrammingguide,onthe[projectwebpage](http://spark.apache.org/documentation.html)and[projectwiki](https://cwiki.apache.org/confluence/display/SPARK).ThisREADMEfileonlycontainsbasicsetupinstructions.##BuildingSpark...
20 . 2
valtopN=10valfileName="hdfs://log_file_path/README.md"//RDDcreationfromexternaldatasourcevaldocs=sc.textFile(fileName)//Splitlinesintowordsvallower=docs.map(line=>line.toLowerCase())valwords=lower.flatMap(line=>line.split("\\s+"))valcounts=words.map(word=>(word,1))//Countallwords(automaticcombination)valfreq=counts.reduceByKey(_+_)//Swaptuplesandgettopresultsvaltop=freq.map(_.swap).top(topN)top.foreach(println)
20 . 3
20 . 4
20 . 5
20 . 6
20 . 7
20 . 8
20 . 9
20 . 10
20 . 11
20 . 12
20 . 13
20 . 14
20 . 15
20 . 16
20 . 17
20 . 18
20 . 19
20 . 20
20 . 21
20 . 22
20 . 23
21