Spark and Scala Sheng QIAN 2015-06-17. The Berkeley Data Analytics Stack

Spark and Scala

Sheng QIAN

2015-06-17

The Berkeley Data Analytics Stack

The Goal of Spark

Compare between Spark and Hadoop

Spark supports …• Scala (Best)

• Python(2.7.*)

• Java (…)

All based on RDD (Resilient Distributed Dataset)

• A list of partitions

• A function for computing each split

• A list of dependencies on other RDDs

• Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned

• Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)

All based on RDD (Resilient Distributed Dataset)

• A list of partitions

• A function for computing each split

• A list of dependencies on other RDDs

• Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned

• Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)

The process

1. File System(HDFS/HBase)/CollectionsRDD

2. Transformation (Delay execution) * Faster than MR due to this

3. Action (execution)

Transformations and actions

Fault tolerance

Every RDD records RDDs it depends on

Cluster Overview

Task Schedule

DAG Scheduler• 基于 Stage 构建 DAG ，决定每个任务的最佳位置• 记录哪个 RDD 或者 Stage 输出被物化• 将 taskset 传给底层调度器 TaskScheduler

• 重新提交 shuffle 输出丢失的 stage

Task Scheduler• 提交 taskset( ⼀组 task) 到集群运⾏并汇报结果• 出现 shuffle 输出 lost 要报告 fetch failed 错误 • 碰到 straggle 任务需要放到别的节点上重试• 为每⼀个 TaskSet 维护⼀个 TaskSetManager( 追踪本

地性及错误信息 )

Job Schedule

Job Optimization

Why Scala• Base on JVM

• FP + OO

Scala - GrammarOn Evernote

Thank you

Spark and Scala Sheng QIAN 2015-06-17. The Berkeley Data Analytics Stack

Documents

FU SHENG SCREW COMPRESSOR

Hello , World ! I’m Zhou Qian

QIAN SHOU QIAN YAN DA BEI CHAN FA The Great · PDF file千手千眼大悲懺法 QIAN SHOU QIAN YAN DA BEI CHAN FA The Great Compassion Repentance Service . 楊枝淨水讚

Compost - Berkeley

Bing Qian Montecarlo

Qian hongyan

Slides qian anchuan_agile requirement analysis

Chang Sheng Che's Portfolio!

Sheng Siong AR 2012

Berkeley - Dios.pdf

Chin - Sheng Concerto ''Su

Ren Sheng C

j 31 Qian Suresh

Maha Mula Acarya Lian Sheng - Vihara Vajra Bhumi SriwijayaInformasi alamat Vihara Zhen Fo Zhong terdekat di kota anda: ... wu shang shen shen wei miao fa. bai qian wan jie nan zao

阿曼劳动法 Oman Labour Laws Chinese Version translated by Qian Qian

и правильно - sheng-wang.ru

Shi yuan sheng 2098857

Qian Shi Reportdsffdsf

Qian Xue Sen

[Berkeley] Kamboja