Download pptx - Spark Hadoop

Transcript
Page 1: Spark Hadoop

DIFFERENCE BETWEEN SPARK AND HADOOP MAPREDUCE

Page 2: Spark Hadoop

SPARK IS MUCH FASTER

Spark tries to keep things in memory, whereas MapReduce keeps shuffling things in and out of disk.

Page 3: Spark Hadoop

LOGISTICS REGRESSION PERFORMANCE

Page 4: Spark Hadoop

WORDCOUNT WITH HADOOP

Page 5: Spark Hadoop

WORDCOUNT WITH SPARK

It’s easier to develop for Spark.

Page 6: Spark Hadoop

Spark also adds libraries for doing things like machine learning, streaming, graph programming and SQL

Page 7: Spark Hadoop

SPARK GENERAL FLOW

Page 8: Spark Hadoop

SOME ACTIONS AND TRANSFORMATIONS

map(func)flatMap(func)froupByKey()reduceByKey(func)mapValues(func)sample(…)union(other)distinct()sortByKey()..

reduce(func)collect()count()first()take(n)saveAsTextFile(path)countByKey()foreach(func)…

Page 9: Spark Hadoop

CREATE INPUT RDDs

Page 10: Spark Hadoop

SPLIT INTO TRAINING,VALIDATION AND TEST DATASETS

FIND OUT OPTIMAL RANK ANDNUMBER OF ITERATIONS

Page 11: Spark Hadoop

RMSE (ROOT MEAN SQUARE ERROR)CALCULATION METHOD

EVALUATE THE BEST MODELON THE TEST SET

Page 12: Spark Hadoop

CREATE A NAIVE BASELINE AND COMPARE IT WITH THE BEST MODEL

OUTPUT

Page 13: Spark Hadoop

RECOMMEND SOME NEW PRODUCTS FOR USER WITH ID #150

AND SOME OUTPUT...

Page 14: Spark Hadoop

USER ALREADY REACTED ON SOME CAMPAIGNS

Page 15: Spark Hadoop

USE THIS INFORMATION FOR PREDICTION

AND SOME OUTPUT...

Page 16: Spark Hadoop

RDD FAULT TOLERANCE

Page 17: Spark Hadoop

SPARK DEPLOYMENT

Page 18: Spark Hadoop

MACHINE LEARNING

Types of Machine Learning

Page 19: Spark Hadoop
Page 20: Spark Hadoop

ALS Algorithm

Page 21: Spark Hadoop

ALS MODEL AND ALGORITHM

Model Ratings as product of User (A) and Movie Feature (B) matrices of size UxK and MxK

Alternating Least Squares (ALS)

• Start with random A nd B vectors

• Optimize user vectors (A) based on movies

• Optimize movie vectors (B) based on users

• Repeat until converged

Page 22: Spark Hadoop

ALS ALGORITHM


Recommended