114
Big Data Tech Stack Big Data 2015 by Abdullah Cetin CAVDAR

Big Data Tech Stack

Embed Size (px)

Citation preview

Page 1: Big Data Tech Stack

Big DataTech Stack

Big Data 2015by Abdullah Cetin CAVDAR

Page 2: Big Data Tech Stack

Me :)

Page 3: Big Data Tech Stack

Graduated from@HU

Page 4: Big Data Tech Stack

PhD Student@METU

Page 5: Big Data Tech Stack

Ex EntrepreneurI had 3 start-ups

Page 6: Big Data Tech Stack

Senior SoftwareEngineer@Udemy

Page 7: Big Data Tech Stack

Founder and Organizer of

meetup.com/ankara-big-data-meetup

Page 8: Big Data Tech Stack

What's Big DataBig data is data that exceeds the processing capacity

of conventional database systems.

Page 9: Big Data Tech Stack

What's Big DataBig data is when the data itself becomes part of the

problem.

Page 10: Big Data Tech Stack

4V's of Big Data

Page 11: Big Data Tech Stack
Page 12: Big Data Tech Stack
Page 13: Big Data Tech Stack
Page 14: Big Data Tech Stack
Page 15: Big Data Tech Stack
Page 16: Big Data Tech Stack
Page 17: Big Data Tech Stack
Page 18: Big Data Tech Stack

Multitude of DataTypes

StructuredSemi-structuredUnstructured

Page 19: Big Data Tech Stack

Data Data Data

Page 20: Big Data Tech Stack
Page 21: Big Data Tech Stack
Page 22: Big Data Tech Stack
Page 23: Big Data Tech Stack
Page 24: Big Data Tech Stack

What We Need?StoreJoinIndexAnalyticsAggregateVisualize

Page 25: Big Data Tech Stack

ChallengeThe challenge in big data analytics is to

dig deeplyquickly (real time?)and widely

Page 26: Big Data Tech Stack

"ilities" or NFR?AvailabilityScalabilitySecurityPerformance...

Page 27: Big Data Tech Stack

Solution?

Page 28: Big Data Tech Stack

Big Data TechStack

Page 29: Big Data Tech Stack

What're essentialcomponents?

Page 30: Big Data Tech Stack

Data Sources

Page 31: Big Data Tech Stack

Multiple internal& external

data sources

Page 32: Big Data Tech Stack

Creates adata lake

Page 33: Big Data Tech Stack
Page 34: Big Data Tech Stack
Page 35: Big Data Tech Stack
Page 36: Big Data Tech Stack

DifferentVolume, Variety,

Velocity

Page 37: Big Data Tech Stack

Aim is to createa funnel after

proper validationand cleaning

Page 38: Big Data Tech Stack

Ingestion Layer

Page 39: Big Data Tech Stack

Signal-to-Noiseratio10:90

Page 40: Big Data Tech Stack

separate thenoise from

relevant info

Page 41: Big Data Tech Stack

It has capability toValidateCleanseTransformReduceIntegrate

Page 42: Big Data Tech Stack
Page 43: Big Data Tech Stack

DistributedStorage Layer

Page 44: Big Data Tech Stack

Fault toleranceParallelization

Page 45: Big Data Tech Stack

HDFSmassively scalable distributed

file system

Page 46: Big Data Tech Stack

HDFS

Page 47: Big Data Tech Stack

HDFS Architecture

Page 48: Big Data Tech Stack

Non-relational,distributed data?

Page 49: Big Data Tech Stack

NoSQL

Page 50: Big Data Tech Stack

CAP theoremConsistency, Availability,

Partition Tolerance

Page 51: Big Data Tech Stack
Page 52: Big Data Tech Stack
Page 53: Big Data Tech Stack

Ingestion to DFSSqoop, Flume, MapReduce, ETL

Page 54: Big Data Tech Stack

Infrastructure &Platform Layer

Page 55: Big Data Tech Stack

Computing &Scalability

Page 56: Big Data Tech Stack

Hadoop?

Page 57: Big Data Tech Stack

Vertical Scaling

Page 58: Big Data Tech Stack

Vertical Scaling

Page 59: Big Data Tech Stack

Vertical Scaling

Page 60: Big Data Tech Stack

Horizontal Scaling

Page 61: Big Data Tech Stack

Horizontal Scaling

Page 62: Big Data Tech Stack

Horizontal Scaling

Page 63: Big Data Tech Stack
Page 64: Big Data Tech Stack

MapReduceis the main computation paradigm

Page 65: Big Data Tech Stack

MapReduce

Page 66: Big Data Tech Stack
Page 67: Big Data Tech Stack

Hadoop 2

Page 68: Big Data Tech Stack

What's new?

Page 69: Big Data Tech Stack

What's new?

Page 70: Big Data Tech Stack

H1 vs. H2

Page 71: Big Data Tech Stack

One cluster,distributed storage,

distributed scheduler,many types of applications.

Page 72: Big Data Tech Stack

BlueprintsNoSQL with HBaseStream Processing with Storm/SparkGraph Processing with GiraphSQL on Hadoop with ImpalaColumnar Data Formats

Page 73: Big Data Tech Stack

Security Layer

Page 74: Big Data Tech Stack

Data need to be protectedMeet compliance requirementsIndividual's privacy

Page 75: Big Data Tech Stack

Properauthorization and

authenticationneeded

Page 76: Big Data Tech Stack

What can we do?Authentication protocol like KerberosEnable file layer encryptionUse SSL, certificates and trusted keysProvision with Chef, Puppet or Ansible like toolsLog all the communication for detecting anomaliesMonitor whole system

Page 77: Big Data Tech Stack

Monitoring Layer

Page 78: Big Data Tech Stack

Get a completepicture

of our Big Data tech stack

Page 79: Big Data Tech Stack

Satisfy SLAs withmin downtime

Page 80: Big Data Tech Stack

DataDog

Page 81: Big Data Tech Stack

New Relic (Overview)

Page 82: Big Data Tech Stack

New Relic (Databases)

Page 83: Big Data Tech Stack

Analytics Engine

Page 84: Big Data Tech Stack

Co-Existencewith Traditional

BIData warehouse in the traditional wayDistributed MR processing on big data stores

Page 85: Big Data Tech Stack

Mediate data in either directioni.e use Hive/HBase with Sqoop

Page 86: Big Data Tech Stack

Real-time analysis can leveragelow-latency NoSQL stores

i.e Cassandra, Vertica, ...

Page 87: Big Data Tech Stack

R may be used for complexstatistical algorithms

Page 88: Big Data Tech Stack

Search Engines

Page 89: Big Data Tech Stack

Huge volume andvariety of data

Page 90: Big Data Tech Stack

“needle in ahaystack”

Page 91: Big Data Tech Stack
Page 92: Big Data Tech Stack

Need blazing fast searchmechanism

to index and search for big dataanalytics

Page 93: Big Data Tech Stack
Page 94: Big Data Tech Stack

Elastic Search,Solr, ...

Page 95: Big Data Tech Stack

Real-timeProcessing

Page 96: Big Data Tech Stack

In memory?

Page 97: Big Data Tech Stack

Apache Spark

Page 98: Big Data Tech Stack
Page 99: Big Data Tech Stack
Page 100: Big Data Tech Stack
Page 101: Big Data Tech Stack

Storm, Kinesis,Flink, ...

Page 102: Big Data Tech Stack

VisualizationLayer

Page 103: Big Data Tech Stack

Gain insight fasterLook at different aspects of

data visually

Page 104: Big Data Tech Stack
Page 105: Big Data Tech Stack

Tableau

Page 106: Big Data Tech Stack

ChartIO

Page 107: Big Data Tech Stack

LambdaArchitecture

Page 109: Big Data Tech Stack

Don't forget

Page 110: Big Data Tech Stack

There is no"One Size Fits All"

solution

Page 111: Big Data Tech Stack

We need

ContinuousDevelopment

Page 112: Big Data Tech Stack
Page 113: Big Data Tech Stack
Page 114: Big Data Tech Stack

Thank You :)