89
Big Data Processing using Hadoop Shadi Ibrahim Inria, Rennes Bretagne Atlantique Research Center

Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

  • Upload
    ngohanh

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Big  Data  Processing  using  Hadoop  

Shadi  Ibrahim  Inria,  Rennes  -­‐  Bretagne  Atlantique  Research  Center    

   

Page 2: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Apache  Hadoop  

2  Hadoop  –  INRIA          S.IBRAHIM     2  

Page 3: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  •   Hadoop  is  a  top-­‐level  Apache  project  » Open  source  implementation  of  MapReduce    » Developed  in  Java    

•  It  was  advocated  by  industry’s  premier  Web  players—Google,  Yahoo!,  Microsoft,  and  Facebook—as  the  engine  to  power  the  cloud.    

 Hadoop  –  INRIA          S.IBRAHIM     3  

Page 4: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  History    

Hadoop  –  INRIA          S.IBRAHIM     4  

Page 5: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Who  Uses  Hadoop?  » Facebook  » IBM  :  Blue  Cloud  » Joost  » Last.fm  » New  York  Times  » Microsoft  » Cloudera    » Yahoo!  

 “The  mapreduce  programming  model  and  implementations”,  H  Jin,  S  Ibrahim,  L  Qi,  H  Cao,  S  Wu,  X  Shi.  Cloud  computing:  principles  and  paradigms.  Wiley,2011    

Hadoop  –  INRIA          S.IBRAHIM     5  

Page 6: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop:  Where?  Batch  data  processing,  not  real-­‐time    

web  search  

Log  processing  

Document  analysis  and  indexing  

Web  graphs  and  crawling  

Highly  parallel  data  intensive  distributed  applications  

Hadoop  –  INRIA          S.IBRAHIM     6  

Page 7: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Public  Hadoop  Clouds  Hadoop  Map-­‐reduce  on  Amazon  EC2    http://wiki.apache.org/hadoop/AmazonEC2  

IBM  Blue  Cloud  » Partnering  with  Google  to  offer  web-­‐scale  infrastructure  

Global  Cloud  Computing  Testbed  » Joint  effort  by  Yahoo,  HP  and  Intel  –    http://www.opencloudconsortium.org/testbed.html  

7  Hadoop  –  INRIA          S.IBRAHIM    

Page 8: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

8  Big  Data  Processing  in  the  Cloud–  INRIA          S.IBRAHIM    

Page 9: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  –  INRIA          S.IBRAHIM     9  

Source:  Wikibon  2013  

Page 10: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Batch  Big  data  Processing    

Adapted  from  Presentations  from:  http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     10  

Page 11: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Components  Distributed  File  System  » Name:  HDFS  »  Inspired  by  GFS  

Distributed  Processing  Framework  » Modelled  on  the  MapReduce  abstraction  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     11  

Page 12: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

HDFS  Distributed  storage  system  »  Files  are  divided  into  large  blocks  (128  MB)  »  Blocks  are  distributed  across  the  cluster  »  Blocks  are  replicated  to  help  against  hardware  failure  »  Data  placement  is  exposed  so  that  computation  can  be  migrated  to  data  

Notable  differences  from  mainstream  DFS  work  »  Single  ‘storage  +  compute’  cluster  vs.  separate  clusters  »  Simple  I/O  centric  API  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     12  

Page 13: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

HDFS  Architecture:  NameNode  (1)    Master-­‐Slave  Architecture  

 HDFS  Master  “NameNode”  »  Manages  all  file  system  metadata  in  memory  

•  List  of  files  •  For  each  file  name,  a  set  of  blocks  •  For  each  block,  a  set  of  DataNodes  •  File  attributes  (creation  time,  replication  factor)  

»  Controls  read/write  access  to  files  »  Manages  block  replication  »  Transaction  log:  register  file  creation,  deletion,  etc.  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     13  

Page 14: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

HDFS  Architecture:  DataNodes  (2)  HDFS  Slaves  “DataNodes”  

A  DataNode  is  a  block  server    »  Stores  data  in  the  local  file  system  (e.g.  ext3)    »  Stores  meta-­‐data  of  a  block  (e.g.  CRC)    »  Serves  data  and  meta-­‐data  to  Clients    

Block  Report    »  Periodically  sends  a  report  of  all  existing  blocks  to  the  NameNode    

Pipelining  of  Data    »  Forwards  data  to  other  specified  DataNodes  

Perform  replication  tasks  upon  instruction  by  NameNode  

Rack-­‐aware  http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     14  

Page 15: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

HDFS  Architecture  (3)  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     15  

Page 16: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Fault  Tolerance  in  HDFS  DataNodes  send  heartbeats  to  the  NameNode    » Once  every  3  seconds    

NameNode  uses  heartbeats  to  detect  DataNode  failures  » Chooses  new  DataNodes  for  new  replicas  » Balances  disk  usage    » Balances  communication  traffic  to  DataNodes  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     16  

Page 17: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

NameNode  Failures  A  single  point  of  failure    

Transaction  Log  stored  in  multiple  directories    » A  directory  on  the  local  file  system    » A  directory  on  a  remote  file  system  (NFS/CIFS)    

Need  to  develop  a  real  HA  solution  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     17  

Page 18: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Data  Pipelining  Client  retrieves  a  list  of  DataNodes  on  which  to  place  replicas  of  a  block    » Client  writes  block  to  the  first  DataNode    » The  first  DataNode  forwards  the  data  to  the  next    » The  second  DataNode  forwards  the  data  to  the  next    

DataNode  in  the  Pipeline    » When  all  replicas  are  written,  the  client  moves  on  to  write  the  next  block  in  file  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     18  

Page 19: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

User  Interface  Command  for  HDFS  User:    » hadoop  dfs  -­‐mkdir  /foodir    » hadoop  dfs  -­‐cat  /foodir/myfile.txt    » hadoop  dfs  -­‐rm  /foodir  myfile.txt    

Command  for  HDFS  Administrator    » hadoop  dfsadmin  -­‐report    » hadoop  dfsadmin  -­‐decommission  datanodename    

Web  Interface    » http://host:port/dfshealth.jsp  

http://wiki.apache.org/hadoop/HadoopPresentations  Hadoop  –  INRIA          S.IBRAHIM     19  

Page 20: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Useful  links  HDFS  Design:    » http://hadoop.apache.org/core/docs/current/hdfs_design.html  

Hadoop  API:    » http://hadoop.apache.org/core/docs/current/api/  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     20  

Page 21: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  MapReduce  Master-­‐Slave  architecture  

• Map-­‐Reduce  Master  “JobTracker”  

» Accepts  MR  jobs  submitted  by  users  

» Assigns  Map  and  Reduce  tasks  to  TaskTrackers  

» Monitors  task  and  TaskTracker  status,  re-­‐executes  

tasks  upon  failure  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     21  

Page 22: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  MapReduce  Master-­‐Slave  architecture  

• Map-­‐Reduce  Slaves  “TaskTrackers”  » Run  Map  and  Reduce  tasks  upon  instruction  from  the  JobTracker  

• Manage  storage  and  transmission  of  intermediate  output  » Generic  Reusable  Framework  supporting  pluggable  user  code  (file  system,  I/O  format)  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     22  

Page 23: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Deployment:  HDFS  +  MR  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     23  

Page 24: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  MapReduce  Client    Define  Mapper  and  Reducer  classes  and  a  “launching”  program  

 Language  support  » Java  » C++  » Python  

http://wiki.apache.org/hadoop/HadoopPresentations  

Hadoop  –  INRIA          S.IBRAHIM     24  

Page 25: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Zoom  on  Map  Phase    

25  

“Handling  partitioning  skew  in  MapReduce  using  LEEN”  S  Ibrahim,  H  Jin,  L  Lu,  B  He,  G  Antoniu,  S  Wu  -­‐  Peer-­‐to-­‐Peer  Networking  and  Applications,  2013  

Hadoop  –  INRIA          S.IBRAHIM     25  

Page 26: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Zoom  on  Reduce  Phase    

Hadoop  –  INRIA          S.IBRAHIM     26  

Page 27: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Data  Locality  Data  Locality  is  exposed  in  the  Map  Task  scheduling    

Data  are  Replicated:  »     Fault  tolerance  »   Performance  :  divide  the  work  among  nodes  

Job  Tracker  schedules  map  tasks  considering:  »   Node-­‐aware  »   Rack-­‐aware    »   non-­‐local  map  Tasks  

Hadoop  –  INRIA          S.IBRAHIM     27  

Page 28: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Fault-­‐tolerance  TaskTrackers  send  heartbeats  to  the  Job  Tracker    »  Once  every  3  seconds      

TaskTracker  uses  heartbeats  to  detect  »  Node  is  labled  as  failed  If  no  heartbeat  is  recieved  for  a  defined  expiry  time  

(Defualt  :  10  Minutes)    

Re-­‐execute  all  the  ongoing  and  complete  tasks  

Need  to  develop  a  more  efficient  policy  to  prevent  re-­‐executing  completed  tasks  (storing  this  data  in  HDFS)  

   

 

 

Hadoop  –  INRIA          S.IBRAHIM     28  

Page 29: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Speculation  in  Hadoop  •  Nodes  slow  (stragglers)  à  run  backup  tasks  

Node 1

Node 2

Adopted  from  a  presentation  by  Matei  Zaharia  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”,  OSDI  2008,  San  Diego,  CA,  December  2008.  

Hadoop  –  INRIA          S.IBRAHIM     29  

Page 30: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Life  cycle  of  Map/Reduce  Task  

Hadoop  –  INRIA          S.IBRAHIM     30  

Page 31: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  ecosystem  

Hadoop  –  INRIA          S.IBRAHIM     31  

Page 32: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  ecosystem  •  Core:  Hadoop  MapReduce  and  HDFS    

•  Data  Access:  Hbase,  Pig,  Hive  

•  Algorithms:  Mahout  

•  Data  Import:  Flume,  Sqoop  and  Nutch    

Hadoop  –  INRIA          S.IBRAHIM     32  

Page 33: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  ecosystem  

Hadoop  –  INRIA          S.IBRAHIM    

•  Database  primitives  

•  Hbase    » Wide  column  data  structure  built  on  HDFS  

•  Processing    •  Pig  » A  high  level  data-­‐flow  language  and  execution  framework  for  parallel  computatuon  

33  

Page 34: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  ecosystem  

Hadoop  –  INRIA          S.IBRAHIM    

•  Algorithm    

•  Mahout  » Scalable  machine  learning  data  mining    » Runs  on  the  tip  of  Hadoop    

•  Processing    •  Hive  » Data  Warehouse  Infrastructure    » Support  for  custom  mappers  and  reducers  for  more  sophisticated  analysis  

34  

Page 35: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  ecosystem  

Hadoop  –  INRIA          S.IBRAHIM    

•  Data  Import    

•  Sqoop  » Structured  Data    » Import  and  Export  from  RDBMS  to  HDFS    

•  Data  Import    

•  Flume    » Import  streams    » Text  files  and  systems  logs    

35  

Page 36: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  ecosystem  

Hadoop  –  INRIA          S.IBRAHIM    

•  Coordination    •  ZooKeeper    » A  high-­‐performance  coordination  service  for  distributed  applications.    

•  Scheduling    •  Oozie    » A  workflow  scheduler  system  to  manage  Apache  Hadoop  jobs  

36  

Page 37: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Job  Scheduling  in  Hadoop    

37  Hadoop  –  INRIA          S.IBRAHIM    

Page 38: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Why  Jobs  Scheduling  in  Hadoop?  MapReduce  /  Hadoop  originally  designed  for  high  throughput  

batch  processing  

•  Today’s  workload  is  far  more  diverse:  » Many  users  want  to  share  a  cluster  

•  Engineering,  marketing,  business  intelligence,  etc  »  Vast  majority  of  jobs  are  short  

•  Ad-­‐hoc  queries,  sampling,  periodic  reports  »  Response  time  is  critical  

•  Interactive  queries,  deadline-­‐driven  reports  

•  How  can  we  efficiently  share  MapReduce  clusters  between  users?  

Hadoop  –  INRIA          S.IBRAHIM     38  

Page 39: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Job  Scheduling  in  Hadoop  • Assigning jobs to Hadoop Cluster

• Considerations » Job priority (e.g., deadline) » Capacity

•  Cluster resources available •  resources needed for job

39  Hadoop  –  INRIA          S.IBRAHIM    

Page 40: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Job  Scheduling  in  Hadoop  •  FIFO » The  first  job  to  arrive  at  a  JobTracker  is  processed  first  

•  Capacity Scheduling » Organizes  jobs  into  queues  » Queue  shares  as  %’s  of  cluster  

•  Fair  Scheduler    » Group  jobs  into  “pools”  » Divide  excess  capacity  evenly  between  pools  » Delay  scheduler  to  expose  data  locality    

 

40  Hadoop  –  INRIA          S.IBRAHIM    

Page 41: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  Optimizations  

41  Hadoop  –  INRIA          S.IBRAHIM    

Page 42: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Open  Issues  –  Heavy  access  All  the  metadata  are  handled  through  one  single  Master  in  HDFS  (Name  node)  

•  Perform  bad  when  » Handling  Many  files  » Heavy  concurrency        

 

Hadoop  –  INRIA          S.IBRAHIM     42  

Page 43: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

The  BlobSeer  Approach    to  Concurrency-­‐Optimized  Data  Management    

BlobSeer:  software  platform  for  scalable,  distributed  BLOB  management  »  Decentralized  data  storage  »  Decentralized  metadata  management  »  Versioning-­‐based  concurrency  control  »  Lock-­‐free  concurrent  writes  (enabled  by  versioning)  

A  back-­‐end  for  higher-­‐level  data  management  systems  »  Short  term:  highly  scalable  distributed  file  systems  »  Middle  term:  storage  for  cloud  services    »  Long  term:  extremely  large  distributed  databases  

http://blobseer.gforge.inria.fr/

Hadoop  –  INRIA          S.IBRAHIM     43  

Page 44: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Highlight:  BlobSeer  Does  Better  Than  Hadoop!     Original Hadoop

Hadoop+BlobSeer distributed grep sort

Execution time reduced by up to 38%

MapReduce: a natural application class for BlobSeer

• Journal of Parallel and Distributed Computing (2011), IPDPS 2010, MAPREDUCE 2010 (HPDC)

Hadoop  –  INRIA          S.IBRAHIM     44  

Page 45: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Open  Issues  –  Data  locality  •  Data  locality  is  crucial  for  Hadoop’s  performance  

•  How  can  we  expose  data-­‐locality  of  Hadoop  in  the  Cloud  efficiently?  

•  Hadoop  in  the  Cloud  » Unaware  of  network  topology  » Node-­‐aware  or  non-­‐local  map  tasks  

Hadoop  –  INRIA          S.IBRAHIM     45  

Page 46: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Open  Issues  –  Data  locality  Data  locality  in  the  Cloud  

The  simplicity  of  Map  tasks  Scheduling  leads  to  

     Non-­‐local  maps  execution  (25%)          

Node1        

Node2        

Node3        

Node5        

Node4        

Node6        

3  1   10  5   1  3   9  7  6  4   5   6  7   10  8  2   2   4  

9   12   12   8   11   11  

Empty node Empty node Empty node

Hadoop  –  INRIA          S.IBRAHIM     46  

Page 47: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Side  impacts:      •  Increase  the  execution  time  

•  Increase  the  number  of  useless  speculation    

• Slot  occupying  » Imbalance  in  the  Map  execution  among  nodes  

Hadoop  –  INRIA          S.IBRAHIM     47  

Page 48: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Maestro:  Replica-­‐Aware  scheduling  in  Hadoop  Schedule  the  map  tasks  in  two  waves:  

First  wave:  fills  the  empty  slots  of  each  data  node  based  on  the  number  of  hosted  map  tasks  and  on  the  replication  scheme  for  their  input  data  

Second  wave:  runtime  scheduling  takes  into  account  the  probability  of  scheduling  a  map  task  on  a  given  machine  depending  on  the  replicas  of  the  task’s  input  data.  

Results:  Maestro  can  achieve  optimal  data  locality  even  if  data  are  not  uniformly  distributed  among  nodes  and  improve  the  performance  of  Hadoop  applications  

Hadoop  –  INRIA          S.IBRAHIM     48  

Page 49: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Results  •  Sort  application    

Local maps

Virtual Cluster

Native Hadoop

83%

Maestro 96%

Grid5000 Native Hadoop

84%

Maestro 97%

S.  Ibrahim,  H.  Jin,  L.  Lu,  B.  He,  G.  Antoniu,  S.  Wu,  Maestro:  Replica-­‐aware  map  scheduling  for  mapreduce,  in:  CCGrid  2012  Hadoop  –  INRIA          S.IBRAHIM     49  

Page 50: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Open  Issues  –  Data  Skew  Data  Skew  

•  The  current  Hadoop’s  hash  partitioning  works  well  when  the  keys  are  equally  appeared  and  uniformly  stored  in  the  data  nodes  

•  In  the  presence  of  Partitioning  Skew:  »  Variation  in  Intermediate  Keys’  frequencies  »  Variation  in  Intermediate  Key’s  distribution  amongst  different  data  node  

•   Native  blindly  hash-­‐partitioning  is  to  be  inadequate  and  will  lead  to:  »  Network  congestion  »  Unfairness  in  reducers’  inputs  ?  Reduce  computation  Skew  »  Performance  degradation  

 Hadoop  –  INRIA          S.IBRAHIM     50  

Page 51: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Open  Issues  –  Data  Skew  Data  Node1

K1

hash  (Hash  code  (Intermediate-­‐key)  Modulo  ReduceID)                                                                                                                                                        

K1 K1 K2 K2 K2

K2 K2 K3 K3 K3 K3

K4 K4 K4 K4 K5 K6

Data  Node2 K1 K1 K1 K1 K1 K1

K1 K1 K1 K2 K4 K4

K4 K5 K5 K6 K6 K6

Data  Node3 K1 K1 K1 K2 K2

K2

K1

K4 K4 K4 K4 K4

K4 K5 K5 K5 K5 K5

K1 K2 K3 K4 K5 K6 K1 K2 K3 K4 K5 K6

K1 K1 K1 K2 K2 K2

K2 K2 K3 K3 K3 K3

K4 K4 K4 K4 K5 K6

K1 K1 K1 K1 K1 K1

K1 K1 K1 K2 K4 K4

K4 K5 K5 K6 K6 K6

K1 K1 K1 K2 K2

K2

K1

K4 K4 K4 K4 K4

K4 K5 K5 K5 K5 K5

Data  Node1� Data  Node2 � Data  Node3�

Total  Data  Transfer   � 11� 15 � 18� Total    44/54�

Reduce  Input  � 29� 17 � 8� cv  58%�

Hadoop  –  INRIA          S.IBRAHIM     51  

Page 52: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Example:  Wordcount  Example    

01002003004005006007008009001000

Map  Outpu

t

Red

uce  Input

Map  Outpu

t

Red

uce  Input

Map  Outpu

t

Reduce  Input

Map  Outpu

t

Red

uce  Input

Map  Outpu

t

Red

uce  Input

Map  Outpu

t

Reduce  Input

DataNode01 DataNode02 DataNode03 DataNode04 DataNode05 DataNode06

Data  Size  (M

B)

Transferred  Data   Local  Data Data  During  Failed  Reduce

6-­‐node,  2  GB  data  set!  

Combine  Function  is  disabled    

Data  Distribution  

Max-­‐Min  Ratio  

20%  

cv   42%  

Transferred  Data  is  relatively  Large    

Data  Distribution  is    Imbalanced    

83%  of  the  Maps  output  

Hadoop  –  INRIA          S.IBRAHIM     52  

Page 53: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

LEEN  Partitioning  Algorithm    •  Extend  the  Locality-­‐aware  concept  to  the  Reduce  Tasks  

•  Consider  fair  distribution  of  reducers’  inputs    

•  Results:    »  Balanced  distribution  of  reducers’  input    » Minimize  the  data  transfer  during  shuffle  phase    »  Improve  the  response  time    

Close  To  optimal  tradeoff  between  Data  Locality  and  reducers’  input  Fairness        

S.  Ibrahim,  H.  Jin,  L.  Lu,  S.  Wu,  B.  He,  L.  Qi,  Leen:  Locality/fairness-­‐aware  key  partitioning  for  mapreduce  in  the  cloud,  in:  CLOUDCOM’10  

Hadoop  –  INRIA          S.IBRAHIM     53  

Page 54: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

For K1

If (to N2 ) à 15 25 14 à Fairness-Score = 4.9 (to N3) à 15 9 30 à Fairness-Score = 8.8

LEEN  details  (Example)   K1 � K2 � k3 � k4 � k5 � k6 �

Node1 � 3 � 5 � 4 � 4 � 1 � 1 � 18 �

Node2 � 9 � 1 � 0 � 3 � 2 � 3 � 18 �

Node3 � 4 � 3 � 0 � 6 � 5 � 0 � 18 �

Total � 16 � 9 � 4 � 13 � 8 � 4 �

54  

Data Node1 K1 K1 K1 K2 K2 K2

K2 K2 K3 K3 K3 K3

K4 K4 K4 K4 K5 K6

Data Node1 K1 K1 K1 K1 K1 K1

K1 K1 K1 K2 K4 K4

K4 K5 K5 K6 K6 K6

Data Node1 K1 K1 K1 K1 K2 K2

K2 K4 K4 K4 K4 K4

K4 K5 K5 K5 K5 K5

K1 � K2 � k3 � k4 � k5 � k6 �

Node1 � 3 � 5 � 4 � 4 � 1 � 1 � 18 �

Node2 � 9 � 1 � 0 � 3 � 2 � 3 � 18 �

Node3 � 4 � 3 � 0 � 6 � 5 � 0 � 18 �

Total � 16 � 9 � 4 � 13 � 8 � 4 �

FLK � 4.66 2.93 1.88 2.70 2.71 1.66

N1 N2 N3 Data Node1

K1 K1 K1 K2 K2 K2

K2 K2 K3 K3 K3 K3

K4 K4 K4 K4 K5 K6

Data Node2 K1 K1 K1 K1 K1 K1

K1 K1 K1 K2 K4 K4

K4 K5 K5 K6 K6 K6

Data Node3 K1 K1 K1 K1 K2 K2

K2 K4 K4 K4 K4 K4

K4 K5 K5 K5 K5 K5

Data Transfer = 24/54 cv = 14%

Hadoop  –  INRIA          S.IBRAHIM     54  

Page 55: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Results  3 � 4 � 5 � 6 �

Locality Range � 1-85%� 15-35%� 1-50%� 1-30%�

S.  Ibrahim,  H.  Jin,  L.  Lu,  S.  Wu,  B.  He,  L.  Qi,  Leen:  Locality/fairness-­‐aware  key  partitioning  for  mapreduce  in  the  cloud,  in:  CLOUDCOM’10  

Hadoop  –  INRIA          S.IBRAHIM     55  

Page 56: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Open  Issues  –  Energy  Efficency  •  Data-­‐centers  consume  large  amount  of  energy  » High  monetary    

•  Power  bills  become  a  substantial  part  of  the  total  cost  of  ownership  (TCO)  of  data-­‐center  (40%  of  the  total  cost)  

» High  carbon  emission    

Hadoop  –  INRIA          S.IBRAHIM     56  

Page 57: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Why  Energy  efficient  Hadoop?  

It  is  essential  to  explore  and  optimize  the  energy  efficiency    

when  running  Big  Data  application  in    Hadoop  

THE  engine  for  Big  Data    processing  in  data-­‐centers    

Power  bills  become  a  substantial    part  of  the  total  cost  of  ownership    (TCO)  of  data-­‐centers  

Hadoop  –  INRIA          S.IBRAHIM     57  

Page 58: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Energy-­‐efficiency  in  Hadoop  

Hadoop  –  INRIA          S.IBRAHIM     58  

Page 59: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

DVFS  in  Hadoop?    

59  

There  is  a  significant  potential  of  energy  saving  by    scaling  down  the  CPU  frequency  when  peak  CPU  is  not  

needed    

Diversity  of  MapReduce    applications  

Multiple  phases  of  MapReduce    application  

Disk  I/O     CPU   Disk  I/O     Network    

CPU  load  is  high  (98%)  

during  almost  75%  of  the  job  

running    

CPU  load  is  high(80%)  during  only  

15%  of  the  job  running    

Hadoop  –  INRIA          S.IBRAHIM     59  

Page 60: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

DVFS  governors    Performance  

 Powersave   OnDemand   Conservative   Userspace  

Maximize  Performance  

Maximize  power  savings    

Power  efficiency  with  reasonable  performance     Support  for  user-­‐defined  frequencies  

Static   Dynamic   Static  

Highest  available  frequency  

Lowest  available  frequency    

 

Gradually  degrades  the  CPU  frequency  when  the  load  is  low    

Adjust  to  the  highest  available  frequency  when  the  load  is  high    

Gradually  upgrades  the  CPU  frequency  when  the  load  is  

high    

High  power  consumption  

Long  response  time    

Low  performance/power  saving  benefits  

when  the  system  switches  between  idle  states  and  heavy  load  

often  

Worse  performance  than  Ondemand  

Hadoop  –  INRIA          S.IBRAHIM     60  

Page 61: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Power  Consumption  

Pi  application  

40%   Pi  application  

“Towards  Efficient  Power  Management  in  MapReduce:  Investigation  of  CPU-­‐Frequencies  Scaling  on  Power  Efficiency  in  Hadoop”,  S.  Ibrahim  et  al,  ARMS  CC  2014.  

Hadoop  –  INRIA          S.IBRAHIM     61  

Page 62: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Performance  vs  Energy  Pi  application  

“Towards  Efficient  Power  Management  in  MapReduce:  Investigation  of  CPU-­‐Frequencies  Scaling  on  Power  Efficiency  in  Hadoop”,  S.  Ibrahim  et  al,  ARMS  CC  2014.  

Hadoop  –  INRIA          S.IBRAHIM     62  

Page 63: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Open  Issue:  Hadoop  in  Virtualized  environment    •  The  applicability  of  MapReduce  on  Virtualized  Cloud  

……….

Virtual Machine

Security

………...

MapReduce

Virtual Machine

Various Implementations

Scalability

Load Balancing

Locality Optimization

Fault Tolerance

Simplicity

Check Pointing / Migration

Optimize Resource Utilization

Power Saving

Resource Abstraction

Performance Isolation

Simplicity

Green Data Center

Pay as you go with no lock-in

Reliable and Robustness

Abstraction layer for the hardware, software, and configuration of systems

Flexible Migration and Restart Capabilities

Dynamic Hardware and

Software Maintenance

Cloud Computing

Dynamic scaling

Hadoop  –  INRIA          S.IBRAHIM     63  

Page 64: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Physical  vs  Virtual  Hadoop  Cluster  •  HDFS  on  Physical  Machine  is  Better  than  VMS  

•  Separate  the  permanent  data-­‐storage  (DFS)  from  the  virtual  storage  associated  with  VM.  

“Evaluating  mapreduce  on  virtual  machines:  The  hadoop  case”,  S  Ibrahim,  H  Jin,  L  Lu,  L  Qi,  S  Wu,  X  Shi  -­‐  Cloud  Computing,  2009    

Hadoop  –  INRIA          S.IBRAHIM     64  

Page 65: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Multiple  VM  Deployment  •  Deploying  More  VMs  

can  improve  the  performance  

•  VMs  within  the  same  physical  node  are  competing  for  the  node  I/O,  causing  poor  performance.  

“Evaluating  mapreduce  on  virtual  machines:  The  hadoop  case”,  S  Ibrahim,  H  Jin,  L  Lu,  L  Qi,  S  Wu,  X  Shi  -­‐  Cloud  Computing,  2009    

Hadoop  –  INRIA          S.IBRAHIM     65  

Page 66: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

CLOUDLET  

“CLOUDLET:  towards  mapreduce  implementation  on  virtual  machines”,  S  Ibrahim,  H  Jin,  B  Cheng,  H  Cao,  S  Wu,  L  Qi.  HPDC  2009    

Hadoop  –  INRIA          S.IBRAHIM     66  

Page 67: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Evaluation  •  Three-­‐Nodes  Cluster  

Achieved  up  to  9%  on  very  small  cluster  (WC)    

0

500

1000

1500

2000

2500

512MB 1GB 2GB

1VM

Elap

sed

Tim

e (S

ec) Native

Separate

Achieved  up  to  50%  on  very  small  cluster  (Sort)    

Hadoop  –  INRIA          S.IBRAHIM     67  

Page 68: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Meta  Disk  I/O  Scheduler      •  Different  Disk  Schedulers  in  Xen:  »  CFQ,  Anticipatory,  Deadline,  Noop  

» Scheduling  in  two  level  (VMM,  VM)  

•  Selecting  the  appropriate  pair  schedulers  can  significantly  affect  the  applications  performance  

•  MapReduce  is  used  to  run  different  applications    

•  A  typical  MapReduce  application  consists  of  different  interleaving  stages,  each  requiring  different  I/O  workloads  and  patterns  

Is  their  an  optimal  pair  schedulers  for  MapReduce  application?  

Hadoop  –  INRIA          S.IBRAHIM     68  

Page 69: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Motivation  

Wordcount     Wordcount    w/o  combiner   Sort  (A,  C)  à  4.5%   (A,N)  (D,N)  à  6%   (A,A)  à  9%  The  disk  pairs  Schedulers  are  only  sub-­‐optimal  for  different  MapReduce  applications  “Adaptive  disk  i/o  scheduling  for  mapreduce  in  virtualized  environment”,  S  Ibrahim,  H  Jin,  L  Lu,  B  He,  S  Wu.  ICPP  2011  

Hadoop  –  INRIA          S.IBRAHIM     69  

Page 70: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Motivation  •  A  typical  Hadoop  application  consists  of  different  interleaving  stages,  each  

requiring  different  I/O  workloads  and  patterns.    

The  pairs  schedulers  are  also  sub-­‐optimal  for  different  sub-­‐phases  of  the  whole  job.    

Hadoop  –  INRIA          S.IBRAHIM     70  

Page 71: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Solution-­‐  Meta  Scheduler  •  A  novel  approach  for  adaptively  tuning  the  disk  scheduler  

pairs  in  both  the  hypervisor  and  the  virtual  machines  during  the  execution  of  a  single  MapReduce  job.    »  First,  manually  dividing  the  Hadoop  program  into  phases  based  on  

 the  characteristics  of  Hadoop  applications.    »  If  there  are  phases  in  an  application  and  a  possible  pair  scheduler,  there  are  unique  solutions,  where  a  solution  is  an  assignment  of  disk  pair  schedulers  to  each  phase.  

»  there  is  a  penalty  of  the  switching  overhead  

•  Novel  heuristic  that  searches  the  space  of  solutions.  »  It  executes  a  solution  and  evaluates  the  performance  score  including  the  switch  cost.  Based  on  the  evaluation,  it  selects  the  next  solution  to  evaluate.  

“Adaptive  disk  i/o  scheduling  for  mapreduce  in  virtualized  environment”,  S  Ibrahim,  H  Jin,  L  Lu,  B  He,  S  Wu.  ICPP  2011  

Hadoop  –  INRIA          S.IBRAHIM     71  

Page 72: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Evaluation  

“Adaptive  disk  i/o  scheduling  for  mapreduce  in  virtualized  environment”,  S  Ibrahim,  H  Jin,  L  Lu,  B  He,  S  Wu.  ICPP  2011  

Hadoop  –  INRIA          S.IBRAHIM     72  

Page 73: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Acknowledgments  Dhruba  Borthakur  (Yahoo!)  

Vinod  Kumar  Vavilapalli  (Hortonworks)  

Apache  Hadoop  Community      

73  Hadoop  –  INRIA          S.IBRAHIM    

Page 74: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Thank  you!  

74  

Shadi  Ibrahim  

[email protected]  

Page 75: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

YARN  :  Next  Generation  Hadoop      Adapted  from  a  Presentation  by  Vinod  Kumar  Vavilapalli(Hortonworks),  “Hadoop  Yarn”,    SF  Hadoop  Users  Meetup    

75  Hadoop  –  INRIA          S.IBRAHIM    

Page 76: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Hadoop  MapReduce  Classic  JobTracker  

–    Manages  cluster  resources  and  job  scheduling  

 TaskTracker  

–    Per-­‐node  agent    

–    Manage  tasks    

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

76  Hadoop  –  INRIA          S.IBRAHIM    

Page 77: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Current  Limitations  Scalability  –    Maximum  Cluster  size  –  4,000  nodes    –    Maximum  concurrent  tasks  –  40,000    –    Coarse  synchronization  in  JobTracker  

Single  point  of  failure  –    Failure  kills  all  queued  and  running  jobs  –    Jobs  need  to  be  re-­‐submitted  by  users  

Restart  is  very  tricky  due  to  complex  state  

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

77  Hadoop  –  INRIA          S.IBRAHIM    

Page 78: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Current  Limitations  -­‐2  Hard  partition  of  resources  into  map  and  reduce  slots    –    Low  resource  utilization  

Lacks  support  for  alternate  paradigms  –    Iterative  applications  implemented  using  MapReduce  are  10x  slower  

–    Hacks  for  the  likes  of  MPI/Graph  Processing  

Lack  of  wire-­‐compatible  protocols  –    Client  and  cluster  must  be  of  same  version  –    Applications  and  workflows  cannot  migrate  to  different  clusters     “Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

78  Hadoop  –  INRIA          S.IBRAHIM    

Page 79: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Requirements    Reliability  Availability  Utilization  WireCompatibility  Agility  &  Evolution  –Abilityforcustomerstocontrol  upgrades  to  the  grid  software  

stack.  Scalability-­‐Clusters  of  6,000-­‐10,000  machines  –    Each  machine  with  16  cores,  48G/96G  RAM,  24TB/36TB  disks  –    100,000+  concurrent  tasks  –    10,000  concurrent  jobs    

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

79  Hadoop  –  INRIA          S.IBRAHIM    

Page 80: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Design  Centre    Split  up  the  two  major  functions  of  JobTracker    

–    Cluster  resource  management  

–    Application  life-­‐cycle  management  

MapReduce  becomes  user-­‐land  library    

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

Hadoop  –  INRIA          S.IBRAHIM     80  

Page 81: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Architecture      Application  –    Application  is  a  job  submitted  to  the  framework    

–    Example  –  Map  Reduce  Job  Container  –    Basic  unit  of  allocation  –    Example  –  container  A  =  2GB,  1CPU    –    Replaces  the  fixed  map/reduce  slots  

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

Hadoop  –  INRIA          S.IBRAHIM     81  

Page 82: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Architecture  Resource  Manager  –    Global  resource  scheduler    –    Hierarchical  queues  Node  Manager  –    Per-­‐machine  agent  –    Manages  the  life-­‐cycle  of  container  –    Container  resource  

monitoring  Application  Master  –    Per-­‐application  –    Manages  application  scheduling  and  task  execution    –    E.g.  MapReduce  Application  Master  

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

Hadoop  –  INRIA          S.IBRAHIM     82  

Page 83: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Architecture  

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

Hadoop  –  INRIA          S.IBRAHIM     83  

Page 84: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Improvements  vs  classic  MapReduce  Utilization  

–    Generic  resource  model  •  Data  Locality  (node,  rack  etc.)  •  Memory  •  CPU  •  Disk  b/q  •  Network  b/w  

–    Remove  fixed  partition  of  map  and  reduce  slot    

Scalability  

–    Application  life-­‐cycle  management  is  very  expensive  

–    Partition  resource  management  and  application  life-­‐cycle  management  –    Application  management  is  distributed  

–    Hardware  trends  -­‐  Currently  run  clusters  of  4,000  machines  •  6,000  2012  machines  >  12,000  2009  machines  •  <16+  cores,  48/96G,  24TB>  v/s  <8  cores,  16G,  4TB>  

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

Hadoop  –  INRIA          S.IBRAHIM     84  

Page 85: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Improvements  vsclassic  MapReduce  Fault  Tolerance  and  Availability    –    Resource  Manager  

•  No  single  point  of  failure  –  state  saved  in  ZooKeeper  (coming  soon)  •  Application  Masters  are  restarted  automatically  on  RM  restart  

–    Application  Master  •  Optional  failover  via  application-­‐specific  checkpoint  •  MapReduce  applications  pick  up  where  they  left  off  via  state  saved  in  HDFS  

 Wire  Compatibility  –    Protocols  are  wire-­‐compatible  –    Old  clients  can  talk  to  new  servers  –    Rolling  

upgrades   “Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    Hadoop  –  INRIA          S.IBRAHIM     85  

Page 86: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Improvements  vs  classic  MapReduce  Innovation  and  Agility  –    MapReduce  now  becomes  a  user-­‐land  library  –    Multiple  versions  of  MapReduce  can  run  in  the  same  cluster  (a  la  Apache  Pig)  

•  Faster  deployment  cycles  for  improvements  

–    Customers  upgrade  MapReduce  versions  on  their  schedule  

–    Users  can  customize  MapReduce  “Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    

Hadoop  –  INRIA          S.IBRAHIM     86  

Page 87: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Improvements  vs  classic  MapReduce  Support  for  programming  paradigms  other  than  

MapReduce  –    MPI  –    Master-­‐Worker  –    Machine  Learning  –    Iterative  processing  –    Enabled  by  allowing  the  use  of  paradigm-­‐specific  

application  master  –    Run  all  on  the  same  Hadoop  cluster  

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    Hadoop  –  INRIA          S.IBRAHIM     87  

Page 88: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Any  Performance  Gains?    Significant  gains  across  the  board,  2x  in  some  cases.    

•    MapReduce  –  Unlock  lots  of  improvements  from  Terasort  record  (Owen/Arun,  2009)  

–  Shuffle  30%+  –  Small  Jobs  –  Uber  AM    –  Re-­‐use  task  slots  (containers)  

“Hadoop  Yarn”,  Vinod  Kumar  Vavilapalli  ,SF  Hadoop  Users  Meetup    Hadoop  –  INRIA          S.IBRAHIM     88  

Page 89: Big$Data$Processing$using$ Hadoop$ - prace.it4i.czprace.it4i.cz/sites/prace.it4i.cz/files/files/hadoop-10-2015... · Original Hadoop distributed grep Hadoop+BlobSeer sort Execution

Thank  you!  

89  

Shadi  Ibrahim  

[email protected]  

Big  Data  Processing  in  the  Cloud–  INRIA          S.IBRAHIM