AMPLab Yahoo

Embed Size (px)

Citation preview

  • 8/12/2019 AMPLab Yahoo

    1/12

    Yahoo & AMPLabCelebration of our partnershipApril 16, 2013

  • 8/12/2019 AMPLab Yahoo

    2/12

    Yahoo!s core business Make the worlds daily habits inspiring and entertaining Put brands in the center of peoples daily habits

    Yahoo! Confidential & Proprietary. 2 4/18/2013

    Yahoo!

    Users

    Adv Publ

  • 8/12/2019 AMPLab Yahoo

    3/12

    What problems do we solve? Matching content to user

    Personalized Responsive

    Matching ads to users Maximize yield Maximize return on investment Maintain positive user experience

    Yahoo! Confidential & Proprietary. 3 4/18/2013

  • 8/12/2019 AMPLab Yahoo

    4/12

  • 8/12/2019 AMPLab Yahoo

    5/12

    What do we need?

    Science Data

    Platforms that analyze data at scale and close the loop On grid solution Horizontally scalable and fault tolerant Interactive Easy to describe sophisticated data mining tasks

    Quick to prototype, and easy to productionize Few knobs to turn

    Yahoo! Confidential & Proprietary. 6 4/18/2013

  • 8/12/2019 AMPLab Yahoo

    6/12

    How do we do this today?

    Yahoo! Confidential & Proprietary. 7 4/18/2013

  • 8/12/2019 AMPLab Yahoo

    7/12

    How the AMPLab, Yahoo! Relationship started

    How to cut down ETL, and query on grid directly? Inspired by Dremel/ enhance with in memory techniques

    Mateis talk on Shark @ Hadoop Summit 2012 Shark Server Further small enhancements and bug fixes Meet with Ion and Mike at AMPLab

    Yahoo! Confidential & Proprietary. 8 4/18/2013

  • 8/12/2019 AMPLab Yahoo

    8/12

  • 8/12/2019 AMPLab Yahoo

    9/12

    Where are we headed? End of Q2, Shark will be available on a 50 node cluster (100GB

    RAM) for advertising analytics. One customer facing analytics optimization feature planned on top

    of Shark Shark/ Spark packaged and available to autodeploy on any cluster

    within Yahoo! Mid Q2 start work on 4000 node cluster productionize YARN patch Bug fixes, memory leak fixes and features like Column Pruning, Map

    Join etc will be checked back into Shark/Spark main branch.

    Upcoming work includes further join optimization, queryoptimizations specific to analytic workloads., Compression etc. Longer term roadmap to enhance on disk performance as well

    Yahoo! Confidential & Proprietary. 10 4/18/2013

  • 8/12/2019 AMPLab Yahoo

    10/12

    Future Architecture

    Yahoo! Confidential & Proprietary. 11 4/18/2013

  • 8/12/2019 AMPLab Yahoo

    11/12

  • 8/12/2019 AMPLab Yahoo

    12/12