詹剑锋：Big databench—benchmarking big data systems

INSTITUTE O

PUTING

BigDataBench: Benchmarking Big Data Systems

Jianfeng ZhanComputer Systems Research Center, ICT, CAS

CCF Big Data Technology Conference 2013-12-06

http://prof.ict.ac.cn/jfzhan

Why Big Data Benchmarking?2

Measuring big data architecture and systems quantitatively

What is BigDataBench? An open source project on big data

benchmarking: • http://prof.ict.ac.cn/BigDataBench/

• 6 real-world data sets and 19 workloads– Extended in near future

• 4V characteristics– Volume, Variety, Velocity, and Veracity

Comparison of Big Data Benchmarking Efforts4/

Possible Users5/

BigDataBench

ArchitectureProcessorMemory

Networks…….....

Systems OS for big data

File systems for big data…………………………..

Data management

…………..

Performance optimization

Co-design

Distributed systemsScheduling

Programming systems

Research Publications

Characterizing data analysis workloads in data centers. Zhen Jia, Lei Wang, Jianfeng Zhan, Lixing Zhang, and Chunjie Luo. IISWC 2013 Best paper award

BigDataBench: a Big Data Benchmark Suite from Internet Services. Lei Wang, Jianfeng Zhan, et al. HPCA 2014, Industry Session.

Outline7/

Benchmarking Methodology and Decision1

Case Study

3 How to Use

Future Work 44

BigDataBench Methodology8/

4V of Big Data BigDataBench

Methodology (Cont’)9/

Representative Data

Diverse Worklo

Data SourcesText dataGraph dataTable dataExtended …

Data TypesStructuredSemi-structuredUnstructured

Big Data Sets Preserving 4V

BigDataBench

Investigate Typical

Application Domains

data generation tool preserving data characteristics

Application TypesOffline analyticsRealtime analyticsOnline services

Basic & Important Operations and Algorithms Extended…

Represent Software Stack Extended…

Big Data Workloads

Methodology (Cont’)10/

4V of Big Data

System and architecture characteristics

Similarity analysis

BigDataBench

Top Sites on the Web

More details in http://www.alexa.com/topsites/global;0

Search Engine, Social Network and Electronic Commerce hold 80% page views of all the Internet service.11/

Workloads Chosen12/

• Cover workloads in diverse and representative application scenarios

• Search Engine, E-commerce, Social Network

• Pay equal attentions to different application types:

• online service, real-time analytics, offline analytics

• Include different data sources

• Text data, Graph data, Table data

• Cover representative software stacks

19 Chosen Workloads13/

Application Scenarios

Micro Benchmarks

Basic Datastore Operations

Relational Queries

Search engines

Social networks

E-commerce system

Data Generation Tools

Data Sources Text, Graph and Table

• Six real raw data

Synthetics Data Scale

• From GB to PB

Features• Preserve characteristics of real-world data

Naïve Text generator15/

wordsfollowing multinomial distribution

select word randomly

documents

architecture

system

miningdata

benchmarkingmemory

evaluatemachine

learning

Only modeling on the word level;

Improved Text generator16/

select word randomly

wordsfollowing multinomial distribution under topic2

architecture

benchmarking

miningdata

systemmemory

evaluatemachine

learning

document

topic1

topic3

topic2

select topic randomly

topicsfollowing multinomial distribution

Modeling on the both topic and word level

Outline17/

Case Study

3 How to Use

Future Work 44

BigDataBench Case Study18/

BigDataBench

Evaluating Big Data Hardware

Systems

Performance evaluation and Diagnosis

Workload Characterization

Networks for big data Energy Efficiency of

Big Data Systems

USTC, and Florida International University

ICT, CASSIAT, CAS

CNCERTOSU

SJTU, and XJTU

http://prof.ict.ac.cn/BigDataBench/#users

Testbed 19/

Workloads Analyzed 20/

http://prof.ict.ac.cn/BigDataBench

Floating point operation intensity

The total number of (floating point or integer) instructions divided by the total number of memory access bytes in a run of workload.

Very low floating point operation intensities ( 0.009), two orders of magnitude lower than the theory number of state-of-practice CPU (1.8)

Data Analytics Services

Instruction Breakdown

Less floating point operations More Integer operations22/

Ratio of Integer to Floating Point Operations

The average of big data workloads is 100 Parsec, HPCC and SPECFP (1.4, 1.0, 0.67)

Integer operation intensity

The average integer operation intensity of big data workloads is 0.49

That of PARSEC, HPCC, SPECFP is 1.5, 0.38, 0.23 24/

Cache Behaviors

Big data workloads have high L1I misses than HPC workloads Data analysis workloads have better L2 cache behaviors than service workloads

except BFS

Big data workloads have good L3 behaviors

TLB Behaviors

ITLB misses of big data workloads are higher than HPC workloads. DTLB misses of big data workloads are higher than HPC workloads. 26/

data analysis service14 5

BigDataBench Case Study27/

BigDataBench

Evaluating Big Data Hardware

Systems

Performance evaluation and Diagnosis

Big Data workload Characterization

Networks for big data Energy Efficiency of

Big Data Systems

USTC, and Florida International University

ICT, CASSIAT, CAS

CNCERTOSU

SJTU, and XJTU

http://prof.ict.ac.cn/BigDataBench/#users

Evaluating Big Data Hardware Systems

Experimental Platforms

Xeon (Common processor)

Atom ( Low power processor)

Tilera (Many core processor)CPU Type Intel Xeon

E5310 Intel Atom D510 Tilera TilePro36

CPU Core 4 cores @ 1.6GHz

2 cores @ 1.66GHz

36 cores @ 500MHz

L1 I/D Cache 32KB 24KB 16KB/8KB

L2 Cache 4096KB 512KB 64KB

Basic InformationBrief Comparison

Experimental Platforms

Hadoop ClusterInformation Xeon VS Atom Xeon VS Tilera

Comprison(the same logical

core number)

[ 1 Xeon master+7 Xeon slaves ] VS [ 1

Atom master +7 Atom slaves]

[1 Xeon master+7 Xeon slaves] VS [ 1 Xeon

master +1 Tilera slave]

Hadoop setting Following the guidance on Hadoop official website

Benchmark SelectionBigDataBench 1.0

Application Time Complexity Characteristics

Sort O(n*log2n) Integer comparison

WordCount O(n) Integer comparison and calculation

Grep O(n) String comparisonNaïve Bayes O(m*n) Floating-point computation

SVM O(n3) Floating-point computation

Metrics

Performance: Data processed per second (DPS)

Energy Efficiency: Application Performance Power Usage Effectiveness(DPJ)

Xeon VS Atom – DPJ

Xeon VS Tilera – DPJ

Reference

Jing Quan, University of Science and Technology of China, Yingjie Shi, Chinese Academy of Sciences, Ming Zhao, Florida International University, Wei Yang, University of Science and Technology of China.

”The Implications from Benchmarking Three Different Data Center Platforms”

The First Workshop on Big Data Benchmarks, Performance Optimization, and Emerging hardware (BPOE 2013) in conjunction with 2013 IEEE International Conference on Big Data (IEEE Big Data 2013)

Outline36/

Case Study

3 How to Use

Future Work 44

BigDataBench Class For Architecture

19 among 19

For OS 19 among 19

For Runtime environment (Hadoop) 9 of 19 workloads

•Sort, Grep, WordCount, PageRank, Index, Kmeans, Connected Components, Collaborative Filtering and Naive Bayes.

For Data management 6 of 19 workloads

•Read, Write, Scan, Select Query, Aggregate Query, Join Query

BigDataBench Class: data sources Text related

6 of 19 workloads•Sort, Grep, WordCount, Index, Collaborative Filtering and Naive Bayes

Graph related 4 of 19 workloads

•BFS, PageRank, Kmeans, and Connected Components

Table related 9 of 19 workloads

•Read, Write, Scan, Select Query, Aggregate Query, Join Query, Nutch Server, Olio Server and Rubis Server

BigDataBench Class: Application Types

Online Services 6 of 19 workloads

• Read, Write, Scan, Nutch server, Olio Server and Rubis server

Offline Analytics 10 of 19 workloads

• Sort, Grep, WordCount, BFS, PageRank, Index, Kmeans, Connected Components, Collaborative Filtering and Naive Bayes.

Realtime Analytics 3 of 19 workloads

• Select Query, Aggregate Query and Join Query

BigDataBench Class: Application Domains Search engine related: Basic Operations + Search Engine

7 of 19 workloads•Sort, Grep, WordCount, BFS, PageRank, Index and Nutch Server

Social network related: Basic Cloud OLTP+ Basic Relational Query+ Social Network

9 of 19 workloads•Read, Write, Scan, Select Query, Aggregate Query, Join Query, Olio Server, Kmeans and Connected Components

E-commerce related: Basic Cloud OLTP+ Basic Relational Query+ Social Network

9 of 19 workloads• Read, Write, Scan, Select Query, Aggregate Query, Join Query, Rubis server, Collaborative Filtering and Naive Bayes

Outline41/

Case Study

3 How to Use

Future Work 44

Near Future Work

Multi-media data

Deep learning workloads

Refine BigDataBench

Related Resources

BigDataBench project http://prof.ict.ac.cn/BigDataBench

BPOE workshop http://prof.ict.ac.cn/bpoe A series of workshops on Big Data Benchmarks,

Performance Optimization, and Emerging Hardware BPOE-4: interaction among OS, architecture, and data

management• Co-located with ASPLOS 2014

BPOE-4 SC Christos Kozyrakis, Stanford Xiaofang Zhou, University of Queensland Dhabaleswar K Panda, Ohio State University Raghunath Nambiar, Cisco Lizy K John, University of Texas at Austin Xiaoyong Du, Renmin University of China H. Peter Hofstee, IBM Austin Research Laboratory Ippokratis Pandis, IBM Almaden Research Center Alexandros Labrinidis, University of Pittsburgh Bill Jia, Facebook Jianfeng Zhan, ICT, Chinese Academy of Sciences

THANKS45/

詹剑锋：Big databench—benchmarking big data systems

Technology

牛少彰崔宝江李剑编著

[剑桥雅思1].cambridge.1 tapescripts

回眸上海(3) 刀光剑影

aerial imagery for hd lakes wisconsinPLUS...Big Bass, Washburn Big Bear, Burnett Big Brook, Bayfield Big Buck, Chippewa Big Butternut, Polk Big Carr, Oneida Big Cedar, Washington*

余锋 - clxy.usst.edu.cn

新编剑桥商务英语 Success with BEC

唐剑武 jtang@mbl

蚂蜂窝旅游攻略欧洲系列剑桥 · 2014. 9. 26. · 蚂蜂窝旅游攻略欧洲系列 3 亮点igigt 剑河撑篙乘船游览剑河是到剑桥旅游的必游项目，最初在1702年就已经兴起，

做自己冲锋枪 - 枪 - 热拉尔Metral

承包商管理 -- QNPC 何小剑 2008.2

广东省注册会计师协会 - cicpa.org.cn · PDF file陈奕贤刘阳梁宇钰 ... 张剑邹巧玉张梦张学静宋晓萍王宝新王秀芳盛田芳袁保合 ... 伍志锋

Huckleberry Finn By Mark Twain By Mark Twain 何秀娟雷洁程俊锋何秀娟雷洁程俊锋

Big Java, Big Data

Yin Style Bagua Combatives · 2016. 12. 5. · 1030-1115 组合剑法 Sword Technique Combinations Sword combination drills – Pushing, Lifting, Hanging, and Pressing 教剑推,

Elman, Cambridge History of China, Volume 9, No. 1. 《剑桥

Big data big mystery ?

2017 - gxedu.gov.cn · PDF file文峰、万剑锋、覃琴、韦荔甫、蒙曦、喻艳、钟洁 2 ... 夏飞、黄锡远、姚伟民、蔡幸

詹剑锋：Big databench—benchmarking big data systems

感谢您使用 SPORTSTAR 户外先锋

新编剑桥商务英语 ( 中级 )