Upload
yukinori-suda
View
5.165
Download
0
Embed Size (px)
Citation preview
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 1 1
Cloudera impala 0.6 beta Performance Evaluation(with Comparison to Hive)
Mar. 6, 2013CELLANT Corp. R&D Strategy Division
Yukinori SUDA@sudabon
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v ChangeLogs from 0.5 betav Cloudera Manager 4.5 and CDH 4.2 support Impala 0.6.v Support for the RCFile file format.v Added support for Impala on SUSE and Debian/Ubuntu.
v RHEL5.7/6.2 and Centos5.7/6.2v SUSE 11 with Service Pack 1 or laterv Ubuntu 10.04/12.04 and Debian 6.03
Cloudera impala 0.6 beta
2
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
System Environment
3
v Install via Cloudera Manager Free Edition 4.5.0
Master Slave
11 Servers
All servers are connected with 1Gbps Ethernet through an L2 switch
ActiveNameNode
DataNodeTaskTrackerImpalad
Stand-‐‑‒byNameNode
JobTrackerstatestored
3 Servers
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
DataNodeTaskTrackerImpalad
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v CPUl Intel Core 2 Duo 2.13 GHz with Hyper Threading
v Memoryl 4GB
v Diskl 7,200 rpm SATA mechanical Hard Disk Drive
v OSl Cent OS 6.2
Server Specification
4
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v Use CDH4.2.0 + impala version 0.6 betav Use hivebench in open-‐‑‒sourced benchmark tool “HiBench”
l https://github.com/hibenchv Modified datasets to 1/10 scale
l Default configuration generates table with 1 billion rowsv Modified query sentence
l Deleted “INSERT INTO TABLE …” to evaluate read-‐‑‒only performancev Combines a few Hive storage format with a few compression methodl TextFile, SequenceFile, RCFilel No compression, Gzip, Snappy
v Comparison with job query latencyv Average job latency over 5 measurements
Benchmark
5
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
• Uservisits table– 100 million rows– Table Definitions
• sourceIP string• destURL string• visitDate string• adRevenue double• userAgent string• countryCode string• languageCode string• searchWord string• duration int
• Rankings table– 12 million rows– Table Definitions
• pageURL string• pageRank int• avgDuration int
Modified Datasets
6
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
SELECT sourceIP, sum(adRevenue) as totalRevenue, avg(pageRank) FROM rankings_̲t RJOIN ( SELECT sourceIP, destURL, adRevenue FROM uservisits_̲t UV WHERE (datediff(UV.visitDate, '1999-‐‑‒01-‐‑‒01')>=0 AND datediff(UV.visitDate, '2000-‐‑‒01-‐‑‒01')<=0) ) NUV
ON (R.pageURL = NUV.destURL)group by sourceIPorder by totalRevenue DESClimit 1;
Modified Query
7
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Benchmark Result (Hive)
8
0 50 100 150 200 250
No Comp.
Gzip
Snappy
Gzip
Snappy
TextFile
SequenceFile
RCFile
235.843
227.883
213.616
234.289
197.894
Avg. Job Latency [sec]
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Benchmark Result (impala)
9
0 50 100 150 200 250
No Comp.
Gzip
Snappy
Gzip
Snappy
TextFile
SequenceFile
RCFile
32.776
21.25
17.725
17.03
16.059
Avg. Job Latency [sec]
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
job TextFile SequenceFile RCFile
No Comp. Gzip Snappy Gzip Snappy 1st 50.256 23.692 22.085 18.475 20.042 2nd 34.905 20.710 19.733 16.690 18.859 3rd 30.752 20.604 15.608 16.620 16.642 4th 26.848 20.625 15.602 16.617 12.148 5th 21.121 20.620 15.597 16.747 12.606
Average 32.776 21.250 17.725 17.030 16.059
Block Location Cache effect ?
10
v 1st job is the slowest, and the fastest job is one of the others due to Block Location Cache effect?
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v Impala is over 10 times faster than MRv1 + Hive
v Specifically,l Impala 0.6 beta
• RCFile compressed as Snappy: 16.059 secl MRv1 + Hive 0.10
• RCFile compressed as Snappy: 197.894 secv Hope that impala GA included in CDH5 makes fasterl Support Trevni columner formatl Optimized Query Planner
Conclusion
11
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 12
Thanks.