Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
李國輝 (KH Li) 亞太區Solution Architect Intel Corp
Intel Distribution for Apache Hadoop (IDH) 在 Big Data上的運用
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,
TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH
PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF
INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY
PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU
PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES,
SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND
EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR
DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE
DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any
features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or
incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or
go to: http://www.intel.com/design/literature.htm
Copyright © 2012 Intel Corporation.
Virtuous Cycle of Data-Driven Innovation
2.8 Zettabytes of data will
be generated WW in 20121
Richer user experiences
Richer data from devices
40 Zettabytes of data will
be generated WW in 20201
Richer data to analyze
Cloud
Clients
Intelligent Systems
(1) IDC Digital Universe 2020, (2) IDC
3
Big Data Use Cases Across Industries
National, Public and Cyber Security
Education Government Healthcare
Retail Manufacturing Telecommunication Financial Services
Enterprise Data
Warehouse
Spreadsheets
Visualize
Mobile Analysis
Consume/Review
RDBMS
ANALYTICS
No-SQL
In Memory DB APPS
Node Node Node
Hadoop
Sensor/ Machine Data
Logs
Social & Web
Legacy
STRUCTURED
UNSTRUCTURED
Docs & Audio Files
DATA PLATFORMS
CONSUME
Cre
ate
Map
REDUCE
Intel Optimized Hadoop Architecture
IMPORT
IMPORT
IMPORT
AES-NI (TXT)
Intel Manager
Intel Dist for Apache*
Hadoop
Data Mining
HiBench HiTune
SQL ‘CAS’
Intel SSDs 910 & S3700
Series
10G NICs
Intel X520/540
10G NICs
Intel X520/540
Luster File System
Cache Acceleration Software ‘CAS’
Streaming Analytics
*Other brands and names are the property of their respective owners
Intel® Distribution for Apache Hadoop (IDH)
•Focused on real-time analysis
•Value added Manager for deployment & monitoring
•Add-on security & compliance controls
• Intel optimized total solution architecture -distro, storage, network, compute
•Vertical features
• Industry leading performance
Hardware-enhanced Enables partner analytics Open platform
Intel® Manager for Apache Hadoop software Deployment, Configuration, Monitoring, Alerts, and Security
HDFS 2.0.3
Hadoop Distributed File System
YARN (MRv2) Distributed Processing Framework
Hba
se 0
.94.
1
Colu
mnar
Sto
re
Zoo
keep
er 3
.4.5
Coord
ination
Flu
me
1.3.
0 Log C
ollecto
r S
qoop
1.4
.1
Data
Exchange
Pig 0.9.2 Scripting
Hive 0.9.0 SQL Query
Oozie 3.3.0 Workflow
Mahout 0.7 Machine Learning
R
connectors
Statistics
Intel proprietary
Intel enhancements contributed back to open source
Open source components included without change
*Other brands and names are the property of their respective owners
Encryption in IDH with AES-NI Acceleration
CPU: E5-2680 (1/32 core is used); Memory: 48G; Disk: 2x160GB SSD; OS: CentOS 6.3; data file: 1GB text file
• Encryption is key for data protection but compute intensive • AES-NI in Intel Xeon effectively neutralize the cost of encryption in Hadoop processing
http://hadoop.intel.com/pdfs/IntelEncryptionforHadoopSolutionBrief.pdf
IDH on Luster for Big Data Technical Computing
• Bringing Hadoop analytics to Lustre HPC deployments
─ Exploiting the superior performance, scalability and management simplicity of shared storage
─ Scaling storage and compute nodes separately
InfiniBand Interconnect
Hadoop Cluster/Compute Nodes
Lustre Storage
*Other brands and names are the property of their respective owners
IDH Optimization on Dell PowerEdge Servers
White paper: http://en.community.dell.com/techcenter/extras/m/white_papers/20412222/download.aspx
*Other brands and names are the property of their respective owners
Intel® Xeon® 5600 HDD 1GbE
Hadoop processing time: <7 minutes with complete Intel-based solution
Unleash the Power of Intel Architecture Platform
TeraSort for 1TB sort: >4 hour process time
Upgrade processor
~50% reduction
Upgrade to SSD
~80% reduction
Upgrade to 10GbE
~50% reduction
Intel distribution
~40% reduction
*Other brands and names are the property of their respective owners
10
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product
when combined with other products.
Source: Intel Internal testing
For more information go to : intel.com/performance
` Whitepaper
White Paper: http://www.intel.com/content/www/us/en/big-data/big-data-apache-hadoop-technologies-for-results-whitepaper.html
IDH Case Studies
Source: http://hadoop.intel.com/resources
Smart City Video Analytics - China Telco Analytics – China Mobile
Smart Energy Analytics – Pecan Street Genomic Analytics – Next Bio Healthcare Analytics – China
Financial Service Analytics – Italy
Summary – Intel in Big Data
• Accelerating the adoption of big data analytics
• Engaging with ecosystem and end-customers to unlock the value of data
• Delivering the full end-to-end capability of Intel from the edge intelligent systems to servers in the datacenter or cloud and with the Intel Distribution for Apache Hadoop (IDH)
The pervasiveness of Intel Architecture democratizes the implementation
and performance of Big Data everywhere
For More Information,
• Intel big data website: https://hadoop.intel.com/
• 李國輝 (KH Li): [email protected]
15
Legal Disclaimers
All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number
Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization
No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer system with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit http://www.intel.com/technology/security
Requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost
Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For more information, see http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/
Intel product is manufactured on a lead-free process. Lead is below 1000 PPM per EU RoHS directive (2002/95/EC, Annex A). No exemptions required
Halogen-free: Applies only to halogenated flame retardants and PVC in components. Halogens are below 900ppm bromine and 900ppm chlorine.
Intel, Intel Xeon, Intel Core microarchitecture, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Copyright © 2011, Intel Corporation. All rights reserved.
16
Legal Disclaimers: Performance
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, Go to: http://www.intel.com/performance/resources/benchmark_limitations.htm.
Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.
Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported.
SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjEnterprise, SPECjbb, SPECompM, SPECompL, and SPEC MPI are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information.
TPC Benchmark is a trademark of the Transaction Processing Council. See http://www.tpc.org for more information.
SAP and SAP NetWeaver are the registered trademarks of SAP AG in Germany and in several other countries. See http://www.sap.com/benchmark for more information.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products.
17
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804