17
李國輝 (KH Li) 亞太區Solution Architect Intel Corp Intel Distribution for Apache Hadoop (IDH) Big Data上的運

Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

李國輝 (KH Li) 亞太區Solution Architect Intel Corp

Intel Distribution for Apache Hadoop (IDH) 在 Big Data上的運用

Page 2: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

Legal Disclaimer

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,

TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH

PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF

INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY

PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU

PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES,

SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND

EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR

DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE

DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any

features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or

incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published

specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or

go to: http://www.intel.com/design/literature.htm

Copyright © 2012 Intel Corporation.

Page 3: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

Virtuous Cycle of Data-Driven Innovation

2.8 Zettabytes of data will

be generated WW in 20121

Richer user experiences

Richer data from devices

40 Zettabytes of data will

be generated WW in 20201

Richer data to analyze

Cloud

Clients

Intelligent Systems

(1) IDC Digital Universe 2020, (2) IDC

3

Page 4: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

Big Data Use Cases Across Industries

National, Public and Cyber Security

Education Government Healthcare

Retail Manufacturing Telecommunication Financial Services

Page 5: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

Enterprise Data

Warehouse

Spreadsheets

Visualize

Mobile Analysis

Consume/Review

RDBMS

ANALYTICS

No-SQL

In Memory DB APPS

Node Node Node

Hadoop

Sensor/ Machine Data

Logs

Social & Web

Legacy

STRUCTURED

UNSTRUCTURED

Docs & Audio Files

DATA PLATFORMS

CONSUME

Cre

ate

Map

REDUCE

Intel Optimized Hadoop Architecture

IMPORT

IMPORT

IMPORT

AES-NI (TXT)

Intel Manager

Intel Dist for Apache*

Hadoop

Data Mining

HiBench HiTune

SQL ‘CAS’

Intel SSDs 910 & S3700

Series

10G NICs

Intel X520/540

10G NICs

Intel X520/540

Luster File System

Cache Acceleration Software ‘CAS’

Streaming Analytics

*Other brands and names are the property of their respective owners

Page 6: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

Intel® Distribution for Apache Hadoop (IDH)

•Focused on real-time analysis

•Value added Manager for deployment & monitoring

•Add-on security & compliance controls

• Intel optimized total solution architecture -distro, storage, network, compute

•Vertical features

• Industry leading performance

Hardware-enhanced Enables partner analytics Open platform

Intel® Manager for Apache Hadoop software Deployment, Configuration, Monitoring, Alerts, and Security

HDFS 2.0.3

Hadoop Distributed File System

YARN (MRv2) Distributed Processing Framework

Hba

se 0

.94.

1

Colu

mnar

Sto

re

Zoo

keep

er 3

.4.5

Coord

ination

Flu

me

1.3.

0 Log C

ollecto

r S

qoop

1.4

.1

Data

Exchange

Pig 0.9.2 Scripting

Hive 0.9.0 SQL Query

Oozie 3.3.0 Workflow

Mahout 0.7 Machine Learning

R

connectors

Statistics

Intel proprietary

Intel enhancements contributed back to open source

Open source components included without change

*Other brands and names are the property of their respective owners

Page 7: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

Encryption in IDH with AES-NI Acceleration

CPU: E5-2680 (1/32 core is used); Memory: 48G; Disk: 2x160GB SSD; OS: CentOS 6.3; data file: 1GB text file

• Encryption is key for data protection but compute intensive • AES-NI in Intel Xeon effectively neutralize the cost of encryption in Hadoop processing

http://hadoop.intel.com/pdfs/IntelEncryptionforHadoopSolutionBrief.pdf

Page 8: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

IDH on Luster for Big Data Technical Computing

• Bringing Hadoop analytics to Lustre HPC deployments

─ Exploiting the superior performance, scalability and management simplicity of shared storage

─ Scaling storage and compute nodes separately

InfiniBand Interconnect

Hadoop Cluster/Compute Nodes

Lustre Storage

*Other brands and names are the property of their respective owners

Page 9: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

IDH Optimization on Dell PowerEdge Servers

White paper: http://en.community.dell.com/techcenter/extras/m/white_papers/20412222/download.aspx

*Other brands and names are the property of their respective owners

Page 10: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

Intel® Xeon® 5600 HDD 1GbE

Hadoop processing time: <7 minutes with complete Intel-based solution

Unleash the Power of Intel Architecture Platform

TeraSort for 1TB sort: >4 hour process time

Upgrade processor

~50% reduction

Upgrade to SSD

~80% reduction

Upgrade to 10GbE

~50% reduction

Intel distribution

~40% reduction

*Other brands and names are the property of their respective owners

10

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,

operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product

when combined with other products.

Source: Intel Internal testing

For more information go to : intel.com/performance

` Whitepaper

White Paper: http://www.intel.com/content/www/us/en/big-data/big-data-apache-hadoop-technologies-for-results-whitepaper.html

Page 11: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

IDH Case Studies

Source: http://hadoop.intel.com/resources

Smart City Video Analytics - China Telco Analytics – China Mobile

Smart Energy Analytics – Pecan Street Genomic Analytics – Next Bio Healthcare Analytics – China

Financial Service Analytics – Italy

Page 12: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

Summary – Intel in Big Data

• Accelerating the adoption of big data analytics

• Engaging with ecosystem and end-customers to unlock the value of data

• Delivering the full end-to-end capability of Intel from the edge intelligent systems to servers in the datacenter or cloud and with the Intel Distribution for Apache Hadoop (IDH)

The pervasiveness of Intel Architecture democratizes the implementation

and performance of Big Data everywhere

Page 13: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

For More Information,

• Intel big data website: https://hadoop.intel.com/

• 李國輝 (KH Li): [email protected]

Page 14: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture
Page 15: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

15

Legal Disclaimers

All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization

No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer system with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit http://www.intel.com/technology/security

Requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost

Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For more information, see http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/

Intel product is manufactured on a lead-free process. Lead is below 1000 PPM per EU RoHS directive (2002/95/EC, Annex A). No exemptions required

Halogen-free: Applies only to halogenated flame retardants and PVC in components. Halogens are below 900ppm bromine and 900ppm chlorine.

Intel, Intel Xeon, Intel Core microarchitecture, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Copyright © 2011, Intel Corporation. All rights reserved.

Page 16: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

16

Legal Disclaimers: Performance

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, Go to: http://www.intel.com/performance/resources/benchmark_limitations.htm.

Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.

Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported.

SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjEnterprise, SPECjbb, SPECompM, SPECompL, and SPEC MPI are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information.

TPC Benchmark is a trademark of the Transaction Processing Council. See http://www.tpc.org for more information.

SAP and SAP NetWeaver are the registered trademarks of SAP AG in Germany and in several other countries. See http://www.sap.com/benchmark for more information.

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products.

Page 17: Intel Big Data · Hadoop Sensor/ Machine Data Intel Manager Logs Social & Web Legacy UNSTRUCTURED Docs & Audio Files DATA PLATFORMS CONSUME ap REDUCE Intel Optimized Hadoop Architecture

17

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel

microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804