CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

Embed Size (px)

Citation preview

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    1/36

    Oracle Big Data, In-memory, and One Database Engine to Rule

    Dr.-Ing. Hol

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    2/36

    03/2012© 2015 sumIT AG

    • Introduction

    • Old Times

    • Exadata

    • Big Data

    • Oracle In-Memory

    • Headquarters

    • Conclusions

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    3/36

    03/2012© 2013 sumIT AG

    s

    • Consulting and implementation services in Switzerland

    • Experts for

    – Data Warehousing,

    – Business Intelligence,

    – and Big Data solutions

    • Focussed on Oracle technology

    • ‘BI Foundation specialized’ partner

    • ‘Data Warehousing specialized’ partner

    • Our motto: Get Value From Data

    • Visit our web site: www.sumit.ch(in German)

    http://www.sumit.ch/

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    4/36

    03/2012© 2013 sumIT AG

    Holger

    • Computer Science diploma of Karlsruhe Institute of Technology (KIT)

    • Ph.D. in Robotics and Machine Learning• More than 16 years experience with Oracle technology

    • Expert for

    – Data Integration

    – Data Warehousing,

    – Data Mining and

    – Business Intelligence

    • Technical Director of sumIT AG

    •   First Oracle ACE for DWH/BI in Switzerland

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    5/36

    03/2012© 2015 sumIT AG

    • Introduction

    • Old Times

    • Exadata

    • Big Data

    • Oracle In-Memory

    • Headquarters

    • Conclusions

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    6/36

    03/2012© 2015 sumIT AG

    DB Architecture - O

    • Old times = 1977 - 2008

    • SGA - System Global Area

    - Shared Pools (Library Cache etc.)- Redo Log Buffer

    - Buffer Cache

    • Persistent Storage

    - Disk & Tape

    - serve database blocks• PGA - Program Global Area

    - Query specific processingand storage

    Query processing done in PGA by query specific serve

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    7/36

    03/2012© 2015 sumIT AG

    Query Processing - O

    Server Process

    Block Buffer

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    8/36

    03/2012© 2015 sumIT AG

    • Introduction

    • Old Times

    • Exadata• Big Data

    • Oracle In-Memory

    • Headquarters

    • Conclusions

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    9/36

    03/2012© 2015 sumIT AG

    2008 - Times Are a

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    10/36

    03/2012© 2015 sumIT AG

    Exadata - Arch

    • Databases and applicationsdeployed and configured without

    any adaptations• Fast network

    via Infiniband

    • Regular compute servers

    • Dedicated storage servers

    - organised in cells

    - discs & flash attached

    - run Exadata StorageSoftware

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    11/36

    03/2012© 2015 sumIT AG

    Exadata - The Secr

    Three reasons for outstanding Exadata performance

    • Hardware engineering

    • Local query processing functionality in storage layer

    • Database engine ‘aware’ of ‘intelligent’ storage layer

    - extended optimizer costing model and transformations

    - extended SW to use Exacta Storage APIs

    Divide and conquer for query processing• not just with slave processes (PARALLEL)• not just between compute nodes (RAC)• but between compute and storage nodes

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    12/36

    03/2012© 2015 sumIT AG

    Exadata - Storage Software E

    • Smart Scanning

    - execute sub-query in storage cells

    - project results in storage already• Keep hot data in Flash Cache

    • Storage Indexes

    - collect min/max column values

    - reduce disc access

    • Smart scanning directly on HCCdata - no decompression required

    • Offload mining tasks like scoring

    • Additional data caching incolumnar format in Flash Cache

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    13/36

    03/2012© 2015 sumIT AG

    • Introduction

    • Old Times

    • Exadata• Big Data

    • Oracle In-Memory

    • Headquarters

    • Conclusions

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    14/36

    03/2012© 2015 sumIT AG

    Information Mgmt Reference Arc

    Big Data

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    15/36

    03/2012© 2015 sumIT AG

    The HAD

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    16/36

    03/2012© 2015 sumIT AG

    Information Managament D

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    17/36

    03/2012© 2015 sumIT AG

    Big Data - Ch

    • Dynamic ecosphere

    - Pre-packaged distributions

    - Oracle Big Data Appliance• Analytics

    - Tools of Hadoop ecosphere

    - Oracle Big Data Analytics

    • Data Integration

    - Ever changing Hadoop tool set- Oracle Data Integrator

    - Big Data SQL

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    18/36

    03/2012© 2015 sumIT AG

    Big Data Appliance - The Secr

    Three reasons for outstanding BDA performance

    • Hardware engineering

    • Local query processing functionality in storage layer- Big Data SQL = Exadata Storage Software on HADOOP

    - Added as process engine to the HADOOP process layer

    - BDS agents run independently on HADOOP nodes

    • Database engine ‘aware’ of ‘intelligent’ big data layer

    - extended and enhanced External Table API- extended optimizer costing model and transformations

    • Exadata success and performance on Big Data• Big Data transparently available for DB queries

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    19/36

    03/2012© 2015 sumIT AG

    Big Data SQL - Sm

    1.Read data from HDFS

    - Direct-path reads

    - C-based readers whe

    - native HADOOP class

    2.Translate bytes to Or

    3.Smart scan on Oracle

    - apply storage indexe

    - filtering

    - column projection

    - parsing JSON/XML

    - model scoringmodels

    • High compression ben(except cols with dist

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    20/36

    03/2012© 2015 sumIT AG

    Big Data SQL 2.0 - Storage

    • New feature of Big Data SQL 2.0

    • Avoid unnecessary disc access

    on HADOOP nodes• Index built during first full scan

    • Granularity in HDFS blocks (256MB)

    • Index application

    - receive filter predicate

    - check storage index for blockswherepredicate betweenmin and max 

    - only smart scan matching blocks

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    21/36

    03/2012© 2015 sumIT AG

    Big Data SQL - Query E

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    22/36

    03/2012© 2015 sumIT AG

    Extended External TablCREATE TABLE order (cust_num VARCHAR2(10),

    order_num VARCHAR2(20),order_date DATE,item_cnt NUMBER,description VARCHAR2(100),

    order_total (NUMBER(8,2))ORGANIZATION EXTERNAL(TYPE oracle_hive 

    ACCESS PARAMETERS (com.oracle.bigdata.tablename: order_db.order_summarycom.oracle.bigdata.colmap: {"col":"ITEM_CNT", \

    "field":"order_line_item_com.oracle.bigdata.overflow: {"action":"TRUNCATE", \

    "col":"DESCRIPTION"}com.oracle.bigdata.erroropt: [{"action":"replace", \

    "value":"INVALID_NUM" , "col":["CUST_NUM","ORDER{"action":"reject", \“col":"ORDER_TOTAL}]

    )) PARALLEL 4;

    optionalsettings

    new typeORACLE_HIVE

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    23/36

    03/2012© 2015 sumIT AG

    Extended External Table

    CREATE TABLE order (cust_num VARCHAR2(10),order_num VARCHAR2(20),order_date DATE,item_cnt NUMBER,

    description VARCHAR2(100),order_total (NUMBER8,2)) ORGANIZATION EXTERNAL

    (TYPE oracle_hdfs ACCESS PARAMETERS(com.oracle.bigdata.rowformat: \

    SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'com.oracle.bigdata.fileformat: \

    INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerI

    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainercom.oracle.bigdata.colmap: {"col":"item_cnt", \"field":"order_line_item_count"}

    com.oracle.bigdata.overflow: {"action":"TRUNCATE", \"col":"DESCRIPTION"}

    LOCATION ("hdfs:/usr/cust/summary/*"));

    O

    Location 

    on HDFS

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    24/36

    03/2012© 2015 sumIT AG

    • Introduction

    • Old Times

    • Exadata• Big Data

    • Oracle In-Memory

    • Headquarters

    • Conclusions

    C l St O l ’

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    25/36

    03/2012© 2015 sumIT AG

    Columnar Stores - Oracle’

    • transparent column store managed next to the row store

    • not either/or

    • persistent storage row-based as before

    • column store DML-synched in real-time

    • the entire Oracle DB-ecosphere remains unchanged

    - security

    - backup

    - disaster recovery- RAC

    - … 

    • NO application changes required!

    Ad

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    26/36

    03/2012© 2015 sumIT AG

    Ad

    • Best for queries that

    - scan large quantities of data

    - on a rather small set of columns- compute aggregates on the

    results

    • High compression benefits onmost columns

     

    (except ones containing distinct

    values)

    Well suited for OLAP/BI

    T h l

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    27/36

    03/2012© 2015 sumIT AG

    Technolo

    1. In-memory storage index

    2. Filtering on binary compressed data

    3. Columnar storage of selected columns4. Transparent querying across storage hierarchy

    5. Real-time background actualization of columnar store

    6. Parallel query execution on the columnar store

    7. SIMD vector processing

    8. In-memory fault tolerance on RAC9. In-memory aggregation

    E l I M Agg

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    28/36

    03/2012© 2015 sumIT AG

    Example - In-Memory Agg

    • New optimizer transformation Vector Group By

    • Resembles well-known star transformation

    • Two phase, 6 step process

    • Phase 1 - preparation

    1. Scan dimensions

    2. Build key vectors

    3. Prepare accumulator

    4. Build tmp-tables for  dim select attributes

    • Phase 2 - computation

    5. Scan facts w.r.t.  key vectors

    6. Join filtered facts with tm -tables

    In Memory The Secr

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    29/36

    03/2012© 2015 sumIT AG

    In-Memory - The Secr

    Many reasons for outstanding In-Memory performance

    • Conceptual advantage of columnar format

    • Speed of processing in DRAM• Sum of technology gems (see earlier)

    • Database engine ‘aware’ of columnar stores capabilities

    - extended optimizer costing model and transformations

    - extended SW to use columnar stores’ APIs

    • Unprecedented performance for analytics• Transparently available for DB queries

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    30/36

    03/2012© 2015 sumIT AG

    • Introduction

    • Old Times

    • Exadata• Big Data

    • Oracle In-Memory

    • Headquarters

    • Conclusions

    Head

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    31/36

    03/2012© 2015 sumIT AG

    HeadWikipedia: "Headquarters (HQ) denotes the location where most

    the important functions of an organization are coordinat

    Query Process

    in DB Columnar 

    Store

    Exadata StorageBig Data

    Storage

    HQ 

    Block Buffer

    Disks

    The Database Kernel Rules

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    32/36

    03/2012© 2015 sumIT AG

    The Database Kernel Rules

    Query Franchising in action

    • optimizer generates execution plan

    • partial queries are sent out to other engines- Big Data (SQL)

    - Columnar in-memory store

    - Exadata storage

    • partial results are received & further processed

    • security policies are applied• final results are delivered

    Divide and conquer between data management technolo

    The Key Lies in Th

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    33/36

    03/2012© 2015 sumIT AG

    The Key Lies in Th

    Database optimizer and execution engine make it happen

    • Transformer:

    - new transformations• Estimator:

    - new cost estimation models

    • Execution engine:

    - extended calls and APIs

    Only possible because Oracleowns all implementationsand APIs involved

    Crucial Part - The D

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    34/36

    03/2012© 2015 sumIT AG

    Crucial Part - The D

    • The optimizer’s estimates rely on

    - the data dictionary

    - statistics• Data Dictionary knows all objects

    - Exadata: regular db objects

    - In-memory: regular db objects

    - Big Data: defined throughExternal Table declaration

    Estimating statistics aboutBig Data objectsis challenging

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    35/36

    03/2012© 2015 sumIT AG

    • Introduction

    • Old Times

    • Exadata• Big Data

    • Oracle In-Memory

    • Headquarters

    • Conclusions

    Co

  • 8/17/2019 CON3365 Friedrich-oow15 BdIMExaRuledByDBEngine

    36/36

    03/2012© 2015 sumIT AG

    Co

    • Exadata - boosts execution for traditional applications and analytics

    • Big Data - provides affordable data management for lots of and unstr

    • In-Memory - serves mighty fast scans, joins, and aggregations for ana

    • With other vendors these technologies are either

    - not available in the desired quality

    - or not tightly integrated, if at all

    • Data silos & isolated solutions are being built again

    • But: Oracle provides top solutions for each

    • In fact: Oracle provides the only portfolio with

    - all three technologies tightly integrated

    - and central data management throughthe Oracle Database