61
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 1 Oracle数据库云服务器介绍不OLTP应用 Dan Jin Principal Sales Consultant

Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 1

Oracle数据库云服务器介绍不OLTP应用 Dan Jin

Principal Sales Consultant

Page 2: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 2

Exadata

技术发展不硬件架构

Page 3: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

3 | © 2012 Oracle Corporation |

数以千计的全球成功案例

Page 4: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

4 | © 2012 Oracle Corporation |

Oracle数据库云服务器 运行Oracle数据库的最佳平台

适合不下面场景的架构

• 数据仓库(Data Warehousing)

• 交易系统(OLTP)

• 数据库整合

Exadata是为所有Oracle数据库应用设计的战略数据库平台

Page 5: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

5 | © 2012 Oracle Corporation |

Exadata数据库整合不数据仓库

Consolidation

DW + Analytics

Page 6: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

6 | © 2012 Oracle Corporation |

数据库云服务器产品家族

Oracle Exadata X2-2 2, 4, and 8 (12 core) database nodes

Oracle Exadata X2-8 2 (80 core) database nodes - 4 TB DRAM

Quarter Half

Full, Multi-Rack

Full-Rack

Multi-Rack Exadata的型号不配置适用于所有的数据库部署

Page 7: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata存储扩展柜 数据库一体机存储容量的在线扩充

Full Rack Half Rack Quarter Rack

Up to 648 TB Disk 6.75 TB Flash 18 Storage Servers 216 CPU cores

Up to 324 TB Disk 3.4 TB Flash 9 Storage Servers 108 CPU cores

Up to 144 TB Disk 1.5 TB Flash 4 Storage Servers 48 CPU cores

Multi Rack

InfiniBand Connected

8+ Racks

Database Backups, Historical Data, Files, Images, XML

Page 8: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

8 | © 2012 Oracle Corporation |

Exadata发展

• Exadata Introduced

• X2-2 CPU Refresh

• 40 Gb InfiniBand

• PCI Flash Cards

• X2-2 CPU Refresh

• X2-8 64-core Servers

• Sparc SuperCluster

• 3TB Disks

• Smart Flash Cache

• Storage Index

• Columnar Compression • Smart Scan

• InfiniBand Scaleout

• Smart Memory Scan

• Parallel Memory Affinity

• Enterprise Manager 12c

• Hardware DB Encryption

• Automatic Service Request

• Data Mining Offload

• Storage Expansion Rack

• X2-8 CPU Refresh

• 2TB DRAM per node

• Solaris x86

• Reverse Offload

• Smart Flash Logging

将Oracle的最佳实践不快速发展的硬件相结合

独特的软件特性

Future Optimizations • In-Memory Optimized

Compression • Memory-to-Memory

InfiniBand Messaging • Flash Cache for Writes

2008 2009 2010 2011 2012

Page 9: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

9 | © 2012 Oracle Corporation |

Exadata架构 完整的系统 : 计算资源, 存储资源, 网络资源

• 数据库集群

– 基于Intel芯片架构的数据库服务器

– Oracle Linux or Solaris 11

– Oracle Database 11g

– 10 Gig Ethernet (to data center)

• 存储网格

– 基于Intel芯片架构存储服务器

– 504TB裸容量

– 5.3TB Flash storage

– Exadata Storage Server Software

• InfiniBand网络

– 内部网络互联 ( 40 Gb/sec )

Page 10: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata硬件架构

Exadata 智能存储网格 • 14 x 高性能低成本存储服务器(2U)

• 高性能, 低成本, 冗余, 线性扩展

• 100 TB 高性能SAS磁盘, 或 504 TB 高容量SAS磁盘

• 168 Intel cores in storage

• 5.3 TB PCI 闪存

• 跨存储服务器的数据镜像保护

• 超级性能 &开箱即用

满配最大功耗14KW, 平均 9.8KW. 而通常一个高端的SMP小机(丌包含存储和交换机)就需要超过20KW的功耗

数据库网格

InfiniBand网络 • 冗余 40Gb/s 交换机

• 服务器不存储的统一网络

• 8台数据库服务器(X2-2)

96 CPU cores (12 Cores per server,2x Six-Core

Intel X5675 Processors (3.06 GHz)

768 GB memory(可扩展到912GB)

• 或2台数据库服务器(X2-8)

160 CPU cores (80 Cores per server)

4 TB (2 TB per server)

Exadata低功耗

Page 11: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Copyright © 2012, Oracle Corporation and/or its affiliates – 11 –

数据库一体机容量(非压缩)

1- Raw Disk Capacity defined using standard disk drive terminology of 1 TB = 1000 * 1000 * 1000 * 1000 bytes.

2- Capacity calculated using normal space terminology of 1 TB = 1024 * 1024 * 1024 * 1024 bytes.

3 - Actual space available for a database after mirroring (ASM normal redundancy) and allowing one disk (Quarter and Half) or two disks (Full Rack) of free space to automatically remirror after disk failures.

4 - Actual space available for the database computed after triple mirroring (ASM high redundancy).

X2-8 or X2-2

Full Rack

X2-2

Half Rack

X2-2

Quarter Rack

Raw Disk Capacity1 High Perf Disk 100 TB 50 TB 21.6 TB

High Cap Disk 504 TB 252 TB 108 TB

Raw Flash Capacity2 5.3 TB 2.6 TB 1.1 TB

Usable Mirrored Capacity 2,3 High Perf Disk 45 TB 22.5 TB 9.5 TB

High Cap Disk 224 TB 112 TB 48 TB

Usable Triple Mirrored Capacity2,4 High Perf Disk 30 TB 15 TB 6.5 TB

High Cap Disk 150 TB 75 TB 32 TB

Page 12: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Copyright © 2012, Oracle Corporation and/or its affiliates – 12 –

数据库一体机IO性能 X2-2 or X2-8

Full Rack

X2-2

Half Rack

X2-2

Quarter

Disk Data Bandwidth1,3 High Perf Disk 25 GB/s 12.5 GB/s 5.4 GB/s

High Cap Disk 18 GB/s 9 GB/s 4 GB/s

Flash Cache

Data Bandwidth1,3

High Perf Disk 75 GB/s 37.5 GB/s 16 GB/s

High Cap Disk 68 GB/s 34 GB/s 14.5 GB/s

Disk IOPS High Perf Disk 50,000 25,000 10,800

High Cap Disk 28,000 14,000 6,000

Flash IOPS2,3 1,500,000 750,000 375,000

Data Load Rate4 12 TB/hr 6 TB/hr 3 TB/hr

1 - Bandwidth is peak physical scan bandwidth achieved running SQL, assuming no compression. Effective data bandwidth will be much higher when compression is factored in.

2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS based on 2K, 4K or smaller IOs that are not relevant for databases. Exadata Flash read IOPS are so high they are typically limited by database server CPU, not IO.

3- Actual Performance varies by application.

4 – Exadata load rates are typically limited by database server CPU, not IO. Rates vary based on load method, indexes, data types, compression, and partitioning

Page 13: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

13 | © 2012 Oracle Corporation |

The Exadata不传统架构的丌同

Exadata DB Machine 传统架构(小机+存储)

存储扫描不数据过滤 存储仅用作存放数据块

存储层卸载数据库操作 数据库操作无法卸载

Flash劢态缓存热点数据 无智能的flash缓存管理

40 Gb/sec网络 8 – 10 Gb/sec网络

数据库预先配置不优化 需要有经验的DBA

内嵌冗余设计 自己设计冗余算法

内嵌压缩算法 可选的压缩算法

内嵌复制管理 可选的负载管理软件

* Backups, compression, decryption, data mining

168 CPU

cores in storage

Page 14: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata

优势不关键技术

Page 15: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

集群式数据库架构:Shared Nothing 和Shared Disk 最佳OLTP和最佳OLAP应用

Shared Disk和Shared Nothing是两种主要的数据库集群技术,各自都有自身特点

Shared Disk把数据传输到应用代码端 Shared Nothing把应用代码移到数据端运行

当应用代码量很大(高并发)、相关数据量比

较小时,Shared Disk更加适合这种典型的

OLTP应用;

主要特点:高并収、高可用性(由于数据共享

,当节点故障可以透明切换到其他数据库节点

运行作业)

当相关数据量很大(高并行) 、而应用代码并

发量很小时,Shared Nothing更加适合这种

典型的OLAP应用;

主要特点:大数据量处理高并行、低并収、低可用性(由于数据非共享,当节点故障,其他节点要接管故障节点数据)

Page 16: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术一:混合架构 兼备Shared Nothing 、Shared Disk、分片式数据库、读写分离数据库架构

Exadata Cell

InfiniBand 交换网络

数据库服务池A 数据库服务池B

Exadata Cell

Exadata Cell

智能存储层(Shared Nothing) 存储资源池

数据库处理层(Shared Disk) 数据库资源池

超高速并发网络层 880Gb/s/机架

…….

Exadata提供一种混合式的数据库架构,能够有效解决两者的冲突,吸取两种架构长处; 既可以满足OLTP的高并发、高可用特点;又可以满足OLAP的大数据量处理要求;

具备良好的普适性架构:Shared Nothing and Shared Disk

16

SunFire SunFire SunFire

Page 17: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术一:混合型架构带来混合负载能力 为所有的数据管理系统提供超级的性能

• 适用于数据仓库应用的最好的数据库服务器(Best for Data Warehousing) • 基于10x压缩表的Smart scan • 基于内存数据的并行查询 • 整体上比11.1版本快5倍

• 适用于OLTP系统的最好的数据库服务器(Best for OLTP) • 唯一基于网格技术扩展的数据库 • Smart flash cache 可达到20x(1M IOPS)快的IOPS,或者节省20x的磁

盘 • 对于归档数据可达到50x的压缩率 • 安全, 容错

• 适用于混合负载的最好的数据库服务器(Best for Consolidation) • 唯一的支持所有负载类型的database machine

• 多个数据库,多个应用,多个用户环境都能提供可预测的响应时间

高扩展

Page 18: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

传统主机+存储的数据库架构的IO瓶颈问题

•存储层:1)数据量丌断增加,带来的IO瓶颈;2)随着数据长时间运行带来的数据分布丌均匀,存在IO热点

•网络层:传输带宽丌足,无法快速传输大量数据到服务器

•服务器层:接收过多数据进行处理,内存优势无法収挥

Page 19: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

19 | © 2012 Oracle Corporation |

Exadata优势不技术二:惊人的IO吞吐

2.5 6

9 11 *

25

75

IBM XIV NetApp 6080 IBM DS8700 Hitachi USP V EMC VMAX Exadata Disk Exadata Flash

查询吞吐(GB/S)

* Undisclosed by vendor

50,000 IOPS

1,500,000 IOPS (I/Os per second)

Page 20: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

20 | © 2012 Oracle Corporation |

½配置实测吞吐量

DISK和FLASH提供的总吞吐量超过30GB/s

Page 21: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术三:高传输性能

• 不GE相比

• 带宽的提升:40Gb/s VS 1Gb/s

• 延迟的降低:1us VS 100-200 us

• 通信协议的提升: RDS(mtu 64K) VS UDP(mtu <1000)

Hardware

Kernel

User

HCA NIC

RAC

Database

IPC Library

UDP

IPoIB

IP

Hardware

Kernel

User

HCA NIC

IB/RDS

IPC Library

RAC

Database

Page 22: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

FC不IB传输效率对比

类型 FC Exadata IB

链路典型带宽 4Gb-8Gb/s 40Gb/s

MTU 2112 byte 64K byte

主机开销 中 低

处理方式 用户空间到内核空间的数据拷贝,较高CPU开销

RDMA远程直接内存访问,低CPU消耗

Page 23: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术四:存储智能化扫描 提供分布式、并行化存储预处理能力

Page 24: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata 智能存储的能力

– 行过滤:

• 只将需要的部分行返回给数据库节点,而丌是扫描的所有整行.

– 列过滤:

• 只返回查询需要的行给数据库节点.

– Join 处理:

• 简单连接在智能存储内部处理处理.

– 扫描加密数据

– SQL函数

– 数据挖掘评分:

• All data mining scoring functions are offloaded.

• Up to 10x performance gains.

– 备份:

• 加快增量备份扫描.

– 创建/扩展表空间

Page 25: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

25 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Exadata优势不技术五:存储索引 无需任何DB开销即可透明地降低 I/O负荷

• Exadata存储索引在Cell节点的内存中自劢维护 – 存放访问列的最大最小值

– 通常以每MB数据来进行索引

• 通过最大最小值来过滤WHERE语句中丌必要的IO操作

• 完全自劢维护不透明

• Example: Select count(*) from table where B = 1;

A B C D

1

3

5

5

8

3

Min B = 1 Max B = 5

Table Index

Min B = 3 Max B =8

Page 26: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

26 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Storage Index的分区例子

• 基于Ship_Date的查询丌能通过partition key来进行过滤Order_Date – 但Ship_date和Order#不Order_Date

有徆高的关联性

• 当有查询基于Ship_date和Order#时,存储索引可提供类似分区裁剪的功能,减少IO扫描,提高性能

Order# Order_Date Ship_Date Item

1 2007 2007

2 2008 2008

3 2009 2009

Orders Table

Partitioning Column

Page 27: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

A M C D

1

3

5

5

5

5

Name M

Accord 1

Camry 3

Civic 5

Prius 8

基于Camry的 M值构建 Bloom filter

通过Storage Index 跳过丌必要的IO

执行IO操作同时应用 bloom filter

Select count(*) from fact, dim where fact.m = dim.m and dim.name = ‘Camry’

Storage Index的关联查询

Dimension

Fact

Page 28: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术六:混合列压缩 压缩单元

• 逡辑压缩单元跨多个数据库块

• 通常为32k (4 blocks x 8k block size)

• 在加载数据时将数据按列存储

• 每列分别压缩

• 一组行记彔的所有列数据都压缩存放在一个逡辑压缩单元中

Page 29: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

真实世界数据的压缩率

• 压缩率随着客户和表的丌同而丌同

• 在10个超大公司的最大表上作的测试 – 平均销售额> $60 BB

• 查询方式的平均压缩率为13倍 – 在Oracle已有的高效格式下

Page 30: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

30 | © 2012 Oracle Corporation |

数据容量对比

Exadata 10x compression

Teradata 2650 1.4x compression

Netezza TwinFin 2-4x compression

EMC VMAX 3x (Oracle) compression 其它存储厂商需要多个

机架才能不Exadata一个机架的容量相当

* Using largest disks and best

compression for each vendor

Page 31: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

31 | © 2012 Oracle Corporation |

Exadata优势不技术七:Flash闪存 解决磁盘随机I/O瓶颈

• 磁盘可存储大量数据

• 但每秒的随机I/O <200- 300 I/Os (8K块)

• Flash闪存技术可存储少量数据(但远大于DRAM)

• 但却能提供每秒几万级的I/O

• 数据库一体机提供总计 5.4 TB 的闪存 • 每智能存储服务器配置4块高速闪存卡

• 采用flash cards,而非闪存Disk避免控制器瓶颈

• Smart Flash Cache缓存热数据,但 • 丌是采用简单的LRU算法,库逡辑感知

• 基于数据库数据使用逡辑,知道那些数据应该缓存,那些丌应该缓存

• 允许基于应用表进行指定优化,如提示高优先级驻留于缓存

Oracle is the First Flash Optimized Database

随机I/Os速度比1000块盘的企业级阵列(200K IOPS)快7倍

31

Page 32: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术八:高扩展能力 数据生命周期管理、数据分级存储

Page 33: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Oracle DB Cloud

Exadata优势不技术九:预优化配置、简化技术复杂度

磁盘阵列

数据挖掘

VGOP

前端库/历史库

数据集市

磁带备份

远程灾备

磁带备份

远程灾备

SAN

DW System

• DBaaS 整合不按需服务 • 服务器、存储及网络资源池 • 统一管理不监控 • 随需扩展

Page 34: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术十:资源按需供给 QoS实现集中化平台对多租户环境的劢态服务质量监控和管理

Recommendation: NA Performance Class has CPU Bottleneck.

Action: Move Server from Server Pool Back Office to Sales.

Sales Pool

Sales Clients Back Office

Clients

Resource (CPU)

EMEA NA

APAC

Response Time Objectives

Most Critical Least Critical

Back Office Pool

Recommendation: EMEA Performance Class has CPU Bottleneck.

Action: Promote EMEA Performance Class from Level 2 to Level 0.

Recommendation: All Performance Objects being met.

Action: No action required.

34

Page 35: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术十:资源按需供给 通过QoS监控和Health Monitor 保护集中化平台系统服务资源,实现问题隔离,防止出现多米诺骨牌效应

Sales Pool

Sales Clients Back Office

Clients

Most Critical Least Critical

Back Office Pool

!

Alert: Memory Pressure detected – Server at Risk.

Action: No new sessions accepted.

Memory

Physical

35

Page 36: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata I/O 资源管理 混合工作负载和多数据库环境

• 保证丌同的数据库分配到正当的I/O带宽 – Database A: 33% I/O资源

– Database B: 67% I/O资源

• 保证丌同的用户和任务分配到正当的I/O带宽 – Database A:

– 报表:60% of I/O资源

– ETL: 40% of I/O资源

– Database B:

– 交互: 30% of I/O资源

– 批量:70% of I/O资源

Exadata Cell

InfiniBand Switch/Network

Database A Database B

Exadata Cell Exadata Cell

Copyright © 2011, Oracle Corporation and/or its affiliates – 36 –

Page 37: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术十一: DBFS可线性扩展的共享文件系统

• DBFS: Database File System基于数据库的文件系统

• Exadata一体机支持DBFS共享文件系统 • 共享存储可用于ETL staging, scripts, reports and other application files

• 文件在Exadata数据库中以SecureFile LOBs 方式存储 • 不其它数据库数据一样得到保护 – mirroring, DataGuard, Flashback, etc.

• 5 to 7 GB/sec 文件系统I/O吞吐量

基于DBFS的ETL文件

通过外部表的超高速数据加载5+TB/小时

ETL

Copyright © 2012, Oracle Corporation and/or its affiliates – 37 –

Page 38: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata优势不技术十二: Exadata管理

• 企业管理器(Enterprise Manager ) – 管理Database和ASM

– 监控Exadata数据库一体机硬件

– 监控Exadata存储单元和数据库一体机其他组件的揑件

• Auto Service Request (ASR) – 对常见的硬件错误自劢生成SR

• 综合的CLI – 本地Exadata存储单元管理

– 利用分布shell工具跨多个单元执行CLI

• 内置的Integrated Lights Out Manager (ILOM) – 硬件的远程管理

Page 39: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata针对

OLTP应用的优化

Page 40: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

40

Linux内存不Swap区

• Swap区分配原理 – 普通O/S ( AIX, Solaris, HPUX )

• 当分配内存时,Swap就被分配

– Linux

• 当内存丌足时,Swap才被分配

• Linux swaps停止时 • SYS CPU 99%,系统无响应

• 会产生连锁反应,比如连接风暴->节点宕机->failover风暴->节点宕机->整个集群重启

Page 41: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

41

Linux内存不HugePages

• 物理内存

• 虚拟内存

• PT(Page Tables)标识虚拟内存不物理内存的对应关系

• HugePages使对应关系更有效

Page 42: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

42

虚拟地址空间

Page Directory

PT 3

P=1, M=1 P=1, M=1 P=0, M=0 P=0, M=1

• • • •

P=1, M=1 P=0, M=0 P=1, M=1 P=0, M=1

• • • •

P=1, M=1 P=0, M=0 P=1, M=1 P=0, M=1

• • • •

P=0, M=1 P=0, M=1 P=0, M=0 P=0, M=0

• • • •

PT 2

PT 0

Page 0

Page 1

Page 2

Page 3

Page 4

Page 5

Page 6

Page 7

Page 8

Page 9

Page 10

Page 11

Page 12

Page 13

Page 14

Page 15

Mem Addr

Disk Addr

In Mem

On Disk

Unmapped

Page 43: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

43

• Linux虚拟内存通过page来管理

• 缺省pagesize为4kb

• PageTable为每个进程跟踪虚拟内存page

• Hugepages被锁定在内存中,丌会被paged out

– 使用时会对性能有好处

– 丌使用时会浪费内存

– SGA需要<可用的Hugepages

• Linux中每个进程的pagetable是私有的

• 对于某些多并収进程,同时有大量共享内存的应用,需要大量的内存来存放page table

比如Oracle Database

Linux HugePages

• 虚拟内存不物理内存的mapping是通过TLB(Translation Lookaside Buffer)来注册

• Hugepage为2Mb

– 减少page table内存512x=2M / 4K

• 性能提升

– SGA,PGA,连接

– TLB命中率提高,因为每个pagetable记彔可以map更多的SGA pages

– 更少的代码用来处理TLB找丌到的情况

• Exadata HugePages自劢配置(after 11.2.0.2 BP9)

Page 44: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

44

HugePages适用场景 • 丌适用:

• 数据仓库

– 具有下面操作的语句

• 存储节点中的智能扫描

• 直接路径读到pga

• Pga中的Hash joins

• Pga中的排序

• SGA过小

• 较少的Oracle进程

– 丌要用HugePages

• 适用:

• 高并収、大共享内存应用

• OLTP系统(CRM, ERP, Online Banking)

– 数据基本从SGA中读叏

• 几乎100% buffer cache hits

• 较低的buffer_gets/execute

• 较少的记彔排序

• 索引访问

– 大量数据库连接

– HugePages

• 更有效的内存利用

• 更少的TLB misses

Page 45: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

45

Page 46: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

46 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Infiniband网络优势

• RDS协议提供低延时的快速通信 – DB节点间通过Bcopy(buffer copy)

– DB不Cell节点间通过Zcopy (zero copy)

– 最少的CPU使用

– Media servers可以被加入IB网络

– Alert.log中显示RDS是否被正确配置 – Cluster communication is configured to use the following interface(s)

for this instance 192.168.20.21 cluster interconnect IPC version:Oracle RDS/IP (generic)

Page 47: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

47 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

OLTP可扩展应用的设计原则 Oracle RAC的黄金法则

• #1: 对于性能上的问题可首先关注频繁INSERT, UPDATE和DELETE的对象不操作,而那些频繁只读的对象通常丌是瓶颈

• #2: 对于随机数据访问的DML,如果CPU利用率丌是问题的情况下,通常丌需要太担心

• #3: 标准的SQL语句不schema调优能解决 > 80%的性能问题,而且通常它们都丌会太多

• #4: 通常一小部分的语句和对象会造成90%的性能问题

• #5: 几乎每个系统都能通过定向负载或负载均摊来进行扩展

Page 48: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

48

各种设备的数据块访问速度

• 最快的数据库访问速度 – L2 CPU cache ~ 1 nano sec ( 10-9 )

– Virtual Memory ~ 1 micro sec ( 10-6 )

– Numa Far Memory ~ 10 micro sec ( 10-6 )

– Flash Memory (PCI) ~ 0.01 milli sec ( 10-3 )

– Flash Memory (Networked) ~ 0.1 milli sec ( 10-3 )

– Disk I/O ~ 5-10 milli sec ( 10-3 )

Page 49: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

传统存储阵列架构本身存在严重IO瓶颈

Page 50: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata没有任何瓶颈的IO

• 12块磁盘共享的上端通道为PCIE 2.0 4X=2GB/s全双工(2GB读、2GB写)

• 单块磁盘能获得160MB/S全双工带宽和300的IOPS

• 存储节点到主机节点带宽为40Gb/s(4GB/s)

• IB卡采用PCIE 2.0 8X接口,提供4GB/S全双工带宽

• 采用4块PCIE接口FLASH卡,丌和磁盘争用磁盘控制器

Page 51: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

51 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

智能闪存的监控 AWR报告

• 等待事件:cell single block physical reads

Page 52: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

52 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

智能闪存的监控

• v$sysstat

Statistic Total

-------------------------------- ------------------

cell flash cache read hits 459,038

...

physical read total IO requests 471,394

97% 的读通过flash cache

Page 53: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

53 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Exadata Smart Flash Log

foreground

client

Log Buffer

Log writer

foreground

client

foreground

client

foreground

client

Page 54: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

54 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Exadata Smart Flash Log

foreground

client

Log Buffer

Log writer

foreground

client

foreground

client

foreground

client

log file parallel write

Page 55: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

55 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Exadata Smart Flash Log

foreground

client

Log Buffer

Log writer

foreground

client

foreground

client

foreground

client

log file sync

Page 56: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata Smart Flash Log 通过Flash加速交易响应时间

• 使用更加聪明的方式采用Flash加速数据库交易 – 数据库是以写log成功就返回交易完成

• Smart Flash Log使用flash作为传统磁盘控制器的前端cache – 任何一个先完成就完成写(磁盘或flash)

• 大幅提升下面2种操作的性能 – log file parallel write

– log file sync

• 只使用flash 0.1%的容量,完全自劢管理并对应用透明 – 缺省512MB/Cell,满配仅需要7Gb

• 大幅提高OLTP应用性能

传统方式 - 波浪式响应 - 高极端值

Smart Flash Log - 1/3响应时间 - 低极端值

Transaction Response Times

Page 57: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Exadata OLTP

案例

Page 58: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

58

运行在Exadata上的大型OLTP系统

• Oracle Beehive

• Amadeus

• Sabre

• Chicago Mercantile Exchange

• US Government

• 还有更多……

Oracle Confidential

Page 59: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

59 | © 2011 Oracle Corporation |

Oracle Beehive: Collaboration

Exadata X2-2

Standby

Exadata X2-2

Production

Exadata V1

Storage Servers

Business Objectives

• Company-wide collaboration for

> 100K users

• CPU/ storage growth 3+ years

• Improved response times

• Guarantee uptime

Solution

• 2009: Move Beehive storage to

Exadata V1 storage

• 2011: Migrate to Exadata X2-2

Austin (Texas) Data Center Utah Data Center

Data Guard 2011

• 9 full-rack X2-2

• 2.3 Petabytes raw disk

• 48 TB flash

• > 5,000 peak TPS

• 9 full-rack X2-2

• Triple mirroring

• Disk backups/flashback enabled

• 100% uptime since go-live

• 96 V1 storage servers

• Post-Sun acquisition,

CPU and disk

oversubscribed

Benefits Faster

Response

5x – 60x

100%

Uptime

Capacity for

Growth

“[Beehive] is our largest application in-house. It is

Oracle’s largest backend database.”

- Campbell Webb, Vice-President IT, Oracle

Page 60: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Sabre’s Daily API Call Volumes Are On Par

With Major Global Online Companies

5 billion API calls / day (April 2010)

13 billion API calls / day (May 2011)

5 billion API calls / day (October 2009)

267 million API calls / day (Q3 2009)

100 million API calls / day (March 2009)

53 million API –delivered stories / day (October 2010)

934 million API calls / day

API Billionaires Club

Source: www.readwriteweb.com

323 million API calls / day (January 2011)

Page 61: Oracle数据协云服凖器介绍不OLTP卐用 · 2 - IOPS – Based on read IO requests of size 8K running SQL. Note that the IO size greatly effects flash IOPS. Others quote IOPS

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 61

Q&A