17
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Hadoop装载器和Oracle HDFS直接连接器 罗海雄 资深技术顾问

Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

1 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Hadoop装载器和Oracle HDFS直接连接器

罗海雄

资深技术顾问

Page 2: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

2 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle大数据连接器

• Oracle Hadoop装载器

• Oracle HDFS直接连接器

• Oracle Data Integrator Hadoop 应用适配器

• Oracle R Hadoop连接器

Page 3: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

3 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

大数据机

Oracle 大数据机

Oracle Exadata

InfiniBand

获取 组织 分析和展示 数据流

Oracle Exalytics

InfiniBand

Oracle Oracle 大数据连接器大数据连接器

Page 4: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

4 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle大数据连接器

• Oracle Hadoop装载器

• Oracle HDFS直接连接器

• Oracle Data Integrator Hadoop 应用适配器

• Oracle R Hadoop连接器

Page 5: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

5 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Hadoop 装载器

• 一个用于装载数据到Oracle数据库的最优化的map/reduce 工具

• 在Hadoop上预先完成分区、排序,数据转换成Oracle可读格式,再进行装载 – 可选择在线或者离线装载的不同选项

Page 6: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

6 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Hadoop 装载器:示意图

SHUFFLE /SORT

SHUFFLE /SORT

MAP

MAP

MAP

MAP

SHUFFLE /SORT

REDUCE

REDUCE

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

输入 2

输入 1

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

ORACLE Hadoop 装载器

数据库

Page 7: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

7 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Hadoop 装载器: 在线装载

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE Hadoop 装载器

3.通过JDBC/OCI驱动,从reducer节点连接到数据库,进行并行装载

3.通过JDBC/OCI驱动,从reducer节点连接到数据库,进行并行装载

1. 从数据库中读取目标表的表定义

1. 从数据库中读取目标表的表定义

2. 读取数据,并进行分区,排序,格式转换

2. 读取数据,并进行分区,排序,格式转换

Page 8: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

8 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Hadoop 装载器: 离线装载

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP

1. 读取目标表的表定义 1. 读取目标表的表定义

2. 读取数据,并进行分区,排序,格式转换

2. 读取数据,并进行分区,排序,格式转换

3. 在reducer节点生成Oracle DataPump格式文件

3. 在reducer节点生成Oracle DataPump格式文件

5. 通过外部表,在数据库空闲时段进行并行的数据装载

5. 通过外部表,在数据库空闲时段进行并行的数据装载

DATA

DATA

DATA

DATA

DATA

4. 将生成的文件拷贝到数据库服务器

4. 将生成的文件拷贝到数据库服务器

4.1 通过Oracle HDFS直接连接器访问

4.1 通过Oracle HDFS直接连接器访问

Page 9: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

9 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Hadoop装载器:优势

• 将数据库服务器的压力转移到 Hadoop集群:

– 把数据转换成数据库格式

– 将数据分配到特定分区

– 根据主键进行排序

• 生成二进制的 datapump 格式文件

• 根据partition对reducer进行负载均衡

相比起SQOOP, OraOOP:

Page 10: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

10 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Hadoop装载器:输入数据格式

• 含分隔符的文本文件

• Hive 格式的表 – Hive内部表或者外部表

– Hive native表或者非native表

• 自定义格式

Page 11: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

11 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle Hadoop装载器: 总结

• 主要优势

- 多数操作在Hadoop集群中完成,对数据库压力很低

- 提供在线/离线装载选项

• Oracle Hadoop装载器不止用于大数据机

Page 12: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

12 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle HDFS 直接连接器

• 以外部表形式直接访问HDFS上的数据文件

• 无需转移文件

Page 13: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

13 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle HDFS 直接连接器: 示意图

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

1. 创建外部表 1. 创建外部表

DATA

DATA

DATA

DATA

DATA

HDFS External Table

SQL QUERY

ODCHODCH ODCHODCH

2. 生成location文件,指向HDFS文件/文件集

2. 生成location文件,指向HDFS文件/文件集

3. 访问外部表 3. 访问外部表

Page 14: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

14 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle HDFS 直接连接器: 优势

• 直接访问HDFS上的文件 (不需要FUSE 插件)

– 创建指向HDFS的外部表Create an external table pointing to file location on HDFS

– 在数据库中直接通过SQL访问HDFS数据

– 可以将数据通过Insert/SELECT或者Create As Select装载到数据库

• 快速的数据访问: 并行、 优化、 自动负载均衡

• 数据文件可以是:

– 含分隔符的文本文件

– Oracle Hadoop装载器产生的Datapump文件

Page 15: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

15 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Oracle HDFS 直接连接器:总结

• 主要优势

- 直接访问,不需要额外步骤

- 支持并行,负载均衡等特性

• Oracle HDFS 直接连接器不止用于大数据机

Page 16: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

16 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.

Page 17: Oracle Hadoop装载器和Oracle HDFS 直接连接器 · 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Title: PowerPoint Presentation Author: Maricel Lennon

17 Copyright © 2011, Oracle and/or its affiliates. All rights

reserved.