Oedo Ruby Conference 04: Ruby会議でSQLの話をするのは間違っているだろうか

Ruby会議でSQLの話をするのは間違っているだろうか

Minero Aoki

今日のお話について

Theme of this session

「技術的に濃い話題がいいです」

Akira Matsuda said ldquoI expect you deep technical talkrdquo

「濃い話」is 何

Whatrsquos deep talk

Rubyの実装の話とかもう別に濃くない

Ruby implementation is not deep already so I speak about another theme

25分でわかるビッグデータ分析～MapReduce追悼～

Big Data Analytics in 25 minutes

トータル100TBくらいのデータを分析するとしよう

Suppose you must analyze 100TB text data

1CPUとかもうマヂムリhellip

コンピュータ

プログラム

データ

Single CPU cannot handle 100TB

そうだ分散処理しようノード0 ノード1 ノード2 ノード3

プログラムプログラムプログラムプログラム

データデータデータデータ

You need more computers (distributed processing)

でも分散処理ってめんどいhellipマヂムリhellip

Distributed processing is too difficulthellip

そこで並列RDBですよ

Parallel RDB may help you

Parallel RDBNode 0 Node 1 Node 2 Node 3

Front End

Back End

並列RDBの特長

1 ノードを増やせば線形に速くなる

2 標準SQLが使える

3 クライアントからは1台に見える

has linear scalability

You can use SQL

Looks as the one computer

Parallel RDB is great becausehellip

並列RDB超スゴイ age age マック

Parallel RDB is great

いろいろな商用並列RDBDatabase Vendor Since

Teradata Teradata 1983

Teradata Aster Teradata 2005

PureData System for Analytics IBM 2000

Exadata Oracle 2008

Greenplum Pivotal 2003

SQL Server PDW Microsoft 2010くらい

Redshift Amazon 2012

Various parallel RDBs

Here Comes a New Challenger

since 2005

Hadoop Architecture

HDFS Distributed File System

MapReduce Compute Framework

(Hive SQL interface)

Hadoopの特徴

1 データはテーブルではなくファイル

2 処理にはMapReduceを使う（使っていた）

3 Hiveを乗せるとSQLっぽい言語でも書ける

Hadoop data is plain file

Processed by MapReduce

Hive allows you to write SQL-like query

猫も194780子もMapReduce

ldquoBig Datardquo meant MapReduce few years ago

MapReducek1 v1

Reduce

Map関数とReduce関数を書いたらよしなに

分散してくれるフレームワーク

You just write MapampReduce functions Hadoop serves the rest

Q1 SQLとMapReduce どっちがいいの

Which is good SQL and MapReduce

ビジネス的な答え

なぜSQLか

1 SQL書けてもJava書けない人は多い

2 既存のSQLを使ったアプリが動かない

3 MapReduce関数を書くのは高コスト

Many people can write SQL but not Java

Many applications rely on SQL

Writing MR function needs more time

コスト差ってどれくらいよpackage orgmyorg import javaioIOException import javautil import orgapachehadoopfsPath import orgapachehadoopconf import orgapachehadoopio import orgapachehadoopmapred import orgapachehadooputil public class WordCount public static class Map extends MapReduceBase implements MapperltLongWritable Text Text IntWritablegt private final static IntWritable one = new IntWritable(1) private Text word = new Text() public void map(LongWritable key Text value OutputCollectorltText IntWritablegt output Reporter reporter) throws IOException String line = valuetoString() StringTokenizer tokenizer = new StringTokenizer(line) while (tokenizerhasMoreTokens()) wordset(tokenizernextToken()) outputcollect(word one) public static class Reduce extends MapReduceBase implements ReducerltText IntWritable Text IntWritablegt public void reduce(Text key IteratorltIntWritablegt values OutputCollectorltText IntWritablegt output Reporter reporter) throws IOException int sum = 0 while (valueshasNext()) sum += valuesnext()get() outputcollect(key new IntWritable(sum)) public static void main(String[] args) throws Exception JobConf conf = new JobConf(WordCountclass) confsetJobName(wordcount) confsetOutputKeyClass(Textclass) confsetOutputValueClass(IntWritableclass) confsetMapperClass(Mapclass) confsetCombinerClass(Reduceclass) confsetReducerClass(Reduceclass) confsetInputFormat(TextInputFormatclass) confsetOutputFormat(TextOutputFormatclass) FileInputFormatsetInputPaths(conf new Path(args[0])) FileOutputFormatsetOutputPath(conf new Path(args[1])) JobClientrunJob(conf)

select count() from ( select regexp_split_to_table(str lsquos+) from text_table ) t

MapReduceによるWordCount SQLによるWordCount

実際SQLが勝った

Now SQL beats MapReduce

Q2 Hadoopと並列RDBは

どっちがいいんですか

Which is good Hadoop or parallel RDB

速度は並列RDB データ構造はHadoop

Parallel RDB is faster Hadoop is more flexible

現在ありがちな構成

MapReduce

今後の構成

impala backend

impala frontend

Hadoopは並列RDBに似てきている

DB filesystem

backend

parser planner

impala be

impala fe

Hadoop resembles to parallel RDB now

Hybrid DB comes in near future

Q3 MapReduceは

お亡くなりですか

MapReduce is dead

まだだっhelliphellip まだ終わらんよ

MapReduceは並列処理にJavaやCを

はさみこめる

MapReduce has better extendability

例Asterの SQL-MapReduce

SQLからMapReduce呼べるselect count(distinct user_id) from npath( on clicks partition by user_id order by timestamp mode(overlapping) pattern(lsquoHSPrsquo) symbols( page_type = lsquohomersquo AS H page_type = lsquosearchrsquo AS S page_type = lsquoproductrsquo AS P) result(first(user_id of H) as user_id) )

最近Hiveにもnpath入りました (MatchPath)

You can combine MapReduce with SQL

Easy amp Handy SQL +

Extendable MapReduce

最後にポエム

よいものはよい

Great product is anywhere

だが知識は偏在している

but knowledge is maldistributed

OSS World Enterprise World

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

今日のお話について

Theme of this session

コンピュータ

プログラム

データ

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

コンピュータ

プログラム

データ

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

コンピュータ

プログラム

データ

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

コンピュータ

プログラム

データ

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

コンピュータ

プログラム

データ

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

コンピュータ

プログラム

データ

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

コンピュータ

プログラム

データ

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Front End

Back End

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

並列RDBの特長

You can use SQL

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Exadata Oracle 2008

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

since 2005

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Hadoop Architecture

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Hadoopの特徴

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

MapReducek1 v1

Reduce

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

なぜSQLか

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

MapReduce

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

今後の構成

impala backend

impala frontend

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

DB filesystem

backend

parser planner

impala be

impala fe

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Q3 MapReduceは

MapReduce is dead

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

はさみこめる

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

最後にポエム

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Hadoop

Python Excel

並列RDB

Windows

markdown VB

Cross the border

Oedo Ruby Conference 04: Ruby会議でSQLの話をするのは間違っているだろうか

Technology

Ruby on Rails – 1. Ruby

Rubyでデータ分析が出来る未来 - digitalfukuoka.jp · 01.01.2017 · Rubyを使ったデータ分析をはじめることが出来る • Rubyで集めたデータをArrowに対応している

松江Ruby会議03 LT るりまアピール

SIビジネスを変えよう。~ Ruby+Ruby on RailsによるエンタープライズCloudアプリケーション事業とは~

Rubyで創るOpenFlowネットワーク - LLまつり

Ruby Time!!!: Una Introducción a Ruby

Ruby で覚える TOTP

明日使える超高速Ruby - RXbyak (Mitaka.rb #5)

Iron rubyとsinatraで作るデスクトップアプリ

Romantic Ruby!(Ruby Kaigi2009 Lt)

Rubyで操るAWS 第67回Ruby関西勉強会

Introduction to Ruby & Ruby on Rails

Rubyによる本気の GC - Rabbit Slide Show · Rubyによる本気のGC - Serious GC with Ruby Powered by Rabbit 2.0.5 Rubyによる本気の GC Serious GC with Ruby @nari3 #sprk2012

Rubyによる分散ストレージシステムの実装 | Ricoh Technical … · Ricoh Technical Report No.37 11 DECEMBER, 2011 Rubyによる分散ストレージシステムの実装

Ruby《を》教えてるんじゃない、 Ruby《で》教えてるんだってばprg.is.titech.ac.jp/members/masuhara/papers/ruby-kaigi... · 2008-06-23 · 1 Ruby《を》教えてるんじゃない、

Rubyによるデータ分析・活用 · Rubyによるデータ分析・活用 Sadayuki Furuhashi RubyWorld Conference 2018 RubyWorld Conference 2018 @frsyuki Senior Principal Engineer

ここがいいんだよ！Ruby Tips集 ~JavaScript、PHP、Rubyとで簡単に文法を比べてみる~｜Ruby / Ruby on Rails ビギナーズ勉強会第１回

Ruby による Agile 開発

Ruby/SDLの現在と未来ohai/ruby-kansai-20090725.pdf2009/07/25 · 目次自己紹介 Ruby/SDLの概要 I SDLとは何か I Ruby/SDLとは何か I Ruby/SDLでできること I

Rubyで作るクローラー Ruby crawler