8. 8 LOAD STORE REGEX_EXTRACT, FILTER, FOREACH, GROUP, JOIN, UNION, SPLIT, AVG, COUNT, MAX, MIN, SIZE, ABS, RANDOM, ROUND, INDEXOF, SUBSTRING, REGEX EXTRACT, Debug DUMP, DESCRIBE, EXPLAIN, ILLUSTRATE HDFS cat, ls, cp, mkdir, $ pig x grunt> A = LOAD file1 AS (x, y, z); grunt> B = FILTER A by y > 10000; grunt> STORE B INTO output;
10. pig 10 A1 12.5 LOAD LOAD FILTER JOIN GROUP FOREACH STORE (nm, dp, id) (nm, dp, id) (id, dt, hr) (nm, dp, id, id, dt, hr) (group, {(nm, dp, id, id, dt, hr)}) (group, ., AVG(hr)) (dp,group, nm, hr) Logical PlanPig Latin A = LOAD 'file1.txt' using PigStorage(',') AS (nm, dp, id) ; B = LOAD file2.txt' using PigStorage(',') AS (id, dt, hr) ; C = FILTER B by hr > 8; D = JOIN C BY id, A BY id; E = GROUP D BY A::id; F = FOREACH E GENERATE $1.dp,group,$1.nm, AVG($1.hr); STORE F INTO '/tmp/pig_output/'; nm dp Id Id dt hr A1 A1 7/7 13 B1 A1 7/8 12 B2 A1 7/9 4 Tips : pig x local dump or illustrate
11. : : DBtable csv Hadoop HDFS, .. : PIGMapReduce SQL DB : 11 sql server Hive
13. Hive .. CLI WebUI API JDBC and ODBC Thrift Server (hiveserver) Client API HiveQL Metastore DB, table, partition 13 figure Source : http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive
14. 14 $ hive hive> create table A(x int, y int, z int) hive> load data local inpath file1 into table A; hive> select * from A where y>10000 hive> insert table B select * from A where y>10000 figure Source : http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/
15. Hive 15 A1 12.5 HiveQL > create table A (nm String, dp String, id String) > create table B (id String, dt Date, hr int) > create table final (dp String, id String , nm String, avg float) > load data inpath file1 into table A; > load data inpath file2 into table B; > Insert table final select a.id, collect_set(a.dp), collect_set(a.nm), avg(b.hr) from a,b where b.hr > 8 and b.id = a.id group by a.id; nm dp Id id dt hr A1 A1 7/7 13 B1 A1 7/8 12 B2 A1 7/9 4 Tips : create table & load data tool
16. HiveSQL Hive RDMS HQL SQL HDFS Raw Device or Local FS MapReduce Excutor NO YES Index, Bigmap index 16 Source : http://sishuok.com/forum/blogPost/list/6220.html
17. Pig vs Hive 17 Hive Pig SQL-LIKE PigLatin Yes/ Schemas/ Types Yes / Yes Partitions No Thrift Server No Yes Web Interface No Yes(limited) JDBC/ODBC No No Hdsf Yes Hive Pig Big Data Source : http://f.dataguru.cn/thread-33553-1-1.html
37. Pig example result 37 A = LOAD '/user/waue/pig_input/file1.txt' using PigStorage(',') AS (nm, dp, id) ; B = LOAD '/user/waue/pig_input/file2.txt' using PigStorage(',') AS (id, dt, hr) ; C = FILTER B by hr > 8; D = JOIN C BY id, A BY id; E = GROUP D BY A::id; F = FOREACH E GENERATE $1.dp,group,$1.nm, AVG($1.hr); STORE F INTO '/tmp/pig_output/';
38. Hive example result 38 INSERT OVERWRITE TABLE final select a.id, collect_set(a.dp), collect_set(a.nm), avg(b.hr) from a,b where b.hr > 8 and b.id = a.id group by a.id;