Transcript

1

SIGMOD/PODS 2009 Providence, Rhode Island, USA1Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability Query Ruoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote2SIGMOD/PODS 2009ACM SIGMOD/PODS Conferencethe 35th SIGMOD international conference on Management of datathe 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database200962972 http://www.sigmod09.org/SIGMOD-J 2009/9/15SIGMOD/PODS20093Providence, Rhode Island, USA4,002 km4,017 kmState of Rhode Island and Providence Plantations (2006)

SIGMOD-J 2009/9/15SIGMOD/PODS20094SIGMOD 2009 PC1783 PC 123Research Track5264303063 15.75%SIGMOD-J 2009/9/15SIGMOD/PODS20095SIGMOD

SIGMOD-J 2009/9/15SIGMOD/PODS20096: Benjamin Bercovitz, Filip Kaliszan, Georgia Koutrika, Henry Liou, Zahra Mohammadi Zadeh, Hector Garcia-Molina (Stanford U.): "CourseRank: a social system for course planning" (http://www.courserank.com/)http://db.csail.mit.edu/sigmod09contest/Main Memory Transactional Index: Clment Genzmer (cole polytechnique, France): "Innovative, hash-based main-memory index" (http://ks36587.kimsufi.com/implementation.htm) SIGMOD'08Wikihttp://www.pubzone.org

SIGMOD-J 2009/9/15SIGMOD/PODS20097Keynote TalkSIGMODHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureFernanda B. Viegas (IBM) and Martin Wattenberg (IBM)Transforming Data Access Through Public VisualizationPODSA Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu SIGMOD-J 2009/9/15SIGMOD/PODS20098SIGMOD AwardsSIGMOD Edgar F. Codd Innovations AwardMasaru Kitsuregawa (Univ. of Tokyo)SIGMOD Contributions AwardBeng Chin Ooi (National University of Singapore )SIGMOD Test-of-Time AwardJeffrey Vitter and Min Wang: "Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets"SIGMOD Best Paper AwardChristopher Olston, Shubham Chopra, Utkarsh Srivastava (Yahoo! Research): "Generating Example Data for Dataflow Programs"SIGMOD Best Paper Award Runner-upMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands): "An Architecture for Recycling Intermediates in a Column-store"SIGMOD-J 2009/9/15SIGMOD/PODS20099PODS AwardsPODS Alberto O. Mendelzon Test-of Time AwardGeorg Gottlob, Nicola Leone, Francesco Scarcello: "Hypertree Decompositions and Tractable Queries"PODS Best Paper AwardGeorg Gottlob (Oxford U.), Stephanie Tien Lee (Oxford U.), Gregory J. Valiant (UC Berkeley): "Size and Treewidth Bounds for Conjunctive Queries"PODS Best Student Paper AwardPawel Parys (Warsaw University): "XPath Evaluation in Linear Time with Polynomial Combined Complexity"

SIGMOD/PODS2009SIGMOD-J 2009/9/151045 for39 and30 data29 a27 in26 of19 query16 with16 the13 queries13 on10 search 9 databases 8 over 7 xml 7 to 7 efficient 7 database 6 web 6 probabilistic 6 optimization 6 dynamic 6 applications 6 an 5 using 5 uncertain 5 processing 5 memory 5 management 5 indexing 5 from 5 based 5 approach 5 analysis 4 system 4 social 4 skyline 4 privacy 4 join 4 interactive 4 information 4 extraction 4 exploration 4 entity 4 enterprise 4 distributed 4 comparison 3 xpath 3 top-k 3 systems 3 space 3 schemas 3 schema 3 querying 3 programs 3 plan 3 partitioning 3 neighbor 3 nearest 3 matching 3 maintenance 3 keyword 3 indexes 3 framework 3 flash 3 exploring 3 exploiting 3 design 3 data: 3 cost 3 computation 3 combining 3 by 3 bounds 3 architecture 3 answering

614SIGMOD-J 2009/9/15SIGMOD/PODS200911SIGMOD-J 2009/9/15

SIGMOD/PODS 20102010 6611SIGMOD/PODS200912Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote13Panel40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft)Ronald Fagin (IBM)Michael Stonebraker (MIT)Patricia Selinger (IBM)David J. DeWitt (Microsoft)

PanelPanel: 40Years of Relational Model CelebrationPhilip BernsteinMicrosoft ResearchPanel40 Years of Relational Model CelebrationPanel: 40Years of Relational Model CelebrationData Processing in 1969COBOLPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/15161950 1950 COBOLAPI

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/15171960 DBInformatics Mark-IV (4)General Electric IDS (Integrated Data Store)Charlie BackmanCODASYL1970 DBIBM IMS (Information Management System)CODASYL DBTG IDS, TDMS (System Development Corp.),MARS VI (CDC), MRI System 2000, ADABAS (Software AG)Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1518Codd

"Evolution of Data-Base Management Systems", J. P. Fry and E. H. Sibley, ACM Computing Surveys, March 1976"A Relational Model of Data for Large Shared Data Banks", Edger F. Codd, CACM, June 1970Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1519A Short VideoPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/15

Sharon CoddBobri Roberts(Digital Born Productions)

20Ted Codd: Ronald FaginIBM ResearchPanel40 Years of Relational Model CelebrationPanel: 40Years of Relational Model CelebrationCoddCodd1975IBM Watson IBM San Jose Ted Codd CoddW. W. Armstrong, Dependency Structures of Data Base References, IFlP Conf Proc. 1974, p. 580.Philip Bernstein P. A. Bernstein, Synthesizing Third Normal Form Relations from Functional Dependencies, ACM Trans. Data-base Syst. (1976).Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1522DB"Functional Dependencies in a Relational Database and Propositional Logic", IBM J. Research and Development, 1977IBM Journals 50"significant papers"Jeffrey D. Ullman SIGMOD-J 2009/9/15Panel: 40Years of Relational Model Celebration23CoddCoddNull3NULLCoddNullFrank KingNullSystem RSIGMOD-J 2009/9/15Panel: 40Years of Relational Model Celebration24CoddTedTeddyPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1525Codd in two wordsVisionaryFighterCoddRelational vs. CODASYL 1974CoddBachman"great debate"Relational vs. IMSPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1526Remembering Ted CoddMichael StonebrakerMITPanel40 Years of Relational Model CelebrationPanel: 40Years of Relational Model Celebration1975 RDBMSCODASYL CODASYLPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1528The Great Debate1974ACM SIGFIDET WorkshopE. F. CoddC. W. BackmanCoddBackmanDB IDSCODASYL1973CODASYL1970Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1529CODASYLCODASYL1978

CODASYLCodd

CoddBackmanDBPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1530IBM IMS (Information Management System)19701980IBMDBIBMIMSMVSIBMOS

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1531IMS IMSDBIMSBL/1

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1532IMSRDBMS1-n JoinDBPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1533IMS?PL/1IBM Christopher J. DateSQL

SQL

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1534Remembering System RPatricia Selinger IBM ResearchPanel40 Years of Relational Model CelebrationPanel: 40Years of Relational Model Celebration

12Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1536DB

12Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/153712...Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1538System R1973 IBM San Jose Research DBMSSQLDDLSystem RDB2DBMSPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1539SEQUEL"SEQUEL: A Standard English Query Language", IBM San Jose Research Report, 1974UpdateGroupViewAuthorization1977 SQLSQLSystem RDB21986 ISO1989199219982003

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1540System RPat SelingerJim GraySystem RDB2Raymond Lorie

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/154125664GB15SQL32KB4SQLPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/15421970-198515David J. DeWittMicrosoftPanel40 Years of Relational Model CelebrationPanel: 40Years of Relational Model CelebrationDB100Teradata, Greenplum, Vertica, Datallegro (MS), Asterdata, IBM, ...UGLY, Truly UGLY

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/15441970 "A Relational Data Model for Large Shared Data Banks", E. F. Codd"Logic per Track Devices", D. L. Slotnick, Advances in Computers, pp. 291-296

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/15451970 Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1546 (1975)

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1547Slotnick CPUDB"Processer-per-track(PPT)" CPUCPUCPUCPUCPUCPUPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1548PPT1973 CASSM Florida)1975 RAP Toronto1976 RARES UtahDB

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1549PPT1970

25KB1000O(n2)nPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1550PPH-based DB Processor-per-head(PPH)DBCSURERAID-5ContorollercpucpucpucpucpucpucpuPanel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/15511970RAP2, DIRECT, Infoplex, RDBM, DBMAC

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/15521985 15VLSI1985DBCPU, , (Ethernet, TCP/IP, RPC)Grace hash join ()

Panel: 40Years of Relational Model CelebrationSIGMOD-J 2009/9/1553Philip A. BernsteinDBCoddRon FaginCoddMichael StonebrakerCODASYLIMSPatricia Selinger System RDavid J. DeWitt15SIGMOD-J 2009/9/1554Panel: 40Years of Relational Model CelebrationGenerating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote55Generating Example Data for Dataflow Programs Christopher Olston (Yahoo! Research)Shubham Chopra (Yahoo! Research)Utkarsh Srivastava (Yahoo! Research)

BestPaperBest Paper: Generating Example Data for Dataflow ProgramsData Processing TB/day at Yahoo!MapReduce, Pig Latin, DryadAurora, Tioga, River

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1557Example Dataflow ProgramLOADuser, urlLOADurl, pagerankTRANSFORMuser, canonicalize(url)JOINurl = urlGROUPby userTRANSFORMuser, AVG(pagerank)FILTERavgPagerank > 0.5

Web12UDFcanonicalize 3FILTERBest Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1558Example DataExample dataTBExample data

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1559Example Data LOADuser, urlLOADurl, pagerankTRANSFORMuser, canonicalize(url)JOINurl = urlGROUPby userTRANSFORMuser, AVG(pagerank)FILTERavgPagerank > 0.5(Amy, cnn.com)(Amy, http://www.frogs.com)(Fred, www.snails.com/index.com)(Amy, www.cnn.com)(Amy, www.frogs.com)(Fred, www.snails.com)(www.cnn.com, 0.9)(www.frogs.com, 0.3)(www.snails.com, 0.4)(Amy, www.cnn.com, 0.9)(Amy, www.frogs.com, 0.3)(Fred, www.snails.com, 0.4)(Amy, )

(Fred, )(Amy, 0.6)(Fred, 0.4)(Amy, 0.6)(Amy, www.cnn.com, 0.9)(Amy, www.frogs.com, 0.3)(Fred, www.snails.com, 0.4)Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1560Example Example

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1561Example DataRealism0Realism1LOADuser, canonicalize(url)LOADurl, pagerankTRANSFORMuser, canonicalize(url)JOINurl = urlGROUPby userTRANSFORMuser, AVG(pagerank)FILTERavgPagerank > 0.5(Amy, cnn.com)(Amy, http://www.frogs.com)(Fred, www.snails.com/index.com)(www.cnn.com, 0.9)(www.frogs.com, 0.3)(www.snails.com, 0.4)Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1562Example DataCompletenessLOADuser, canonicalize(url)LOADurl, pagerankTRANSFORMuser, canonicalize(url)JOINurl = urlGROUPby userTRANSFORMuser, AVG(pagerank)FILTERavgPagerank > 0.5(Amy, 0.6)(Fred, 0.4)(Amy, 0.6)

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1563FILTERE0E1JOINE0UNIONE1E22Dangling tupleBest Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1564Example DataCompletenessLOADuser, canonicalize(url)LOADurl, pagerankTRANSFORMuser, canonicalize(url)JOINurl = urlGROUPby userTRANSFORMuser, AVG(pagerank)FILTERavgPagerank > 0.5(Amy, 0.6)(Fred, 0.4)(Amy, 0.6)0Completeness1

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1565Example Data ConcisenessLOADuser, canonicalize(url)LOADurl, pagerankTRANSFORMuser, canonicalize(url)JOINurl = urlGROUPby userTRANSFORMuser, AVG(pagerank)FILTERavgPagerank > 0.5(Amy, cnn.com)(Amy, http://www.frogs.com)(Fred, www.snails.com/index.com)(Amy, www.cnn.com)(Amy, www.frogs.com)(Fred, www.snails.com)(www.cnn.com, 0.9)(www.frogs.com, 0.3)(www.snails.com, 0.4)0Conciseness1

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1566Example Example

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/15671Downstream Propergation

Realism

Completeness

Conciseness

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/15682Upstream Propagation

Realism

Completeness

Conciseness

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1569

DownstreamPruningUpstreamPruningLOAD(user, age)LOAD(user, age)FILTERUDF(user)UNIONFILTERage > 18(Amy, 20)(Fred, 25)(Jack, 30)(Amy, 20)(Fred, 25)(Amy, 20)(Fred, 25)(Jack, 30)(Amy, 20)(Fred, 25)(Jack, 30)Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1570DownstreamPruningUpstreamPruningExample(Amy, 20)(Fred, 25)(Jack, 30)(Amy, 20)(Fred, 25)(Amy, 20)(Fred, 25)(Jack, 30)(Amy, 20)(Fred, 25)(Jack, 30)LOAD(user, age)LOAD(user, age)FILTERUDF(user)UNIONFILTERage > 18Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1571Pruning Example NP

O(logn)Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1572SIGMOD-J 2009/9/15(Amy, 20)

(Jack, 30)

(Amy, 20)

(Amy, 20)(Jack, 30)

DownstreamPruningUpstreamPruning(Amy, 20)(Jack, 30)LOAD(user, age)LOAD(user, age)FILTERUDF(user)UNIONFILTERage > 18Best Paper: Generating Example Data for Dataflow Programs(Bob, 17)(--, 17)

(Bill, 17)(Bill, 17)(Bill, 17)(Bob, 17)(--, 17)

(--, 17)

(--, 17)

73SIGMOD-J 2009/9/15DownstreamPruningUpstreamPruningExample(Amy, 20)(Bill, 17)(Jack, 30)(Bob, 17)(Amy, 20)(Bill, 17)(Amy, 20)(Jack, 30)(Bill, 17)(Bob, 17)(Amy, 20)(Jack, 30)LOAD(user, age)LOAD(user, age)FILTERUDF(user)UNIONFILTERage > 18Best Paper: Generating Example Data for Dataflow Programs74PigILLUSTRATEEclipsePigPen

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1575Example Example

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1576Program 3: Web LOAD table CFILTER by GROUPTRANSFORM using

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1577Program 3 Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1578say downstream run with 10000 initial samples, same as our algo78Program 8 : Web LOAD table AFILTER A by LOAD table BJOIN A and B ()TRANSFORM using 4 ()

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1579Program 8 Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1580investigate completeness with downstream80SIGMOD-J 2009/9/15Upstream 3.5Best Paper: Generating Example Data for Dataflow Programs81exampleRealismConcisenessCompletenessRealism

Best Paper: Generating Example Data for Dataflow ProgramsSIGMOD-J 2009/9/1582Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote83An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (CWI: )BestPaperRunner-up

Runner-up: Recycling Intermediates in a Column-storeRecycler TPC-HSkyServerRunner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/15

852Tuple-at-a-time Operator-at-a-timeRunner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1586MonetDBOperator-at-a-time Column-storeRecyclerMonetDBRunner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1587Recycler TPC-HSkyServerRunner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/15

88MonetDB SQLOptimizerMonetDBMonetDB Serverfunction user.s1 2(A0:date,...):void; X5 := sql.bind(sys,lineitem",...); X11 := algebra.uselect(X5,A3); X14 := algebra.markT(X11,0@0); X15 := bat.reverse(X14); X16 := sql.bindIdxbat(sys,...); X18 := algebra.join(X15,X16); X19 := sql.bind(sys,orders,...); X25 := mtime.addmonths(A1,A2);MAL (MonetDB Assembly Language)MAL

RecycleOptimizerfunction user.s1 2(A0:date,...):void; X5 := sql.bind(sys,lineitem",...); X11 := algebra.uselect(X5,A3); X14 := algebra.markT(X11,0@0); X15 := bat.reverse(X14); X16 := sql.bindIdxbat(sys,...); X18 := algebra.join(X15,X16); X19 := sql.bind(sys,orders,...); X25 := mtime.addmonths(A1,A2);

Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1589:

X1 := sql.bind("sys", "orders", "o_orderdate", 0);...

NameValueDate typeSizeX110:bat[:oid, :date]T1"sys":strT2"orders":str............Y3 := sql.bind("sys", "orders", "o_orderdate", 0);

X1Y1 X1Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1590

X3 := algebra.select(X1, 10, 80)...

NameValueDate typeSizeX110:bat[:oid, :date]1000X3130:bat[:oid, :date]700X5150:bat[:oid, :date]500............Y3 := algebra.select(X1, 20, 45)

10-80 20-45X3X1 X3Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1591

X3 := algebra.select(X1, 10, 80)X5 := algebra.select(X1, 20, 60)

NameValueDate typeSizeX110:bat[:oid, :date]1000X3130:bat[:oid, :date]700X5150:bat[:oid, :date]500............Y3 := algebra.select(X1, 20, 45)

X5X3X5X1X5X3X5Y320-45Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1592Recycle PoolRecycle Pool

sql.bind("C1")sql.bind("C2")algebra.selectalgebra.join...Query 1sql.bind("C1")sql.bind("C2")algebra.selectalgebra.joinX1 := sql.bind("C1")X2 := algebra.select(X1)X3 := sql.bind("C2")X4 := algebra.join(X2, X3)Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1593Recycle PoolRecycle Pool

sql.bind("C1")sql.bind("C2")algebra.selectalgebra.joinQuery 2sql.bind("C3")algebra.join...sql.bind("C1")sql.bind("C2")algebra.selectalgebra.joinX1 := sql.bind("C1")X2 := algebra.select(X1)X3 := sql.bind("C2")X4 := algebra.join(X2, X3)X1X2X3X4Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1594KEEPALLCREDIT AB2 A:2, B:21ABab1 A:1,B:1A'B'A'aB'bB'b' A:1, B:00A''B''a''b''

Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1595ABQuery planLeast Recently Used (LRU): Benefit Policy (BP): CPUIO Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1596Recycler TPC-HSkyServerRunner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/15

97TPC-HScale Factor 1: 1GBKEEPALLCREDITLRUBP()Intra-query (local): Inter-query (global):

Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/1598CREDIT

Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/15Q19(mix)CreditHit ratioQ18(inter-query): 100%Q11(intra-query): hit ratio local Credit99

4GB 42.7%521928%TPC-H 200

NaiveLRUBPCREDITKEEPALL40%Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/15100SkyServer EvaluationSkyServerJim GrayData Release 4100GB200811001502254

11001.5GB95.6%14296785Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/15500NAIVE4057KEEPALL/UNLIMITED17CRD/LRU/1GB1433101SkyServerTPC-HMapReduce

Runner-up: Recycling Intermediates in a Column-storeSIGMOD-J 2009/9/15102Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote103SIGMOD-J 2009/9/15

104SIGMOD-J 2009/9/15

Newport

105SIGMOD-J 2009/9/15

SIGMOD106Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote107A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu

PODSKeynotePODS Keynote: A Web of ConceptsWebWebDish-Dash "index" Dish-Dash

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/151091yelp.com59%URL11%19%WebPODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15110URL59%URL35%URLWeb42%11.5%9%1%10%

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15111Y!

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15112Google

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15113SIGMOD-J 2009/9/15PODS Keynote: A Web of Concepts

W FIFA114PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15115

Index

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15116URLConcept-centric

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15117PODS Keynote: A Web of ConceptsWebPODS Keynote: A Web of Concepts: Webe.g. e.g. PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15119: 3 e.g. vs

Concept-PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15120 ConceptsWebe.g. IDe.g. NFL IDe.g.

ERPODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15121Loosely-structured Record (lrec)IDDBMSlreclrec

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15122LrecWebLrece.g. Instance-of is-part-of

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/151234e.g. Semantic Web Yahoo! SearchMonkey

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15124PODS Keynote: A Web of ConceptsPODS Keynote: A Web of ConceptsRAWFor years, Microsoft Corporation CEO Bill Gates was against open source. But today he appears to have changed his mind. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.Richard Stallman, founder of the Free Software Foundation, countered sayingNameTitleOrganizationBill GatesCEOMicrosoftBill VeghteVPMicrosoftRichard StallmanFounderFree Software...SELECT Name FROM PeopleWHERE Organization = 'Microsoft'PeopleBill GatesBill VeghtePODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15126Websanjose.comcitysearchyelpwikipedia

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15127PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15128Avatar (IBM Almaden)DeepWeb & WebTables (Google)DBLife (U. Wisconsin)KnowItAll & Text Runner (U. Washington)Rexa (U. Mass)

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15129 vs. e.g. UNIVERSITY OF CICAGODepartment of MathematicsGROUP THEORY SEMINAR

May 1J. Alperin"MacKey conjecture and Hecke algebras"

May 8G. Lehrer"Endomorphism algebras of tensor powers of quantum group representations of at roots of unity"PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15130 vs. UNIVERSITY OF CICAGODepartment of MathematicsGROUP THEORY SEMINAR

May 1J. Alperin"MacKey conjecture and Hecke algebras"

May 8G. Lehrer"Endomorphism algebras of tensor powers of quantum group representations of at roots of unity"PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15131 e.g. HTMLtable, list e.g. PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15132PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15133PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15134PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15135PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15136PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15137Accepted papersWebDBLPPODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15138XPathPODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15139SIGMOD-J 2009/9/15140

SIGMOD-J 2009/9/15141CSS

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15142

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15143

PODS Keynote: A Web of Concepts Noisy label

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15144e.g. blog, news, e.g. [N. Dalvi, R Kumar, B. Pang, A. Tomkins, EMNLP 09]PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15145sanjose.com

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15146WikipediaYelpLinkedInFreebaseKnowItAllDBLifePODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15147

PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15148PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15149PODS Keynote: A Web of ConceptsSIGMOD-J 2009/9/15150Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote151A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown University)Erik Paulson (University of Wisconsin)Alexander Rasin (Brown University)Daniel J. Abadi (Yale University)David J. DeWitt (Microsoft Inc.)Samuel Madden (MIT)Michael Stonebraker (MIT)A Comparison of Approaches to Large-Scale Data AnalysisMapReduceMR20RDBMSDBMS A Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15153MapReduceFSFSRreduceRreduceMmapMmapMmapshufflesortsortshuffleA Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15filtermergemerge154154MapReduceDBMSMapReduceDBMSkey/value RelationhashB-treeCODASYLMapReduceOptimizerMap ReduceMap Reduce SQL3MapReduceA Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15155MapReduceHadoopYahoo! ApacheJavaHDFS3(default)2 Map1 ReduceDBMSDBMS-XDBRowVerticaColumn(default)A Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/151561002.4 GHz Intel Core 2 Duo64bit RedHat Linux 5kernel version 2.6.184GB RAM250 GB SATA-I hard disk * 2hdparmCache read7GB/secDisk read 74MB/sec160Gbps /50MRGrepData Loading, Task Execution2HTML Selection, Aggregation, Join, UDF Aggregation 4Vartica DBMS-XHadoopReduceHDFSA Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15157Grep TaskVerticaCOPYDBMS-XLOADhashHadoopHadoopHDFS

560(535MB)/node1TB/total

Hadoop25A Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15158Grep TaskDBMS SQLSELECT * FROM data WHERE field LIKE %XYZ%;HadoopMapReduceMapkey/valueReduceMapReduce

560(535MB)/node1TB/total2DBMSstart-up100A Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15159CREATE TABLE Documents ( url VARCHAR(100) PRIMARY KEY, contents TEXT );CREATE TABLE Rankings ( pageURL VARCHAR(100) PRIMARY KEY, pageRank INT, avgDuration INT );CREATE TABLE UserVisits ( sourceIP VARCHAR(16), destURL VARCHAR(100), visitDate DATE, adRevenue FLOAT, userAgent VARCHAR(64), countryCode VARCHAR(3), languageCode VARCHAR(6), searchWord VARCHAR(32), duration INT );HTMLURLHTMLDocuments:607GB/nodeWebRankings18001GB/nodeUserVisits1.520GB/nodeHadoopHDFSAPIDBMS-XRankings:pageURLpageRankUserVisits: destURLvisitDateVerticapageRankvisitDateA Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15160HTMLSelection TaskSQLSELECT pageURL, pageRank FROM Rankings WHERE pageRank > X;

MapReduceMapRankingsXReduceMapReduce

DBMSHadooppageRankstart-upA Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15161162SIGMOD-J 2009/9/15HTMLAggregation TaskSQL250053MB

200024KB 7

MapReduce1MapsourceIPadRevenueReduceadRevenu2MapsourceIP7adRevenueReduceMapReduce

DBMSVertica2200byte220byteSELECT SUBSTR(sourceIP,1,7),SUM(adRevenue) FROM UserVisits GROUP BY SUBSTR(sourceIP,1,7);SELECT sourceIP, SUM(adRevenue) FROM UserVisits GROUP BY sourceIP;

A Comparison of Approaches to Large-Scale Data Analysis25002000

HTMLJoin TaskSELECT INTO Temp sourceIP, AVG(pageRank) as avgPageRank, SUM(adRevenue) as totalRevenueFROM Rankings AS R, UserVisits AS UVWHERE R.pageURL = UV.destURL AND UV.visitDate BETWEEN Date(2000-01-15) AND Date(2000-01-22)GROUP BY UV.sourceIP;SELECT sourceIP,totalRevenue,avgPageRankFROM TempORDER BY totalRevenue DESC LIMIT 1;2SQL2000/01/15-22sourceIPpageRankadRevenue3TempTemp totalRevenue 3MapReduce2000/01/15-22UserVisitRankingsURL pageRankadRevenue adRevenue

HadoopDBMSIndexSequencal A)1434.7s, B)24.3s, C)12.7sJoinhashVerticaA Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15163

HTMLUDF Aggregation Taskurl value Temp

urlDBMS-XUDFSQL1VerticaUDFSQL1UDFSQL1HadoopMapURLkey1valueReduceURL1PageRankURLSELECT INTO Temp UDF(contents) FROM Documents;SELECT url, SUM(value) FROM Temp GROUP BY url;

HadoopDBMS-XHadoop1DBMS-XUDF1VerticaA Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15164HadoopDBMS-XVerticaTask35DBMSIOCPUPullOptimizerPushJavaSQLA Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15165Hadoop DBMS-XMR3.2VerticaDBMS-X2.3UDFHadoop4B-treeColum-storeMapReduceMapReduce

A Comparison of Approaches to Large-Scale Data AnalysisSIGMOD-J 2009/9/15166HadoopDBVLDB2009HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. A. Abouzeid, K. B. Pawlikowski, D. J. Abadi, A. Silberschatz, A. Rasin. VLDB 2009.HadoopDBSQLSIGMOD-J SIGMOD-J 2009/9/15A Comparison of Approaches to Large-Scale Data Analysis167Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote168Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (University of Toronto)Stavros Harizopoulos (Hewlett-Packard Laboratories)Mehul A. Shah (Hewlett-Packard Laboratories)Janet L. Wiener (Hewlett-Packard Laboratories)Goetz Graefe (Hewlett-Packard Laboratories)Query Processing Techniques for Solid State DrivesSolid State Drive (SSD)HDDSSDHDDSSD201250%IDC2007Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15SATA HDDSSD 1MtronSSD 2Stec ZeusIOPSSSD 3Fusion IO ioDrive GB50032146320 $60 5001241010240 $/GB$0.12$15.62$85$32(W)1328.46seq. read (MB/s)608092700seq. write (MB/s)55100108500ran. read (IO/s)120112005400079000ran. write (IO/s)12096001500060000IO/s/$211.24.48.3IO/s/W9.25600643013.166170HDDSSDQuery Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15171SSDDBFlash-friendly FlashScanFlashJoinPostgreSQLFlash only systemBI workload TPC-Hscan3-4join2Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15172Flash-friendly FlashScan FlashJoin

Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15173SIGMOD-J 2009/9/15

Flash-Friendly pageQuery Processing Techniques for Solid State Drives [Ailamaki et al., 2001] NSM (N-ary Storage Model)PAX (Partition Attribute Accross)R (a INT, b VARCHAR(8), c INT, d INT)pagepage1743JKLMNO85,0009143132XYZ90,0009370221ABC100,000962011NSMScanSELECT a FROM R

NSMBuffer poolR (a INT, b VARCHAR(8), c INT, d INT)RQuery Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15175

PAXPAXScan FlashScanSELECT a FROM RBuffer poolR (a INT, b VARCHAR(8), c INT, d INT)123

123123123RaminipageSSDQuery Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15176

456EFGHIJPQRST80,00065,00090,000823757502852739789UVWXYZABCDEF80,00060,000130,00013857918278243780,00065,00090,000100,00090,00085,000PAXFlashScan SELECT a FROM R WHERE c > 90,000R (a INT, b VARCHAR(8), c INT, d INT)

a minipage a minipage 123180,00060,000130,0007899Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15177Flash-friendly FlashScan FlashJoin

Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15178Hash join on NSMMemory > JS + PSRSJRJSPRPSURUSQuery: SELECT JR, PR, JS, PSFROM R, SWHERE JR = JSmemoryURUSProjectionQuery Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15179Hash join on PAXMemory > JS + PSRSJRJSPRPSURUSQuery: SELECT JR, PR, JS, PSFROM R, SWHERE JR = JSmemory Memory < JS + PS PRPSQuery Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15180FlashJoin AlgorithmTID hash join [CIKM 94] semi join reducers [ICDE 01] Jive-Join [VLDB 99]SSDequi-join 2Join kernelJoin IndexFetch kernel

Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15181FlashJoin Join kernelMemory > JSRSJRJSPRPSURUSQuery: SELECT JR, PR, JS, PSFROM R, SWHERE JR = JSmemoryHash, Nested loop, Sort mergeJRJSJoin Index Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15Join kernel 182FlashJoin Fetch kernelQuery: SELECT JR, PR, JS, PSFROM R, SWHERE JR = JSJoin Index1A2 D3B4C5C6E7FRJRPRUR1A2 B3CSJSPSUSJRIDRJSIDSA1A1B3B2C4C3C5C3IDR IDSMemorySSDIOminipageminipageQuery Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15183FlashJoin Example of 3-way joinSELECT R1.B, R1.C, R2.E, R2.H, R3.FFROM R1, R2, R3WHERE R1.A = R2.D AND R2.G = R3.K;ScanScanJoin 1A=DScanJoin 2G=KR1 (A, B, C)R2 (D, E, G, H)R3 (F, K, L)ScanScanJoin 1 A=D

ScanJoin 2 G=K

R1 (A, B, C)R2 (D, E, G, H)R3 (F, K, L)Fetch kernelJoin kernelG=KFetch kernelJoin kernelA=DA, B, CD, E, G, HB, C, E, G, HF, KB, C, E, H, FADGKB, C, E, H, FGB, CFE, HQuery Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15184Flash-friendly FlashScan FlashJoin

Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15185PostgreSQLScanNSMScanNSMScanFlashScanPAXScanJoinHNSMNSMScanHash join HPAXFlashScanHash joinFlashJoinMTron 32GB SSDPage size 64KBQuery Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15186 ScanRelation size10GBA1, ..., A11128 bytes

33-4Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15187 JoinRelation sizeR=10GB, S=1GB

Join cardinality 3

Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15188 Star N-wayRelationR0R1R2R3R4R5Size (GB)7.53210.50.3Memory: 100MBProjectivity : 25%Join cardinality: 10%

Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15189PAXFlash-Friendly ScanProjection3-4FlashJoinProjectionJoin2SSDNSMColumn-storeFlashScan

Query Processing Techniques for Solid State DrivesSIGMOD-J 2009/9/15190Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model Celebration: Panel SessionPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15191Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote1923-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin,Yang Xiang,Ning Ruan,David Fuhry (Kent State University)3-HOP: Indexing for Reachability Query ?Query(1,11) Yes?Query(3,9) No123467859131011121415: G u v u v DAG (directed acyclic graph) 3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15194194Coalesce/collapsingXML () ()

3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15195DFS/BFSO(n+m)O(n+m) / 0O(n+m)Transitive ClosureO(1)O(nm) or O(n3)O(n2)Optimal Chain Cover(Jagadish, TODS90)O(k) / O(log k)O(nm) / O(n3)O(nk)Optimal Tree Cover(Agrawal et al., SIGMOD89)O(n) / O(log k)O(nm)O(n2)2-HOP (Cohen et al., SODA'02)O(nm1/2) / O(m1/2)(conjecture)O(n3|TC|)O(m1/2) / O(nm1/2)(conjecture)Dual-Labeling(Wang et al., ICDE06)O(1)O(n+m+t3)O(n+t2)Labeling+SSPI(Chen et al., VLDB05)O(m-n)O(n+m)O(n+m)GRIPP(Tril et al., SIGMOD07)O(m-n)O(n+m)O(n+m)Path-Tree(Jin et al., SIGMOD08)log2kO(mk) or O(mn)O(nk)3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15n=|V|m=|E||Tc|=k=t=k'= /

196196quadruple

Spanning2-hoptransitive closure3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15197-Hop123467859101112Lout:{5}Lout:{6}Lout:{7}Lout:{8}Lin:{7}Lin:{8}Lin:{7}Lin:{8}3-Hop123467859101112Lout:{5,6,7}Lout:{6,7}Lout:{7}Lout:{8}Lout:{7}Lout:{7}Lin:{7}Lin:{7,8}Lin:{7}Lin:{7,8}Lin:{7}2-HopLout:{7}Lout:{6,7}Lout:{7}Lin:{7}Lin:{7}3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15 5678 LoutLin1983-Hop67891011161217181314192015C2C3C412345C16789C21213141518192043121311121018161714C23-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/151993-Hop2-Hop: 3-Hop: ()3-Hop Contour: GvLout: vLin: v3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15200DAG{C1, C2, , Ck}3-Hop 3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/1520167891011161217181314192015C2C3C412345C1101112131415C312345C1(110)(312) (515)1011121314151111111111111111C312345C13-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15Contour Points202Contour Point1234567891011121314151617181920123456789101112131415161718192011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111xyC1C2C3C4C1C2C3C4111111111111111111-cell1Contour Point0-cellContour PointContour PointContour PointSIGMOD-J 2009/9/1567891011161217181314192015C2C3C412345C1203Contour PointO(mn) n: |V|m: |E|Contour PointO(mklogn)k: |Ci|3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15204o:{11}i:{2}i:{2,7}Contour Point3-HopC1C3i:{2}o:{11,15}o:{15}Label size: 12o:{6}o:{6,11}i:{2,7}i:{2}123456789101112131415161111111111112111111111131141511111111611111117111111819111111101111111111112111131111141111511161111111FromTo1345678910111213141516C2C4i:{2,7,12}o:{6,11,15}23-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/152053-Hop12345678910111213141516C1C2C3C4o:{6}i:{7}i:{7,12}Label size: 4123456789101112131415161111111111112111111111131141511111111611111117111111819111111101111111111112111131111141111511161111111FromTo3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/152063-hop 12345678910111213141516C1C2C3C4Contour Points: (2,6), (2,11), (2,15), (7,11), (7,15), (12,15)2o:{6}i:{7}i:{7}i:{7, 12}3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/152Contour Points2567812C111121516C22073-Hop22

3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/1520823222333333-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/1522093-Hop ContourStep1: Contour PointStep2: 2Step3: Contour point2222

3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/152103-Hop3-Hop ContourO(logn)3-Hop2-Hop2-Hop O(n3|Tc|)3-Hop O(kn2|contour|) 3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/152113-Hop Contour 67891011161217181314192015C2C3C412345C1 o:{10} o:{6} o:{9} o:{15} i:{18} i:{7} o:{18} o:{19} i:{11} i:{7,13} o:{8,14} i:{9} 2 20 ?

2 out: {6, 9, 15}

20 in: {11, 7, 13, 9}

69C2, Yes.

O(n)

3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15C269C31311C2972123-Hop Segment 67891011161217181314192015C2C3C412345C1 o:{10} o:{6} o:{9} o:{15} i:{18} i:{7} o:{18} o:{19} i:{11} i:{7,13} o:{8,14} i:{9}C1 C2[1,3] o:{6}[4,4] o:{9}

C2C4 [18,18] i:{7}[19,20] i:{9}

O(nk)

2 20 ?2[1,3] 620[19,20]9

69C2, Yes. O(lognk+k)=O(logn+k)3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15213C++125DAG2000=2, 4, 6, 8, 10, 1210000=2, 5, 10, 15, 20, 25AMD Opteron GHz CPUGB RAMLinux 2.6

3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15214, 2k

3-HOP: Indexing for Reachability Query3HOP-ContourPath-Tree2.7HOP1.53HOP-SegmentPath-Tree2.02HOP1.1SIGMOD-J 2009/9/15215, 2k

3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/153HOP-Contour Path-Tree2HOP3HOP-Segment3HOP-Contour()216, 10k

3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/152HOP3HOP-ContourPath-Tree6.03.93HOP-Segment5.33.110Path-Tree3HOP2173-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15

22HOP(ms)()3HOP-Contour 1.73HOP-Segment1.23HOP-ContourPath-Tree5.24.1218SIGMOD-J 2009/9/152HOP7213HOP171

2HOPPath-Tree2193-Hop JournalPath-Tree [Jin et al., SIGMOD08]

3-HOP: Indexing for Reachability QuerySIGMOD-J 2009/9/15220Generating Example Data for Dataflow ProgramsChristopher Olston (Yahoo! Research), Shubham Chopra (Yahoo! Research), Utkarsh Srivastava (Yahoo! Research)An Architecture for Recycling Intermediates in a Column-storeMilena G. Ivanova, Martin L. Kersten, Niels J. Nes, Romulo A. P. Gonalves (Centrum Wiskunde en Informatica, Netherlands)A Comparison of Approaches to Large-Scale Data AnalysisAndrew Pavlo (Brown U.), Erik Paulson (Wisconsin U.), Alexander Rasin (Brown U.), Daniel J. Abadi (Yale U.), David J. DeWitt (Microsoft), Samuel Madden (MIT), Michael Stonebraker (MIT)Query Processing Techniques for Solid State DrivesDimitris Tsirogiannis (Toronto U.), Stavros Harizopoulos (HP Labs), Mehul A. Shah (HP Labs), Janet L. Wiener (HP Labs), Goetz Graefe (HP Labs)Cost Based Plan Selection for XPathHaris Georgiadis, Minas Charalambides, Vasilis Vassalos (Athens University of Economics and Business, Greece)3-HOP: A High-Compression Indexing Scheme for Reachability QueryRuoming Jin, Yang Xiang, Ning Ruan, David Fuhry (Kent State U.)40 Years of Relational Model CelebrationPhilip A. Bernstein (Microsoft), Ron Fagin (IBM), Michael Stonebraker (MIT), Patricia Selinger (IBM), David J. DeWitt (Microsoft)A Web of ConceptsNilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, Srujana Merugu Enterprise Applications - OLTP and OLAP - Share One Database ArchitectureHasso Plattner (Hasso-Plattner-Institute for IT Systems Engineering)Transforming Data Access Through Public VisualizationFernanda B. Viegas (IBM), Martin Wattenberg (IBM)SIGMOD/PODS2009Codd Innovations Award

SIGMOD-J 2009/9/15PanelBestPaperPODSKeynote221222SIGMOD-J 2009/9/15

BusinessMeeting330Sheet12003259020042571-1920052317-25420062240-7720072094-1462008230721320092819512

Graph12590257123172240209423072819

Sheet12003259020042571-1920052317-25420062240-7720072094-1462008230721320092819512


Recommended