25
The Claremont Report on Database Research 2009-10-28 淡淡淡淡 淡淡淡

The Claremont Report on Database Research

  • Upload
    tao

  • View
    33

  • Download
    1

Embed Size (px)

DESCRIPTION

The Claremont Report on Database Research. 2009-10-28 淡江大學 周清江. Background. Senior database researchers have gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus. Laguna Beach, Calif. in 1989 - PowerPoint PPT Presentation

Citation preview

Page 1: The Claremont Report on Database Research

The Claremont Reporton Database Research

2009-10-28淡江大學 周清江

Page 2: The Claremont Report on Database Research

2

Background Senior database researchers have gathered every

few years to assess the state of database research and to recommend problems and problem areas deserve additional focus. Laguna Beach, Calif. in 1989 Palo Alto, Calif. (“Lagunita”) in 1990 and 1995 Cambridge, Mass. in 1996 Asilomar, Calif. in 1998 Lowell, Mass. in 2003 Claremont, Calif. in 2008

Page 3: The Claremont Report on Database Research

3

New Focus Areas

New database engine architectures Declarative programming languages Interplay of structured and unstructured data Cloud data services Mobile and virtual worlds

Page 4: The Claremont Report on Database Research

4

A Turning Point in Database Research

Unusually rich opportunities for Technical advances, intellectual achievement,

entrepreneurship, and impact on science and society

Sense of change as a function of several factors Breadth of excitement about Big data Data analysis as a profit center Ubiquity of structured and unstructured data Expanded development demand Architecture shift in computing

Page 5: The Claremont Report on Database Research

5

Research Portfolio Change

Impact and Breadth Evaluated by external measures

Helping new classes of users Powering new computing platforms Making conceptual breakthroughs across computing

Page 6: The Claremont Report on Database Research

6

Two Promising Approaches

Reformation Deconstucting core data-centric ideas and systems Reforming for new applications and architectural realit

ies

Synthesis Leverage good research ideas that have yet to develo

p identifiable, agreed-upon system architectures Data integration, information extraction, data privacy, etc.

Page 7: The Claremont Report on Database Research

7

Research Opportunities

Revisiting Database Engines Declarative Programming for Emerging Platforms The Interplay of Structured and Unstructured

Data Cloud Data Services Mobile Applications and Virtual Worlds

Page 8: The Claremont Report on Database Research

8

Research Opportunities

Main issues cut across the above topics Management of uncertain information data privacy and security e-science and other scholarly applications human centric interactions with data social networks and Web 2.0 personalization and contextualization of query- and

search-related tasks streaming and networked data self-tuning and adaptive systems, and the challenges raised by new hardware technologies and

energy constraints

Page 9: The Claremont Report on Database Research

9

Revisiting Database Engines

Data-intensive tasks for which relational DBs provide poor price/performance Ex: text indexing, serving web pages, media delivery

Room for significant innovation within traditional application domains Analytics for business and science

The cost of software and management relative to hardware is exorbitant

OLTP Need to address data lifecycle issues

Data provenance, schema evolution, and versioning

Good time to try radical ideas

Page 10: The Claremont Report on Database Research

10

Revisiting Database Engines

Two directions of research projects Revolutionary steps in DB system architecture

Broadening the range of applicability Radically improving performance by designing special

purpose DB systems for specific domains

These efforts may be synergistic

Page 11: The Claremont Report on Database Research

11

Revisiting Database Engines

Important research topics in the core DB engine Designing systems for clusters of many-core processors Exploiting remote RAM and Flash as persistent media Treating query optimization and physical data layout as a uni

fied, adaptive, self-tuning task to be carried out continuously Compressing and encrypting data at the storage layer, integr

ated with data layout and query optimization Designing systems for non-relational data models Trading off consistency and availability for better performanc

e and scaleout to thousands of machines Designing power-aware DBMS that limit energy costs withou

t sacrificing scalability

Page 12: The Claremont Report on Database Research

12

Declarative Programming for Emerging Platforms

The urgency of programmer productivity is increasing exponentially as programmers target even more complex environments

No-expert programmers need to be write robust code that scales out across processors in both loosely- and tightly-coupled architectures

Page 13: The Claremont Report on Database Research

13

Declarative Programming for Emerging Platforms

Example: Map-Reduce New declarative languages, based on Datalog, have been d

eveloped for a variety of domain-specific systems Network and distributed systems, computer games, machine le

arning and robotics, compilers, security protocols, and information extraction

Enterprise application programming Ruby on Rails (

http://www.ithome.com.tw/itadm/article.php?c=46863, http://en.wikipedia.org/wiki/Ruby_on_Rails ) LINQ (Language-Integrated Query,

http://www.ithome.com.tw/itadm/article.php?c=44337, http://en.wikipedia.org/wiki/Language_Integrated_Query )

Page 14: The Claremont Report on Database Research

14

Declarative Programming for Emerging Platforms

Research questions Language design

Fairly expressive Attractive syntax, typing and modularity, development tool

s, smooth interactions with the rest of the computing ecosystem

Efficient compilers and runtimes Techniques to optimize code automatically

Across both the horizontal distribution of parallel processors and the vertical distribution of tiers

Should extend techniques behind parallel and distributed DBs

Page 15: The Claremont Report on Database Research

15

The Interplay of Structured and Unstructured Data

Within enterprises, heterogeneous collections of structured data linked with unstructured data

On Web, structured data from Millions of DBs hidden behind forms (deep web) High quality data items in HTML tables on web pa

ges, and mashups providing dynamic views on structured data

Data contributed by Web 2.0 services Photo and video sites Collaborative annotation services On-line structured data repositories

Page 16: The Claremont Report on Database Research

16

The Interplay of Structured and Unstructured Data

Challenges of managing dataspaces Managing a rich collection of structured, semi-stru

ctured, and unstructured data On the web, previous contributions

Techniques for domain-specific search engines Domain-independent tech for crawling through for

ms, and surfacing the resulting HTML pages in a search-engine index

Within enterprises, enterprise search and discovery of relationships between structured and unstructured data

Page 17: The Claremont Report on Database Research

17

The Interplay of Structured and Unstructured Data

Challenge 1 Extract structure and meaning from unstructured

and semi-structured data Applying and managing predictions from large numbers

of independently developed extractors Need algorithms to introspect about the correctness of

extractions Better technology to manage data in context

Discover data sources Discover implicit relationships Determine the weight of an object’s context when

assigning it semantics Maintain data provenance

Page 18: The Claremont Report on Database Research

18

The Interplay of Structured and Unstructured Data

Challenge 2 Develop methods for effectively querying and deriving

insight from the resulting sea of heterogeneous data Analyze keyword query to extract its intended semantics Route the query to relevant sources

Do not assume we have semantic mappings for the data sources Cannot assume that the domain of the query or data sources is

known The system should provide best-effort service and improve over

time Develop index structures to support querying hybrid

data Need new notions of correctness and consistency to provide

metrics and to make cost/quality tradeoffs

Page 19: The Claremont Report on Database Research

19

The Interplay of Structured and Unstructured Data

Challenge 2 Innovation about creating data collections

Web 2.0 Users join ad-hoc communities to create, collaborate, curate,

and discuss data online They rarely agree on schemata ahead of time Schemata need to be inferred from the data and will be highly

dynamic Schemata will be used to guide users to consensus

Need to incorporate visualizations effectively They need to be easy to use

Page 20: The Claremont Report on Database Research

20

Cloud Data Services

Infrastructure change Service-oriented cloud computing

Application services (salesforce.com) Storage services (Amazon S3) Compute services (Google App Engine, Amazon EC2) Data services (Amazon SimpleDB, MS SQL Server Data Services,

Google Datastore) Trade-off between functionality and operational costs Manageability is particularly important

Limited human intervention High-variance workloads: elastic provisioning A variety of shared infrastructures: service tuning depends on ho

w the shared infrastructure is virtualized Urgency of self-managing DB technologies

Page 21: The Claremont Report on Database Research

21

Cloud Data Services

Challenges from scale of cloud computing SQL databases cannot scale to thousands of nodes

Different transactional implementation techniques? Different storage semantics?

More work is needed to synthesize ideas from the literature in cloud computing

Limitations on either the plan space or the search will be required

How programmers will express their programs in the cloud

Page 22: The Claremont Report on Database Research

22

Cloud Data Services

Challenges from scale of cloud computing Data security and privacy

Key to success: target usage scenarios in the cloud New scenarios will emerge with their own challenges

Specialized services pre-loaded with large data-sets “Mash up” data from public and private domains Services reaching out across clouds

Prevalent in scientific data “grids” Federated cloud architectures will enhance the challenges

Page 23: The Claremont Report on Database Research

23

Mobile Applications and Virtual Worlds

This new class of applications need to manage diverse user-created data, synthesize it intelligently, and provide real-time services

Trends in the mobile space Platforms to build mobile applications are mature The emergence of mobile search and social networks suggest a

new set of mobile applications Virtual worlds, like Second Life, increasingly blur the

distinctions with the real world Suggest a more data-rich mixture (co-space)

Applications include rich social networking, massive multi-player games, military training, edutainment and knowledge sharing

Page 24: The Claremont Report on Database Research

24

Mobile Applications and Virtual Worlds

New challenges The need to process heterogeneous data streams to

materialize real-world events The need to balance privacy against the collective

benefit of sharing personal real-time information The need for more intelligent processing to send

interesting events in the co-space to someone in the physical world

Page 25: The Claremont Report on Database Research

25

Moving Forward Survey articles and tutorials are becoming an

increasingly important contribution Risky or speculative papers not championed effectively A need for approachable books on scalable data

management algorithms and techniques Time is ripe for projects to stimulate collaboration and

cross-fertilization of ideas, like information integration Two areas are identified for competitions

System components for cloud computing Large-scale information extraction