29
성성성성성성성 성성성 XML 성 Database 성성성 성성성성성성성

XML 과 Database

Embed Size (px)

DESCRIPTION

XML 과 Database. 홍기형 성신여자대학교. 차례. Database, Web, and XML XML Database Systems Data Models Query Language and Processing Storage and Index Other issues. Database and Web, before XML. DB : a back-end server for Web Applications CGI JDBC Embedded SQL Web Information Retrieval - PowerPoint PPT Presentation

Citation preview

성신여자대학교 홍기형

XML 과 Database

홍기형성신여자대학교

성신여자대학교 홍기형

차례• Database, Web, and XML

• XML Database Systems

• Data Models

• Query Language and Processing

• Storage and Index

• Other issues

성신여자대학교 홍기형

Database and Web, before XML• DB : a back-end server

for Web Applications– CGI– JDBC– Embedded SQL

• Web – Information Retrieval– Target to manage (Web

DB)

HTML

Application Server

Scripts

HTMLTemplatesScripts

ApplicationCode

MappingCode

Template Engine

Web Server

ThinClient

MiddleTier

BackEnd

성신여자대학교 홍기형

XML• eXtensible Markup Language

• A new emerging standard for data representation and exchange on the internet– See the XML catalog , http://www.xml.org

• Separating content from presentation– Easy to provide multiple view of the same data

• Easily parsed and self-describing

성신여자대학교 홍기형

XML

• Extensible — a dynamic data model

• Simple — human-readable, easy to use

• Flexible — for handling complex data

• Portable — for cross-platform data exchange

• Standard — easy to integrate, widely adopted

성신여자대학교 홍기형

HTML 과 XML 문서 비교

<HTML> <BODY><TABLE> <TR> <TD> 자바프로그램 </TD> <TD> 자바, 웹 </TD> <TD> 인터넷이 대중화하면서, 웹상에서 다양하고 고도화된 사용자환경을 제공할 수 있는 언어가 요구되고 있다. 자바는 플랫폼 독립적인 프로그램 언어이다. </TD> </TR> </TABLE></BODY> </HTML>

<?XML version =“1.0”?><단행본> <도서명> 자바프로그램 </도서명> <주제어> 자바, 웹 </주제어> <요약>인터넷이 대중화하면서, 웹상에서 다양하고고도화된 사용자환경을 제공할 수 있는 언어가 요구되고 있다.자바는 플랫폼 독립적인 프로그램 언어이다. </요약></단행본>

<?XML version =“1.0”?><단행본> <도서명> 인터넷 발전사</도서명> <주제어> 인터넷, 컴퓨터 </주제어> <요약> 인터넷은 전세계를 하나로 묶는 중요한 통신인프라의 하나로 상호교환적이고 트랜잭션중심의 네트워킹 하부구조이다. </요약></단행본>

XML 문서HTML 문서

<HTML> <BODY><TABLE> <TR> <TD> 인터넷 발전사 </TD> <TD> 인터넷, 컴퓨터 </TD><TD> 인터넷은 전세계를 하나로 묶는 중요한 통신인프라의 하나로 상호교환적이고 트랜잭션중심의 네트워킹 하부구조이다.</TD> </TR> </TABLE> </BODY> </HTML>

성신여자대학교 홍기형

XML Is All About DataHTML example:<heading1> Invoice </heading1><bold>To: Joe Bloggs <P>From: J. Abrams <P>Date: 2/1/1999<P>Amount: $100 <P>Tax: 21% <P>Total $121 </bold>

Datamixed with

presentation

성신여자대학교 홍기형

XML Is All About Data

XML example:<Invoice>

<Customer> Joe Bloggs </Customer>

<From> J. Abrams </From>

<Date year=‘1999’ month=‘2’ day = ‘1’ />

<Amount unit = ‘Dollars’> 100 </Amount>

<TaxRate> 21 </TaxRate>

<Total currency = “Dollars”>121 </Total>

</Invoice>

HumanReadable

Comeswith Tags

성신여자대학교 홍기형

XML Is All About Data

XML example:<Invoice> <Customer> <Name>Joe Bloggs </Name> <Address> 25 Mall Road </Address>

</Customer> <From> J. Abrams </From> <Date year=‘1999’ month=‘2’ day = ‘1’ /> <Amount unit = ‘Dollars’> 100 </Amount> <TaxRate> 21 </TaxRate> <Total unit = “Dollars”>121 </Total>

</Invoice>

<Name>Joe Bloggs </Name> <Address> 25 Mall Road </Address>

Extensible

성신여자대학교 홍기형

XML Family of Standards• XML• DOM (Document Object Model)• XML Namespaces• XSL (style language)• XQL (XSL query language)• XML Data / DCD / Schema• XUL (updates, future)• …many more

성신여자대학교 홍기형

Building Web Applications with XML

HTML

XML

Scripts

ApplicationCode DOMXSL

Web Server / App Server

ThinClient

MiddleTier

BackEnd

Standard API andTemplate Language

• Quickly react to changes• Lower maintenance

costs• Does not depend on a

single vendor

XML Server

성신여자대학교 홍기형

Legacy DBs for XML Applications

• XML as a new data-exchange format– for legacy DB applications

• DB2XML– Transforming the results of database queries or

complete databases into XML documents or into HTML documents using XSLT stylesheets.

– DB2XML can be used: • as a standalone tool (with GUI or command line),

• as a servlet to dynamically generate XML-documents

• using the DB2XML API

성신여자대학교 홍기형

XML Database Systems• 3 approaches

– Build special-purpose systems• Lore, Strudel

• Best performance for XML data

– Use object-oriented database systems• eXelon, Monet, Ozone

• Object-oriented modeling

– Use relational database systems• Oracle, Microsoft

• Matured large market

성신여자대학교 홍기형

Lore

TextualInterface

Query PlanGenerator

Preprocessing(Lorel2OQL)Parsing

ApplicationsHTML GUI

UtilitiesQuery

OperatorsObject

Manager

QueryOptimizer

External DataManager

API

PhysicalStorage

External,Read-onlyDataSources

queries

Non-query Requests

Query Processor

Data Engine

성신여자대학교 홍기형

eXelon

성신여자대학교 홍기형

Oracle 8i

성신여자대학교 홍기형

Data Models for XML• XML is not a data model

• Structure of an XML document– an ordered list of elements– each element

• may have a set of attributes

• may have (sub)elements (nested elements)

– Structured data and full text mixed together

• DOM defines how to translate an XML document into a data structure for processing

• Need a true data model for XML data

성신여자대학교 홍기형

OEM: a Semi-structured Data Model• Object Exchange Model (Lore)

• Semi-structured Data– Self-describing structure, the lack of schema– the structure changes rapidly and unpredictably

• Labeled direct graph– Node : Object (OID) or atomic value (leaf)– Labeled Edge : object-subobject relationship

성신여자대학교 홍기형

OEM, an example<DBGroup>

<Member Name=“ 유” Advisor=“m1”>

<Age>28</Age>

</Member>

<Member ID=“m1”, Project=“p1”>

<Name> 박 </Name>

<Advisor> 홍 </Advisor>

</Member>

<Project ID=“p1” Member=“m1”>

<Title>XML DB</Title>

</Project>

</DBGroup>

&1

&9

&5

&4&3&2

&10

&7&6

&12

&8

&11

“28”

TextTextTextText

Age

DBGroup

“ 박”

{Name=“Smith”,Advisor=“m1”}

TitleAdvisorName

ProjectMember

{ID=“p1,Project=“m1”}

{ID=“m1,Project=“p1”}

Member

“ 홍”

성신여자대학교 홍기형

Issues in Data Modeling• How to simultaneously view XML information

in both– a set of documents– a single large database

• No loss of information in XML– How to represent the Ordering of elements– external/internal entities, processing instructions

성신여자대학교 홍기형

• XML DB Design– When should attributes (subelements) be used?– Is a 1-to-1 relationship best represented using

element nesting or IDREFs?– How to translate the conceptual model (OEM?)

into an XML encoding?

• Need to identify the relationship between DTDs and traditional DB schema

성신여자대학교 홍기형

Query Languages for XML DB• Requirements

– Path Expressions– Queries over

• the structured and semistructured data

• full text

• the mixture of data elements and full text

• W3C, Query Languages for the Web, 1998

• QL for semistructured data– Lorel, UnQL

• XQL, XML-QL

성신여자대학교 홍기형

XML-QL• SyntaxSelect <variable-list> where <XML-pattern>+

• Exampleselect $n, $h

where <person> <age=$a>

<name> $n </name>

<address> 서울 성북구 동선동 3 가 </address>

[<hobby> $h </hobby>]

</person>, $a > 18

성신여자대학교 홍기형

Issues in Query Processing• The true requirements for XML QL is not

known

• Need to review all facets of traditional query processing

• Need to Develop a new IR model– proximity in XML documents– similarity measure between XML elements

성신여자대학교 홍기형

• How to integrate – traditional (DB) query processing model and– information retrieval model

• Optimization Schemes– for not well-structured XML data– for queries mixed with full text retrieval and

structured/semistructured search

성신여자대학교 홍기형

Storage Structure and Indexing• Clustering schemes for storing XML data• New index types

– for quickly finding certain elements, attributes, and more complex structural patterns

– element orderings

• Determine the level of parsing for storing XML documents

• Based on the analysis of encoding pattern– merging identical text strings (sub-patterns) by using

appropriate IDREFs

– compression based on regular patterns

성신여자대학교 홍기형

Issues in Various DB Features• Full view support for XML

– both virtual and materialized views– incremental maintenance– XSL as a view definition language

• Data integrity issue– What are constraints on XML data?

• key, referential, domain

– How to represent the constraints– How to check them when changes occur

성신여자대학교 홍기형

• Trigger– active database capabilities in XML

• Transaction Control over XML database

• Performance Evaluation– need to make an appropriate benchmark for XML

data• XML data set

• query types

• mix of queries and updates

성신여자대학교 홍기형

References• Research issues

– Data Management for XML: Research Directions, http://www-db.stanford.edu/~wisom/xml-whitepaper.html

– More on Data Management for XML, http://www.cs.washington.edu/homes/alon/widom-response.html

• Storing XML data into RDBMSs– A Performance Evaluation of Alternative Mapping Schemes

for Storing XML Data in a Relational Database, ercim.inria. publications/RR-3680

• XML Database Systems– http://www.xmlsoftware.com/database/