Upload
kerry-fitzgerald
View
229
Download
4
Embed Size (px)
Citation preview
성신여자대학교 홍기형
XML 과 Database
홍기형성신여자대학교
성신여자대학교 홍기형
차례• Database, Web, and XML
• XML Database Systems
• Data Models
• Query Language and Processing
• Storage and Index
• Other issues
성신여자대학교 홍기형
Database and Web, before XML• DB : a back-end server
for Web Applications– CGI– JDBC– Embedded SQL
• Web – Information Retrieval– Target to manage (Web
DB)
HTML
Application Server
Scripts
HTMLTemplatesScripts
ApplicationCode
MappingCode
Template Engine
Web Server
ThinClient
MiddleTier
BackEnd
성신여자대학교 홍기형
XML• eXtensible Markup Language
• A new emerging standard for data representation and exchange on the internet– See the XML catalog , http://www.xml.org
• Separating content from presentation– Easy to provide multiple view of the same data
• Easily parsed and self-describing
성신여자대학교 홍기형
XML
• Extensible — a dynamic data model
• Simple — human-readable, easy to use
• Flexible — for handling complex data
• Portable — for cross-platform data exchange
• Standard — easy to integrate, widely adopted
성신여자대학교 홍기형
HTML 과 XML 문서 비교
<HTML> <BODY><TABLE> <TR> <TD> 자바프로그램 </TD> <TD> 자바, 웹 </TD> <TD> 인터넷이 대중화하면서, 웹상에서 다양하고 고도화된 사용자환경을 제공할 수 있는 언어가 요구되고 있다. 자바는 플랫폼 독립적인 프로그램 언어이다. </TD> </TR> </TABLE></BODY> </HTML>
<?XML version =“1.0”?><단행본> <도서명> 자바프로그램 </도서명> <주제어> 자바, 웹 </주제어> <요약>인터넷이 대중화하면서, 웹상에서 다양하고고도화된 사용자환경을 제공할 수 있는 언어가 요구되고 있다.자바는 플랫폼 독립적인 프로그램 언어이다. </요약></단행본>
<?XML version =“1.0”?><단행본> <도서명> 인터넷 발전사</도서명> <주제어> 인터넷, 컴퓨터 </주제어> <요약> 인터넷은 전세계를 하나로 묶는 중요한 통신인프라의 하나로 상호교환적이고 트랜잭션중심의 네트워킹 하부구조이다. </요약></단행본>
XML 문서HTML 문서
<HTML> <BODY><TABLE> <TR> <TD> 인터넷 발전사 </TD> <TD> 인터넷, 컴퓨터 </TD><TD> 인터넷은 전세계를 하나로 묶는 중요한 통신인프라의 하나로 상호교환적이고 트랜잭션중심의 네트워킹 하부구조이다.</TD> </TR> </TABLE> </BODY> </HTML>
성신여자대학교 홍기형
XML Is All About DataHTML example:<heading1> Invoice </heading1><bold>To: Joe Bloggs <P>From: J. Abrams <P>Date: 2/1/1999<P>Amount: $100 <P>Tax: 21% <P>Total $121 </bold>
Datamixed with
presentation
성신여자대학교 홍기형
XML Is All About Data
XML example:<Invoice>
<Customer> Joe Bloggs </Customer>
<From> J. Abrams </From>
<Date year=‘1999’ month=‘2’ day = ‘1’ />
<Amount unit = ‘Dollars’> 100 </Amount>
<TaxRate> 21 </TaxRate>
<Total currency = “Dollars”>121 </Total>
</Invoice>
HumanReadable
Comeswith Tags
성신여자대학교 홍기형
XML Is All About Data
XML example:<Invoice> <Customer> <Name>Joe Bloggs </Name> <Address> 25 Mall Road </Address>
</Customer> <From> J. Abrams </From> <Date year=‘1999’ month=‘2’ day = ‘1’ /> <Amount unit = ‘Dollars’> 100 </Amount> <TaxRate> 21 </TaxRate> <Total unit = “Dollars”>121 </Total>
</Invoice>
<Name>Joe Bloggs </Name> <Address> 25 Mall Road </Address>
Extensible
성신여자대학교 홍기형
XML Family of Standards• XML• DOM (Document Object Model)• XML Namespaces• XSL (style language)• XQL (XSL query language)• XML Data / DCD / Schema• XUL (updates, future)• …many more
성신여자대학교 홍기형
Building Web Applications with XML
HTML
XML
Scripts
ApplicationCode DOMXSL
Web Server / App Server
ThinClient
MiddleTier
BackEnd
Standard API andTemplate Language
• Quickly react to changes• Lower maintenance
costs• Does not depend on a
single vendor
XML Server
성신여자대학교 홍기형
Legacy DBs for XML Applications
• XML as a new data-exchange format– for legacy DB applications
• DB2XML– Transforming the results of database queries or
complete databases into XML documents or into HTML documents using XSLT stylesheets.
– DB2XML can be used: • as a standalone tool (with GUI or command line),
• as a servlet to dynamically generate XML-documents
• using the DB2XML API
성신여자대학교 홍기형
XML Database Systems• 3 approaches
– Build special-purpose systems• Lore, Strudel
• Best performance for XML data
– Use object-oriented database systems• eXelon, Monet, Ozone
• Object-oriented modeling
– Use relational database systems• Oracle, Microsoft
• Matured large market
성신여자대학교 홍기형
Lore
TextualInterface
Query PlanGenerator
Preprocessing(Lorel2OQL)Parsing
ApplicationsHTML GUI
UtilitiesQuery
OperatorsObject
Manager
QueryOptimizer
External DataManager
API
PhysicalStorage
External,Read-onlyDataSources
queries
Non-query Requests
Query Processor
Data Engine
성신여자대학교 홍기형
eXelon
성신여자대학교 홍기형
Oracle 8i
성신여자대학교 홍기형
Data Models for XML• XML is not a data model
• Structure of an XML document– an ordered list of elements– each element
• may have a set of attributes
• may have (sub)elements (nested elements)
– Structured data and full text mixed together
• DOM defines how to translate an XML document into a data structure for processing
• Need a true data model for XML data
성신여자대학교 홍기형
OEM: a Semi-structured Data Model• Object Exchange Model (Lore)
• Semi-structured Data– Self-describing structure, the lack of schema– the structure changes rapidly and unpredictably
• Labeled direct graph– Node : Object (OID) or atomic value (leaf)– Labeled Edge : object-subobject relationship
성신여자대학교 홍기형
OEM, an example<DBGroup>
<Member Name=“ 유” Advisor=“m1”>
<Age>28</Age>
</Member>
<Member ID=“m1”, Project=“p1”>
<Name> 박 </Name>
<Advisor> 홍 </Advisor>
</Member>
<Project ID=“p1” Member=“m1”>
<Title>XML DB</Title>
</Project>
</DBGroup>
&1
&9
&5
&4&3&2
&10
&7&6
&12
&8
&11
“28”
TextTextTextText
Age
DBGroup
“ 박”
{Name=“Smith”,Advisor=“m1”}
TitleAdvisorName
ProjectMember
{ID=“p1,Project=“m1”}
{ID=“m1,Project=“p1”}
Member
“ 홍”
성신여자대학교 홍기형
Issues in Data Modeling• How to simultaneously view XML information
in both– a set of documents– a single large database
• No loss of information in XML– How to represent the Ordering of elements– external/internal entities, processing instructions
성신여자대학교 홍기형
• XML DB Design– When should attributes (subelements) be used?– Is a 1-to-1 relationship best represented using
element nesting or IDREFs?– How to translate the conceptual model (OEM?)
into an XML encoding?
• Need to identify the relationship between DTDs and traditional DB schema
성신여자대학교 홍기형
Query Languages for XML DB• Requirements
– Path Expressions– Queries over
• the structured and semistructured data
• full text
• the mixture of data elements and full text
• W3C, Query Languages for the Web, 1998
• QL for semistructured data– Lorel, UnQL
• XQL, XML-QL
성신여자대학교 홍기형
XML-QL• SyntaxSelect <variable-list> where <XML-pattern>+
• Exampleselect $n, $h
where <person> <age=$a>
<name> $n </name>
<address> 서울 성북구 동선동 3 가 </address>
[<hobby> $h </hobby>]
</person>, $a > 18
성신여자대학교 홍기형
Issues in Query Processing• The true requirements for XML QL is not
known
• Need to review all facets of traditional query processing
• Need to Develop a new IR model– proximity in XML documents– similarity measure between XML elements
성신여자대학교 홍기형
• How to integrate – traditional (DB) query processing model and– information retrieval model
• Optimization Schemes– for not well-structured XML data– for queries mixed with full text retrieval and
structured/semistructured search
성신여자대학교 홍기형
Storage Structure and Indexing• Clustering schemes for storing XML data• New index types
– for quickly finding certain elements, attributes, and more complex structural patterns
– element orderings
• Determine the level of parsing for storing XML documents
• Based on the analysis of encoding pattern– merging identical text strings (sub-patterns) by using
appropriate IDREFs
– compression based on regular patterns
성신여자대학교 홍기형
Issues in Various DB Features• Full view support for XML
– both virtual and materialized views– incremental maintenance– XSL as a view definition language
• Data integrity issue– What are constraints on XML data?
• key, referential, domain
– How to represent the constraints– How to check them when changes occur
성신여자대학교 홍기형
• Trigger– active database capabilities in XML
• Transaction Control over XML database
• Performance Evaluation– need to make an appropriate benchmark for XML
data• XML data set
• query types
• mix of queries and updates
성신여자대학교 홍기형
References• Research issues
– Data Management for XML: Research Directions, http://www-db.stanford.edu/~wisom/xml-whitepaper.html
– More on Data Management for XML, http://www.cs.washington.edu/homes/alon/widom-response.html
• Storing XML data into RDBMSs– A Performance Evaluation of Alternative Mapping Schemes
for Storing XML Data in a Relational Database, ercim.inria. publications/RR-3680
• XML Database Systems– http://www.xmlsoftware.com/database/