iiWAS2004484-491

8/8/2019 iiWAS2004484-491

1/8

STORING AND QUERYING XML DATA

USING RDBMS

Yesi Novaria KunangComputer Science Faculty

Bina Darma University, Palembang Indonesia

[email protected]

Ahmad AshariFaculty of Mathematics and Natural Sciences

Gadjah Mada Universty, Yogyakarta Indonesia

[email protected]

Abstract

XML (eXtensible Markup Language) is rapidly becoming a popular data format and emerging

standard for data exchange over the Internet. With a large amount of data represented as XML

documents, it becomes necessary to store and query these XML documents. One of these is using an

RDBMS or for media storage and using SQL to query an XML document.

There are two approaches to parsing an XML document into RDBMS using a middleware, i.e. SAX

parsing and DOM parsing methods. This research studied those methods, and then compared the

performance the two methods. It also studied performance of some alternatives way to structuring

and tagging data from one or more tables on RDBMS as a hierarchical XML document. As a final

result, we will get the best performance for storing and querying XML data using an RDBMS from

these alternatives

1. Introduction

XML (eXtensible Markup Language) has become a popular data format and exchanging data over

the Internet. The flexibility of XML structure is suitable for exchanging data and modeling

applications. However, when a large number of data is to be present as an XML document, it causes

the query request and saving process at XML document very important needed. One of the

approaches is by using the XML native database system. This approach has two weaknesses: first,

XML native database system is not adequate to save data and cannot accommodate the complicated

query at relational database system; second, it is impossible for the users to ask directly for XML

documents and other data that are stored in a relational database system.

Querying and storing XML data techniques using relational database system are implemented to

overcome those weaknesses, which have been presented above. The steps for this approach are as

follows: first, make the relational table design to save data or an XML document; second, divide the

XML data by separating them into columns in the presented table; and third is processing an SQL

query to get the XML document format needed from RDBMS data.

457

8/8/2019 iiWAS2004484-491

2/8

2. Literature Review

Bourret [3] says that XML and related technology can be said to be a simple database because XML

documents can be used in environments with small amounts of data, few users, and simple work

performance. XML also provides many things found in databases: storage (XML documents),

schemas (DTD, XML schema languages), query languages, programming interfaces (SAX, DOM,JDOM), and so on. However, XML lacks many of the things found in real databases, efficient

storage, indexes, security, transaction and data integrity, multi user access, queries across multiple

documents, and so on. For this reasons, XML is not suitable for environments that have many users,

strict data integrity requirements, and the need for good performance.

Two mappings are commonly used to map an XML document schema to database schema: the

table-based mapping and the object-relational mapping [4]. These mapping data in XML document

other than in that document itself, so it will be suitable to map data-centric and not suitable for

document-centric [6].

Strategies for data transfer from XML to database depend on the software used and middleware

support. There are two ways to parsing XML document using middleware application (Java, Perl,

PHP, Python, C/C++, Eiel, Tcl, dll); they are using SAX (Simple API for XML) or DOM

(Document Object Model) for XML parser [1].

Shanmugasundaram, et.al, [5], discuss some alternatives to publishing relational data as an XML

document, that can be differentiated based on basic principle between relational table and XML

document, where document XML has been tagged and structured while relational table does not

have these two things. Therefore, in order to converse relational table into XML document, tagging

and structuring need to be added in processing. One approach is to do tagging as the final step of

query processing (late tagging), while another approach is to do it earlier in the process (early

tagging). Similarly, structuring can be done as the final step query processing (late structuring) or it

can be done earlier (early structuring).

Each alternative depends on how much work is done inside the relational engine. Inside Engine

means tagging and structuring are done completely inside the relational engine, whereas outside

engine means that part, though not necessarily all, of that work is done outside the relational engine.

This depends upon the ability of relational database engine in used. For early tagging with late

structuring is not visible alternative because adding tags to an XML document without having its

structure makes no sense.

2.1. Individual

Tagging and structuring are both done early in query processing. One the simplest technique for

structuring relational data as an XML document is using the early tagging and early structuringmethod which is called the Stored Procedure [5] orIndividual table technique [2].

This strategy transfers a hierarchy database orderly document. This process can be done by opening

a number of results set in the table for root element. For each row within the table, it contains row

element, and then open almost all child table orderly.

458

8/8/2019 iiWAS2004484-491

3/8

2.2. Universal I table

In this way, the tagging and structuring were done as the final step in arranging an XML document.

The forming of XML document is divided into two phases: (a) form content, where relational data

produced, and (b) tagging and structuring; relational data are arranged and given tag to produce

XML document.

In order to produce the content as that we want is by making a single result set (universal table),

consisting of all data in document. By correlating all tables, using join predicate is to relate parent to

child. This is known asRedundant Relation [5], or it also called Universal I[2].

To tagging and structuring result, there are two ways this should be done: (a) grouping all siblings

(XML documents) which have the same category (and eliminate duplicate for redundant), and (b)

extracting information from each tuple and tag to produce XML result.

2.3. Universal II table

The main problem with the Late Tagging Late Structuring technique is related to memorymanagement when forming tags. To overcome this problem, a relational engine can structure

relational content. This strategy can be done using a Universal table type II [2] or Sorted Outer

Union [5], which uses UNION statement.

3. Conducting the Research

This research specializes to see strategies to transfer data between XML document and relational

database, and vice versa. The steps are as follows:

3.1. Designing an XML Document

To show how treating the element data attribute and sub element in an XML document and

transferring to a database or vice versa, an example will be given in the form of an XML document

where inside there is web element, attribute and sub element. The example that looks in Figure 1

can be done for this research.

Figure 1: Inventory Document sample

Professional PHP Programming

paperback

909

$25

Tutorial

Wrox Press Ltd.

459

8/8/2019 iiWAS2004484-491

4/8

To examine this XML document, numbers of root node were added, from 1 until 100,000-root node

by the assisting of PHP script to form the data file. The size of data begins from the smallest unit,

583 bytes until 35,135 M.

3.2. Object Relational - XML Document Mapping

The previous example of an XML documents, can be mapped by Object Relational mapping

become objects that look in Figure 2.

Figure 2: Object based Mapping on Document Inventory

These objests can be mapped into MySQL database with two tables, the database schema or table

relation is shown in Figure 3.

Figure 3: Table Relation

The idbook attribute in the tabbook1 table is the primary key, and the idbook attribute in the

tabauthor1 is theforeign key.

3.3. Transferring Data from XML to Database

The strategy to transfer data from XML to database using PHP as middleware in this research can

be done by:

1. Parsing file using PHP SAX parser

2. Using PEARs XML Tree Class

Object Book {

- Idbook=001; - Title=Profesional PHPProgramming;

- Year =2000; - Binding =paperback;

- Pages = 909; - Price = $25;

- Genre = Tutorial; -Publisher = Wrox Press Ltd.

}

Object Author {

-First = Jesus

-middle=

-last =Castagnetto

}

Object Author {

-First = Harish

-middle=

-last =Rawat

}

Object Author {

-First = Sascha

-middle=

-last =Schumann

}

Etc.

460

8/8/2019 iiWAS2004484-491

5/8

3.4. Transferring Data from Database into XML

By using an alternative way in forming tagging and structure, the present of XML document format

from relational MySQL database supported by PHP script as middleware. For all alternatives done

by tagging and structuring process, which is outside engine, has meant that a part of the process was

done outside the relational engine.

The alternatives of data transfer from database into XML that used in this research are:

1. Early tagging, early structuring; stored procedure/individual table

2. Late Tagging, Late Structuring; Universal Table I/Redundant

3. Late Tagging, Early Structuring; Universal Table II/Sorted Union

3.5. Presenting and Searching XML Data

To compare the work performance between XML document and RDBMS, from speed side in

loading process in browser, it was done by the following task:

1. Data Searching at XML document by using DSO binding technique, using script.

2. Presenting XML data from RDBMS, by conducting a data search from XML document that

have been saved into MySQL database, using the redundant method. The result from query is

saved as XML document and by using DOM Tree method, the result from query is saved as

XML document using DOM Tree method, the result file is read and bound using XSL file.

4. Results and Discussion

4.1. Comparison between SAX and DOM

Comparison between SAX and DOM were conducted in this research by inserting the data fromXML document, which has some node variation to be inserted into database table. This result is

look in Figure 4.

0

500

1000

1500

2000

1 50 100 500 1000 5000

Number of nodes

Time(s)

DOM

SAX

Figure 4: Comparison between SAX and DOM

As we see from graphic, parsing XML document is faster if we use SAX method (Simple API for

XML) than DOM method (Document Object Model). There are two important things: first, SAX

code uses smaller memory because the buffer is only one row, while DOM code uses the buffer for

the whole document. The second, SAX code is faster because it is saving time to form DOM Tree.

461

8/8/2019 iiWAS2004484-491

6/8

The most important effect of using the memory is SAX method can be use for large document while

DOM uses a lot of memory. However, DOM is suitable to be used for application where DOM tree

is needed, for example if we want to present an XML document supported by XSLT. Another

reason is the hierarchy of XML document parsing by DOM technique is more complete for the tag

name, tag attribute, data content and other nested tags.

In order to parsing the larger XML document with a large number of node, from this research, it is

suggested to divide those document for having parsing faster by using SAX and DOM method.

4.2. The Comparison Result for transfer from Database into XML

To compare some strategy for data transfer from database to an XML document, each method

examine with the variation of numbers data to observe the speed of data transfer from database into

XML document. The comparison result is shown in Figure 5.

0

500

1000

1500

1 10 50 100 1000 5000

Number of records

Time(s) Individual

Universal I

Universal II

Figure 5: Data Transfer Strategies from Database into Document XML Graphic

From the graphic, it is significant that the individual table method becomes most inappropriate

method. This method shows the worst work performance by processing data slowly to be an XML

document, compared to Universal Universal II methods and I. The main cause of this is because

most of the resource database used in this method, one or more SQL query that should be given in

every tuple for tables should have nestedstructure. Therefore, to form a larger document, thousands

of queries that should be processed cause the inefficient or even deadlock.

The Universal table I/ redundant method shows a very good work performance compared to

universal type II. This happened because of the efficient process of query even we found

redundancy, compared to Universal II method, which should form result table in the structure form.

Also for tagging process, this become faster because rows from the result set table is fewer than

Universal table II. The use of memory of Universal table I is better than Universal table II, it is

indicated by the number of record data (50.000). Universal II method spends all, or even more of

the default memory provided by MySQL.

4.3. The Comparison of Searching Data for XML Document toward RDBMS

The comparison result of searching data using XML file document toward RDBMS can be found in

Figure 6.

462

8/8/2019 iiWAS2004484-491

7/8

0

10

20

30

40

1 10 50 100 500 1000 5000 10000 50000

Number of nodes/records

T

ime(s)

XML

RDBMS

Figure 6: Comparison of query using RDBMS and using XML Document

From the graphic of XML searching data, which is saved using XML document and using RDBMS,

it is concluded that RDBMS work performance for keeping and data query is better than keeping

XML data in XML document. To find a certain record (final record from total number of the record)

it is found that RDBMS is more stable, starting from searching time until after the display in web

page form. Compared to an XML document, the bigger number of node, the worst the form of dataand final node become. This happened because by using DSO before data was searched; browser

should form (cache) data from XML document and find out each node to find certain data. So the

bigger the number of data at an XML document the longer the time to cache the data and data

searching.

On the other hand, XML document data also needs a bigger capacity for saving than saving data in

RDBMS form; it is look in Table 1. The use of index in RDBMS will make the searching for data

faster. Besides that in an XML document, for every data saving into an XML document, they also

should save the tags and this causes the larger capacity.

Table 1: Comparison Space XML Document vs. RDBMS Table

RDBMSNumber ofNodes/ records

XMLDocument Data Index Total

1 583 b 224 b 3 Kb 3,2 Kb

10 4Kb 1,3 Kb 3 Kb 4,3 Kb

50 16 Kb 6,8 Kb 3 Kb 9,8 Kb

100 32 Kb 13,6 Kb 3 Kb 19,6 Kb

500 165 Kb 73,0 Kb 7 Kb 80,0 Kb

1.000 330 Kb 147,2 Kb 11,0 Kb 158,2 Kb

5.000 1,7 M 717,7 Kb 43,0 Kb 760,7 Kb

10.000 3,4 M 1,4 M 82,0 Kb 1,5 M

50.000 17,5 M 7,9 M 403 Kb 8,3M

5. Conclusion and Future Work

The results clearly indicated that storing XML document using RDBMS needs SAX parser to make

better work performance compared to DOM tree technique. Redundant/Universal I technique is the

best alternative for querying XML document in RDBMS and data transfer from RDBMS into XML

document since the use of memory of Universal I table is better than the other techniques. The use

of RDBMS for querying XML data especially for large number of data is faster than XML

document flat file as storage. In general, storing XML data using RDBMS is more efficient since

463

8/8/2019 iiWAS2004484-491

8/8

RDBMS only needs smaller capacity to store the data compared to XML document. This happened

because XML document not only saved the content of the data but also the tags. Our future work

wills also comparing the relative performance of relational database and native XML database,

integrating and comparing some other XML query languages.

6. References

[1] Asaduzzaman, A., 2003, Building XML Tress with PEARs XML_Tree Class, Devshed article,

http://www.devarticles.com/art/1/443

[2] Bourret, R.,Data Transfer Strategies, 2001. http://www.rpbourret.com/xml/DataTransfer.htm

[3] Bourret, R.,XML and Databases, 2003.

http://www.informatik.tudarmstadt.de/DVS1/staff/bourret/xml/XMLAndDatabases.htm

[4] Florescu, D., Kossmann, D., Storing and Querying XML Data using an RDBMS, Bulletin of the Technical Comitte

on Data Enginering, 1999. http://www.research.microsoft.com/research/db/debull/99sept/we.ps

[5] Shanmugasundaram, J., Shekita, E., Carey, M., Lindsay, B., Pirahesh, H., Reinwald, B., Efficiently Publishing

Relational Data as XML Document, 1999. http://www.acm.org/sigmod/vIdb/conf/2000/P065.pdf

[6] Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugusundaram, J., Shekita, E., Zhang, C., Storing and Querying Ordered

XML Using a Relational Database System, ACM SIGMOOD, Madison, Wisconsin, USA, 2002.

464

Documents

iiWAS2004484-491