iiWAS2004484-491

  • Upload
    akomani

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 iiWAS2004484-491

    1/8

    STORING AND QUERYING XML DATA

    USING RDBMS

    Yesi Novaria KunangComputer Science Faculty

    Bina Darma University, Palembang Indonesia

    [email protected]

    Ahmad AshariFaculty of Mathematics and Natural Sciences

    Gadjah Mada Universty, Yogyakarta Indonesia

    [email protected]

    Abstract

    XML (eXtensible Markup Language) is rapidly becoming a popular data format and emerging

    standard for data exchange over the Internet. With a large amount of data represented as XML

    documents, it becomes necessary to store and query these XML documents. One of these is using an

    RDBMS or for media storage and using SQL to query an XML document.

    There are two approaches to parsing an XML document into RDBMS using a middleware, i.e. SAX

    parsing and DOM parsing methods. This research studied those methods, and then compared the

    performance the two methods. It also studied performance of some alternatives way to structuring

    and tagging data from one or more tables on RDBMS as a hierarchical XML document. As a final

    result, we will get the best performance for storing and querying XML data using an RDBMS from

    these alternatives

    1. Introduction

    XML (eXtensible Markup Language) has become a popular data format and exchanging data over

    the Internet. The flexibility of XML structure is suitable for exchanging data and modeling

    applications. However, when a large number of data is to be present as an XML document, it causes

    the query request and saving process at XML document very important needed. One of the

    approaches is by using the XML native database system. This approach has two weaknesses: first,

    XML native database system is not adequate to save data and cannot accommodate the complicated

    query at relational database system; second, it is impossible for the users to ask directly for XML

    documents and other data that are stored in a relational database system.

    Querying and storing XML data techniques using relational database system are implemented to

    overcome those weaknesses, which have been presented above. The steps for this approach are as

    follows: first, make the relational table design to save data or an XML document; second, divide the

    XML data by separating them into columns in the presented table; and third is processing an SQL

    query to get the XML document format needed from RDBMS data.

    457

  • 8/8/2019 iiWAS2004484-491

    2/8

    2. Literature Review

    Bourret [3] says that XML and related technology can be said to be a simple database because XML

    documents can be used in environments with small amounts of data, few users, and simple work

    performance. XML also provides many things found in databases: storage (XML documents),

    schemas (DTD, XML schema languages), query languages, programming interfaces (SAX, DOM,JDOM), and so on. However, XML lacks many of the things found in real databases, efficient

    storage, indexes, security, transaction and data integrity, multi user access, queries across multiple

    documents, and so on. For this reasons, XML is not suitable for environments that have many users,

    strict data integrity requirements, and the need for good performance.

    Two mappings are commonly used to map an XML document schema to database schema: the

    table-based mapping and the object-relational mapping [4]. These mapping data in XML document

    other than in that document itself, so it will be suitable to map data-centric and not suitable for

    document-centric [6].

    Strategies for data transfer from XML to database depend on the software used and middleware

    support. There are two ways to parsing XML document using middleware application (Java, Perl,

    PHP, Python, C/C++, Eiel, Tcl, dll); they are using SAX (Simple API for XML) or DOM

    (Document Object Model) for XML parser [1].

    Shanmugasundaram, et.al, [5], discuss some alternatives to publishing relational data as an XML

    document, that can be differentiated based on basic principle between relational table and XML

    document, where document XML has been tagged and structured while relational table does not

    have these two things. Therefore, in order to converse relational table into XML document, tagging

    and structuring need to be added in processing. One approach is to do tagging as the final step of

    query processing (late tagging), while another approach is to do it earlier in the process (early

    tagging). Similarly, structuring can be done as the final step query processing (late structuring) or it

    can be done earlier (early structuring).

    Each alternative depends on how much work is done inside the relational engine. Inside Engine

    means tagging and structuring are done completely inside the relational engine, whereas outside

    engine means that part, though not necessarily all, of that work is done outside the relational engine.

    This depends upon the ability of relational database engine in used. For early tagging with late

    structuring is not visible alternative because adding tags to an XML document without having its

    structure makes no sense.

    2.1. Individual

    Tagging and structuring are both done early in query processing. One the simplest technique for

    structuring relational data as an XML document is using the early tagging and early structuringmethod which is called the Stored Procedure [5] orIndividual table technique [2].

    This strategy transfers a hierarchy database orderly document. This process can be done by opening

    a number of results set in the table for root element. For each row within the table, it contains row

    element, and then open almost all child table orderly.

    458

  • 8/8/2019 iiWAS2004484-491

    3/8

    2.2. Universal I table

    In this way, the tagging and structuring were done as the final step in arranging an XML document.

    The forming of XML document is divided into two phases: (a) form content, where relational data

    produced, and (b) tagging and structuring; relational data are arranged and given tag to produce

    XML document.

    In order to produce the content as that we want is by making a single result set (universal table),

    consisting of all data in document. By correlating all tables, using join predicate is to relate parent to

    child. This is known asRedundant Relation [5], or it also called Universal I[2].

    To tagging and structuring result, there are two ways this should be done: (a) grouping all siblings

    (XML documents) which have the same category (and eliminate duplicate for redundant), and (b)

    extracting information from each tuple and tag to produce XML result.

    2.3. Universal II table

    The main problem with the Late Tagging Late Structuring technique is related to memorymanagement when forming tags. To overcome this problem, a relational engine can structure

    relational content. This strategy can be done using a Universal table type II [2] or Sorted Outer

    Union [5], which uses UNION statement.

    3. Conducting the Research

    This research specializes to see strategies to transfer data between XML document and relational

    database, and vice versa. The steps are as follows:

    3.1. Designing an XML Document

    To show how treating the element data attribute and sub element in an XML document and

    transferring to a database or vice versa, an example will be given in the form of an XML document

    where inside there is web element, attribute and sub element. The example that looks in Figure 1

    can be done for this research.

    Figure 1: Inventory Document sample

    Professional PHP Programming

    paperback

    909

    $25

    Tutorial

    Wrox Press Ltd.

    459

  • 8/8/2019 iiWAS2004484-491

    4/8

    To examine this XML document, numbers of root node were added, from 1 until 100,000-root node

    by the assisting of PHP script to form the data file. The size of data begins from the smallest unit,

    583 bytes until 35,135 M.

    3.2. Object Relational - XML Document Mapping

    The previous example of an XML documents, can be mapped by Object Relational mapping

    become objects that look in Figure 2.

    Figure 2: Object based Mapping on Document Inventory

    These objests can be mapped into MySQL database with two tables, the database schema or table

    relation is shown in Figure 3.

    Figure 3: Table Relation

    The idbook attribute in the tabbook1 table is the primary key, and the idbook attribute in the

    tabauthor1 is theforeign key.

    3.3. Transferring Data from XML to Database

    The strategy to transfer data from XML to database using PHP as middleware in this research can

    be done by:

    1. Parsing file using PHP SAX parser

    2. Using PEARs XML Tree Class

    Object Book {

    - Idbook=001; - Title=Profesional PHPProgramming;

    - Year =2000; - Binding =paperback;

    - Pages = 909; - Price = $25;

    - Genre = Tutorial; -Publisher = Wrox Press Ltd.

    }

    Object Author {

    -First = Jesus

    -middle=

    -last =Castagnetto

    }

    Object Author {

    -First = Harish

    -middle=

    -last =Rawat

    }

    Object Author {

    -First = Sascha

    -middle=

    -last =Schumann

    }

    Etc.

    460

  • 8/8/2019 iiWAS2004484-491

    5/8

    3.4. Transferring Data from Database into XML

    By using an alternative way in forming tagging and structure, the present of XML document format

    from relational MySQL database supported by PHP script as middleware. For all alternatives done

    by tagging and structuring process, which is outside engine, has meant that a part of the process was

    done outside the relational engine.

    The alternatives of data transfer from database into XML that used in this research are:

    1. Early tagging, early structuring; stored procedure/individual table

    2. Late Tagging, Late Structuring; Universal Table I/Redundant

    3. Late Tagging, Early Structuring; Universal Table II/Sorted Union

    3.5. Presenting and Searching XML Data

    To compare the work performance between XML document and RDBMS, from speed side in

    loading process in browser, it was done by the following task:

    1. Data Searching at XML document by using DSO binding technique, using script.

    2. Presenting XML data from RDBMS, by conducting a data search from XML document that

    have been saved into MySQL database, using the redundant method. The result from query is

    saved as XML document and by using DOM Tree method, the result from query is saved as

    XML document using DOM Tree method, the result file is read and bound using XSL file.

    4. Results and Discussion

    4.1. Comparison between SAX and DOM

    Comparison between SAX and DOM were conducted in this research by inserting the data fromXML document, which has some node variation to be inserted into database table. This result is

    look in Figure 4.

    0

    500

    1000

    1500

    2000

    1 50 100 500 1000 5000

    Number of nodes

    Time(s)

    DOM

    SAX

    Figure 4: Comparison between SAX and DOM

    As we see from graphic, parsing XML document is faster if we use SAX method (Simple API for

    XML) than DOM method (Document Object Model). There are two important things: first, SAX

    code uses smaller memory because the buffer is only one row, while DOM code uses the buffer for

    the whole document. The second, SAX code is faster because it is saving time to form DOM Tree.

    461

  • 8/8/2019 iiWAS2004484-491

    6/8

    The most important effect of using the memory is SAX method can be use for large document while

    DOM uses a lot of memory. However, DOM is suitable to be used for application where DOM tree

    is needed, for example if we want to present an XML document supported by XSLT. Another

    reason is the hierarchy of XML document parsing by DOM technique is more complete for the tag

    name, tag attribute, data content and other nested tags.

    In order to parsing the larger XML document with a large number of node, from this research, it is

    suggested to divide those document for having parsing faster by using SAX and DOM method.

    4.2. The Comparison Result for transfer from Database into XML

    To compare some strategy for data transfer from database to an XML document, each method

    examine with the variation of numbers data to observe the speed of data transfer from database into

    XML document. The comparison result is shown in Figure 5.

    0

    500

    1000

    1500

    1 10 50 100 1000 5000

    Number of records

    Time(s) Individual

    Universal I

    Universal II

    Figure 5: Data Transfer Strategies from Database into Document XML Graphic

    From the graphic, it is significant that the individual table method becomes most inappropriate

    method. This method shows the worst work performance by processing data slowly to be an XML

    document, compared to Universal Universal II methods and I. The main cause of this is because

    most of the resource database used in this method, one or more SQL query that should be given in

    every tuple for tables should have nestedstructure. Therefore, to form a larger document, thousands

    of queries that should be processed cause the inefficient or even deadlock.

    The Universal table I/ redundant method shows a very good work performance compared to

    universal type II. This happened because of the efficient process of query even we found

    redundancy, compared to Universal II method, which should form result table in the structure form.

    Also for tagging process, this become faster because rows from the result set table is fewer than

    Universal table II. The use of memory of Universal table I is better than Universal table II, it is

    indicated by the number of record data (50.000). Universal II method spends all, or even more of

    the default memory provided by MySQL.

    4.3. The Comparison of Searching Data for XML Document toward RDBMS

    The comparison result of searching data using XML file document toward RDBMS can be found in

    Figure 6.

    462

  • 8/8/2019 iiWAS2004484-491

    7/8

    0

    10

    20

    30

    40

    1 10 50 100 500 1000 5000 10000 50000

    Number of nodes/records

    T

    ime(s)

    XML

    RDBMS

    Figure 6: Comparison of query using RDBMS and using XML Document

    From the graphic of XML searching data, which is saved using XML document and using RDBMS,

    it is concluded that RDBMS work performance for keeping and data query is better than keeping

    XML data in XML document. To find a certain record (final record from total number of the record)

    it is found that RDBMS is more stable, starting from searching time until after the display in web

    page form. Compared to an XML document, the bigger number of node, the worst the form of dataand final node become. This happened because by using DSO before data was searched; browser

    should form (cache) data from XML document and find out each node to find certain data. So the

    bigger the number of data at an XML document the longer the time to cache the data and data

    searching.

    On the other hand, XML document data also needs a bigger capacity for saving than saving data in

    RDBMS form; it is look in Table 1. The use of index in RDBMS will make the searching for data

    faster. Besides that in an XML document, for every data saving into an XML document, they also

    should save the tags and this causes the larger capacity.

    Table 1: Comparison Space XML Document vs. RDBMS Table

    RDBMSNumber ofNodes/ records

    XMLDocument Data Index Total

    1 583 b 224 b 3 Kb 3,2 Kb

    10 4Kb 1,3 Kb 3 Kb 4,3 Kb

    50 16 Kb 6,8 Kb 3 Kb 9,8 Kb

    100 32 Kb 13,6 Kb 3 Kb 19,6 Kb

    500 165 Kb 73,0 Kb 7 Kb 80,0 Kb

    1.000 330 Kb 147,2 Kb 11,0 Kb 158,2 Kb

    5.000 1,7 M 717,7 Kb 43,0 Kb 760,7 Kb

    10.000 3,4 M 1,4 M 82,0 Kb 1,5 M

    50.000 17,5 M 7,9 M 403 Kb 8,3M

    5. Conclusion and Future Work

    The results clearly indicated that storing XML document using RDBMS needs SAX parser to make

    better work performance compared to DOM tree technique. Redundant/Universal I technique is the

    best alternative for querying XML document in RDBMS and data transfer from RDBMS into XML

    document since the use of memory of Universal I table is better than the other techniques. The use

    of RDBMS for querying XML data especially for large number of data is faster than XML

    document flat file as storage. In general, storing XML data using RDBMS is more efficient since

    463

  • 8/8/2019 iiWAS2004484-491

    8/8

    RDBMS only needs smaller capacity to store the data compared to XML document. This happened

    because XML document not only saved the content of the data but also the tags. Our future work

    wills also comparing the relative performance of relational database and native XML database,

    integrating and comparing some other XML query languages.

    6. References

    [1] Asaduzzaman, A., 2003, Building XML Tress with PEARs XML_Tree Class, Devshed article,

    http://www.devarticles.com/art/1/443

    [2] Bourret, R.,Data Transfer Strategies, 2001. http://www.rpbourret.com/xml/DataTransfer.htm

    [3] Bourret, R.,XML and Databases, 2003.

    http://www.informatik.tudarmstadt.de/DVS1/staff/bourret/xml/XMLAndDatabases.htm

    [4] Florescu, D., Kossmann, D., Storing and Querying XML Data using an RDBMS, Bulletin of the Technical Comitte

    on Data Enginering, 1999. http://www.research.microsoft.com/research/db/debull/99sept/we.ps

    [5] Shanmugasundaram, J., Shekita, E., Carey, M., Lindsay, B., Pirahesh, H., Reinwald, B., Efficiently Publishing

    Relational Data as XML Document, 1999. http://www.acm.org/sigmod/vIdb/conf/2000/P065.pdf

    [6] Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugusundaram, J., Shekita, E., Zhang, C., Storing and Querying Ordered

    XML Using a Relational Database System, ACM SIGMOOD, Madison, Wisconsin, USA, 2002.

    464