XML query. introduction An XML document can represent almost anything, and users of an XML query...

Preview:

Citation preview

XML query

introduction

• An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever they have stored in XML.

• XQuery is based on the structure of XML and leverages this structure to provide query capabilities for the same range of data that XML stores.

• XQuery is defined in terms of the XQuery 1.0 and XPath 2.0 Data Model [XQ-DM], which represents the parsed structure of an XML document as an ordered, labeled tree in which nodes have identity and may be associated with simple or complex types.

• XQuery is a functional language

XML vs. Relational Data

{ row: { name: “John”, phone: 3634 }, row: { name: “Sue”, phone: 6343 }, row: { name: “Dick”, phone: 6363 }}

name phone

John 3634

Sue 6343

Dick 6363

row row row

name name name

phone phone phone

“John” 3634 “Sue” “Dick”6343 6363

Relation … in XML

Relational to XML Data

• A relation instance is basically a tree with:– Unbounded fanout at level 1 (i.e., any # of rows)– Fixed fanout at level 2 (i.e., fixed # fields)

• XML data is essentially an arbitrary tree– Unbounded fanout at all nodes/levels– Any number of levels– Variable # of children at different nodes, variable

path lengths

Query Language for XML

• Must be high-level; “SQL for XML”• Must conform to XSchema

– But also work in absence of schema info• Support simple and complex/nested datatypes• Support universal and existential quantifiers,

aggregation• Operations on sequences and hierarchies of doc

structures• Capability to transform and create XML structures

XQuery

• Influenced by XML-QL, Lorel, Quilt, YATL– Also, XPath and XML Schema

• Reads a sequence of XML fragments or atomic values and returns a sequence of XML fragments or atomic values– Inputs/outputs are objects defined by XML-Query

data model, rather than strings in XML syntax

Overview of XQuery• Path expressions• Element constructors• FLWOR (“flower”) expressions

– Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc.

• Expressions evaluated w.r.t. a context:– Context item (current node)– Context position (in sequence being processed)– Context size (of the sequence being processed)– Context also includes namespaces, variables, functions,

date, etc.

Path Expressions

Examples:• Bib/paper• Bib/book/publisher• Bib/paper/author/lastname

Given an XML document, the value of a path expression p is a set of objects

Path Expression Examples

Doc =

&o1

&o12 &o24 &o29

&o43

&o70 &o71

&96

&243 &206

&25

“Serge”“Abiteboul”

1997

“Victor”“Vianu”

122 133

paper bookpaper

references

references references

authortitle

yearhttp

author

authorauthor

title publisherauthor

authortitle

page

firstname lastnamefirstname

lastnamefirst last

Bib

&o44 &o45 &o46

&o47 &o48 &o49 &o50 &o51

&o52

Bib/paper = <&o12,&o29>Bib/book/publisher = <&o51>Bib/paper/author/lastname = <&o71,&206>

Bib/paper = <&o12,&o29>Bib/book/publisher = <&o51>Bib/paper/author/lastname = <&o71,&206>

Note that order of elements matters!

Element Construction

• An XQuery expression can construct new values or structures

• Example: Consider the path expressions from the previous slide.– Each of them returns a newly constructed

sequence of elements– Key point is that we don’t just return existing

structures or atomic values; we can re-arrange them as we wish into new structures

Data Model

• In the XQuery data model, every document is represented as a tree of nodes. The kinds of nodes that may occur are: document, element, attribute, text, name-space, processing instruction, and comment.

• An item is a single node or atomic value. A series of items is known as a sequence. In XQuery, every value is a sequence

Literals and comments

• (: Hello World :)

Doc() function

• Returns entire document doc("books.xml")

Locating nodes

• A path expression consists of a series of one or more steps, separated by a slash, /, or double slash, //.

• doc("books.xml")/bib/book

• doc("books.xml")//book

Predicates

• Predicates are Boolean conditions that select a subset of the nodes computed by a step expression.

• XQuery uses square brackets around predicates.

• For instance, the following query returns only authors for which last="Stevens" is true:doc("books.xml")/bib/book/author[last="Stevens"]

• If a predicate contains a single numeric value, it is treated like a subscript. For instance, the following expression returns the first author of each book:

(doc("books.xml")/bib/book/author)[1]

Creating Nodes

document { <book year="1977"> <title>Harold and the Purple Crayon</title> <author><last>Johnson</last><first>Crockett </first></author> <publisher>HarperCollins Juvenile Books</publisher> <price>14.95</price> </book> }

<titles count="{ count(doc('books.xml')//title) }"> { doc("books.xml")//title } </titles>

FLWOR Expressions

• FOR-LET-WHERE-ORDERBY-RETURN = FLWOR

FOR / LET Clauses

WHERE Clause

ORDERBY/RETURN Clause

List of tuples

List of tuples

Instance of XQuery data model

FOR vs. LET

• FOR $x IN list-expr – Binds $x in turn to each value in the list expr

• LET $x = list-expr – Binds $x to the entire list expr– Useful for common sub-expressions and for

aggregations

FOR vs. LET: Example

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

LET $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns:<result> <book>...</book> <book>...</book> <book>...</book> ...</result>

Notice that result hasseveral elements

Notice that result hasexactly one element

XQuery Example 1

Find all book titles published after 1995:

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

XQuery Example 2For each author of a book by Morgan

Kaufmann, list all books she published:

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

distinct = a function that eliminates duplicates (after converting inputs to atomic values)

Results for Example 2

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

Observe how nested structure of result elements is determined by the nested structure of the query.

XQuery Example 3

count = (aggregate) function that returns the number of elements

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

For each publisher p- Let the list of books published by p be b

Count the # books in b, and return p if b > 100

XQuery Example 4

Find books whose price is larger than average:

LET $a=avg(document("bib.xml")/bib/book/price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/price > $a

RETURN $b

LET $a=avg(document("bib.xml")/bib/book/price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/price > $a

RETURN $b

FLWOER Expressions

for $b in doc("books.xml")//book where $b/@year = "2000" return $b/title

Recommended