Upload
samuel-adams
View
220
Download
2
Embed Size (px)
Citation preview
SDPL 2002 Notes 3.2: Document Object Model 1
XML 접근과 관련된 자료
SDPL 2002 Notes 3.2: Document Object Model 2
The Misconceived Web
The original vision of the WWW was as a hyperlinked document-retrieval system.
It did not anticipate presentation, session, or interactivity.
If the WWW were still consistent with TBL's original vision, Yahoo would still be two guys in a trailer.
SDPL 2002 Notes 3.2: Document Object Model 3
How We Got Here
Rule Breaking
Corporate Warfare
Extreme Time Pressure
SDPL 2002 Notes 3.2: Document Object Model 4
The Miracle
It works!
Java didn't.
Nor did a lot of other stuff.
SDPL 2002 Notes 3.2: Document Object Model 5
The Scripted Browser
Introduced in Netscape Navigator 2 (1995)
Eclipsed by Java Applets
Later Became the Frontline of the Browser War
Dynamic HTML
Document Object Model (DOM)
SDPL 2002 Notes 3.2: Document Object Model 6
Viewing XML
XML is designed to be processed by computer programs, not to be displayed to humans
Nevertheless, almost all current Web browsers can display XML documents– They do not all display it the same way– They may not display it at all if it has errors
This is just an added value. Remember: HTML is designed to be viewed, XML is designed to be used
SDPL 2002 Notes 3.2: Document Object Model 7
Stream Model
Stream seen by parser is a sequence of elements As each XML element is seen, an event occurs
– Some code registered with the parser (the event handler) is executed
This approach is popularized by the Simple API for XML (SAX)
Problem:– Hard to get a global view of the document– Parsing state represented by global variables set by
the event handlers
SDPL 2002 Notes 3.2: Document Object Model 8
Data Model
The XML data is transformed into a navigable data structure in memory– Because of the nesting of XML elements, a tree data
structure is used– The tree is navigated to discover the XML document
This approach is popularized by the Document Object Model (DOM)
Problem:– May require large amounts of memory– May not be as fast as stream approach
• Some DOM parsers use SAX to build the tree
SDPL 2002 Notes 3.2: Document Object Model 9
SAX and DOM
SAX and DOM are standards for XML parsers– DOM is a W3C standard– SAX is an ad-hoc (but very popular) standard
There are various implementations available Java implementations are provided as part of
JAXP (Java API for XML Processing) JAXP package is included in JDK starting from
JDK 1.4– Is available separately for Java 1.3
SDPL 2002 Notes 3.2: Document Object Model 10
Difference between SAX and DOM
DOM reads the entire document into memory and stores it as a tree data structure
SAX reads the document and calls handler methods for each element or block of text that it encounters
Consequences:– DOM provides "random access" into the document– SAX provides only sequential access to the document– DOM is slow and requires huge amount of memory, so it cannot be
used for large documents– SAX is fast and requires very little memory, so it can be used for
huge documents• This makes SAX much more popular for web sites
SDPL 2002 Notes 3.2: Document Object Model 11
Parsing with SAX
SAX uses the source-listener-delegate model for parsing XML documents– Source is XML data consisting of a XML elements
– A listener written in Java is attached to the document which listens for an event
– When event is thrown, some method is delegated for handling the code
SDPL 2002 Notes 3.2: Document Object Model 12
Callbacks SAX works through callbacks:
– The program calls the parser – The parser calls methods provided by the program
parse(...)
The SAX parser
Program
main(...)
startDocument(...)
startElement(...)
characters(...)
endElement( )
endDocument( )
SDPL 2002 Notes 3.2: Document Object Model 13
Problems with SAX
SAX provides only sequential access to the document being processed
SAX has only a local view of the current element being processed– Global knowledge of parsing must be stored in global
variables– A single startElement() method for all elements
• In startElement() there are many “if-then-else” tests for checking a specific element
• When an element is seen, a global flag is set• When finished with the element global flag must be set to false
SDPL 2002 Notes 3.2: Document Object Model 14
DOM
DOM represents the XML document as a tree– Hierarchical nature of tree maps well to hierarchical
nesting of XML elements– Tree contains a global view of the document
• Makes navigation of document easy• Allows to modify any subtree• Easier processing than SAX but memory intensive!
As well as SAX, DOM is an API only– Does not specify a parser– Lists the API and requirements for the parser
DOM parsers typically use SAX parsing
SDPL 2002 Notes 3.2: Document Object Model 15
Document Object Model (DOM)
How to provide uniform access to structured documents in diverse applications (parsers, browsers, editors, databases)?
Overview of W3C DOM Specification– second one in the “XML-family” of recommendations
• Level 1, W3C Rec, Oct. 1998• Level 2, W3C Rec, Nov. 2000• Level 3, W3C Working Draft (January 2002)
What does DOM specify, and how to use it?
SDPL 2002 Notes 3.2: Document Object Model 16
DOM: What is it?
An object-based, language-neutral API for XML and HTML documents
– allows programs and scripts to build documents, navigate their structure, add, modify or delete elements and content
– Provides a foundation for developing querying, filtering, transformation, rendering etc.
applications on top of DOM implementations In contrast to “Serial Access XML” could think
as “Directly Obtainable in Memory”
SDPL 2002 Notes 3.2: Document Object Model 17
DOM structure model
Based on O-O concepts:– methods (to access or change object’s state)– interfaces (declaration of a set of methods) – objects (encapsulation of data and methods)
Roughly similar to the XSLT/XPath data model (to be discussed later)
a parse tree– Tree-like structure implied by the abstract relationships
defined by the programming interfaces; Does not necessarily reflect data structures used by an implementation (but probably does)
SDPL 2002 Notes 3.2: Document Object Model 18
invoiceinvoice
invoicepageinvoicepage
namename
addresseeaddressee
addressdataaddressdata
addressaddress
form="00"form="00"type="estimatedbill"type="estimatedbill"
Leila LaskuprinttiLeila Laskuprintti streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
<invoice><invoice> <invoicepage form="00" <invoicepage form="00" type="estimatedbill">type="estimatedbill"> <addressee><addressee> <addressdata><addressdata> <name>Leila Laskuprintti</name><name>Leila Laskuprintti</name> <address><address> <streetaddress>Pyynpolku 1<streetaddress>Pyynpolku 1 </streetaddress></streetaddress> <postoffice>70460 KUOPIO<postoffice>70460 KUOPIO </postoffice></postoffice> </address></address> </addressdata></addressdata> </addressee> ...</addressee> ...
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
DOM structure modelDOM structure model
SDPL 2002 Notes 3.2: Document Object Model 19
Structure of DOM Level 1
I: DOM Core Interfaces– Fundamental interfaces
• basic interfaces to structured documents– Extended interfaces
• XML specific: CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstruction
II: DOM HTML Interfaces– more convenient to access HTML documents– (we ignore these)
DOM Level 2
– Level 1: basic representation and manipulation of document structure and content (No access to the contents of a DTD)
DOM Level 2 adds – support for namespaces– accessing elements by ID attribute values– optional features
• interfaces to document views and style sheets• an event model (for, say, user actions on elements)• methods for traversing the document tree and manipulating
regions of document (e.g., selected by the user of an editor)
– Loading and writing of docs not specified (-> Level 3)
SDPL 2002 Notes 3.2: Document Object Model 20
DOM Language Bindings
Language-independence:– DOM interfaces are defined using OMG Interface
Definition Language (IDL; Defined in Corba Specification)
Language bindings (implementations of DOM interfaces) defined in the Recommendation for– Java and– ECMAScript (standardised JavaScript)
SDPL 2002 Notes 3.2: Document Object Model 21
SDPL 2002 Notes 3.2: Document Object Model 22
Document Tree Structure
<html> <body> <h1>Heading 1</h1> <p>Paragraph.</p> <h2>Heading 2</h2> <p>Paragraph.</p> </body></html>
#text
H1
H2
P
BODY
HTML
#document
HEAD
#text
P
#text
#text
document
document.body
document.documentElement
SDPL 2002 Notes 3.2: Document Object Model 23
child, sibling, parent
#text
H1 H2P
BODY
#text
P
#text#text
lastChild
last
Chi
ld
last
Chi
ld
last
Chi
ld
last
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
SDPL 2002 Notes 3.2: Document Object Model 24
child, sibling, parent
#text
H1 H2P
BODY
#text
P
#text#text
lastChild
last
Chi
ld
last
Chi
ld
last
Chi
ld
last
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
nextSibling nextSibling nextSibling
previousSibling previousSibling previousSibling
SDPL 2002 Notes 3.2: Document Object Model 25
child, sibling, parent
#text
H1
#text #text#text
lastChild
last
Chi
ld
last
Chi
ld
last
Chi
ld
last
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
nextSibling nextSibling nextSibling
previousSibling previousSibling previousSibling
pare
ntN
ode
pare
ntN
ode
pare
ntN
ode
pare
ntN
ode
pare
ntN
ode
H2P P
BODY
SDPL 2002 Notes 3.2: Document Object Model 26
child, sibling, parent
#text
H1 H2P
BODY
#text
P
#text#text
first
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
first
Chi
ld
nextSibling nextSibling nextSibling
Core Interfaces: Node & its variants
SDPL 2002 Notes 3.2: Document Object Model 27
NodeNode
CommentComment
DocumentFragmentDocumentFragment AttrAttr
TextText
ElementElement
CDATASectionCDATASection
ProcessingInstructionProcessingInstruction
CharacterDataCharacterData
EntityEntityDocumentTypeDocumentType NotationNotation
EntityReferenceEntityReference
““Extended Extended interfaces”interfaces”
DocumentDocument
SDPL 2002 Notes 3.2: Document Object Model 28
DOM interfaces: DOM interfaces: NodeNode
invoice
invoicepage
name
addressee
addressdata
address
form="00"type="estimatedbill"
Leila Laskuprintti streetaddress postoffice
70460 KUOPIOPyynpolku 1
NodeNodegetNodeTypegetNodeTypegetNodeValuegetNodeValuegetOwnerDocumentgetOwnerDocumentgetParentNodegetParentNodehasChildNodeshasChildNodes getChildNodesgetChildNodesgetFirstChildgetFirstChildgetLastChildgetLastChildgetPreviousSiblinggetPreviousSiblinggetNextSiblinggetNextSiblinghasAttributeshasAttributes getAttributesgetAttributesappendChild(newChild)appendChild(newChild)insertBefore(newChild,refChild)insertBefore(newChild,refChild)replaceChild(newChild,oldChild)replaceChild(newChild,oldChild)removeChild(oldChild)removeChild(oldChild)
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
Object Creation in DOM
Each DOM object X lives in the context of a Document: X.getOwnerDocument()
Objects implementing interface X are created by factory methods
D.createX(…) ,where D is a Document object. E.g: – createElement("A"), createAttribute("href"),
createTextNode("Hello!")
Creation and persistent saving of Documents left to be specified by implementations
SDPL 2002 Notes 3.2: Document Object Model 29
SDPL 2002 Notes 3.2: Document Object Model 30
invoiceinvoice
invoicepageinvoicepage
namename
addresseeaddressee
addressdataaddressdata
addressaddress
form="00"form="00"type="estimatedbill"type="estimatedbill"
Leila LaskuprinttiLeila Laskuprintti streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
DocumentDocumentgetDocumentElementgetDocumentElementcreateAttribute(name)createAttribute(name)createElement(tagName)createElement(tagName)createTextNode(data)createTextNode(data)getDocType()getDocType()getElementById(IdVal)getElementById(IdVal)
NodeNode
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
DOM interfaces: DOM interfaces: DocumentDocument
SDPL 2002 Notes 3.2: Document Object Model 31
DOM interfaces: DOM interfaces: ElementElement
invoiceinvoice
invoicepageinvoicepage
namename
addresseeaddressee
addressdataaddressdata
addressaddress
form="00"form="00"type="estimatedbill"type="estimatedbill"
Leila LaskuprinttiLeila Laskuprintti streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
ElementElementgetTagNamegetTagNamegetAttributeNode(name)getAttributeNode(name)setAttributeNode(attr)setAttributeNode(attr)removeAttribute(name)removeAttribute(name)getElementsByTagName(name)getElementsByTagName(name)hasAttribute(name)hasAttribute(name)
NodeNode
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
Accessing properties of a Node
– Node.getNodeName()• for an Element = getTagName()• for an Attr: the name of the attribute• for Text = "#text" etc
– Node.getNodeValue() • content of a text node, value of attribute, …;
null for an Element (!!) (in XSLT/Xpath: the full textual content)
– Node.getNodeType(): numeric constants (1, 2, 3, …, 12) for ELEMENT_NODE, ATTRIBUTE_NODE,TEXT_NODE, …, NOTATION_NODE
SDPL 2002 Notes 3.2: Document Object Model 32
Content and element manipulation
Manipulating CharacterData D:– D.substringData(offset, count) – D.appendData(string) – D.insertData(offset, string) – D.deleteData(offset, count) – D.replaceData(offset, count, string)(= delete + insert)
SDPL 2002 Notes 3.2: Document Object Model 33
Accessing attributes of anAccessing attributes of an ElementElement object E object E::– EE.getAttribute(.getAttribute(namename)) – EE.setAttribute(.setAttribute(name, valuename, value)) – EE.removeAttribute(.removeAttribute(namename))
Additional Core Interfaces (1)
NodeList for ordered lists of nodes– e.g. from Node.getChildNodes() or Element.getElementsByTagName("name")
• all descendant elements of type "name" in document order (wild-card "*"matches any element type)
SDPL 2002 Notes 3.2: Document Object Model 34
Accessing a specific node, or iterating over all Accessing a specific node, or iterating over all nodes of a nodes of a NodeListNodeList::– E.g. Java code to process all children:E.g. Java code to process all children:for for (i=0;(i=0;
i<node.i<node.getChildNodesgetChildNodes().().getLengthgetLength(); ();
i++) i++)
process(node.process(node.getChildNodesgetChildNodes().().itemitem(i));(i));
Additional Core Interfaces (2)
NamedNodeMap for unordered sets of nodes accessed by their name:– e.g. from Node.getAttributes()
NodeLists and NamedNodeMaps are "live":– changes to the document structure reflected to
their contents
SDPL 2002 Notes 3.2: Document Object Model 35
DOM: Implementations
Java-based parsers e.g. IBM XML4J, Apache Xerces, Apache Crimson
MS IE5 browser: COM programming interfaces for C/C++ and MS Visual Basic, ActiveX object programming interfaces for script languages
XML::DOM (Perl implementation of DOM Level 1) Others? Non-parser-implementations?
(Participation of vendors of different kinds of systems in DOM WG has been active.)
SDPL 2002 Notes 3.2: Document Object Model 36
A Java-DOM Example
A stand-alone toy application BuildXml– either creates a new db document with two person elements, or adds them to an existing db document
– based on the example in Sect. 8.6 of Deitel et al: XML - How to program
Technical basis– DOM support in Sun JAXP – native XML document initialisation and storage
methods of the JAXP 1.1 default parser (Apache Crimson)
SDPL 2002 Notes 3.2: Document Object Model 37
Code of BuildXml (1)
Begin by importing necessary packages:
import java.io.*;
import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
// Native (parse and write) methods of the
// JAXP 1.1 default parser (Apache Crimson):
import org.apache.crimson.tree.XmlDocument;
SDPL 2002 Notes 3.2: Document Object Model 38
Code of BuildXml (2)
Class for modifying the document in file fileName:
public class BuildXml { private Document document;
public BuildXml(String fileName) { File docFile = new File(fileName); Element root = null; // doc root elemen
// Obtain a SAX-based parser: DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
SDPL 2002 Notes 3.2: Document Object Model 39
Code of BuildXml (3)
try { // to get a new DocumentBuilder: documentBuilder builder =
factory.newInstance();
if (!docFile.exists()) { //create new doc document = builder.newDocument();
// add a comment: Comment comment =
document.createComment( "A simple personnel list"); document.appendChild(comment);
// Create the root element: root = document.createElement("db");
document.appendChild(root);
SDPL 2002 Notes 3.2: Document Object Model 40
Code of BuildXml (4)
… or if docFile already exists:
} else { // access an existing doc try { // to parse docFile
document = builder.parse(docFile);root = document.getDocumentElement();
} catch (SAXException se) {System.err.println("Error: " +
se.getMessage() );System.exit(1);
} /* A similar catch for a possible IOException */
SDPL 2002 Notes 3.2: Document Object Model 41
Code of BuildXml (5)
Create and add two child elements to root:
Node personNode = createPersonNode(document, "1234",
"Pekka", "Kilpeläinen");
root.appendChild(personNode);
personNode = createPersonNode(document, "5678",
"Irma", "Könönen");
root.appendChild(personNode);
SDPL 2002 Notes 3.2: Document Object Model 42
Code of BuildXml (6)
Finally, store the result document:
try { // to write the // XML document to file
fileName
((XmlDocument) document).write( new FileOutputStream(fileName));
} catch ( IOException ioe ) {
ioe.printStackTrace(); }
SDPL 2002 Notes 3.2: Document Object Model 43
Subroutine to create person elements
public Node createPersonNode(Document document, String idNum, String fName, String lName) {
Element person = document.createElement("person");
person.setAttribute("idnum", idNum); Element firstName =
document. createElement("first");person.appendChild(firstName);
firstName. appendChild( document. createTextNode(fName) );
/* … similarly for a lastName */ return person;}
SDPL 2002 Notes 3.2: Document Object Model 44
The main routine for BuildXml
public static void main(String args[]){ if (args.length > 0) {
String fileName = args[0]; BuildXml buildXml = new
BuildXml(fileName); } else {
System.err.println("Give filename as argument");
};} // main
SDPL 2002 Notes 3.2: Document Object Model 45
Summary of XML APIs
XML processors make the structure and contents of XML documents available to applications through APIs
Event-based APIs– notify application through parsing events– e.g., the SAX call-back interfaces
Object-model (or tree) based APIs– provide a full parse tree– e.g, DOM, W3C Recommendation– more convenient, but may require too much
resources with the largest documents Major parsers support both SAX and DOMSDPL 2002 Notes 3.2: Document Object Model 46
SDPL 2002 Notes 3.2: Document Object Model 47
실제 구현에서 문제점
Event 처리 Bubbling Memory leaks
SDPL 2002 Notes 3.2: Document Object Model 48
Trickling and Bubbling
Trickling is an event capturing pattern which provides compatibility with the Netscape 4 model. Avoid it.
Bubbling means that the event is given to the target, and then its parent, and then its parent, and so on until the event is canceled.
SDPL 2002 Notes 3.2: Document Object Model 49
Why Bubble?
Suppose you have 100 draggable objects.
You could attach 100 sets of event handlers to those objects.
Or you could attach one set of event handlers to the container of the 100 objects.
SDPL 2002 Notes 3.2: Document Object Model 50
Memory Leaks
Memory management is automatic.
It is possible to hang on to too much state, preventing it from being garbage collected.
SDPL 2002 Notes 3.2: Document Object Model 51
Memory Leaks on IE 6
Explicitly remove all of your event handlers from nodes before you discard them.
The IE6 DOM uses a reference counting garbage collector.
Reference counting is not able to reclaim cyclical structures.
You must break the cycles yourself.
SDPL 2002 Notes 3.2: Document Object Model 52
Memory Leaks on IE 6
That was not an issue for page view-driven applications.
It is a showstopper for Ajax applications.
It will be fixed in IE7.
SDPL 2002 Notes 3.2: Document Object Model 53
Memory Leaks on IE 6
Remove all event handlers from deleted DOM nodes.
It must be done on nodes before removeChild or replaceChild.
It must be done on nodes before they are replaced by changing innerHTML.