Upload
vea
View
188
Download
0
Embed Size (px)
DESCRIPTION
OrientX4.0 系统开发报告. XML Group July 25, 2009. XML Keyword Search. TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 - PowerPoint PPT Presentation
Citation preview
XML Keyword Search• <?xml version="1.0" encoding="GB2312"?>• <bib>• <book year="1994">• <title>TCP/IP Illustrated</title>• <author>• <last>Stevens</last>• <first>W.</first>• </author>• <publisher>Addison-Wesley</publisher>• <price>65.95</price>• </book>• ......• <book year="1992">• <title>Advanced Programming in the Unix
environment</title>• <author>• <last>Stevens</last>• <first>W.</first>• </author>• <publisher>Addison-Wesley</publisher>• <price>65.95</price>• </book>
Stevens, Addison-WesleyKeyword:
<book year="1994"><title>TCP/IP Illustrated</title><author><last>Stevens</last><first>W.</first></author><publisher>Addison-Wesley</publisher><price>65.95</price></book>
<book year="1992"><title>Advanced Programming in the Unix environment</title><author><last>Stevens</last><first>W.</first></author><publisher>Addison-Wesley</publisher><price>65.95</price></book>
OrientX4.0 系统结构
Data manager Schema Manager
IndexManager
Storage
XMLDocuments Query Result
Query/ Keywords
Update XPath Data Definition
Schema Address
Records
XQuery
Element Node
Keyword-search
Coder
Execute Engine
Storage Manager
xml文档的解析
SAX 简介• SAX ( Simple API for XML )• SAX是一种解析 XML 文件的技术。
– 使用事件基础来处理 XML文件,目前大部分XML解析器除了支持 DOM 外,都会一并支持SAX解析。
• SAX 是一组程序接口。– 可以将 XML 文件视为字符串流的数据,在读取
XML 元素时触发一系列事件,只需撰写所需的事件处理程序,就可以分析或取得 XML元素。
SAX 图例• XML 文件在经过 SAX 解析后,产生一系列
事件,我们可以建立事件处理程序来处理这些事件。
<book> <title>TCP/IP</title> <author> Stevens</author> <publisher>Addison</publisher> <price>65.95</price></book>
SAX 解析器 事件处理程序
XML 文件
startElement (book)startElement (title)characters (TCP/IP). . . . . . .
解析文件 产生事件
SAX 主要事件• startDocument 事件• startElement 事件
– attribute a;– a.getLength();– a.getQname();– a.getValue();
• Characters 事件• endElement 事件• endDocument 事件
SAX 解析举例
bib
book
title authoryear
1994 TCP/IP last first
publisher price
Addison 65.95
Stevens W.
• <?xml version="1.0" encoding="GB2312"?>• <bib>• <book year="1994">• <title>TCP/IP Illustrated</title>• <author>• <last>Stevens</last>• <first>W.</first>• </author>• <publisher>Addison-Wesley</publisher>• <price>65.95</price>• </book>• </bib>
OrientX4.0 系统结构
Data manager Schema Manager
IndexManager
Storage
XMLDocuments Query Result
Query/ Keywords
Update XPath Data Definition
Schema Address
Records
XQuery
Element Node
Keyword-search
Coder
Execute Engine
Storage Manager
xml文档的编码
Dewey 编码• int dewey[51]={0,-1,-1,…,-1}• 对元素、属性、属性值、文本进行编码• 举例
bib
book
title authoryear
1994 TCP/IP last first
publisher price
Addison 65.95
Stevens W.
0
0.0.0
0.0.0.0
0.0
0.0.1
0.0.1.0 0.0.2.0
0.0.30.0.4
0.0.3.0
0.0.2.1.00.0.2.0.0
0.0.2.1 0.0.4.0
0.0.2
编码举例初始情况: int dewey[51]={0,-1,-1,-1 ,…, -1}start: bib {1,0,………}start: book {2,0,0,……..}start: year{3,0,0,0,……}start:1994{4,0,0,0,0,…….}end:1994 {3,0,0,0,0,-1,……}end: year {2,0,0,0,-1,-1}start: title {3,0,0,1,……}
bib
book
title authoryear
1994 TCP/IP last first
publisher price
Addison 65.95
Stevens W.
• <?xml version="1.0" encoding="GB2312"?>• <bib>• <book year="1994">• <title>TCP/IP Illustrated</title>• <author>• <last>Stevens</last>• <first>W.</first>• </author>• <publisher>Addison-Wesley</publisher>• <price>65.95</price>• </book>• </bib>
0
0.0.0
0.0.0.0
0.0
0.0.1
OrientX4.0 系统结构
Data manager Schema Manager
IndexManager
Storage
XMLDocuments Query Result
Query/ Keywords
Update XPath Data Definition
Schema Address
Records
XQuery
Element Node
Keyword-search
Coder
Execute Engine
Storage Manager
Dewey编码的存储与索引
一个例子• key: Stevens; Deweycode: 0.0.2.0.0• hash(Stevens)=((S*31+t)*31+…)/Nhash=552
552
NULL
Stevens
NULL
0.0.2.0.0
5
一个例子• key: Stevens; Deweycode: 0.1.2.0.0• hash(Stevens)=552
552
NULL
Stevens
0.0.2.0.0
5
NULL
0.1.2.0.0
5
一个例子• key: xxx, 0.1.3.0• hash(xxx)=552
552Stevens
0.0.2.0.0
5
NULL
0.1.2.0.0
5
NULL
xxx
NULL
0.1.3.0
4
OrientX4.0 系统结构
Data manager Schema Manager
IndexManager
Storage
XMLDocuments Query Result
Query/ Keywords
Update XPath Data Definition
Schema Address
Records
XQuery
Element Node
Keyword-search
Coder
Execute Engine
Storage Manager
SLCA算法的实现
SLCA 的实现•Naïve 方法
–对所有关键字的组合求 LCA–从结果中去掉祖先结点–剩余的就是 SLCA 结点
Key S Stevens : 0.0.2.0.0 , 0.1.2.0.0
Key A Addison-Wesley : 0.0.3.0 , 0.1.3.0
0 T F0 F F2 F F0 F F0 F F
Key S Stevens : 0.0.2.0.0 , 0.1.2.0.0
Key A Addison-Wesley : 0.0.3.0 , 0.1.3.0 S A
(a) node0.0.2.0.0
bib
book
author
last first
publisher ….
Addison
Stevens W.
book
…. author
last first
publisher price
Addison 65.95
Stevens W.
……
….
0.0.2.0.0
0.0.3.0
0
0.1.2.0.0
0.0 0.1
0.1.3.0
0.0.2
0.0.2.0
0.0.3
Stack 算法
Stack 算法
0 T F0 F F2 F F0 F F0 F F
0 T F2 F F0 F F0 F F
2 T F0 F F0 F F
0 T F0 F F
0 F T3 F F0 T F0 F F
Key S Stevens : 0.0.2.0.0 , 0.1.2.0.0
Key A Addison-Wesley : 0.0.3.0 , 0.1.3.0 S A
S A
(a) node0.0.2.0.0
(b) node0.0.3.0bib
book
author
last first
publisher ….
Addison
Stevens W.
book
…. author
last first
publisher price
Addison 65.95
Stevens W.
……
….
0.0.2.0.0
0.0.3.0
0
0.1.2.0.0
0.0 0.1
0.1.3.0
0.0.2
0.0.2.0
0.0.3
Stack 算法
0 T F0 F F2 F F0 F F0 F F
0 T F2 F F0 F F0 F F
2 T F0 F F0 F F
0 T F0 F F
0 F T3 F F0 T F0 F F
0 T T0 F F 0 F F
0 T F0 F F2 F F1 F F0 F F
Key S Stevens : 0.0.2.0.0 , 0.1.2.0.0
Key A Addison-Wesley : 0.0.3.0 , 0.1.3.0 S A
S A
(a) node0.0.2.0.0
(c) node0.1.2.0.0
(b) node0.0.3.0 report 0.0 as a slca
0 F T3 F F1 T F0 F F
1 T T0 F F
(d) node0.1.3.0 report 0.1 as a slca
1 T F0 F F
bib
book
author
last first
publisher ….
Addison
Stevens W.
book
…. author
last first
publisher price
Addison 65.95
Stevens W.
……
….
0.0.2.0.0
0.0.3.0
0
0.1.2.0.0
0.0 0.1
0.1.3.0
0.0.2
0.0.2.0
0.0.3