Upload
gwen-smith
View
215
Download
1
Embed Size (px)
Citation preview
Source Description-based Approach for the Modeling of Spatial Information Integration
Yoshiharu Ishikawa and Hiroyuki KitagawaUniversity of Tsukuba
{ishikawa,kitagawa}@is.tsukuba.ac.jp
Outline
Background Our Objective and Approach Motivating Example Data Model Query Specification and Source
Description Query Processing Conclusions and Future Work
Background: Spatial Information Sources (1)
Spatial information sources: emerging new information sources on the Internet information sources that
provide region- or location-oriented information
some of them support mobile users with GPSs and hand-held devices
Background: Spatial Information Sources (2)
Need for the technology to integrate spatial information sources description of spatial
information sources by taking their contents into consideration
efficient and effective query planning and processing
Spatial Information Integration
Background: Spatial Information Sources (3)
Standarization Efforts of Spatial Technologies OpenGIS [5]: standardization of GIS system POIX [6]: language for location-oriented information exchan
ge G-XML [7]: XML vocaburary for geographic information des
cription RWML [8]: road information description language
Spatial Information Services Digital City [10], citysearch.com [11]: location-oriented info
rmation services Ekimae Tanken Club [12]: provides local information nearb
y a specified rail station MONET system [13]: provides information for car drivers
Background: Heterogeneous Information Integration (1)
Popular approach for information integration well-known wrapper-mediator approach
Wrapper encapsulates the detail of each information
source provides abstract uniform view of the source
Mediator selects appropriate information sources for a
given query query planning and processing
Unified Access tothe Integrated Information
Heterogeneous Information Integration System
Wrapper
Wrapper
Wrapper
Wrapper
Wrapper
Wrapper
MediatorMediator
InformationSource B
InformationSource C
InformationSource D
Wrapper
Wrapper
InformationSource A
Background: Heterogeneous Information Integration (2)
Outline
Background Our Objective and Approach Motivating Example Data Model Query Specification and Source
Description Query Processing Conclusions and Future Work
Our Objective
Development of a spatial information integration framework for location-aware information services integration of heterogeneous spatial
information sources heterogeneity of the contents of the sources heterogeneity of the capabilities of the sources
provide useful location-oriented information service to mobile users
selection of neighborhood geometric features
Our Approach Development of a description method to represent s
patial information sources based on the source description framework: describes the
contents and the service of the source introduction of spatial data types and spatial operators: ba
sed on OpenGIS standard Development of query planning and processing meth
ods that effectively utilize source descriptions selection of appropriate information sources for a given qu
ery effective use of the query processing power of each inform
ation source
Outline
Background Our Objective and Approach Motivating Example Data Model Query Specification and Source
Description Query Processing Conclusions and Future Work
Motivating Example (1)
Heterogeneous Information Integration SystemWrappe
rWrappe
rWrappe
rWrappe
rWrapp
erWrapp
er
MediatorMediator
InformationSource B
InformationSource C
InformationSource D
Wrapper
Wrapper
InformationSource A
Global Schema based on the relational model represents a virtual database schema each information source is (partially)
mapped to the global schema
relation Restaurant { relation Evalouation { name string; name string; category string; score real; address string; }; location point; };
Motivating Example (2)
Motivating Example (3) Query issued by the
user: show top-20 nearest restaurants such that within 1000 meters
from the current position
the score is more than or equal to 2.5 stars
1000m
1
2
34
5
67
SELECT r.name, r.addressFROM Restaurant as r, Evaluation as eWHERE r.name = e.name, e.score >= 2.5 Distance(r.location, p) <= 200ORDER BY Distance (r.location, p)STOP AFTER 20
SQLrepresentation
p
Motivating Example (4) Information Source A:
provides restaurant info for a specific area
Contents: contains information of restaurants within the rectangle area r
Capability: given name or address, it returns the matched restaurants
r
Motivating Example (5) Information Source B:
supports spatial conditions to query restaurant info
Contents: contains information about restaurants
Capability returns restaurants within
the specified circle area receives additional
condition on restaurant category
category = “Chinese”
Motivating Example (6) Information Source C:
supports spatial conditions to query restaurant info
Contents: contains information about restaurants
Capability returns restaurants that
match the specified name if an optional polygon is
given, it only returns restaurants within the specified polygon region
name like “%Sushi”
Motivating Example (7) Information Source
D: provides restaurant evaluation scores given restaurant
name, it returns the evaluation score
select *from Source-Dwhere name like “%Sushi”
name
Tokyo Sushi
score
3.0
Edo Sushi 2.7
Outline
Background Our Objective and Approach Motivating Example Data Model Query Specification and Source
Description Query Processing Conclusions and Future Work
Data Model for Integration The relational model enhanced with spatial da
ta types and spatial operations Spatial data types and spatial operations are
based on OpenGIS proposal [5] A wrapper for each spatial information source
wraps the operations of the source, then provides OpenGIS-conformed operations
A wrapper for a source provides a subset of OpenGIS operations, depending on the capability of the source
Based on OpenGIS Proposal To simplify the problem, we only considers Point, LineString, and Polygon type
s
Geometry
MultiPointMultiCurve
MultiSurface
Point Curve Surface
Geometry
Point GeometryCollectionCurve Surface
LineString Polygon
MultiPointMultiCurve
MultiSurfaceOur Target
Spatial Data Types
intersects(g1,g2)
disjoint(g1,g2)
equals(g1,g2)
overlaps(g1,g2)
contains(g1,g2)
within(g1,g2)
crosses(g1,g2)
touches(g1,g2)
g1 and g2 have intersections
g1 and g2 ao not have any overlap
g1 and g2 are equal
g1 and g2 have one or more overlaps
g1 contains g2
g1 is contained in g2
g1 and g2 have intersections
g1 and g2 touch at one or more points
Spatial Operations (1)Spatial Predicates of OpenGIS
Spatial Functions of OpenGIS
intersection(g1,g2)
distance(g1,g2)
envelope(g)
union(g1,g2)
isempty(g) Integer
Double
Geometry
Geometry
Geometry
g is empty
mindist between g1and g2
MBB of g
unified region of g1 and g2
intersection of g1 and g2
name return type semantics
Spatial Operations (2)
Outline
Background Our Objective and Approach Motivating Example Data Model Query Specification and Source
Description Query Processing Conclusions and Future Work
Source Description Framework
Source Description Framework: a formal framework to specify meta information for an information source proposed by Information Manifold [3]
A source description consists of: Contents Description: describes the contents of th
e source in terms of the global schema Capability Description: describes the types of quer
ies which the source can support We extend the source description approach by consi
dering OpenGIS data types and operations
Query Description An extension of a conjunctive query: it can contain
spatial predicates (e.g., intersects, contains) spatial functions (e.g., envelope, distance) use of additional comparison operators (e.g., ≤)
General form of a conjunctive query
R1,…,Rn : global relations
u, u1,…,un : sequences of variables
c1,…,cm (m 0) : conditions
ans(u) R1(u1),…,Rn(un), c1,…,cm
Query Description (1)
ans(n, a) Restaurant(n, c, a, l), Evaluation(e, s), n = e, s 2.5, distance(l, p) 1000
Show restaurants within 1000 meters from the current position and their scores are larger than or equal to 2.5 stars
SELECT r.name, r.addressFROM Restaurant as r, Evaluation as eWHERE r.name = e.name, e.score >= 2.5 Distance(r.position, p) <= 1000
Query Description (2)
Spatial Query Conditions For spatial query condition, we allow the follow
ing spatial range restriction predicates (g is a geometric constant) equals(g, g) and equals(g, g) within(g, g) contains(g, g)
Also, we allow distance-based range restriction conditions (g is a Geometry object, d is a real constant, is < or ≤) distance(g, g) θ d
A source description consists of contents description
capability description
pat : mandatory input arguments (input pattern)
out : denotes the condition issued to the underlying
source when the input arguments (pat) are given
contents : S (u) R (u), c1,…,cn
example: S(n, c, a, l) Restaurant(n, c, a, l), c = “Italian”
filters : pat out
Source Descriptions (1)
Information Source A Information Source A:
provides restaurant info for a specific area
Contents: contains information of restaurants within the rectangle area r
Capability: given name or address, it returns the matched restaurants
r
Source A provides restaurant information provides information within r also allows retrieval by restaurant name and
address
Source A
contents: SA Restaurant(n, c, a, l), contains(r, l)
filters: <n: string> n = n, <a: string> a = a
Source Description for A
Information Source B Information Source B:
supports spatial conditions to query restaurant info
Contents: contains information about restaurants
Capability returns restaurants within
the specified circle area receives additional
condition on restaurant category
category = “Chinese”
Source B provides restaurant information inputs are a query point (p) and a threshold
value of distances (d) allows an additional filtering condition based
on the restaurant category (c)
Source Bcontents: SB Restaurant(n, c, a, l) filters: <p : Point, d : real> distance(l, p) d, <c: string> c = c
Source Description for B
Information Source C Information Source C:
supports spatial conditions to query restaurant info
Contents: contains information about restaurants
Capability returns restaurants that
match the specified name if an optional polygon is
given, it only returns restaurants within the specified polygon region
name like “%Sushi”
Source C provides restaurant information returns restaurants that match the specified name (n) allows additional filtering condition based on polygonal
region (g )
Source C
contents: SC Restaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> contains(g, l)
Source Description for C
Information Source D Information Source
D: provides restaurant evaluation scores given restaurant
name, it returns the evaluation score
select *from Source-Dwhere name like “%Sushi”
name
Tokyo Sushi
score
3.0
Edo Sushi 2.7
Source D provides restaurant evaluation scores allows retrieval by restaurant name and/or
evaluation score
Source D
contents: SD Evaluation(n, s) filters: <n: string> n = n, <s: real> s θ s (θ in {=, ≠, <, >, ≤, ≥})
Source Description for D
Outline
Background Our Objective and Approach Motivating Example Data Model Query Specification and Source
Description Query Processing Conclusions and Future Work
Query Plan Construction1. Preprocessing
- Validation of the correctness of the given query
according to the global schema - deletion of redundant variables - simplifications of expressions2. Selection of useful information sources
based on contents description 3. Pushing query conditions into the underlying
information sources as possible4. Generation of the integrated query plan
Overview of Query Processing (1)
Wrapper
Wrapper
Wrapper
Wrapper
Wrapper
Wrapper
MediatorMediator
Source C Source DSource B
Wrapper
Wrapper
Source A
Pushing subqueries to the sources
query validity checkquery simplification
Source selection basedon contents description
Integration of Subquery results
query result
Receives partial results
Overview of Query Processing (2)
Contents Description used to select useful information sources to proce
ss the given query also used to eliminate redundant join conditions
Capability Description used to decide whether a wrapper on a source can
process the given query condition using its query processing capability
also used to generate a subquery to an information source
Usage of Source Descriptions
Unifies the given query condition and a contents description of a information sourceQuery : ans(u) R1,…,Rn, c1,…,cm
Contents Description : SR (v) Ri(v), e1,…,en
possibility condition for an information sourceto fulfill the given query condition:
x1…xn(c1 … cm e1 … en) = true
Selection of Information Source (1)
Example: a query over the global schema:
ans(n) Restaurant(n, c, a, l), distance(l, p) 1000
Source Description for E: SE (n, c, a, l) Restaurant(n, c, a, l), c = “Italian” , contains(r, l)
Source E has a possibility to satisfy the subquery if: c, l (c = “Italian” contains(r, l)
distance(l, p) 1000) = true
Selection of Information Source (2)
simplification of the possibility condition:
l(contains(r, l) distance(l, p) 1000) = true
intersects(r, circle(p, 1000)) = true
query regionsupported area by source E
1000m
Selection of Information Source (3)
rp
Example: a query over the global schema: ans(n, m) Restaurant(n, c, a, l), BusStop(m, p), distance(l, p) 200
Contents Description for Sources F and G: SF (n, c, a, l) Restaurant(n, c, a, l), contains(r, l)
SG (m, p) BusStop (m, p), contains(s, p)
F and G may satisfy the query if distance(r, s) 200
region of E
200m
region of A
Elimination of Redundant Joins
Pushing Query Conditions (1)
Check the possibility that the given query condition can be processed by the source When the query condition and the filtering
condition (supported by the source) are equivalent
direct push There is no equivalent condition, but if the
source has more general condition transform into more general condition then push
to the source we need an additional step to check the
retrieved results exactly satisfy the given query condition
Capability Description of the Source:Source C
contents: SC Restaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> contains(g, l)
Query:
ans(n) Restaurant(n, c, a, l), contains(r, l)
push contains(r, p) to the source C
Pushing Query Conditions (2)
Source Description for the Source: Source H
contents: SH Restaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> intersects(l, g)
Query:
ans(n) Restaurant(n, c, a, l), distance(l, p) 1000
push condition intersects(p , envelope(circle(p, 1000)))then examine distance(p, circle(p, 1000)) 1000for the retrieved data
Pushing Query Conditions (3)
Outline
Background Our Objective and Approach Motivating Example Data Model Query Specification and Source
Description Query Processing Conclusions and Future Work
Conclusions Proposal of a framework for heterogeneous spatial info
rmation sources based on source description framework
contents description capability description
use of data types and operations of OpenGIS proposal query processing strategies
source selection pushing query conditions
Future Work investigation of source selection and query planning str
ategies more formal framework (e.g., constraint-based approac
h)
Conclusions and Future Work