Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2

Embed Size (px)

DESCRIPTION

DABI 2006Syntactic-to-Semantic Middleware3 Data exchange and integration proven to be a challenge  Systems distribution  Integration scalability  Data sources heterogeneity  Three types of data heterogeneity 1. Syntactic heterogeneity 2. Schematic heterogeneity 3. Semantic heterogeneity Different data supporting technology Schemas with different structures Different meanings, vocabulary or units for the same concept B2B Integration Issues

Citation preview

Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso DABI 2006Syntactic-to-Semantic Middleware2 Presentation Plan B2B Integration Issues Approaches Syntactic-2-Semantic Objectives Architecture Overview DABI 2006Syntactic-to-Semantic Middleware3 Data exchange and integration proven to be a challenge Systems distribution Integration scalability Data sources heterogeneity Three types of data heterogeneity 1. Syntactic heterogeneity 2. Schematic heterogeneity 3. Semantic heterogeneity Different data supporting technology Schemas with different structures Different meanings, vocabulary or units for the same concept B2B Integration Issues DABI 2006Syntactic-to-Semantic Middleware4 B2B Integration Approaches (1) Using translators in the terminologies of systems pairs to resolve heterogeneity Only suitable for a small number of systems to interoperate Low scalability and high cost Development costs increase as more systems are added DABI 2006Syntactic-to-Semantic Middleware5 B2B Integration Approaches (2) Resolving heterogeneity by relying on the technological foundations of semantic Web Semantically define the meaning of the terminology of each data system Clearly define the relationships and differences between concepts Ability to conceptualize a domain in a machine readable format DABI 2006Syntactic-to-Semantic Middleware6 S2S Integration Objectives (1) Bridge syntactic, schematic and semantic gaps between data sources Heterogeneity enabled system Transparent data gathering Thus focusing on data necessities rather than on location DABI 2006Syntactic-to-Semantic Middleware7 S2S Integration Objectives (2) Combine integration result in a standard format Facilitating further data integration by using integrations standards Supply an homogeneous outcome Make use of semantic web benefits Enables deductions from related data Provides detailed answers to challenging questions DABI 2006Syntactic-to-Semantic Middleware8 S2S Middleware Architecture DABI 2006Syntactic-to-Semantic Middleware9 S2S Middleware Architecture DABI 2006Syntactic-to-Semantic Middleware10 S2S Data Sources Define the scope of the integration system Data source diversity provides a wider integration range and data visibility Support unstructured, semi-structured and structured data sources DABI 2006Syntactic-to-Semantic Middleware11 S2S Ontology Schema (1) Conceptualizes a domain in a machine readable format B2B perspective Promote and facilitate interoperability among systems Enables intelligent processing Allows to share and reuse knowledge Data integration perspective Provides a shared common understanding of a domain DABI 2006Syntactic-to-Semantic Middleware12 S2S Ontology Schema (2) Ontology representation using Web Ontology Language (OWL) World Wide Web Consortium (W3C) recommendation for building ontologies DABI 2006Syntactic-to-Semantic Middleware13 S2S Mapping Module (1) Mapping between remote data and local ontology Responsible for system integration Enables the extraction from distributed and heterogeneous sources Accomplished in two phases Register data sources Attribute registration DABI 2006Syntactic-to-Semantic Middleware14 S2S Mapping Module (2) Phase I Phase I - Register data sources To inform the mapping mechanism what data sources are available for mapping Data sources need specific connection information Generic: file path, url, ip address, Specific: login, driver, id key, DABI 2006Syntactic-to-Semantic Middleware15 S2S Mapping Module (3) Phase II Phase II - Attribute registration Merge information about data sources and extraction rules for an attribute Achieved in three steps 1.Attribute Naming - Providing an unique identifier for each attribute 2.Extraction Rules Specification - Code segment that fills the attribute with data 3.Attribute Mapping - Associate an attribute name with extraction rules and a data source DABI 2006Syntactic-to-Semantic Middleware16 S2S Mapping Module (4) Phase II Phase II - Attribute registration DABI 2006Syntactic-to-Semantic Middleware17 S2S Extractor Manager (1) Handles data sources for retrieving raw data Supports several extraction methods Depending on the data source type Scalable design Easily extended to support other extraction methods New methods don't require changes on the mapping mechanism DABI 2006Syntactic-to-Semantic Middleware18 S2S Extractor Manager (2) DABI 2006Syntactic-to-Semantic Middleware19 S2S Query Handler (1) A query is the event that sets the S2S middleware into action Syntactic-to-Semantic Query Language (S2SQL) is based on SQL syntax Thus well-known and user-readable syntax Simple and small query specification SELECT WHERE AND | OR LIMIT DABI 2006Syntactic-to-Semantic Middleware20 S2S Query Handler (2) Query example: SELECT product WHERE product.brand =seiko AND product.watch.case = stainless-steel LIMIT 1, 20 Query response Up to 20 products classes DABI 2006Syntactic-to-Semantic Middleware21 S2S Instance Generator (1) Manages the serialization of the extracted data Easily integrated output format Ontology (OWL) Easily expanded to others formats Handles errors events Extracting errors Query errors DABI 2006Syntactic-to-Semantic Middleware22 S2S Instance Generator (2) Direct ontology instantiation problem Difficult due semantic relations between classes and maintain data consistency Automatic algorithms dont always offer 100% accuracy S2S instantiation approach creates ontology instances directly from extracted data Using OWL schema and mapping information Output data respects ontology data structure Semantic is maintain since instances are created using the ontology schema DABI 2006Syntactic-to-Semantic Middleware23 S2S Instance Generator (3) S2S instantiation approach DABI 2006Syntactic-to-Semantic Middleware24 S2S Middleware Overview Syntactic-to-Semantic approach Provides homogenous access to a heterogeneous information sources Allows users to focus on what data is needed and leaves details like how to obtain and integrate hidden from users Ontology centric approach Scalable solution DABI 2006Syntactic-to-Semantic Middleware25 Bruno Silva 1, Jorge Cardoso Questions Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware