Upload
paolo-starace
View
168
Download
0
Embed Size (px)
DESCRIPTION
Ontological Warehousing on Semantically Indexed Data: Reusing Semantic Search Engine Ontologies to Develop Multidimensional Schemas
Citation preview
ONTOLOGICALWAREHOUSINGONSEMANTICALLYINDEXEDDATA
ReusingSeman,cSearchEngineOntologiestoDevelopMul,dimensionalSchemas
FilippoSciarrone PaoloStaracef.sciarrone@openinforma/ca.org p.starace@openinforma/ca.org
BusinessIntelligenceDivision,OpenInforma/casrl,ViadeiCastelliRomani12/A,Pomezia,Italy
ABSTRACT: In this poster we present a first experimenta6on of aBusiness Intelligence solu6on to dynamically develop mul6dimensionalOLAP schemas through a reuse of ontologies, stored in concept andrela6onsdic6onariesandusedbyseman6cindexingengines.Thepar6cularaspect of the proposed solu6on consists of the integra6on of seman6cindexing techniques of non‐structured documents, based on ontologies,withdynamicmanagementtechniquesofunbalancedhierarchiesinaDataWarehouse.
The Two‐Step Indexing Process: We adapted our solu6on to a pre‐exis6ngsystem, implemented to execute a two‐step indexing process of non structured documents.Duringthefirststep,fromtheunstructureddocslayertothetermssetlayer,theengineindexedeachdocument,thusobtainingasetofindex‐terms.Inthesecondstep,fromthetermssetlayerto the ontologies layer, these terms were contextualized and associated with the concepts ofpredefinedontologies.In order to integrate our module with the aforesaid system, the following assump6ons wereimposed:• Theconcepts included in thedic6onarywereexclusively linkedbyhypernymyandhyponymyrela6ons;
• Eachontologywasbasedonahierarchicstructure.
TheStar‐SchemawithBridgeTable:Theadoptedsolu6onenvisagestheinclusionofabridgetablebetweentheconceptdimensionandthefacts.ThegoalofthebridgetablewastohelptheOLAPengineaggregatedatamorequickly.Themeasureonwhichtheaggrega6onistobemadeisthenumberofpersonsreferringtothesingleconcept.Theresul6ngtablewillthereforebeafactlessfactbecausethereisn’tasumma6vevalue.
TheResul6ngPivotTable:ThePENTAHOBIsuitepresentsthedataprocessedbytheOLAPMONDRIANenginethroughjsplibrariesthatgeneratepivottablesforthenaviga6onofmul6dimensionalcubes.Thesumoftheleavesvaluesdoesnotalwayscorrespondtothevalueoftopnodebecausetheremaybeconceptsthatreferonlytotheparentnodethatyoumustaddtothevalueofthesum.
AnExampleOntology:Inordertoensureahierarchicnaviga6onitisnecessarytobringitbacktoatreestructure.Thepresenceofnavigablecyclesonthestructuresisruledout‐eventheore6cally‐fromthetypologyofrela6onexis6ngbetweentheconcepts,i.e.,thepartofrela6on.OnemustthereforeconsiderthemanagementofDirectedAcyclicGraphs(DAG)
The Overall Process: We developed a custom ETL module in order to integrateseman6cally indexed data with opera6onal ones. The custom ETL process performs the followingopera6ons:(1)Buildstheontologicaltree,extrac6ngitfromthedic6onary;(2)Definesthedimensiontable; (3) Includes theontologynodes; (4)Defines thebridge table; (5) Includesa record for eachiden6fiablepathonthetree,alongwiththedistancebetweentherelevantnodes(includingthezerolengthpathfromaconcepttoitself).
CaseStudy
CONCLUSIONS and FUTUREWORKS: The defined process iscurrentlystableandyieldsposi6veresultsinacompanyenvironment.Thefact‐defini6on process can be improved, extending the logic to the joinbaseofthedata.InordertoprovideacompleteBIservice,thesystemmustbeabletomakeseveraltypesofaggrega6ons,notjustthebasicones.Forthe futureweplan the enhancement of indexed datamanagement,withthe introduc6on of a Cache‐Based engine, and on the resolu6on ofproblemsrelatedtothemanagementofmany‐to‐manyrela6ons.