Transcript
Page 1: International Conference on Knowledge Discovery and Information Retrieval 2009

ONTOLOGICALWAREHOUSINGONSEMANTICALLYINDEXEDDATA

ReusingSeman,cSearchEngineOntologiestoDevelopMul,dimensionalSchemas

FilippoSciarrone PaoloStaracef.sciarrone@openinforma/ca.org p.starace@openinforma/ca.org

BusinessIntelligenceDivision,OpenInforma/casrl,ViadeiCastelliRomani12/A,Pomezia,Italy

ABSTRACT: In this poster we present a first experimenta6on of aBusiness Intelligence solu6on to dynamically develop mul6dimensionalOLAP schemas through a reuse of ontologies, stored in concept andrela6onsdic6onariesandusedbyseman6cindexingengines.Thepar6cularaspect of the proposed solu6on consists of the integra6on of seman6cindexing techniques of non‐structured documents, based on ontologies,withdynamicmanagementtechniquesofunbalancedhierarchiesinaDataWarehouse.

The Two‐Step Indexing Process: We adapted our solu6on to a pre‐exis6ngsystem, implemented to execute a two‐step indexing process of non structured documents.Duringthefirststep,fromtheunstructureddocslayertothetermssetlayer,theengineindexedeachdocument,thusobtainingasetofindex‐terms.Inthesecondstep,fromthetermssetlayerto the ontologies layer, these terms were contextualized and associated with the concepts ofpredefinedontologies.In order to integrate our module with the aforesaid system, the following assump6ons wereimposed:•  Theconcepts included in thedic6onarywereexclusively linkedbyhypernymyandhyponymyrela6ons;

• Eachontologywasbasedonahierarchicstructure.

TheStar‐SchemawithBridgeTable:Theadoptedsolu6onenvisagestheinclusionofabridgetablebetweentheconceptdimensionandthefacts.ThegoalofthebridgetablewastohelptheOLAPengineaggregatedatamorequickly.Themeasureonwhichtheaggrega6onistobemadeisthenumberofpersonsreferringtothesingleconcept.Theresul6ngtablewillthereforebeafactlessfactbecausethereisn’tasumma6vevalue.

TheResul6ngPivotTable:ThePENTAHOBIsuitepresentsthedataprocessedbytheOLAPMONDRIANenginethroughjsplibrariesthatgeneratepivottablesforthenaviga6onofmul6dimensionalcubes.Thesumoftheleavesvaluesdoesnotalwayscorrespondtothevalueoftopnodebecausetheremaybeconceptsthatreferonlytotheparentnodethatyoumustaddtothevalueofthesum.

AnExampleOntology:Inordertoensureahierarchicnaviga6onitisnecessarytobringitbacktoatreestructure.Thepresenceofnavigablecyclesonthestructuresisruledout‐eventheore6cally‐fromthetypologyofrela6onexis6ngbetweentheconcepts,i.e.,thepartofrela6on.OnemustthereforeconsiderthemanagementofDirectedAcyclicGraphs(DAG)

The Overall Process: We developed a custom ETL module in order to integrateseman6cally indexed data with opera6onal ones. The custom ETL process performs the followingopera6ons:(1)Buildstheontologicaltree,extrac6ngitfromthedic6onary;(2)Definesthedimensiontable; (3) Includes theontologynodes; (4)Defines thebridge table; (5) Includesa record for eachiden6fiablepathonthetree,alongwiththedistancebetweentherelevantnodes(includingthezerolengthpathfromaconcepttoitself).

CaseStudy

CONCLUSIONS and FUTUREWORKS: The defined process iscurrentlystableandyieldsposi6veresultsinacompanyenvironment.Thefact‐defini6on process can be improved, extending the logic to the joinbaseofthedata.InordertoprovideacompleteBIservice,thesystemmustbeabletomakeseveraltypesofaggrega6ons,notjustthebasicones.Forthe futureweplan the enhancement of indexed datamanagement,withthe introduc6on of a Cache‐Based engine, and on the resolu6on ofproblemsrelatedtothemanagementofmany‐to‐manyrela6ons.

Recommended