42
Advanced Data Mining and Integration Research for Europe SAMI 2011, January 2011, Smolenice, Slovakia ADMIRE – Framework 7 ICT 215024 Using Advanced Data Mining and Integration in Environmental Risk Management Ladislav Hluchy Ondrej Habala, Martin Šeleng, Peter Krammer, Viet Tran Institute of Informatics Slovak Academy of Sciences

Using Advanced Data Mining and Integration in Environmental Risk Management

  • Upload
    ismael

  • View
    39

  • Download
    1

Embed Size (px)

DESCRIPTION

Using Advanced Data Mining and Integration in Environmental Risk Management. Ladislav Hluchy Ondrej Habala , Martin Šeleng, Peter Krammer , Viet Tran Institute of Informatics Slovak Academy of Sciences. Contents. EU FP 7 project ADMIRE – overview Architecture of DMI solution in ADMIRE - PowerPoint PPT Presentation

Citation preview

PowerPoint Presentation

Using Advanced Data Mining and Integration in Environmental Risk ManagementLadislav HluchyOndrej Habala, Martin eleng, Peter Krammer, Viet TranInstitute of InformaticsSlovak Academy of SciencesAdvanced Data Mining and Integration Research for EuropeSAMI 2011, January 2011, Smolenice, SlovakiaADMIRE Framework 7 ICT 215024...making data-mining easierADMIRE Framework 7 ICT 215024ContentsEU FP7 project ADMIRE overviewArchitecture of DMI solution in ADMIRENew DMI process language DISPELPilot application scenarios ORAVA, RADARgoals, architecture, experimental resultsTools in ADMIRE

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ADMIRE - Advanced Data Mining and Integration Research for Europe 7th Framework ProgramICT, Call 1.2.ACommenced in February 2008 over 36 months.4.3 million in costs, and 3 million in EC funding

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024CollaboratorsUniversity of Edinburgh, UK (Coordinator) NeSc - National e-Science CentreEPCC - Edinburgh Parallel Computing CentreFujitsu Labs of Europe, UKUniversity of Vienna, AustriaInstitute of Scientific ComputingUniversidad Politcnica de Madrid, SpainFacultad de InformaticaSlovak Academy of Sciences, SlovakiaInstitute of InformaticsComArch S.A., Poland

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ADMIRE GoalsAccelerate access to and increase the benefits from data exploitation;Deliver consistent and easy to use technology for extracting information and knowledge;Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; andProvide power to users and developers of data mining and integration processes.SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ADMIRE StructureWP1: High-Level Model and Language ResearchIncremental development of models and languages with a goal of describing Data Mining and Integration (DMI) processes abstractlyWP2: Architecture ResearchIncremental development of a flexible, scalable and open DMI architectureWP3: Platform Support & DeliveryDeliver robust service platforms, support users and encapsulate knowledge in a bookWP4: Service Infrastructure Development and EnhancementDevelop technology and services to enhance the DMI service infrastructure based on Fujitsus USMTSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ADMIRE StructureWP5: Data Mining and Integration Tools DevelopmentDevelop and integrate tools that make the technology easier to use and reduce the frequency of failuresWP6: Integrated ApplicationsDemonstration of validation and performance of architecture, language, platform and tools as an integrated environment for Data Mining and IntegrationWP7: Project ManagementManagement and coordination of the projectSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ADMIRE Architecture: Separation of Concerns

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Domain experts: end user, CRM, flood 8ADMIRE Architecture

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024The famous hour-glasses of the ADMIRE conceptual architecture, the data mining, integration process are divided into two parts, domain experts will work on the top level of the glass, to compose DMI language with their domain knowledge, the Gateway is the only interface to accept the DMI language, do some validation/optimization/translation, and forward to distributed enactment engine for execution.9ADMIRE Framework

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024DISPEL Data Intensive Systems Process-Engineering LanguageData-intensive distributed systemsConnection point of complex application requests and complex enactment systemsBenefit: method development, engineering and evolution of supported practices can take place independently in each worldDescribes enactment requests for streaming-data workflows processesProcess-engineering time transform and optimize process in preparation for enactment periodSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024DISPEL: Simple Example

Creating connectionsString sql1 = "SELECT * FROM some_table";String sql2 = SELECT * FROM table2;String resource = "128.18.128.255";

SQLQuery query = new SQLQuery; |- sql1, sql2 -| => query.expression; |- resource -| => query.resource;

Tee tee = new Tee;query.result => tee.connectInput;

Creating streams of literalsSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024DISPEL real use

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024JAVA WITH EXTENSTIONS, with some ADMIRE flavored syntax like connectors, compiled into OGSADAI workflow13ADMIREs High-Level Architecture

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ADMIRE Gateways

USMTSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024SecurityFramework built on top of formal Grid Infrastructure, available security mechanisms include:Transport level security: SSL, HTTPs, (currently available)Message level security: Web Services Security: SOAP Message SecurityX509 certificate authentificationMultiple stakeholder authorizationExplicit Trust Delegation (ETD)

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Pilot ApplicationsAdmire has 2 pilot applicationsCRMFloodAppFloodAppOravaRadarSVP

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ACRM ApplicationLarge-scale, distributed Churn scenario4 database parts, distributed among ADMIRE partnersGraphical UI for businessanalystsUsing ADMIRE workbench,DISPEL and frameworkto create predictionsof customer churnMining over distributed data

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Flood ApplicationData sets used in hydrological scenarios FSKD 2010Yantai, China, August 10-1219DatasetDomainDescriptionVolumeTemporal coverageSpatial coverageHUSAVHydrologyData from two probes, containing water saturation of soil10s of MB1998-2007Two distinct pointsMARSMeteorologyHistorical meteorological data (temperature, rainfall, etc) for Slovakia100s of MB1975-2007Slovakia (grid 50x50 km)SVPHydrologyData from waterworks in western Slovakia (mainly river Vh) outflows, water levels, temperature, rainfall100s of MB1998-200715 distinct waterworksDAISYPedologyVarious pedological parameters for one probe in southern Slovakia10s of MB1961-2000One pointWOFOSTPedologyCrop data (with attached soil and meteorological data) for Slovakia, year 200610s of MB2006Slovakia (grid)SHMU_CURRMeteorologyOn-line database of meteorological data copied from SHMI web; including radar imagery10s of GB +2008-Slovakia (about 100 distinct probes)SHMU_HISTMeteorologyHistorical meteorological data from SHMI probes100s of MB1998-2007Slovakia (more than 100 distinct probes)SHMU_GRIBMeteorologyHistorical temperatures and rainfall amounts in a gridded binary format100s of GB1998-2007Slovakia (grid, various sizes)RADARMeteorologyWeather radar imagery100s of GB2005-2008SlovakiaSHMU_HYDROHydrologyHistorical data from hydrological measurement stations10s of MB1998-2007Orava and upper Vah riverSOIL_RETPedologyWater retention capacities of soil10s of MBcurrent (no time series applicable)Vah river watershed areaSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Scenarios deployment in testbedTwo scenarios (ORAVA, RADAR) completely deployed in testbedOther scenarios data are partially deployed5 nodes (1 real + 4 virtual nodes)Databases (MySQL + PostgreSQL), GRIB files in file storageUSMT (Unified System Management Technology - Jetty container), OGSA-DAI (Apache Tomcat)SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Orava scenarioLegendGreen area Orava (part of north Slovakia)Blue Orava reservoir and local riversRed dots hydrological measurement stations NotesWe are interested only on hydrological stations below the Orava reservoirIn our tests we will use the hydrological station 5830 (Tvrdosin)

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ORAVA data mining conceptPredictors rainfall amount (reservoir and station), air temperature (reservoir and station), reservoir discharge, reservoir temperature Time Water tempOravaRainfall OravaAir temp OravaAir tempStationRainFallStationOutflowOravaWater -levelStationWater tempStationT-4E-4R-4A-4B-4S-4D-4X-4Y-4T-3E-3R-3A-3B-3S-3D-3X-3Y-3T-2E-2R-2A-2B-2S-2D-2X-2Y-2T-1E-1R-1A-1B-1S-1D-1X-1Y-1TERABSDXYT+1R+1A+1B+1S+1D+1X+1Y+1T+2R+2A+2B+2S+2D+2X+2Y+2T+3R+3A+3B+3S+3D+3X+3Y+3T+4R+4A+4B+4S+4D+4X+4Y+4T+5R+5A+5B+5S+5D+5X+5Y+5T+6R+6A+6B+6S+6D+6X+6Y+6Targets water level and temperature at a station below the reservoir Predicted by a meteo modelGiven in a scheduleTargets of data miningSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ORAVA data integrationIntegration of data fromGRIB filesReservoirsInputsTime period of experimentReservoir IDList of hydro stationsGeo coordinates

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ORAVA data setsDatasetDomainDescriptionVolumeTemporal coverageSpatial coverageSVPHydrologyData from waterworks in western Slovakia (mainly river Vh) outflows, water levels, temperature, rainfall100s of MB1998-200715 distinct waterworksSHMU_CURRMeteorologyOn-line database of meteorological data copied from SHMI web; including radar imagery10s of GB +2008-Slovakia (about 100 distinct probes)SHMU_HISTMeteorologyHistorical meteorological data from SHMI probes100s of MB1998-2007Slovakia (more than 100 distinct probes)SHMU_GRIBMeteorologyHistorical temperatures and rainfall amounts in a gridded binary format100s of GB1998-2007Slovakia (grid, various sizes)SHMU_HYDROHydrologyHistorical data from hydrological measurement stations10s of MB1998-2007Orava and upper Vah riverSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024ZeroEpsilon FilterLinearTrend FilterReplaceMissingValues FilterORAVA integrated and preprocessed dataWater_tempOravaAir_tempOravaRainfallOravaOutflowOravaRainfallStationAir_tempStationFlow/HeightStationWater_tempStation-430-5.55E-20269.0278280.71-430-5.55E-20269.047628.620.7-530-4.24E-20269.505928.620.7-530-8.47E-20270.239428.620.7-530-8.47E-20270.8507280.7-350-8.47E-20271.2792280.7-350-8.47E-20271.9238280.8Water_tempOravaAir_tempOravaRainfallOravaOutflowOravaRainfallStationAir_tempStationFlow/HeightStationWater_tempStation1.0-4.00.030.00.0-3.1222328.00.71.0-4.00.030.00.0-3.102428.620.70.995833-5.00.030.00.0-2.6440828.620.70.991667-5.00.030.00.0-1.9106228.620.70.9875-5.00.030.00.0-1.2992628.00.70.983333-3.00.050.00.0-0.8707628.00.70.979167-3.00.050.00.0-0.2261728.00.8 Integrated raw dataIntegrated preprocessed dataTimeTimeSAMI 2011, Smolenice, Slovakia, January 2011Kelvin2Celsius Filter...making data-mining easierADMIRE Framework 7 ICT 215024ORAVA data miningInput - Integrated dataData Mining Phases:Data understandingData visualizationData quality explorationData preparationMissing values substitution (ReplaceMissingValues filter)Noise reduction (ZeroEpsilon filter)Switching from one scale to another (Kelvin2Celsius filter)Data modifying (LinearTrend filter)Model trainingTraining on historical data (8760 records)Linear Regression modelNeural networks - multilayer perceptron without hidden layers Model EvaluationTesting of the trained modelN-fold cross validationUsing training setsOutput - Prediction model

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Orava data mining resultsprediction of temperatureLinear Regression model equation:

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Orava temperature prediction model comparisonModel\PropertiesLinear regressionMultilayer perceptronCorrelation coefficient0.96390.9821Mean absolute error1.17910.7748Root mean squared error1.46071.0386Relative absolute error23.8739 %15.6884 %Root relative squared error26.609 %18.9195 %Total Number of Instances87608760Validation data Linear regressionmodelMultilayer perceptron modelPredicted dataErrorPredicted dataError11.613.0711.47112.4460.84615.214.335-0.86514.494-0.7066.47.6141.2145.766-0.6340.72.2841.5840.9260.22611.710.948-0.75210.266-1.43414.316.5262.22613.671-0.62915.612.891-2.70914.502-1.09815.712.838-2.86213.353-2.3470.81.7520.9520.8260.02615.815.188-0.61214.005-1.79515.416.5531.15313.129-2.27114.912.795-2.10514.599-0.30115.415.6600.26013.696-1.704SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Orava prediction of water levelNeural network model multilayer perceptronInput parameters (6)Rainfall ([S+1]), Water-Level ([X])Outflows ([D], [D+1] [D], ln([D]), sqrt([D]))Output Difference of waterlevel ([X+1] [X])

Time Water tempOravaRainfall OravaAir temp OravaAir tempStationRainFallStationOutflowOravaWater - LevelStationWater tempStationT-3E-3R-3A-3B-3S-3D-3X-3Y-3T-2E-2R-2A-2B-2S-2D-2X-2Y-2T-1E-1R-1A-1B-1S-1D-1X-1Y-1TERABSDXYT+1R+1A+1B+1S+1D+1X+1Y+1T+2R+2A+2B+2S+2D+2X+2Y+2SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Orava water level prediction Data count : 8735 recordsActivation function of the feed-forward neural network: sigmoidCorrelation coefficient: 0.9816Mean absolute error : 0.4105Root mean squared err.: 0.9673Relative absolute error : 30.5869 % (from difference)Root relative squared error 19.2384 % (from difference)

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024RADAR31Very short-term rainfall prediction from weather radar dataMovement of areas with higher air moisture content, and thus also higher precipitation potentialMining of matrices of data

Time Potential precipitation(RADAR)Measured precipitation(STATION)Temperature(MODEL)Wind (MODEL)T-3R-3S-3H-3W-3T-2R-2S-2H-2W-2T-1R-1S-1H-1W-1TRSHWT+1R+1S+1T+2R+2S+2SAMI 2011, Smolenice, Slovakia, January 2011Targets of data mining...making data-mining easierADMIRE Framework 7 ICT 215024

Network of synoptic stations in Slovakia27 stations in SlovakiaUsed data from year 2007, 2008Rainfall, humidity, atmospheric pressure and temperature values for each hourMeteorologic dataSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024RADAR isotonic modelActual model for rainfall predictionIsotonic reggresion model structureTraining on historical data Correlation coefficient 0.4593 Mean absolute error 0.1105Root mean squared error 0.5490 Total Number of Instances 89700 Validation 10 Cross Fold

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Table of isotonic modelindexPrediction(rainfall)cut point(reflective)indexPrediction(rainfall)cut point(reflective)indexPrediction(rainfall)cut point(reflective)10.011.78150.2396.91291.35355.9120.031.84160.2897.47301.40377.1930.038.28170.30129.63311.52381.7840.0316.97180.33129.72322.13395.3150.0324.28190.42147.94332.23399.1660.0336.91200.44168.59342.28447.0670.0537.53210.50187.13352.60447.6980.0538.72220.51187.47362.60467.6690.0644.53230.62211.56372.98515.19100.0759.03240.72268.38383.75625.56110.0861.16250.93281.28394.93665.41120.1061.78261.00297.72405.24901.25130.1481.59271.14314.47415.40934.41140.1989.22281.26344.59426.30971.5SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Hydrometeorological performanceProbability of detection with threshold 0,3 and 0,6 mm rainfall per hour:POD0,3 = 63,87 %POD0,6 = 56,22 %Miss rate with threshold 0,3 and 0,6 mm rainfall per hour:MR0,3 = 1,85 %MR0,6 = 1,58 %

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024RADAR modelOther tested models Neural networks, SMOreg, linear regression, ...Reached correlation coeficient between 0,35 and 0,42Validation - 10 Cross FoldProblems in model creation :process is significantly stochasticSome input variables are backwards dependent on outputMeteorological process is very sensitive Reflection matrix represents quantity of water in atmosphere, not exact rainfall rate in specified area, as opposed to data from synoptic stations

SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 21502436ADMIRE Tools

Registry client GUIProcess designerSKSAGateway Process ManagerDMI Model VisualizerSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Registry client GUI

Read-only access to ADMIRE Registrylist PEs and view their propertiessearch, sort PEsWrite access to Registry is done via DISPEL documentsSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Process Designer

Manage your DMI project (files, directories project structure)Edit your DMI process graphicallyView the canonical (DISPEL) representation of your DMI process in real timeSelect elements from the RegistryView the properties of your chosen elementsSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Semantic Knowledge Sharing Assistant

Context the user works inSeveral reservoirs, one settlementKnowledge that may be useful in this contextpreviously entered by other usersProvides access to existing users knowledge, sorting and selecting it automatically according to the users current working contextSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Gateway Process Manager

Keep track of running processesstop/pause/cancel the processview the process source DISPELaccess process results (if available) in several ways raw or visualizedSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024DMI Model Visualizer

Visualization of data mining modelsRead Weka classifier objectproduce PMML (Predictive Model Markup Language) description of the modelShow the PMML as a graphical treeSAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024Admire Project

Thank you for attention.SAMI 2011, Smolenice, Slovakia, January 2011...making data-mining easierADMIRE Framework 7 ICT 215024