Machine Learning and Bioinformatics 機器學習與生物資訊學

  • Upload
    matia

  • View
    48

  • Download
    3

Embed Size (px)

DESCRIPTION

Machine Learning and Bioinformatics 機器學習與生物資訊學. Enhancing instance-based classification using kernel density estimation. Outline. Kernel approximation Kernel a pproximation in O ( n ) Multi-dimensional kernel approximation Kernel d ensity e stimation - PowerPoint PPT Presentation

Citation preview

rvkde

Machine Learning and BioinformaticsMachine Learning & Bioinformatics1Molecular Biomedical Informatics2Enhancing instance-based classification usingkernel density estimationMachine Learning and BioinformaticsOutlineKernel approximationKernel approximation in O(n)Multi-dimensional kernel approximationKernel density estimationApplication in data classificationMachine Learning and Bioinformatics3Identifying class boundary

Machine Learning and Bioinformatics4Boundary identified

Machine Learning and Bioinformatics5Kernel approximation6Machine Learning and BioinformaticsKernel approximationProblem definitionMachine Learning and Bioinformatics7EquationsMachine Learning and Bioinformatics8Machine Learning and Bioinformatics9Kernel approximationAn 1-D example

Kernel approximationSpherical Gaussian functionsMachine Learning and Bioinformatics10EquationsMachine Learning and Bioinformatics11Kernel approximation in O(n)12Machine Learning and BioinformaticsKernel approximation in O(n)Machine Learning and Bioinformatics13Machine Learning and Bioinformatics14A 2-D example of uniform samplingThe O(n) kernel approximationMachine Learning and Bioinformatics15Machine Learning and Bioinformatics16The O(n) kernel approximationAn 1-D example

Machine Learning and Bioinformatics17EquationsMachine Learning and Bioinformatics18WhatsMachine Learning and Bioinformatics19the difference between equations (2) and (3)?Machine Learning and Bioinformatics20

Machine Learning and Bioinformatics21EquationsMachine Learning and Bioinformatics22The O(n) kernel approximationGeneralizationMachine Learning and Bioinformatics23EquationsMachine Learning and Bioinformatics24Machine Learning and Bioinformatics25

The smoothing effectMachine Learning and Bioinformatics26Machine Learning and Bioinformatics27An example of the smoothing effect

Multi-dimensional kernel approximation28Machine Learning and BioinformaticsMulti-dimensional kernel approximationThe basic conceptsMachine Learning and Bioinformatics29Machine Learning and Bioinformatics30Machine Learning and Bioinformatics31EquationsMachine Learning and Bioinformatics32Multi-dimensional kernel approximationGeneralizationMachine Learning and Bioinformatics33EquationsMachine Learning and Bioinformatics34Multi-dimensional kernel approximationSimplificationMachine Learning and Bioinformatics35EquationsMachine Learning and Bioinformatics36Machine Learning and Bioinformatics37Any Questions?Lack something?Machine Learning and Bioinformatics38Kernel density estimation39Machine Learning and BioinformaticsKernel density estimationProblem definitionAssume that we are given a set of samples taken from a probability distribution in a d-dimensional vector spaceThe problem definition of kernel density estimation is to find a linear combination of kernel functions that provides a good approximation of the probability density functionMachine Learning and Bioinformatics40Probability density estimationMachine Learning and Bioinformatics41The Gamma functionMachine Learning and Bioinformatics42Application in data classification43Machine Learning and BioinformaticsApplication in data classificationRecent development in data classification focuses on the support vector machines (SVM), due to accuracy concernRVKDEdelivers the same level of accuracy as the SVM and enjoys some advantagesconstructs one approximate probability density function for one class of objectsMachine Learning and Bioinformatics44Time complexityMachine Learning and Bioinformatics45Comparison of classification accuracyMachine Learning and Bioinformatics46Comparison of classification accuracy3 larger data setsMachine Learning and Bioinformatics47Data reductionAs the learning algorithm is instance-based, removal of redundant training samples will lower the complexity of the prediction phaseThe effect of a nave data reduction mechanism was studiedThe nave mechanism removes a training sample, if all of its 10 nearest samples belong to the same class as this particular sampleMachine Learning and Bioinformatics48With the KDE based statistical learning algorithm, each training sample is associated with a kernel function with typically a varying width

Machine Learning and Bioinformatics49In comparison with the decision function of SVM in which only support vectors are associated with a kernel function with a fixed width

Machine Learning and Bioinformatics50Effect of data reductionsatimagelettershuttle# of training samples in the original data set44351500043500# of training samples after data reduction is applied18157794627% of training samples remaining40.92%51.96%1.44%Classification accuracy after data reduction is applied92.15%96.18%99.32%Degradation of accuracy due to data reduction-0.15%-0.94%-0.62%Machine Learning and Bioinformatics51Effect of data reduction# of training samples after data reduction is applied# of support vectors identified by LIBSVMsatimage18151689letter77948931shuttle627287Machine Learning and Bioinformatics52Execution times (in seconds)Proposed algorithm without data reductionProposed algorithm with data reductionSVMCross validationsatimage67026564622letter28251724386814shuttle9679559.9467825Make classifiersatimage5.910.8521.66letter17.056.48282.05shuttle17450.69129.84Testsatimage21.37.411.53letter128.651.7494.91shuttle996.15.582.13Machine Learning and Bioinformatics53Todays exerciseMachine Learning & Bioinformatics54RegressionMachine Learning & Bioinformatics55Use the regression mode of RVKDE to generate transaction decisions. Upload, test and commit them in our stock system. Send TA Lin a report before 23:59 11/20 (Wed).