29
指指指指 指指指 指指 指指 指指指 Using data mining methods to identify college freshmen who need special assistance in their academic performance.

指導教授:呂學毅 教授 學生:陳彥璋 Using data mining methods to identify college freshmen who need special assistance in their academic performance

Embed Size (px)

Citation preview

  • Slide 1

Using data mining methods to identify college freshmen who need special assistance in their academic performance. Slide 2 Introductionreview literaturemethod background collegesInstructiongrade Slide 3 studentsfactors grade Introductionreview literaturemethod background (cont) Slide 4 Introductionreview literaturemethod background (cont) Lower grade Slide 5 Introductionreview literaturemethod background (cont) College freshmen Lower grade Have more adapting problems than other higher grades. will affect the physical and mental health. will affect grade in future several semesters. Slide 6 Introductionreview literaturemethod background (cont) Taking Taiwan's national Yunlin university of science and technology for example From academic year 94 building . Building the achievements warning policy for students' academic achievement. Learning Career Planning Emotion & life problems Guidance in collage For low academic achievement Slide 7 Introductionreview literaturemethod motivation (cont) Out of the test scores Final exam List of need special assistance students This study The freshmen entering time The general practice Out of the test scoresFinal exam time List of need special assistance students Slide 8 Introductionreview literaturemethod motivation grade Family, Intelligence, Sex, Emotion, Personality, Learning Motivation, learning engagement, Slide 9 Introductionreview literaturemethod objective The aim of this study is to construct a model with data mining tools in predicting college freshmen of low academic achievement. Finding students who need special assistance in their academic performance, and help students with improving their academic performance through guidance as earlier as possible. Slide 10 Introductionreview literaturemethod The negative effects of low academic achievement. Slide 11 Introductionreview literaturemethod The problems of college freshmen with low academic achievement. authoryearfinding 2007 As the college environment is more complex than high school, the freshmen who attend the new environment will encounter a lot of adapting problems. 2007 The college freshmen who have poor academic achievements may be affected in future several semesters. 1999 The college freshmen who attend the new environment will encounter more adapting problems than other higher grades. Slide 12 Introductionreview literaturemethod The relationship between personality (emotion) and grade. authoryearfinding McIlroy & Bunting2002 The students have good personality and behavior will contribute to their academic performance. Busato, Prins, Elshout, Hamaker 2000 Intelligence, personality, motivation and academic achievement of the students have a positive correlation. Yeh et al.2007 The students have a anxiety or depression whoes academic achievements will be affected Parker et al.2004 Emotional and academic achievement of the students have a correlation. Slide 13 Introductionreview literaturemethod Forecasting model construct process Slide 14 Introductionreview literaturemethod Coding Data attributes of primary data 2 4 0 20 - 0 20 - 0 20 - 0 20 - 0 20 - 0 20 - 0 20 - 0 20 - 0 20 - 0 20 - 0 20 - 0 20 - 4 2 Slide 15 Introductionreview literaturemethod Feature selection Some attributes are noisy or redundant.This noise makes it more difficult to discover meaningful patterns from the data. Dash, 1997 Sequential Backward Selection Using Shannons Entropy as identification rule to find out attributes that have more explanatory capability. Sequential Backward Selection T = Original Variable Set For k = 1 to M 1 {/* Iteratively remove variables one at a time */ For every variable v in T {/* Determine which variable to be removed */ Tv = T v Calculate E Tv on D using eqn. 1} Let vk be the variable that minimizes E Tv T = T vk /* Remove vk as the least important variable */ Output vk } Slide 16 Introductionreview literaturemethod Feature selection (cont) Slide 17 Introductionreview literaturemethod Data mining Slide 18 Introductionreview literaturemethod Data mining K-Fold Cross-vaildation K-Fold is mainly used in settings where the goal is prediction. To estimate how accurately a predictive model will perform in practice. One round of cross-validation involves partitioning a sample of data into complementary subsets. Performing the analysis on one subset (called the training set). Vthe analysis on the other subset (called the testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. Slide 19 Introductionreview literaturemethod Data mining C4.5 Decision Trees C4.5 is an extension of Quinlan's earlier ID3 algorithm 1993 . The decision trees generated by C4.5 can be used for classification. internal node (attribute) branches leaf node (class) Slide 20 Introductionreview literaturemethod Data mining C4.5 Decision Trees(cont) Slide 21 Introductionreview literaturemethod Data mining Nave Bayes Classifier Slide 22 Introductionreview literaturemethod Data mining MLP Artificial neural network A ANN model where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons or their connectivity. neurons weight adder active function y Slide 23 Introductionreview literaturemethod Data mining MLP Artificial neural network (cont) A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate output. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Input Layer hidden layer Output Layer Slide 24 Introductionreview literaturemethod Data mining MLP Artificial neural network (cont) Slide 25 Introductionreview literaturemethod model evaluation Confusion Matrix predicted class TrueFalse actual class PositveTPFN NegativeFPTN Accuracy = Sensitivity (true positive rate) = Specificity = false positive rate = Slide 26 Introductionreview literaturemethod model evaluation Receiver Operating Characteristic Slide 27 Expected result This study is expect to construct a forecasting model through collage freshmens data. The forecasting model is using data mining methods to constructed that be select from three classifier (C4.5 decision trees, Nave Bayes classifier, MLP artificial neural network). Using the forecasting model can identify college freshmen who need special assistance in their academic performance. And the collages can use model to help students with improving their academic performance through guidance as earlier as possible. Slide 28 Gantt Chart Slide 29 Q & A