View
214
Download
0
Embed Size (px)
Citation preview
Outline
• Problem description
• General approach
• ML algorithms
• Important concepts
• Assignments
• What’s next?
Two types of problems
• Classification problem
• Sequence Labeling problem
• In both cases:– A predefined set of labels: C = {c1, c2, …cn}
– Training data: { (xi, yi) }, where yi 2 C, and yi is known or unknown.
– Test data
NLP tasks
• Classification problems:– Document classification– Spam detection– Sentiment analysis– …
• Sequence labeling problems:– POS tagging– Word segmentation– Sentence segmentation– NE detection– Parsing– IGT detection – …
Step 1: Preprocessing
• Converting the NLP task to a classification or sequence labeling problem
• Creating the attribute-value table:– Define feature templates– Instantiate feature templates and select features– Decide what kind of feature values to use (e.g.,
binarizing features or not) – Converting a multi-class problem to a binary problem
(optional)
Feature selection
• Dimensionality reduction– Feature selection
• Wrapping methods• Filtering methods:
– Mutual info, 2, Information gain, ….
– Feature extraction• Term clustering: • Latent semantic indexing (LSI)
Step 2: Training and decoding
• Choose a ML learner
• Train and test on development set, with different settings of non-model parameters
• Choose the best setting for the development set
• Run the learner on the test data with the best setting
Step 3: Post-processing
• Label sequence the output we want
• System combination– Voting: majority voting, weighted voting– More sophisticated models
Main ideas
• kNN and Ricchio: finding the nearest neighbors / prototypes
• DT and DL: finding the right group
• NB, MaxEnt: calculating P(y | x)
• Bagging: Reducing the instability
• Boosting: Forming a committee
• TBL: Improving the current guess
Training
• kNN: no training• Rocchio: calculate prototypes
• DT: build a decision tree– Choose a feature and then split data
• DL: build a decision list:– Choose a decision rule and then spit data
• TBL: build a transformation list by – Choose a transformation and then update the current
label field
Training (cont)
• NB: calculate P(ci) and P(fj | ci) by simple counting.
• MaxEnt: calculate the weights of feature functions by iteration.
• Bagging: create bootstrap samples and learn base classifiers.
• Boosting: learn base classifiers and their weights.
Testing
• kNN: calculate distances between x and xi, find the closest neighbors.
• Rocchio: calculate distances between x and prototypes.
• DT: traverse the tree
• DL: find the first matched decision rule.
• TBL: apply transformations one by one.
Testing (cont)
• NB: calc
• MaxEnt: calc
• Bagging: run the base classifiers and choose the class with highest votes.
• Boosting: run the base classifiers and calc the weighted sum.
Sequence labeling problems
• With classification algorithms:– Having features that refer to previous tags– Using beam search to find good sequences
• With sequence labeling algorithms:– HMM– TBL– MEMM– CRF– …
Semi-supervised algorithms
• Self-training
• Co-training
• …
Adding some unlabeled data to the labeled data
Unsupervised algorithms
• MLE
• EM:– General algorithm: E-step, M-step– EM for PM models
• Forward-backward for HMM• Inside-outside for PCFG• IBM models for MT
Concepts
• Attribute-value table
• Feature templates vs. features
• Weights:– Feature weights– Classifier weights– Instance weights– Feature values
Concepts (cont)
• Maximum entropy vs. Maximum likelihood
• Maximize likelihood vs. minimize training error
• Training time vs. test time
• Training error vs. test error
• Greedy algorithm vs. iterative approach
Concepts (cont)
• Local optima vs. global optima
• Beam search vs. Viterbi algorithm
• Sample vs. resample
• Model parameters vs. non-model parameters
Assignments
• Read code: – NB: binary features?– DT: difference between DT and C4.5– Boosting: AdaBoost and AdaBoostM2– MaxEnt: binary features?
• Write code:– Info2Vectors– BinVectors– 2
• Complete two projects
Projects
• Steps:– Preprocessing– Training and testing– Postprocssing
• Two projects:– Project 1: Document classification– Project 2: IGT detection
Project 1: Document classification
• A typical classification problem
• Data are prepared already– Feature template: word appeared in the doc– Feature value: word frequency
Project 2: IGT detection
• Can be framed as a sequence labeling problem – Preprocessing: Define label set– Postprocessing: Tag sequence spans
• Sequence labeling problem using classification algorithm with beam search
• To use classification classifiers: – Preprocessing:
• Define features• Choose feature values• …
Project 2 (cont)
• Preprocessing:– Define label set– Define feature templates– Decide on feature values
• Training and decoding– Write beam search
• Postprocessing– Convert label sequence spans
Project 2 (cont)
• Presentation
• Final report
• A typical conference paper:– Introduction– Previous work– Methodology– Experiments– Discussion– Conclusion
Using Mallet
• Difficulties:– Java– A large package
• Benefits:– Java– A large package– Many learning algorithms: comparing the
implementation with “standard” algorithms
Course summary• 9 weeks: 18 sessions
• 2 kinds of problems• 9 supervised algorithms• 1 semi-supervised algorithm• 1 unsupervised algorithm• 4 related issues: feature selection, multiclass binary, system
combination, beam search
• 2 projects• 1 well-known package• 9 assignments, including 1 presentation and 1 final report
• N papers
What’s the next?
• Learn more about the algorithms covered in class.
• Learn new algorithms:– SVM, CRF, regression algorithms, graphical models,
…
• Try new tasks:– Parsing, spam filtering, reference resolution, …
Misc
• Hw7: due tomorrow 11pm
• Hw8: due Thursday 11pm
• Hw9: due 3/13 11pm
• Presentation: No more than 15+5 minutes
What must be included in the presentation?
• Label set
• Feature templates
• Effect of beam search
• 3+ ways to improve the system and results on dev data (test_data/)
• Best system: results on dev data and the setting
• Results on test data (more_test_data/)