50
Problem setup Accuracy estimation Results Summary References Active Evaluation of Classifiers on Large Datasets Namit Katariya, Arun Iyer, Sunita Sarawagi IIT Bombay December 13, 2012 International Conference on Data Mining, 2012 Active Evaluation of Classifiers 1/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Active Evaluation of Classifiers on Large Datasets

Embed Size (px)

DESCRIPTION

My presentation of our ICDM 2012 paper: http://www.it.iitb.ac.in/~sunita/papers/icdm2012.pdf

Citation preview

Page 1: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Active Evaluation of Classifiers on LargeDatasets

Namit Katariya, Arun Iyer, Sunita Sarawagi

IIT Bombay

December 13, 2012

International Conference on Data Mining, 2012

Active Evaluation of Classifiers 1/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 2: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Problem setup

• Setup• A classifier C (x) deployed on

• A large unlabeled dataset D

• Estimate true accuracy µ of C (x) on D

• Given• A labeled set L : small or unrepresentative of D

• Measured accuracy on labeled set 6= True accuracy on data

• A human labeler

• Compelling problem in many real-life applications

Active Evaluation of Classifiers 2/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 3: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Problem setup

• Setup• A classifier C (x) deployed on

• A large unlabeled dataset D

• Estimate true accuracy µ of C (x) on D

• Given• A labeled set L : small or unrepresentative of D

• Measured accuracy on labeled set 6= True accuracy on data

• A human labeler

• Compelling problem in many real-life applications

Active Evaluation of Classifiers 2/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 4: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Problem setup

• Setup• A classifier C (x) deployed on

• A large unlabeled dataset D

• Estimate true accuracy µ of C (x) on D

• Given• A labeled set L : small or unrepresentative of D

• Measured accuracy on labeled set 6= True accuracy on data

• A human labeler

• Compelling problem in many real-life applications

Active Evaluation of Classifiers 2/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 5: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Problem setup

• Setup• A classifier C (x) deployed on

• A large unlabeled dataset D

• Estimate true accuracy µ of C (x) on D

• Given• A labeled set L : small or unrepresentative of D

• Measured accuracy on labeled set 6= True accuracy on data

• A human labeler

• Compelling problem in many real-life applications

Active Evaluation of Classifiers 2/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 6: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Our goal

1 Handle arbitrary classifier e.g. a user script• existing work assumes that C (x) is probabilistic and can output

well-calibrated Pr(y |x) values.

2 Provide interactive speed to user in loop even when D is verylarge• even a single sequential scan on D may not be practical.

• D can be accessed only via an index.

3 Require user to label as few additional instances as possible• Similar to active learning but different ...

• Active learning usually used in the context of learning classifiers

• Task of learning classifiers different from evaluating given classifier

Active Evaluation of Classifiers 3/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 7: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Our goal

1 Handle arbitrary classifier e.g. a user script• existing work assumes that C (x) is probabilistic and can output

well-calibrated Pr(y |x) values.

2 Provide interactive speed to user in loop even when D is verylarge• even a single sequential scan on D may not be practical.

• D can be accessed only via an index.

3 Require user to label as few additional instances as possible• Similar to active learning but different ...

• Active learning usually used in the context of learning classifiers

• Task of learning classifiers different from evaluating given classifier

Active Evaluation of Classifiers 3/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 8: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Our goal

1 Handle arbitrary classifier e.g. a user script• existing work assumes that C (x) is probabilistic and can output

well-calibrated Pr(y |x) values.

2 Provide interactive speed to user in loop even when D is verylarge• even a single sequential scan on D may not be practical.

• D can be accessed only via an index.

3 Require user to label as few additional instances as possible• Similar to active learning but different ...

• Active learning usually used in the context of learning classifiers

• Task of learning classifiers different from evaluating given classifier

Active Evaluation of Classifiers 3/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 9: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Our goal

1 Handle arbitrary classifier e.g. a user script• existing work assumes that C (x) is probabilistic and can output

well-calibrated Pr(y |x) values.

2 Provide interactive speed to user in loop even when D is verylarge• even a single sequential scan on D may not be practical.

• D can be accessed only via an index.

3 Require user to label as few additional instances as possible• Similar to active learning but different ...

• Active learning usually used in the context of learning classifiers

• Task of learning classifiers different from evaluating given classifier

Active Evaluation of Classifiers 3/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 10: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Outline

The problem has two aspects

1 Accuracy estimation (What this talk is about)

• Given a fixed L what is the best estimator of µ?

• How to do this scalably on large D?

2 Instance selection (Not covered in this talk, details in paper)

• Selecting instances from D to be labeled by a human and addingto L. Performed in a loop.

Active Evaluation of Classifiers 4/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 11: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Accuracy estimation

• Simple averaged estimate : µ̂R = 1n

∑i∈L ai is poor when L is

small or biased

• A classical solution : stratified estimate• Stratify L and D into B buckets (L1,D1), . . . , (LB ,DB)

• Measure accuracy µ̂b of Lb in each bucket b

• Estimate weight wb as fraction of D instances in bucket b= |Db||D|

• Stratified estimate µ̂S =∑

b wbµ̂b

Error of µ̂S << Error of µ̂R if instances within a bucket arehomogeneous

Two challenges

1 Selecting a stratification strategy that puts instances with sameerror in the same bucket

2 Finding wb scalably when D is large⇒ cannot stratify whole of D.

Active Evaluation of Classifiers 5/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 12: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Accuracy estimation

• Simple averaged estimate : µ̂R = 1n

∑i∈L ai is poor when L is

small or biased

• A classical solution : stratified estimate• Stratify L and D into B buckets (L1,D1), . . . , (LB ,DB)

• Measure accuracy µ̂b of Lb in each bucket b

• Estimate weight wb as fraction of D instances in bucket b= |Db||D|

• Stratified estimate µ̂S =∑

b wbµ̂b

Error of µ̂S << Error of µ̂R if instances within a bucket arehomogeneous

Two challenges

1 Selecting a stratification strategy that puts instances with sameerror in the same bucket

2 Finding wb scalably when D is large⇒ cannot stratify whole of D.

Active Evaluation of Classifiers 5/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 13: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Accuracy estimation

• Simple averaged estimate : µ̂R = 1n

∑i∈L ai is poor when L is

small or biased

• A classical solution : stratified estimate• Stratify L and D into B buckets (L1,D1), . . . , (LB ,DB)

• Measure accuracy µ̂b of Lb in each bucket b

• Estimate weight wb as fraction of D instances in bucket b= |Db||D|

• Stratified estimate µ̂S =∑

b wbµ̂b

Error of µ̂S << Error of µ̂R if instances within a bucket arehomogeneous

Two challenges

1 Selecting a stratification strategy that puts instances with sameerror in the same bucket

2 Finding wb scalably when D is large⇒ cannot stratify whole of D.

Active Evaluation of Classifiers 5/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 14: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Accuracy estimation

• Simple averaged estimate : µ̂R = 1n

∑i∈L ai is poor when L is

small or biased

• A classical solution : stratified estimate• Stratify L and D into B buckets (L1,D1), . . . , (LB ,DB)

• Measure accuracy µ̂b of Lb in each bucket b

• Estimate weight wb as fraction of D instances in bucket b= |Db||D|

• Stratified estimate µ̂S =∑

b wbµ̂b

Error of µ̂S << Error of µ̂R if instances within a bucket arehomogeneous

Two challenges

1 Selecting a stratification strategy that puts instances with sameerror in the same bucket

2 Finding wb scalably when D is large⇒ cannot stratify whole of D.

Active Evaluation of Classifiers 5/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 15: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Accuracy estimation

• Simple averaged estimate : µ̂R = 1n

∑i∈L ai is poor when L is

small or biased

• A classical solution : stratified estimate• Stratify L and D into B buckets (L1,D1), . . . , (LB ,DB)

• Measure accuracy µ̂b of Lb in each bucket b

• Estimate weight wb as fraction of D instances in bucket b= |Db||D|

• Stratified estimate µ̂S =∑

b wbµ̂b

Error of µ̂S << Error of µ̂R if instances within a bucket arehomogeneous

Two challenges

1 Selecting a stratification strategy that puts instances with sameerror in the same bucket

2 Finding wb scalably when D is large⇒ cannot stratify whole of D.

Active Evaluation of Classifiers 5/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 16: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Stratification strategy (h)

• Existing approach : Bin Pr(y |x) values assumed to be providedby a classifier• Bennett and Carvalho & Druck and McCallum

• Not applicable since we wish to handle arbitrary classifiers.

• Our Proposal :• F (x,C ) : A feature representation of the input x and the result of

deploying C on x

• Learn a hash function h on features using the labeled data L

• Stratification evolves as more labeled instances get added to L(fixed in existing approaches)

Active Evaluation of Classifiers 6/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 17: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Stratification strategy (h)

• Existing approach : Bin Pr(y |x) values assumed to be providedby a classifier• Bennett and Carvalho & Druck and McCallum

• Not applicable since we wish to handle arbitrary classifiers.

• Our Proposal :• F (x,C ) : A feature representation of the input x and the result of

deploying C on x

• Learn a hash function h on features using the labeled data L

• Stratification evolves as more labeled instances get added to L(fixed in existing approaches)

Active Evaluation of Classifiers 6/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 18: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Stratification strategy (h)

• Existing approach : Bin Pr(y |x) values assumed to be providedby a classifier• Bennett and Carvalho & Druck and McCallum

• Not applicable since we wish to handle arbitrary classifiers.

• Our Proposal :• F (x,C ) : A feature representation of the input x and the result of

deploying C on x

• Learn a hash function h on features using the labeled data L

• Stratification evolves as more labeled instances get added to L(fixed in existing approaches)

Active Evaluation of Classifiers 6/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 19: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Stratification strategy (h)

• Existing approach : Bin Pr(y |x) values assumed to be providedby a classifier• Bennett and Carvalho & Druck and McCallum

• Not applicable since we wish to handle arbitrary classifiers.

• Our Proposal :• F (x,C ) : A feature representation of the input x and the result of

deploying C on x

• Learn a hash function h on features using the labeled data L

• Stratification evolves as more labeled instances get added to L(fixed in existing approaches)

Active Evaluation of Classifiers 6/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 20: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesBackground

• Hash function : concatenation of r bits• bit k = sign(wk .F(x)) i.e. the side of wk on which x lies

• wk : parameters to be learnt

• Learning problem : non-smooth and non-convex

• Existing work : learning distance-preserving hash function

• Learn wk parameters sequentially & smooth the sign functionthrough various tricks

• Our problem can be expressed as special case of learningdistance-preserving hash function• Exploit 0/1 nature of accuracy values rather than optimizing a

black-box distance measure

• Allows a more efficient algorithm

Active Evaluation of Classifiers 7/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 21: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesBackground

• Hash function : concatenation of r bits• bit k = sign(wk .F(x)) i.e. the side of wk on which x lies

• wk : parameters to be learnt

• Learning problem : non-smooth and non-convex

• Existing work : learning distance-preserving hash function

• Learn wk parameters sequentially & smooth the sign functionthrough various tricks

• Our problem can be expressed as special case of learningdistance-preserving hash function• Exploit 0/1 nature of accuracy values rather than optimizing a

black-box distance measure

• Allows a more efficient algorithm

Active Evaluation of Classifiers 7/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 22: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesBackground

• Hash function : concatenation of r bits• bit k = sign(wk .F(x)) i.e. the side of wk on which x lies

• wk : parameters to be learnt

• Learning problem : non-smooth and non-convex

• Existing work : learning distance-preserving hash function

• Learn wk parameters sequentially & smooth the sign functionthrough various tricks

• Our problem can be expressed as special case of learningdistance-preserving hash function• Exploit 0/1 nature of accuracy values rather than optimizing a

black-box distance measure

• Allows a more efficient algorithm

Active Evaluation of Classifiers 7/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 23: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesBackground

• Hash function : concatenation of r bits• bit k = sign(wk .F(x)) i.e. the side of wk on which x lies

• wk : parameters to be learnt

• Learning problem : non-smooth and non-convex

• Existing work : learning distance-preserving hash function

• Learn wk parameters sequentially & smooth the sign functionthrough various tricks

• Our problem can be expressed as special case of learningdistance-preserving hash function• Exploit 0/1 nature of accuracy values rather than optimizing a

black-box distance measure

• Allows a more efficient algorithm

Active Evaluation of Classifiers 7/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 24: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesBackground

• Hash function : concatenation of r bits• bit k = sign(wk .F(x)) i.e. the side of wk on which x lies

• wk : parameters to be learnt

• Learning problem : non-smooth and non-convex

• Existing work : learning distance-preserving hash function

• Learn wk parameters sequentially & smooth the sign functionthrough various tricks

• Our problem can be expressed as special case of learningdistance-preserving hash function• Exploit 0/1 nature of accuracy values rather than optimizing a

black-box distance measure

• Allows a more efficient algorithm

Active Evaluation of Classifiers 7/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 25: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesOur approach

• Main step : find the best value of wk for a bit k , assuminghyperplanes of other bits are fixed

• For each bucket formed from remaining hyperplanes, arbitrarilychoose which side of hyperplane wk it wants to call +ve or -ve

• Use logistic loss to find the optimal wk so that in each bucket,wk correctly puts points on the “right” side• Can be solved optimally using a standard logistic classifier

• Use a EM-like iteration to refine the side chosen as positive ineach bucket until convergence

We see in our experiments that our method provides much betterresults than existing algorithms

Active Evaluation of Classifiers 8/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 26: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesOur approach

• Main step : find the best value of wk for a bit k , assuminghyperplanes of other bits are fixed

• For each bucket formed from remaining hyperplanes, arbitrarilychoose which side of hyperplane wk it wants to call +ve or -ve

• Use logistic loss to find the optimal wk so that in each bucket,wk correctly puts points on the “right” side• Can be solved optimally using a standard logistic classifier

• Use a EM-like iteration to refine the side chosen as positive ineach bucket until convergence

We see in our experiments that our method provides much betterresults than existing algorithms

Active Evaluation of Classifiers 8/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 27: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesOur approach

• Main step : find the best value of wk for a bit k , assuminghyperplanes of other bits are fixed

• For each bucket formed from remaining hyperplanes, arbitrarilychoose which side of hyperplane wk it wants to call +ve or -ve

• Use logistic loss to find the optimal wk so that in each bucket,wk correctly puts points on the “right” side• Can be solved optimally using a standard logistic classifier

• Use a EM-like iteration to refine the side chosen as positive ineach bucket until convergence

We see in our experiments that our method provides much betterresults than existing algorithms

Active Evaluation of Classifiers 8/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 28: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesOur approach

• Main step : find the best value of wk for a bit k , assuminghyperplanes of other bits are fixed

• For each bucket formed from remaining hyperplanes, arbitrarilychoose which side of hyperplane wk it wants to call +ve or -ve

• Use logistic loss to find the optimal wk so that in each bucket,wk correctly puts points on the “right” side• Can be solved optimally using a standard logistic classifier

• Use a EM-like iteration to refine the side chosen as positive ineach bucket until convergence

We see in our experiments that our method provides much betterresults than existing algorithms

Active Evaluation of Classifiers 8/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 29: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesOur approach

• Main step : find the best value of wk for a bit k , assuminghyperplanes of other bits are fixed

• For each bucket formed from remaining hyperplanes, arbitrarilychoose which side of hyperplane wk it wants to call +ve or -ve

• Use logistic loss to find the optimal wk so that in each bucket,wk correctly puts points on the “right” side• Can be solved optimally using a standard logistic classifier

• Use a EM-like iteration to refine the side chosen as positive ineach bucket until convergence

We see in our experiments that our method provides much betterresults than existing algorithms

Active Evaluation of Classifiers 8/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 30: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Learning hyperplanesOur approach

• Main step : find the best value of wk for a bit k , assuminghyperplanes of other bits are fixed

• For each bucket formed from remaining hyperplanes, arbitrarilychoose which side of hyperplane wk it wants to call +ve or -ve

• Use logistic loss to find the optimal wk so that in each bucket,wk correctly puts points on the “right” side• Can be solved optimally using a standard logistic classifier

• Use a EM-like iteration to refine the side chosen as positive ineach bucket until convergence

We see in our experiments that our method provides much betterresults than existing algorithms

Active Evaluation of Classifiers 8/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 31: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Estimating bucket weights without sequentially hashing D

• µ̂S =∑

b wbµ̂b = 1N

∑i∈D µ̂h(fi )

• Proposal sampling : µ̂Sq = 1N

(1m

∑i∈Q

µ̂h(fi )

q(i)

)• Optimal q(i) ∝ µ̂h(fi ) : impossible without assigning each i ∈ D

to a bucket of h(.)

• Allowed q(i) are the ones which assign same probability to allinstances i within an index partition Du of D

• Claim : Under above restriction, qu ∝√∑

b µ̂2bp(b|u) where

p(b|u) is the fraction of i ∈ Du with h(fi ) = b

• p(b|u) estimate : Initially depend on labeled data & small staticsample. Refine estimate as more instances get sampled from Du

Active Evaluation of Classifiers 9/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 32: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Estimating bucket weights without sequentially hashing D

• µ̂S =∑

b wbµ̂b = 1N

∑i∈D µ̂h(fi )

• Proposal sampling : µ̂Sq = 1N

(1m

∑i∈Q

µ̂h(fi )

q(i)

)• Optimal q(i) ∝ µ̂h(fi ) : impossible without assigning each i ∈ D

to a bucket of h(.)

• Allowed q(i) are the ones which assign same probability to allinstances i within an index partition Du of D

• Claim : Under above restriction, qu ∝√∑

b µ̂2bp(b|u) where

p(b|u) is the fraction of i ∈ Du with h(fi ) = b

• p(b|u) estimate : Initially depend on labeled data & small staticsample. Refine estimate as more instances get sampled from Du

Active Evaluation of Classifiers 9/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 33: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Estimating bucket weights without sequentially hashing D

• µ̂S =∑

b wbµ̂b = 1N

∑i∈D µ̂h(fi )

• Proposal sampling : µ̂Sq = 1N

(1m

∑i∈Q

µ̂h(fi )

q(i)

)• Optimal q(i) ∝ µ̂h(fi ) : impossible without assigning each i ∈ D

to a bucket of h(.)

• Allowed q(i) are the ones which assign same probability to allinstances i within an index partition Du of D

• Claim : Under above restriction, qu ∝√∑

b µ̂2bp(b|u) where

p(b|u) is the fraction of i ∈ Du with h(fi ) = b

• p(b|u) estimate : Initially depend on labeled data & small staticsample. Refine estimate as more instances get sampled from Du

Active Evaluation of Classifiers 9/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 34: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Estimating bucket weights without sequentially hashing D

• µ̂S =∑

b wbµ̂b = 1N

∑i∈D µ̂h(fi )

• Proposal sampling : µ̂Sq = 1N

(1m

∑i∈Q

µ̂h(fi )

q(i)

)• Optimal q(i) ∝ µ̂h(fi ) : impossible without assigning each i ∈ D

to a bucket of h(.)

• Allowed q(i) are the ones which assign same probability to allinstances i within an index partition Du of D

• Claim : Under above restriction, qu ∝√∑

b µ̂2bp(b|u) where

p(b|u) is the fraction of i ∈ Du with h(fi ) = b

• p(b|u) estimate : Initially depend on labeled data & small staticsample. Refine estimate as more instances get sampled from Du

Active Evaluation of Classifiers 9/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 35: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Estimating bucket weights without sequentially hashing D

• µ̂S =∑

b wbµ̂b = 1N

∑i∈D µ̂h(fi )

• Proposal sampling : µ̂Sq = 1N

(1m

∑i∈Q

µ̂h(fi )

q(i)

)• Optimal q(i) ∝ µ̂h(fi ) : impossible without assigning each i ∈ D

to a bucket of h(.)

• Allowed q(i) are the ones which assign same probability to allinstances i within an index partition Du of D

• Claim : Under above restriction, qu ∝√∑

b µ̂2bp(b|u) where

p(b|u) is the fraction of i ∈ Du with h(fi ) = b

• p(b|u) estimate : Initially depend on labeled data & small staticsample. Refine estimate as more instances get sampled from Du

Active Evaluation of Classifiers 9/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 36: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Estimating bucket weights without sequentially hashing D

• µ̂S =∑

b wbµ̂b = 1N

∑i∈D µ̂h(fi )

• Proposal sampling : µ̂Sq = 1N

(1m

∑i∈Q

µ̂h(fi )

q(i)

)• Optimal q(i) ∝ µ̂h(fi ) : impossible without assigning each i ∈ D

to a bucket of h(.)

• Allowed q(i) are the ones which assign same probability to allinstances i within an index partition Du of D

• Claim : Under above restriction, qu ∝√∑

b µ̂2bp(b|u) where

p(b|u) is the fraction of i ∈ Du with h(fi ) = b

• p(b|u) estimate : Initially depend on labeled data & small staticsample. Refine estimate as more instances get sampled from Du

Active Evaluation of Classifiers 9/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 37: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Estimating bucket weights without sequentially hashing D

• µ̂S =∑

b wbµ̂b = 1N

∑i∈D µ̂h(fi )

• Proposal sampling : µ̂Sq = 1N

(1m

∑i∈Q

µ̂h(fi )

q(i)

)• Optimal q(i) ∝ µ̂h(fi ) : impossible without assigning each i ∈ D

to a bucket of h(.)

• Allowed q(i) are the ones which assign same probability to allinstances i within an index partition Du of D

• Claim : Under above restriction, qu ∝√∑

b µ̂2bp(b|u) where

p(b|u) is the fraction of i ∈ Du with h(fi ) = b

• p(b|u) estimate : Initially depend on labeled data & small staticsample. Refine estimate as more instances get sampled from Du

Active Evaluation of Classifiers 9/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 38: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

ResultsSummary of datasets used

• TableAnnote : Annotate columns of Web tables to type nodes ofan ontology

• Spam : Classifying web-pages as spam or not

• DNA : Binary DNA classification task

• HomeGround, HomePredicted : Dataset of (entity, web-page)instances and decide if web-page was a homepage for the entity

Dataset # Size Accuracy (%)Features Seed(L) Unlabeled(D) Seed(L) True(D)

TableAnnote 42 541 11,954,983 56.4 16.5Spam 1000 5000 350,000 86.4 93.2DNA 800 100,000 50,000,000 72.2 77.9HomeGround 66 514 1060 50.4 32.8HomePredicted 66 8658 13,951,053 83.2 93.9

Active Evaluation of Classifiers 10/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 39: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

ResultsComparison of estimation strategies on the TableAnnote dataset

Random Random SamplingPropSample Sampling from proposal distribution (Sawade et al., 2010)

ScoreBins Stratified sampling with scoresActive Evaluation of Classifiers 10/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 40: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

ResultsComparison of estimation strategies on remaining datasets

4%

5%

6%

7%ScoreBins

Random

Slogistic

PropSample

Spam

0%

1%

2%

3%

4%

500 1000 1500 2000 2500

4%

5%

6%

7%

ScoreBins

Random

Slogistic

DNA

0%

1%

2%

3%

4%

35000 35500 36000 36500 37000

Slogistic

PropSample

15%

20%

25%

30%ScoreBins

Random

Slogistic

PropSample

HomeGround

0%

5%

10%

15%

500 1000 1500 2000 2500

6%

8%

10%

12%ScoreBins

Random

Slogistic

PropSample

HomePredicted

0%

2%

4%

6%

500 1500 2500 3500 4500

Figure: Absolute error (on the Y axis) of different estimation algorithmsagainst increasing number of labeled instances (on the X axis)

Active Evaluation of Classifiers 10/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 41: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

ResultsComparison of different stratification methods on the TableAnnote dataset

20%25%30%35%40%45%50%

Sq

ua

re e

rro

r

TableAnnoteNoStratify ScoreBins SeqProj_hash BRE_hash Slogistic_hash

0%5%

10%15%20%

2 5 7 2 5 7

250 541

Sq

ua

re e

rro

r

Num bits

Train Size

NoStratify Simple averaging (no stratification)ScoreBins Stratify using classifier scores

SeqProj hash Learn hyperplanes using method in Wang et al.BRE hash Learn hyperplanes using method in Kulis and Darrell

Active Evaluation of Classifiers 10/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 42: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

ResultsComparison of different stratification methods on remaining datasets

4%5%6%7%8%9%

Sq

ua

re e

rro

r

Spam

NoStratify ScoreBins SeqProj_hash BRE_hash Slogistic_hash

0%1%2%3%4%

2 5 7 2 5 7

1500 5000

Sq

ua

re e

rro

r

Num bits

Train size

3%

4%

5%

6%

7%

Sq

ua

re e

rro

r

DNA

NoStratify ScoreBins Slogistic_hash

0%

1%

2%

3%

2 5 7 2 5 7 2 5 7

40000 50000 60000

Sq

ua

re e

rro

r

Num bits

Train size

20%

25%

30%

35%

Sq

ua

re e

rro

r

HomeGround

NoStratify ScoreBins SeqProj_hash BRE_hash Slogistic_hash

0%

5%

10%

15%

20%

2 5 7 2 5 7

250 500

Sq

ua

re e

rro

r

Num bits

Train size

6%

8%

10%

12%

14%

16%

Sq

ua

re e

rro

r

HomePredicted

NoStratify ScoreBins SeqProj_hash BRE_hash Slogistic_hash

0%

2%

4%

6%

2 5 7 2 5 7 2 5 7

1000 1500 4000

Sq

ua

re e

rro

r

Num bits

Train size

Figure: Error of different stratification methods against increasing trainingsizes and for different number of bits

Active Evaluation of Classifiers 10/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 43: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

ResultsComparison of methods of sampling from indexed data for estimating bucket weights

0.0%

0.2%

0.4%

0.6%

0.8%

1.0%

1.2%

50

0

15

00

25

00

10

00

25

00

20

00

0

10

00

25

00

20

00

0

TableAnnote HomePredicted DNA

Data & sample size

Diffe

ren

ce

with

fu

ll sca

n Our

Uniform

Figure: Comparing methods of sampling from indexed data for estimatingbucket weights

Active Evaluation of Classifiers 10/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 44: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Summary

1 Addressed the challenge of calibrating a classifier’s accuracy onlarge unlabeled datasets given small amounts of labeled data anda human labeler

2 Proposed a stratified sampling-based method for accuracyestimation that provides better estimates than simple averaging &better selection of instances for labeling than random sampling

3 Between 15% and 62% relative reduction in error achievedcompared to existing approaches

4 Algorithm made scalable by proposing optimal sampling strategiesfor accessing indexed unlabeled data directly

5 Close to optimal performance while reading three orders ofmagnitude fewer instances on large datasets

Active Evaluation of Classifiers 11/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 45: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Summary

1 Addressed the challenge of calibrating a classifier’s accuracy onlarge unlabeled datasets given small amounts of labeled data anda human labeler

2 Proposed a stratified sampling-based method for accuracyestimation that provides better estimates than simple averaging &better selection of instances for labeling than random sampling

3 Between 15% and 62% relative reduction in error achievedcompared to existing approaches

4 Algorithm made scalable by proposing optimal sampling strategiesfor accessing indexed unlabeled data directly

5 Close to optimal performance while reading three orders ofmagnitude fewer instances on large datasets

Active Evaluation of Classifiers 11/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 46: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Summary

1 Addressed the challenge of calibrating a classifier’s accuracy onlarge unlabeled datasets given small amounts of labeled data anda human labeler

2 Proposed a stratified sampling-based method for accuracyestimation that provides better estimates than simple averaging &better selection of instances for labeling than random sampling

3 Between 15% and 62% relative reduction in error achievedcompared to existing approaches

4 Algorithm made scalable by proposing optimal sampling strategiesfor accessing indexed unlabeled data directly

5 Close to optimal performance while reading three orders ofmagnitude fewer instances on large datasets

Active Evaluation of Classifiers 11/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 47: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Summary

1 Addressed the challenge of calibrating a classifier’s accuracy onlarge unlabeled datasets given small amounts of labeled data anda human labeler

2 Proposed a stratified sampling-based method for accuracyestimation that provides better estimates than simple averaging &better selection of instances for labeling than random sampling

3 Between 15% and 62% relative reduction in error achievedcompared to existing approaches

4 Algorithm made scalable by proposing optimal sampling strategiesfor accessing indexed unlabeled data directly

5 Close to optimal performance while reading three orders ofmagnitude fewer instances on large datasets

Active Evaluation of Classifiers 11/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 48: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Summary

1 Addressed the challenge of calibrating a classifier’s accuracy onlarge unlabeled datasets given small amounts of labeled data anda human labeler

2 Proposed a stratified sampling-based method for accuracyestimation that provides better estimates than simple averaging &better selection of instances for labeling than random sampling

3 Between 15% and 62% relative reduction in error achievedcompared to existing approaches

4 Algorithm made scalable by proposing optimal sampling strategiesfor accessing indexed unlabeled data directly

5 Close to optimal performance while reading three orders ofmagnitude fewer instances on large datasets

Active Evaluation of Classifiers 11/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 49: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

Thank You

Active Evaluation of Classifiers 12/13 Namit Katariya, Arun Iyer, Sunita Sarawagi

Page 50: Active Evaluation of Classifiers on Large Datasets

Problem setup Accuracy estimation Results Summary References

References I

Bennett, P. N. and Carvalho, V. R. (2010). Online stratified sampling:evaluating classifiers at web-scale. In CIKM.

Druck, G. and McCallum, A. (2011). Toward interactive training andevaluation. In CIKM.

Kulis, B. and Darrell, T. (2009). Learning to hash with binaryreconstructive embeddings. In NIPS.

Sawade, C., Landwehr, N., Bickel, S., and Scheffer, T. (2010). Activerisk estimation. In ICML.

Wang, J., Kumar, S., and Chang, S. (2010). Sequential projectionlearning for hashing with compact codes. In ICML.

Active Evaluation of Classifiers 13/13 Namit Katariya, Arun Iyer, Sunita Sarawagi