Download ppt - Memories of Bug Fixes

Transcript
Page 1: Memories of Bug Fixes

Memories of Bug-Fixes

Sunghun Kim, Kai Pan, Jim Whitehead{hunkim, pankai, ejw}@cs.ucsc.edu

University of California, Santa Cruz

Page 2: Memories of Bug Fixes

What is a bug (Zeller 2006)?

• This pointer, being null, is a bug► An incorrect program state

• This software crashes; this is a bug► An incorrect program execution

• This line 11 is buggy►An incorrect program code

Page 3: Memories of Bug Fixes

Bugs?

• //null dereference• public nullDeref () {

   MyObject o = null;    if (isGoodDay) {

o = new MyObject(“Hi”);

}    

System.out.println(o.toString()); }

Page 4: Memories of Bug Fixes

Bugs?

• //null dereference• public nullDeref () {

   MyObject o = null;    if (isGoodDay) {

o = new MyObject(“Hi”);

}    

System.out.println(o.toString()); }

Page 5: Memories of Bug Fixes

Bugs?

//stack buffer overun for sizes greater than 14 stack_buffer(void* src, int size ) {     char buffer[14];    memcpy(buffer, src, size );  }

Page 6: Memories of Bug Fixes

Bugs?

//stack buffer over-run for sizes greater than 14 stack_buffer(void* src, int size ) {     char buffer[14];    memcpy(buffer, src, size );  }

Page 7: Memories of Bug Fixes

Bugs?

if (…) {

setSelectedText("\t");

}

Page 8: Memories of Bug Fixes

• There are many bug fix patterns that are specific to an individual project, and may not match one of the static patterns

• Example from jEdit project:

JEditTextArea.java at transaction 114- setSelectedText("\t"); + insertTab();

JEditTextArea.java at transaction 86 -setSelectedText("\t");+ insertTab();

Project-Specific Bug Fix Patterns

Page 9: Memories of Bug Fixes

Bug?

if (requiredProjectRsc.exists() &&

requiredProjectRsc.isOpen()) {

}

Page 10: Memories of Bug Fixes

• Example from Eclipse project:

JavaProject.java, transaction 2024 (“Fix for bug 28434”)- if (requiredProjectRsc.exists() &&- requiredProjectRsc.isOpen()) {

+ if (JavaProject.hasJavaNature(requiredProjectRsc)) {

DeltaProcessor.java, transaction 1945 (“Fix for bug 27499”)- boolean isOpened=proj.isOpen();- if (isOpened && this.hasJavaNature(proj))

+ if (JavaProject.hasJavaNature(proj))

Project-Specific Bug Fix Patterns

Page 11: Memories of Bug Fixes

Horizontal and Vertical Bug Patterns

Buffer over run

Horizontal: general bugs

Vertical: project specific

Null dereference

JEditexample

Eclipseexample

Page 12: Memories of Bug Fixes

Bug-Fix Memories – Basic Idea

Extract patterns in bug fix change history

……

Bug fix changes in revision 1 .. n-1

Memory

Page 13: Memories of Bug Fixes

Bug-Fix Memories – Basic Idea

Extract patterns in bug fix change history

……

Search for patterns against Memory

Bug fix changes in revision 1 .. n-1

Memory

Code to examine

Page 14: Memories of Bug Fixes

Talk Overview

• Detection of bug fix changes• Mining vertical bugs

► Abstracting code

• Evaluation • Conclusions• Future Work

Page 15: Memories of Bug Fixes

Retrieving Bug Fix Changes

• Software projects today record their development history using Software Configuration Management tools

• As developers make changes, they record a reason along with the change

► In the change log message• When developers fix a bug in the software, they tend to

record log messages with some variation of the words “fixed” or “bug”

► “Fixed null pointer bug”• It is possible to mine the change history of a software

project to uncover these bug-fix changes• That is, we retrospectively recover those changes that

developers have marked as containing a bug fix► We assume they are not lying

Page 16: Memories of Bug Fixes

Hunks, and Hunk PairsRevision n-1(has bug hunks)

Revision n(has fix hunks)

modification

addition

deletion

added hunk

hunk pair type

deleted hunk

empty deleted hunk

empty added hunk

Page 17: Memories of Bug Fixes

Detecting Vertical Bugs (Patterns)

• Detecting bug patterns► Saving exact code in bug and fix hunks doesn’t

work, since there is rarely an exact match.► Need a method for abstracting changes to find

patterns

• Approach► Abstract code in each bug fix change► Save abstracted bug and fix code in a database (the

“bug fix memory”)► Can search existing code to see if it matches a bug

fix pattern► Can suggest code to fix the bug

Page 18: Memories of Bug Fixes

Process for Abstracting Code

• Four step process► Raw component extraction

• Parse source code, and burst out individual syntactic elements

► Normalization• Substitute type names for variables, string literals,

constants (abstract to types)► Information filtering

• Remove elements that are too common to yield project-specific patterns

► Diff filtering• Remove code components that are common in bug and fix

hunks, yielding only code unique to the change

Page 19: Memories of Bug Fixes

Raw Component Extraction

• Step 1: Convert statements inside change hunks so they lie on a single line

► Eliminate whitespace► Concatenate multi-line statements to one line► Concatenate conditionals for complex statements (if, while,

etc.) to one line

• Step 2: Extract raw components► Component is a non-leaf node in the syntax tree of a single line► Bursts out complex statements into constituent parts

• Each portion of a complex conditional is a separate component► Additionally, separate out a method call and its parameters

Page 20: Memories of Bug Fixes

Raw Component Extraction Example

• Initial code

if (foo.flag > 5 && foo.ready()) {

i=1;

foo.create(“example”);

initiate(6,bar);

}

• Extracted Raw Componentsfoo.flag

foo.flag > 5

foo.ready()

ready()

foo.flag > 5 && foo.ready ()

if (foo.flag > 5 && foo.ready())

i=1

“example”

foo.create(.) “example”

create(.) “example”

initiate(,) 6, bar

if

>

&&.

.

foo flag

5 foo ready()

ready

Page 21: Memories of Bug Fixes

Normalization

• To further improve the ability to match code, perform abstraction of instances to types

► Replace variable instance with its type• Permits matching on type, rather than instance• foo.flag >= 5 Foo.flag >= 5 (type of foo is Foo)

► For literals, insert new component with type• i=1 yields int=1 and int=int

► For method calls, replace each parameter with type of parameter

• Use “*” for unknown types (we only do one-pass parse)• initiate(,) 6, bar initiate(,) int,* (type of bar is unknown)

Page 22: Memories of Bug Fixes

Information Filtering Goal

• After normalization, resulting components are candidates for insertion into database

► Problem: many commonly occurring statement types• int=int

► Want to eliminate these, and others that don’t contribute unique information about bug fixes

Page 23: Memories of Bug Fixes

Diff Filtering and Storing Memories

• As a final filtering step, keep only those components that are unique to either bug or fix hunks

► Duplicate components are eliminated, since they do not represent the bug or its fix

• After diff filtering step, store all components into the database (“memory”)

► Components record their transaction, file name, bug or fix hunk, etc.

► Also store initial source code of bug and fix hunks

Page 24: Memories of Bug Fixes

Searching the Memory

• The memory database contains extracted adaptive bug and fix patterns for a given project

• Can use this memory to find code that matches bug code in the memory

• Use scenario► Developer working in their favorite development

environment► Receives feedback when code they are developing

matches a stored bug pattern► Can also suggest potential fixes from stored bug fix

code

Page 25: Memories of Bug Fixes

IDE IntegrationBug

detection

Fix suggestion

Page 26: Memories of Bug Fixes

Evaluation

• We evaluated the memory to determine how well it captures new bug fix changes

► Online learning approach► Specifically, we create a memory for transactions 1 to n-1► At transaction n, for bug fix changes we examine whether the

bug hunks are found in the memory• This is a “half hit”

► If found, we also examine whether the fix hunk is found too• This is a “full hit”

► Examined same 5 project histories• ArgoUML, Columba, Eclipse, jEdit, Scarab

• This can be viewed as a proxy for how well the approach might work for bug and fix prediction

Page 27: Memories of Bug Fixes

Half and Full Hit

Build memories based on transaction 1 .. n-1

……

Transaction 1 .. n-1

MemoriesBug | Fix

Fix change caseat transaction n

Half hit Full hit

Page 28: Memories of Bug Fixes

True and False Positives

Build memories based on transaction 1 .. n-1

……

False positive half hit, if found

True positive half hit, if found

Transaction 1 .. n-1

Memories

Non-fix change case at transaction n

Fix change caseat transaction n

Page 29: Memories of Bug Fixes

True Positive Hit Rates

True Positive Hit Rate

0

5

10

15

20

25

30

35

40

45

ArgoUML Columba Eclipse jEdit Scarab

Projects

Hit

Rate

Full hit

Half hit

Page 30: Memories of Bug Fixes

False Positive Hit Rates

False Positive Hit Rate

0

5

10

15

20

25

30

35

ArgoUML Columba Eclipse jEdit Scarab

Projects

Hit

Rate

Full hit

Half hit

Page 31: Memories of Bug Fixes

True Positive and False Positive Full Hit Rates

0

2

4

6

8

10

12

14

16

18

ArgoUML Columba Eclipse jEdit Scarab

Projects

Hit

Rate

TP full hit

FP full hit

Page 32: Memories of Bug Fixes

True Positive and False Positive Full Hit Rates

• Bug fix memories work well► Captures 19.3%-40.3% of bugs (half-hits)► But, also captures a lot of non-bug changes (20.8%-

32.5%)

Page 33: Memories of Bug Fixes

PMD VS Fix Memories

• PMD is a bug finding tool based on a static syntax checker

Bug

Page 34: Memories of Bug Fixes

PMD VS Fix Memories

• PMD is a bug finding tool based on a static syntax checker

Bug

PMD

Page 35: Memories of Bug Fixes

PMD VS Fix Memories

• PMD is a bug finding tool based on a static syntax checker

Bug

PMD

Fix Memories

Page 36: Memories of Bug Fixes

PMD VS Fix Memories

• PMD is a bug finding tool based on a static syntax checker

Bug

PMD

Fix Memories

Page 37: Memories of Bug Fixes

40.3%6.5%

PMD VS Fix Memories

• PMD is a bug finding tool based on a static syntax checker

• Found bugs by PMD and Fix memories are largely exclusive

PMD

Fix Memories

3%

ArgoUML

38.7%6.5%

PMD

Fix Memories

2.3%

Eclipse

Page 38: Memories of Bug Fixes

Conclusions

• It is now possible to reliably extract bug fix memories from software project evolution data

• Bug fix memories work well► Captures 19.3%-40.3% of bugs (half-hits)► But, also captures a lot of non-bug changes (20.8%-

32.5%)

• Found bugs using fix memories and PMD are mostly exclusive

► Our approach complements other bug finding tools

Page 39: Memories of Bug Fixes

Future Work

• Developing other pattern extracting algorithms► To remove false positives► AST, Slicing, Control flow, etc.

• Comparing fix memories with more bug finding tools

► FindBugs, JLint, etc.