Semi-automatic Incompatibility Localization for Re-engineered Industrial Software

Copyright 2014 FUJITSU LABORATORIES LIMITED

Semi-automatic Incompatibility Localization for Re-engineered Industrial Software

Susumu Tokumoto†1, Kazunori Sakamoto†2, Kiyofumi Shimojo†3, Tadahiro Uehara†1, Hironori Washizaki†3

Fujitsu Laboratories Limited(†1) National Institute of Informatics(†2)Waseda University(†3)

April 1, 2014

2 Copyright 2014 FUJITSU LABORATORIES LIMITED

How to Test Compatibilities of Re-engineered Software


A legacy system evolves with many fixing and new features

Reengineering a Legacy System

Reengineering

↓ Efficiency↓ Reusability

↑ Efficiency↑ Reusability

Can the new system keep the specifications of old one?

⇒ Compatibility testing!


Basic idea: Record inputs and outputs on old, then check the outputs with corresponding inputs on new.

How to Test the Compatibility of the Reengineered System

InputInputInput

InputInputOutput

out=1 out=4 out=1 out=5

in=2in=1Incompatible

Check the outputs

Automation with Symbolic Execution


run a symbolic executor to generate test cases for a component, then use these test cases to test the new versions of this component

Compatibility Testing using KLEE

int ffs (int word) { int i=0; if (!word) return 0; for (;;) if (((1 << i++)&word) != 0) return i;}int main(){ int x, r; klee_make_symbolic(&x, sizeof(x), "x"); r = ffs(x); klee_expected(&r, sizeof(r), "r"); return 0;}

int ffs (int i) { char n = 1; if (!(i & 0xfffe)) { n += 16; i >>= 16; } if (!(i & 0xff)) { n += 8; i >>= 8; } if (!(i & 0x0f)) { n += 4; i >>= 4; } if (!(i & 0x03)) { n += 2; i >>= 2; } return (i) ? (n+((i+1) & 0x01)) : 0;}int main(){ int x, r; klee_make_symbolic(&x, sizeof(x), "x"); r = ffs(x); klee_expected(&r, sizeof(r), "r"); return 0;}

Generate

test cases

Replay

test case

s

Running test: klee-last/test000001.ktest KLEE: r is valid as expected.Running test: klee-last/test000002.ktestKLEE: ERROR: invalid expected value.(name=r, input=0x00000000, expected=0x00000001)

Original source code New source codefault

Test results

KLEE

x=0Test

casesExpected

values

x=1

r=0 r=1

compatibleincompatible


As Is The source code of the server products’ monitor is different from that of

the storage systems. However their SMTP libraries have similar features

To Be The both of SMTP libraries are unified

Overview of Re-engineering Project

Linux VxWorks

SMTP Library forServer Product

SMTP Library forStorage Systems

HW Monitor LibraryHW Monitor Library

HW Monitor Agent HW Monitor Agent

Linux VxWorks

Common SMTP Library

HW Monitor LibraryHW Monitor Library

HW Monitor Agent HW Monitor Agent

Compatibility Layer

Server ProductsStorage Systems

Storage SystemsServer ProductsAsIs ToBe

Product Specific Layer

Re-engineeri

ng

・ Written in C・ 19 KLOC

・ Written in C・ 13 KLOC


How efficient is our approach compared with a traditional testing?

Results of Automated Compatibility Testing

Traditional testing Our approach

Man-months 1.5 4

# of test cases 545 10846

# of detected bugs 27 +5

Results of traditional testing and our approach

We mostly consumed 4 man-months for attacking the path explosion problem. One of examples to reduce redundant paths is adding constraints about symbolic values.

Our approach was able to find 5 more bugs which cannot be detected in the traditional testing

The test cases finding the bugs are characterized by combination of parameters and SMTP sequence, which are the

corner cases hard to be recognized manually


Depending on the situations, sometimes developers trade some incompatibility for improved usability, performance and quality.

Issue: Investigating cause of failures

Original

• TCP port can be specified any non-negative number.

• Use blocking socket API

• Check the partial size even if partial message is disable

Re-engineered

• TCP port can be specified from 0 to 65535.

• Use non-blocking socket API and select()

• Doesn’t check the partial size if partial message is disable

Examples of allowable incompatibilities


Issue: Investigating cause of failures

Erroneous incompatibilities

• can be fixed• all failed test cases

which have the same cause are changed to success accordingly

Allowable incompatibilities

• cannot be fixed• failed test cases

which have the same cause cannot be changed to success

Failed test case

Erroneous Allowable

Same causeallowable

Same causeerroneous We should check whether each

test case is allowable without fixing.

In our case there are 4002failed test cases to be checked

Cannot bechanged to

success

1. Investigating cause of failure

2. If it is error, fix it.

Process of debugging


How to Detect Incompatibilities


int mid(int x, int y, int z){

S1 int m = z;

S2 if (y < z)

S3 if (x < y)

S4 m = y;

S5 else if (x < z)

S6 m = y;

S7 else

S8 if (x > y)

S9 m = y;

S10 else if (x > z)

S11 m = x;

S12 return m;

}

Spectrum-based Bug Localization

uses the coverage and execution results to compute the suspiciousness of each statement

𝑆𝑢𝑠𝑝 (𝑆𝑛)=

𝐹𝑎𝑖𝑙(𝑆𝑛)𝐹𝑎𝑖𝑙(𝐴𝑙𝑙)

𝐹𝑎𝑖𝑙 (𝑆𝑛)𝐹𝑎𝑖𝑙(𝐴𝑙𝑙)

+𝑃𝑎𝑠𝑠(𝑆𝑛)𝑃𝑎𝑠𝑠( 𝐴𝑙𝑙)

𝑆𝑢𝑠𝑝 (𝑆𝑛)=¿

Tarantula’s suspiciousness:

Ochiai’s suspiciousness:

Susp (𝑆6 )=

1111+ 15

3,3,5

1,2,3

3,2,1

5,5,5

5,3,4

2,1,3

Susp.

50.0%

50.0%

62.5%

0%

71.4%

83.3%

0%

0%

0%

0%

0%

50.0%

● ● ● ● ● ●

● ● ● ● ● ●

● ● ● ●

●

● ● ●

● ●

● ●

● ●

●

●

● ● ● ● ● ●

P P P P P F


Details of Applying Bug Localization

• Written by C• 13 KLOC （ 3560 Executable Statements ）• 4003 failed test cases in 10876 test cases• Statement coverage: 86.3%

• 9 causes of incompatibility are detected beforehand by combination of another methods

Target Source Code

• OCCF(Open Code Coverage Framework) by Sakamoto et al.• Easy to change suspiciousness formula(Tarantula, Ochiai, Jaccard,

Russell, etc.)

Bug Localization Tool

• Search causes from high suspicious line

Method of detecting causes of incompatibility


Histogram of Statements (Tarantula)

0.9＜

susp

1≦

0.8＜

susp

0.9≦

0.7＜

susp

0.8≦

0.6＜

susp

0.7≦

0.5＜

susp

0.6≦

0.4＜

susp

0.5≦

0.3＜

susp

0.4≦

0.2＜

susp

0.3≦

0.1＜

susp

0.2≦

0su

sp0.1

≦

≦

0

100

200

300

400

500

600

700

800

900

1000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

# of statements cumulative % of statements cumulative % of detected causes

Detect 90% causesin 10% statements


Histogram of Statements (Ochiai)

0.9＜

susp

1≦

0.8＜

susp

0.9≦

0.7＜

susp

0.8≦

0.6＜

susp

0.7≦

0.5＜

susp

0.6≦

0.4＜

susp

0.5≦

0.3＜

susp

0.4≦

0.2＜

susp

0.3≦

0.1＜

susp

0.2≦

0su

sp0.1

≦

≦

0

100

200

300

400

500

600

700

800

900

1000

1100

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

# of statements cumulative % of statements cumulative % of detected causes

Detect 44% causesin 10% statements


Comparison of Tarantula and Ochiai

In this application, Tarantula is superior to Ochiai This application contained multiple causes, therefore most Fail(Sn) is

much smaller than Fail(All) Tarantula’s suspiciousness is calculated by ratio of pass/fail test cases.

• If Pass(Sn) is 0, Susp(Sn) is 1 even though Fail(Sn) is much smaller than Fail(All).

Ochiai’s suspiciousness is calculated by non-ratio values.

• Even if Pass(Sn) is 0, Susp. is not 1 because Fail(Sn) is much smaller than Fail(All).

𝑆𝑢𝑠𝑝 (𝑆𝑛)=

𝐹𝑎𝑖𝑙(𝑆𝑛)𝐹𝑎𝑖𝑙(𝐴𝑙𝑙)

𝐹𝑎𝑖𝑙 (𝑆𝑛)𝐹𝑎𝑖𝑙(𝐴𝑙𝑙)

+𝑃𝑎𝑠𝑠(𝑆𝑛)𝑃𝑎𝑠𝑠( 𝐴𝑙𝑙)

Ratio of failed test cases in Sn Ratio of

passed test cases in Sn

𝑆𝑢𝑠𝑝 (𝑆𝑛)= 𝐹𝑎𝑖𝑙(𝑆𝑛)

√𝐹𝑎𝑖𝑙( 𝐴𝑙𝑙)×(𝐹𝑎𝑖𝑙 (𝑆𝑛)+𝑃𝑎𝑠𝑠 (𝑆𝑛))

If Fail(Sn) << Fail(All), susp. becomes small


Why are there row Susp.s ?

ID File name line Susp.1 dir_c/src01.c 524-535 0.9782 dir_c/src01.c 443-445 0.5273 dir_c/src01.c 507-513 1.0004 dir_r/src07.c 292-296 0.5385 dir_r/src03.c 312 0.6866 dir_r/src06.c 216 0.4777 dir_r/src06.c 197 0.9628 dir_r/src07.c 216-234 1.0009 dir_r/src10.c 266 1.000 switch (isPart) {

case 0: /* no partial size check */ break;case 1: if (! ((Partial_size == 0) || (Partial_size >= PART_SIZE_MIN && Partial_size <= PART_SIZE_MAX))) { /* patial size error */ return -1; } break;default: /* invalid isPart value */ return -1;}

Due to missing code, suspicious lines of code is

not found

if( Partial_size >= PART_SIZE_MIN && Partial_size <= PART_SIZE_MAX ){ :}else{ return -1;}

Before Reengineered

After Reengineered

Due to missing code,

suspicious lines of code is not

found

List of incompatible points


Discussion

Does symbolic execution really help compatibility testing? We obtained five more bugs but spent 4 MM The most important effect of automated compatibility testing is detecting

edge case bugs, which would cost more by manual testing The original project didn't have automated test suite. Therefore the

generated test cases become their testing asset.

Is spectrum-based bug localization technique appropriate for finding incompatibilities? In our case localized 10% of code equals about 200 executable

statements, that is realistic amount to inspect by manual. Trying other techniques, such as model-based debugging, is candidate

of future work


Conclusion

We present compatibility testing

using symbolic execution and show

how to apply it to the re-engineering project

5 more bugs was detected by our testing

method

We applied the bug localization to search

the cause of incompatibility in

system reengineering.

90% of incompatibilities are localized in 10% of

code by Tarantula.