Patrick Lam Lectures

7/18/2019 Patrick Lam Lectures

1/69

Software Testing, Quality Assurance and Maintenance Winter 2015

Lecture 2 January 7, 2015Patrick Lam version 1

Faults, Errors, and Failures

For this course, we are going to dene the following terminology.

Fault (also known as a bug): A static defect in softwareincorrect lines of code.

Error : An incorrect internal statenot necessarily observed yet.

Failure : External, incorrect behaviour with respect to the expected behaviourmust bevisible (e.g. EPIC FAIL).

These terms are not used consistently in the literature. Dont get stuck on memorizing them.

Motivating Example. Heres a train-tracks analogy.

(all railroad pictures inspired by: Bernd Bruegge & Allen H. Dutoit, Object Oriented Software Engineering: Using UML, Patterns and Java .)

Is it a failure? An error? A fault? Clearly, its not right, But no failure has occurred yet; thereis no behaviour. Id also say that nothing analogous to execution has occurred yet either. If therewas a train on the tracks, pre-derailment, then there would be an error. That picture most closelycorresponds to a fault.

1


2/69

Perhaps it was caused by mechanical stresses.

Or maybe it was caused by poor design.

Software-related Example. Lets get back to software and consider the code from last time:p u bl i c s t at i c n u mZ e ro ( i nt [ ] x ) {

/ / e f fe c ts : i f x i s n ul l , t h ro w N u l lP o in t e rE x ce p t io n // o therwise , re tu rn number of occurrences of 0 in x .

i nt c ou nt = 0 ;fo r ( i nt i = 1; i < x . le ng th ; i + +) {

/ / p r og r am p o in t ( *)i f ( x [ i ] = = 0 ) c ou n t ++ ;

}re turn co u nt ;

}

As we saw, it has a fault (independent of whether it is executed or not): its supposed to return thenumber of 0s, but it doesnt always do so. We dene the state for this method to be the variablesx, i , count , and the Program Counter (PC).

Feeding this numZero the input {2, 7, 0 } shows a wrong state.

The wrong state is as follows: x = {2, 7, 0 }, i = 1, count = 0, PC = (*) , on the rst timearound the loop.

The expected state is: x = {2, 7, 0 }, i = 0, count = 0, PC = (*)

However, running numZero on {2, 7, 0 } executes the fault and causes a (transient) error state,but doesnt result in a failure, as the output value count is 1 as expected.

On the other hand, running numZero on {0, 2, 7 } causes an error state with count = 0 on return,hence leading to a failure.

2


3/69

RIP Fault Model

To get from a fault to a failure:

1. Fault must be reachable ;

2. Program state subsequent to reaching fault must be incorrect: infection ; and

3. Infected state must propagate to output to cause a visible failure.

Applications of the RIP model: automatic generation of test data, mutation testing.

Dealing with Faults, Errors and Failures

Three strategies for dealing with faults are avoidance, detection and tolerance. Or, you can justtry to declare that the fault is not a bug, if the specication is ambiguous.

Fault Avoidance. Certain faults can just be avoided by not programming in vulnerable lan-guages; buffer overows, for instance, are impossible in Java. Better system design can also helpavoid faults, for instance by making an error state unreachable.

Fault Detection. Testing (construed broadly) is the primary means of fault detection. Softwareverication also qualies. Once you have detected a fault, if it is economically viable, you mightrepair it:

Fault Tolerance. You are never going to remove all of the bugs, and some errors arise fromconditions beyond your control (such as hardware faults). Its worthwhile to tolerate faults too.Strategies include redundancy and isolation. An example of redundancy is provisioning extrahardware in case a server goes down. Isolation includes things as simple as checking preconditions.

3


4/69

Testing vs Debugging

Recall from last time:

Testing : evaluating software by observing its execution.Debugging : nding (and xing) a fault given a failure.

I said that you really need to automate your tests. But even so, testing is still hard: only certaininputs expose the fault in the form of a failure. As youve experienced, debugging is hard too: youhave the failure, but you have to nd the fault.

Contrived example. Consider the following code:if ( x - 100


5/69



(We saw the numZero example today. That example is in the L02 lecture notes.)

Exercise: ndLast. Heres a faulty program from last time, again.

1 s ta ti c p ub li c in t fi nd La st ( i nt [ ] x , in t y ) {2 / * b u g : l o o p t e s t s h o u l d b e i > = 0 * / 3 for ( int i = x .length - 1; i > 0; i --) {4 if ( x[ i] == y ) {5 r e tu r n i ;6 }

7 }8 r e turn -1;9 }

On the assignment, you will have a question like this:

(a) Identify the fault, and x it.

(b) If possible, identify a test case that does not execute the fault.

(c) If possible, identify a test case that executes the fault, but does not result in an error state.

(d) If possible, identify a test case that results in an error, but not a failure. (Hint: PC)

(e) For the given test case, identify the rst error state. Be sure to describe the complete state.

I asked you to work on that question in class. (I noticed that most people were on-topic, thanks!)Here are some answers:

(a) The loop condition must include index 0: i >= 0 .

(b) This is a bit of a trick question. To avoid the loop condition, you must not enter the loop body.The only way to do that is to pass in x = null . You should also state, for instance, y = 3.The expected output is a NullPointerException , which is also the actual output.

(c) Inputs where y appears in the second or later position execute the fault but do not result inthe error state; nor do inputs where x is an empty array. (There may be other inputs as well.)So, a concrete input: x = {2, 3 , 5 }; y = 3 . Expected output = actual output = 1.

(d) One error input that does not lead to a failure is where y does not occur in x. That results inan incorrect PC after the nal executed iteration of the loop.

1


6/69

(e) After running line 6 with i = 1 , the decrement occurs, followed by the evaluation of i > 0 ,causing the PC to exit the loop (statement 8) instead of returning to statement 4. The faultystate is x = {2, 3 , 5 }; y = 3; i = 0; PC = 8 , while correct state would be PC = 4.

Someone asked about distinguishing errors from failures. These questions were about failures at

the method level, and so a wrong return value would be a failure when were asking about methods.In the context of a bigger program, that return value might not be visible to the user, and so itmight not constitute a failure, just an error.

Line Intersections

We then talked about different ways of validating a test suite. Consider the following Python code,found by Michael Thiessen on stackoverow ( http://stackoverflow.com/questions/306316/determine-if-two-rectangles-overlap-each-other ).

1 c l a s s L in eSeg men t :2 d ef _ _ in i t_ _ ( s el f , x 1 , x 2 ) :3 s el f. x1 = x1 ; s el f. x2 = x2 ;45 d ef i n te r se c t ( a , b ) :6 r et ur n (a . x1 < b. x2 ) & ( a. x2 > b. x1 ) ;

We could construct test suites that:

execute every statement in intersect (node coverage, well see later). Well, thats not veryuseful; any old test case will do that. There are no branches, so what well call edge coveragedoesnt help either.

feed random inputs to intersect ; unfortunately, interesting behaviours are not random, soit wont help much in general.

check all outputs of intersect (i.e. a test case with lines that intersect and one with linesthat dont intersect): were getting somewherethat will certify that the method works insome cases, but its easy to think of situations that we missed.

check different values of clauses a.x1 < b.x2 and a.x2 > b.x1 (logic coverage)better thanthe above coverage criteria, but still misses interesting behaviours;

analyze possible inputs and cover all interesting combinations of inputs (input space coverage)

can create an exhaustive test suite, if your analysis is sound.

Lets try to prove correctness of intersect . Theres an old saying about testingsupposedly, itcan only nd the presence of bugs, not their absence. This is not completely true, especially if youhave reliable software that automatically constructs an exhaustive test suite. But that is beyondthe state of the practice in 2015, for the most part.

[Below is a cleaner presentation than the one in class. It also has the advantage of being not wrong.]

2


7/69

Inputs to intersect . There are essentially four inputs to this function. Rename them aAbB ,for a.x1 , a.x2 , etc.

Lets rst assume that all points are distinct. We should make a note to ourselves to checkviolations of this, as well: we may have a = A, a = b, a = B , and symmetrically for B :

b = a, b = A, b = B . For the purpose of the analysis, lets assume that a < b ; when constructing testcases, we can

swap a and b around. Thats why there are duplicate assert statements below.

Without loss of generality, we can assume that a < A and b < B . (We ought to update theconstructor if we want to make that assumption.)

With these assumptions, we have to test the three possible permutations aAbB , abAB , and abBa .It is simple to construct test cases for these permutations, using Pythons unittest framework:

1 d e f t e s t_aAb B(se l f ) :2 a = L i ne S eg m en t ( 0 , 2)3 b = L i ne S eg m en t ( 3 , 7)4 s e l f . a s se r tFa l se ( in t e r sec t ( a , b ) )5 s e l f . a s se r tFa l se ( in t e r sec t (b , a ) )67 d e f t e s t_ab AB(se l f ) :8 a = L i ne S eg m en t ( 0 , 4)9 b = L i ne S eg m en t ( 3 , 7)

10 s e l f . a s se r tTru e( in t e r sec t ( a , b ) )11 s e l f . a s se r tTru e( in t e r sec t (b , a ) )1213 d e f t e s t_ab BA(se l f ) :14 a = L i ne S eg m en t ( 0 , 4)15 b = L i ne S eg m en t ( 1 , 2)16 s e l f . a s se r tTru e( in t e r sec t ( a , b ) )17 s e l f . a s se r tTru e( in t e r sec t (b , a ) )

Those test cases pass. However, if you construct test cases for equality (as Ive committed to therepository), you see that the given intersect function fails on line segments that intersect only ata point. Replacing < with with >= xes the code.

3


8/69



About Testing

We can look at testing statically or dynamically.

Static Testing (ahead-of-time): this includes static analysis, which is typically automated and runsat compile time (or, say, nightly), as well human-driven static testingwalk-throughs (informal)and code inspection (formal).

Dynamic Testing (at run-time): observe program behaviour by executing it; includes black-boxtesting and white-box testing.

Usually the word testing means dynamic testing .

Naughty words. People like to talk about complete testing, exhaustive testing, and fullcoverage. However, for many systems, the number of potential inputs is innite. Its thereforeimpossible to completely test a nontrivial system, i.e. run it on all possible inputs. There are bothpractical limitations (time and cost) and theoretical limitations (i.e. the halting problem).

In the absence of complete testing, we will dene testing criteria and evaluate test suites with them.

Test cases

Informally, a test case contains:

what you feed to software; and

what the software should output in response.

Here are two denitions to help evaluate how hard it might be to create test cases.

Denition 1 Observability is how easy it is to observe the systems behaviour, e.g. its outputs,effects on the environment, hardware and software.

Denition 2 Controlability is how easy it is to provide the system with needed inputs and to get the system into the right state.

1


9/69

Anatomy of a Test Case

Consider testing a cellphone from the off state:

on 1 519 888 4567 talk end prex values tes t case values vericat ion values exit codes

postx values

Denition 3

Test Case Values : input values necessary to complete some execution of the software.(often called the test case itself)

Expected Results : result to be produced iff program satises intended behaviour on a test case.

Prex Values : inputs to prepare software for test case values.

Postx Values : inputs for software after test case values; verication values : inputs to show results of test case values; exit commands : inputs to terminate program or to return it to initial state.

Denition 4

Test Case : test case values, expected results, prex values, and postx values necessary toevaluate software under test.

Test Set : set of test cases.

Executable Test Script : test case prepared in a form to be executable automatically and which generates a report.

On Coverage

Ideally, wed run the program on the whole input space and nd bugs. Unfortunately, such a planis usually infeasible: there are too many potential inputs.

Key Idea: Coverage. Find a reduced space and cover that space.

We hope that covering the reduced space is going to be more exhaustive than arbitrarily creatingtest cases. It at least tells us when we can plausibly stop testing.

The following denition helps us evaluate coverage.

Denition 5 A test requirement is a specic element of a (software) artifact that a test case must satisfy or cover.

2


10/69

We write TR for a set of test requirements; a test set may cover a set of TRs.

For instance, consider three ice cream cone avours: vanilla, chocolate and mint. A possible testrequirement would be to test one chocolate cone. (Volunteers?)

Two software examples:

cover all decisions in a program (branch coverage); each decision gives two test requirements:branch is true; branch is false.

each method must be called at least once; each method gives one test requirement.

Denition 6 A coverage criterion is a rule or collection of rules that impose test requirements on a test set.

A test set may or may not satisfy a coverage criterion. The coverage criterion gives a recipe forgenerating TRs systemically.

Returning to the ice cream example, a avour criterion might be cover all avours, and thatwould generate three TRs: {avour: chocolate, avour: vanilla, avour: mint }.

We can test an ice cream stand by running two test sets on it, for instance: test set 1 includes 3chocolate cones and 1 vanilla cone, while test set 2 includes 1 chocolate cone, 1 vanilla cone, and 1mint cone.

Denition 7 (Coverage). Given a set of test requirements TR for a coverage criterion C , a test set T satises C iff for every test requirement tr TR, at least one t T satises TR.

Infeasible Test Requirements. Sometimes, no test case will satisfy a test requirement. Forinstance, dead code can make statement coverage infeasible, e.g.:

if (false)unreachableCall();

or, a real example from the Linux kernel:

while (0){local_irq_disable();}

Hence, a criterion which says test every statement is going to be infeasible for many programs.

Quantifying Coverage. How good is a test set? Its great if it covers everything, but sometimesthats impossible. We can instead assign a number.

Denition 8 (Coverage Level). Given a set of test requirements TR and a test set T , the coverage level is the ratio of the number of test requirements satised by T to the size of TR.

Returning to our example, say TR = {avour: chocolate, avour: vanilla, avour: mint }, and testset T1 contains {3 chocolate, 1 vanilla }, then the coverage level is 2/3 or about 67%.

3


11/69

Subsumption

Sometimes one coverage criterion is strictly more powerful than another one: any test set thatsatises C 1 might automatically satisfy C 2 .

Denition 9 Criteria subsumption: coverage criterion C 1 subsumes C 2 iff every test set that satises C 1 also satises C 2 .

Software example: branch coverage (Edge Coverage) subsumes statement coverage (Node Cov-erage). Which is stronger?

Evaluating coverage criteria. Subsumption is a rough guide for comparing criteria, but itshard to use in practice. Consider also:

1. difficulty of generating test requirements;

2. difficulty of generating tests;

3. how well tests reveal faults.

4


12/69



Graph Coverage

We will discuss graph coverage in great detail. Many forms of software testing reduce to graphcoverage, so once you understand how graph coverage works, you will have a good understandingof a key software testing topic.

Denition of Graphs. (You should already be familiar with this material, but lets establishthe terms well use in this class). Lets start with an example graph.

C

B

A

D

E

N : Set of nodes {A,B,C,D,E }N 0 : Set of initial nodes {A}N f : Set of nal nodes {E }

E N N : Edges, e.g. (A, B ) and ( C, E );C is the predecessor and

E is the successor in (C, E ).

Subgraph: Let G be a subgraph of G; then the nodes of G must be a subset N sub of N . Thenthe initial nodes of G are N 0 N sub and its nal nodes are N f N sub . The edges of G areE (N sub N sub ).

For example, consider the case where we set N sub = {A,B,E }. This induces the subgraph:

A B E

Note that graphs need not be connected.

Paths. The most important thing about a graph, for testing purposes, is the path through thegraph. Here is a graph G# and some example paths through the graph.

1


13/69

1 2 3

4

5 6 7

path 1: [2, 3, 5], with length 2.

path 2: [1, 2, 3, 5, 6, 2], with length 5.

not a path: [1 , 2, 5].

We say that path 1 is from node 2 to node 5. We can also say that path 1 is from edge (2, 3) to(3, 5).

Denition 1 A path is a sequence of nodes from a graph G whose adjacent pairs all belong to the set of edges E of G.

Note that length 0 paths are still paths.

Denition 2 A subpath is a subsequence of a path.

This textbook denition is ambiguous: is [1 , 2, 5] a subpath of [1, 2, 3, 5, 6, 2]?

Denition 3 A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.

Yes, [1, 2, 5] is a subsequence of [1, 2, 3, 5, 6, 2]. However, [1, 2, 5] is not a path, so we are going tosay that it is not a subpath.

Test cases and test paths.

Some paths are also test paths. Here is a test path from G# :

[1, 2, 3, 5, 6, 7];

here is another one:[1, 2, 3, 5, 6, 2, 3, 5, 6, 7].

You can easily come up with more paths. Now, test paths are linked to test cases. First, lets denethe notion of a test path.

2


14/69

Denition 4 A test path is a path p (possibly of length 0) that starts at some node in N 0 and ends at some node in N f .

Running the test case on the program or method yields one or more test paths. A test path mayrepresent many test cases (for instance, if a program takes the same branches on all of those test

cases); or a test path may represent no test cases (if it is infeasible).

Paths and semantics. When a graph is a programs control-ow graph, some of the paths inthe graph may not correspond to program semantics. Consider the following graph.

:if (false)

Clearly will never execute.

In this course, we will generally only talk about the syntax of a graphits nodes and edgesandnot its semantics .

However, in the following denition, well talk about both notions.

Denition 5 A node n (or edge e) is syntactically reachable from n i if there exists a path from n i to n (or e). A node n (or edge e) is semantically reachable if one of the paths from n i to n can

be reached on some input.

Standard graph algorithms, like breadth-rst search and depth-rst search, can compute syntacticreachability. (Semantic reachability is undecidable; no algorithm can precisely compute semanticreachability for all programs.)

We dene reach G ( ) as the portion of the graph syntactically reachable from . ( might be anode, an edge, a set of nodes, or a set of edges.) For example:

reach G (N 0 ) is the set of nodes and edges reachable from the initial node(s);

reach G # (2) in the above graph G # is {2, 3, 4, 5, 6, 7};

reach G # (7) is {7}.

Note that we include in the set of nodes reachable from , because paths of length 0 are paths.

When we talk about the nodes or edges in a graph G in a coverage criterion, well generally meanreach G (N 0 ); the unreachable nodes tend to (1) be uninteresting; and (2) to frustrate coveragecriteria.

3


15/69



Some binary distinctions

Lets digress for a bit and dene some older terms which we wont use much in this course, butwhich we should discuss briey.

Black-box testing. Deriving tests from external descriptions of software: specications, re-quirements, designs; anything but the code.

White-box testing. Deriving tests from the source code, e.g. branches, conditions, statements.

Our model-based approach makes this distinction less important.

Test paths and cases

We resume our graph coverage content with the following denition:

Denition 1 A graph is single-entry/single-exit (SESE) if N 0 and N f have exactly one element each. N f must be reachable from every node in N , and no node in N \ N f may be reachable from N f , unless N 0 = N f .

The graphs that well be talking about in this course will almost always be SESE.

Heres another example of a graph, which happens to be SESE, and test paths in that graph. Wellcall this graph D , for double-diamond, and itll come up a few times.

n 0

n 1

n 2

n 3

n 4

n 5

n 6

1


16/69

Here are the four test paths in D :

[n 0 , n 1 , n 3 , n 4 , n 6 ][n 0 , n 1 , n 3 , n 5 , n 6 ][n 0 , n 2 , n 3 , n 4 , n 6 ]

[n 0 , n 2 , n 3 , n 5 , n 6 ]

We next focus on the path p = [n 0 , n 1 , n 3 , n 4 , n 6 ] and use it to explain several path-related deni-tions. We can say that p visits node n 3 and edge (n 0 , n 1 ); we can write n 3 p and (n 0 , n 1 ) prespectively.

Let p = [n 1 , n 3 , n 4 ]. Then p is a subpath of p.

Test cases and test paths. We connect test cases and test paths with a mapping path G fromtest cases to test paths; e.g. path G (t ) is the set of test paths corresponding to test case t .

usually we just write path since G is obvious from the context.

we can lift the denition of path to test sets T by dening path( T ) = {path( t ) |t T }.

each test case gives at least one test path. If the software is deterministic, then each test casegives exactly one test path; otherwise, multiple test cases may arise from one test path.

Heres an example of deterministic and nondeterministic control-ow graphs:

if (x > 3)

x = 5x = 0 if (x.hashCode() > 3)

Causes of nondeterminism include dependence on inputs; on the thread scheduler; and on memoryaddresses, for instance as seen in calls to the default Java hashCode() implementation.

Nondeterminism makes it hard to check test case output, since more than one output might be avalid result of a single test input.

Indirection. Note that we will describe coverage criteria with respect to test paths , but we alwaysrun test cases .

2


17/69

Example. Here is a short method, the associated control-ow graph, and some test cases andtest paths.

i n t f o o ( i n t x ) {i f ( x < 5) {

x ++;} e l s e {

x ;}r e t u r n x ;

}

(1) if (x < 5)

(2) x++(3) x--

(4) return x

TF

Test case: x = 5; test path: [(1) , (3) , (4)].

Test case: x = 2; test path: [(1) , (2) , (4)].

Note that (1) we can deduce properties of the test case from the test path; and (2) in this example,since our method is deterministic, the test case determines the test path.

Graph Coverage

Having dened all of the graph notions well need for now, we apply them to graphs. Recall ourprevious denition of coverage:

Denition 2 Given a set of test requirements TR for a coverage criterion C , a test set T satisesC iff for every test requirement tr in TR , at least one t in T exists such that t satises tr .

We apply this denition to graph coverage:

Denition 3 Given a set of test requirements TR for a graph criterion C , a test set T satises C on graph G iff for every test requirement tr in TR , at least one test path p in path (T ) exists such that p satises tr .

Well use this notion to dene a number of standard testing coverage criteria. (At this point, thetextbook denes predicates, but mostly ignores them afterwards. Ill just ignore them right away.)

Recall the double-diamond graph D which we saw on page 1. For the node coverage criterion, weget the following test requirements:

{n 0 , n 1 , n 2 , n 3 , n 4 , n 5 , n 6 }

That is, any test set T which satises node coverage on D must include test cases t; the cases tgive rise to test paths path( t ), and some path must include each node from n 0 to n 6 . (No singlepath must include all of these nodes; the requirement applies to the set of test paths.)

Lets formally dene node coverage.

3


18/69

Denition 4 Node coverage: For each node n reach G (N 0 ), TR contains a requirement to visit node n .

We will state all of the coverage criteria in the following form:

Criterion 1 Node Coverage (NC): TR contains each reachable node in

G.

We can then writeTR = {n 0 , n 1 , n 2 , n 3 , n 4 , n 5 , n 6 }.

Lets consider an example of a test set which satises node coverage on D , the double-diamondgraph from last time.

Start with a test case t1 ; assume that executing t 1 gives the test path

path( t 1 ) = p1 = [n 0 , n 1 , n 3 , n 4 , n 6 ].

Then test set {t 1 } does not give node coverage on D , because no test case covers node n 2 or n 5 . If

we can nd a test case t2 with test pathpath( t 2 ) = p2 = [n 0 , n 2 , n 3 , n 5 , n 6 ],

then the test set T = {t 1 , t 2 } satises node coverage on D .

What is another test set which satises node coverage on D ?

Here is a more verbose denition of node coverage.

Denition 5 Test set T satises node coverage on graph G if and only if for every syntactically reachable node n N , there is some path p in path (T ) such that p visits n .

A second standard criterion is that of edge coverage.

Criterion 2 Edge Coverage (EC). TR contains each reachable path of length up to 1, inclusive,in G.

We describe edge coverage this way so that, as far as possible, new criteria in a series will subsumeprevious criteria.

Here are some examples of paths of length 1:

Note that since were not talking about test paths , these reachable paths need not start in N 0 .

In general, paths of length 1 consist of nodes and edges. (Why not just say edges?)

Saying edges on the above graph would not be the same as saying paths of length 1.

4


19/69

Here is a more involved example:

n 0

n 1

n 2

x < y

x y

Lets dene

path( t 1 ) = [n 0 , n 1 , n 2 ]path( t 2 ) = [n 0 , n 2 ]

Then

T 1 = satises node coverageT 2 = satises edge coverage

Going beyond 1. So far weve seen length 0 (node coverage) and length 1. Of course, wecan go to lengths 2, etc., but we quickly get diminishing returns. Here is the criterion for length 2.

Criterion 3 Edge-Pair Coverage. (EPC) TR contains each reachable path of length up to 2,inclusive, in G.

Heres an example.

1 2 3

4

5 6

nodes:

edges:

paths of length 2:

5


20/69



Further properties of paths

Weve seen so far length-0 (NC), length-1 (EC) and length-2 (EPC) paths. We could keep ongoing, but that gets us to Complete Path Coverage (CPC), which requires an innite number of test requirements.

Criterion 1 Complete Path Coverage . (CPC) TR contains all paths in G .

Note that CPC is impossible to achieve for graphs with loops.

Instead of going to CPC, we would like to capture the essence of all loops. To do so, we rst setup a few denitions:

Denition 1 A path is simple if no node appears more than once in the path, except that the rst and last nodes may be the same.

In the graphs above, some simple paths are:

but not:

Some properties of simple paths:

no internal loops; can bound their length; can create any path by composing simple paths; and many simple paths exist (too many!)

Because there are so many simple paths, lets instead consider prime paths, which are simple pathsof maximal length.

1


21/69

For instance, in the following graph:

1 2

3

4

5 6

Simple paths:

Prime paths:

Denition 2 A path is prime if it is simple and does not appear as a proper subpath of any other simple path.

Criterion 2 Prime Path Coverage. (PPC) TR contains each prime path in G .

There is a problem with using PPC as a coverage criterion: a prime path may be infeasible butcontain feasible simple paths.

Example:

One could replace infeasible prime paths in TR with feasible subpaths, but we wont bother.

Beyond CFGs. Lets now move beyond control-ow graphs and think about a different type of graph. The next few graphs represent nite state machines rather than control-ow graphs. Ourmotivation will be to set up criteria that visit round trips in cyclic graphs.

n 1 n 2 n 3open close

read

or perhaps

q 1 q 2 q 3 q 4

q 5 q 6

q 7socket bind listen closeS

accept

send

closeC

2


22/69

Prime paths apply to both CFGs and other graphs. The next criteria are mostly not for CFGs.

Denition 3 A round trip path is a prime path of nonzero length that starts and ends at the same node.

Criterion 3 Simple Round Trip Coverage . (SRTC) TR contains at least one round-trip path for each reachable node in G that begins and ends a round-trip path.

Criterion 4 Complete Round Trip Coverage . (CRTC) TR contains all round-trip paths for each reachable node in G .

Exercise: Computing Prime Paths. Last time, we saw this graph:

1 2 3

4

5 6 7

I recommend computing all of the simple paths for this example. (There are 53 simple paths, withlength up to 5, and 12 prime paths.) Hint: write out simple paths of length up to N . To getthe simple paths of length N + 1, start with the paths of length N and extend the ones that are

extendable; the non-extendable paths are prime, so mark them as you go along.Heres another graph:

0

1

2

3

4

5

6

This graph has 38 simple paths and 9 prime paths:

[0, 1, 2, 3, 6], [0, 1, 2, 4, 6], [0, 2, 3, 6], [0, 2, 4, 6], [0, 1, 2, 3, 5], [0, 2, 3, 5], [5, 3, 6], [3, 5, 3], [5, 3, 5].

To compute the prime paths, we could also enumerate all of the simple paths and note non-extendable paths and cycles (which are both prime). In doing so, one would also get all of theinformation necessary to write out the test requirements for NC, EC and EPC. Since there is aloop, it is impossible to explicitly write out all test requirements for CPC, but one could write outsome of them.

3


23/69



Another coverage criterion that Ive mentioned, but is not in the notes:

Criterion 1 Specied Path Coverage . (SPC) TR contains a specied set S of paths.

Specied path coverage might be useful for encoding a set of usage scenarios.

Prime Path Coverage versus Complete Path Coverage.

n 0

n 1 n 2

n 3

Prime paths:

path( t 1 ) =

path( t 2 ) = T 1 = {t 1 , t 2 } satises both PPC and CPC.

q 0 q 1

q 3 q 4

q 2

Prime paths:

path( t 3 ) =

path( t 4 ) =

T 1 = {t 3 , t 4 } satises both PPC but not CPC.

Now, well see another graph theory-inspired coverage criterion. First, some denitions:

1


24/69

Denition 1 A graph G is connected if every node in G is reachable from the set of initial nodes N 0 .

Denition 2 An edge e is a bridge for a graph G if G is connected, but removing e from G results in a disconnected graph.

(This is similar to an articulation point in a graph, except that articulation points are nodes, notedges.)

Criterion 2 Bridge Coverage. (BC) TR contains all bridges.

Assume that a graph contains at least two nodes, and all nodes in a graph are reachable fromthe initial nodes. Does bridge coverage subsume node coverage? Justify your answer, providing acounterexample if appropriate.

Specifying versus meeting test requirements. Consider this graph.

n 0 n 1 n 2 n 3

The following simple (and loop-free) path is, in fact, prime:

p =

PPC includes this path as a test requirement. The test path

meets the test requirement induced by p even though it is not prime. Note that a test path maysatisfy the prime path test requirement even though it is not prime.

Graph Coverage Exercise. Consider the graph dened by the following sets.

N = {1, 2, 3, 4, 5, 6, 7} N 0 = {1}

N

f = {7} E = {[1, 2], [1, 7], [2, 3], [2, 4], [3, 2], [4, 5], [4, 6], [5, 6], [6, 1]}

Also consider the following test paths:

t0 = [1, 2, 4, 5, 6, 1, 7] t1 = [1, 2, 3, 2, 4, 6, 1, 7]

Answer the following questions.

2


25/69

(a) Draw the graph.

(b) List the test requirements for EPC. [hint: 12 requirements of length 2]

(c) Does the given set of test paths satisfy EPC? If not, identify what is missing.

(d) Consider the simple path [3,2

,4

,5

,6] and test path [1

,2

,3

,2

,4

,6

,1

,2

,4

,5

,6

,1

,7]. Is the simplepath a subpath of the test path?

(e) List the test requirements for NC, EC, and PPC on this graph.

(f) List a test path that achieves NC but not EC on the graph.

(g) List a test path that achieves EC but not PPC on the graph.

(Note: Weve talked about test sets meeting criteria in the past. For (f) and (g), we are simplytalking about a test set with one test case that induces the given test paths.)

3


26/69



Graph Coverage for Source Code

So far, weve seen a number of coverage criteria for graphs, but Ive been vague about how toactually construct graphs. For the most part, its fairly obvious.

Structural Graph Coverage for Source Code

Fundamental graph for source code: Control-Flow Graph (CFG).

CFG nodes: zero or more statements;

CFG edges: an edge ( s 1 , s 2 ) indicates that s 1 may be followed by s 2 in an execution.

Basic Blocks. We can simplify a CFG by grouping together statements which always executetogether (in sequential programs):

1 x = 52

z = 23 q0 : i f ( z < 17) goto q14 z = z + 15 p rin t (x )6 g oto q 07 q 1 : n op

1 (L1-2)

2 (L3)

3 (L4-6)

4 (L7)

FT

We use the following denition:

Denition 1 A basic block has one entry point and one exit point.

Note that a basic block may have multiple successors. However, there may not be any jumps intothe middle of a basic block (which is why statement l0 has its own basic block.)

Some Examples

Well now see how to construct control-ow graph fragments for various program constructs.

1


27/69

if statements: The book puts the conditions (and hence uses) on the control-ow edges, ratherthan in the if node. I prefer putting the condition in the node.

1 if ( z < 17)2 p r in t (x );3 else4 p r in t (y );

1 (L1)

2 (L2) 3 (L4)

4

T F

1 (L1)

2 (L2)

3

T

F1 if ( z < 17)2 print(x);

Short-circuit if evaluation is more complicated; I recommend working it out yourself.

(Recall that node coverage does not imply edge coverage.)

case / switch statements:

1 s wi tc h ( n ) {2 c as e I : .. .; b re ak ;3 c as e J : . .. ; / * f al l t hr u * /4 c as e K : .. .; b re ak ;5 }6 / / . ..

1 (L1)

2 (L2) 3 (L3) 4 (L4)

5(L2)

6(L3)

7(L4)

8 (L6)

i j

k

while statements:

1 x = 0; y = 20;2 while ( x < y ) {3 x ++; y - -;4 }

1 (x = 0;

y = 20)

2 (L2)

3 (L3)

4

x < y

(x < y )

Note that arbitrarily complicated structures may occur inside the loop body.

for statements:

1 for ( int i = 0; i < 57; i ++) {2 if (i % 3 == 0) {3 p rin t ( i );4 }5 }

(an exercise for the reader!)

2


28/69

This example uses Javas enhanced for loops, which iterates over all of the elements in the widgetList :1 f or ( Wi dg e t w : w i dg e tL i st ) {2 decorate(w);3 }4 / / . ..

I will accept the simplied CFG or the more useful one on the right:

1 (L1)

2 (L2)

3 (L4)

w widgetList

it = wL.iterator()

it.hasNext()

w = it.next();decorate(w);

3 (L4)

T

F

All of these graphs admit the notions of node coverage (statement coverage, basic block coverage)and edge coverage (branch coverage).

Larger example. You can draw a 7-node CFG for this program:1 / ** B i na r y s e a rc h f o r t a rg et i n s o r te d s u b a rr a y a [ l ow. . h ig h ] * /2 i nt b i na r y_ s ea r ch ( i nt [ ] a , i n t l ow, i nt h i gh , i nt t a rg e t ) {3 w hi le ( l ow a [ m id d le ] )8 l ow = m id dl e + 1 ;9 else

10 r e turn m id dle ;11 }12 r e tu r n -1 ; / * no t fo u nd i n a [ l ow. . h ig h ] */13 }

1. if (low


29/69

Exercises

Here are more exercise programs that you can draw CFGs for.1 / * e ff ec t s : i f x = = n ul l , t h row N u l lP o in t e rE x cep t io n2 o th e rw i se , re t ur n nu m be r o f e l em e nt s i n x th a t a re od d , p o si t iv e o r b ot h . * /3 i nt o d dO r Po s ( i nt [ ] x ) {4 i nt co un t = 0 ;5 f or ( i nt i = 0; i < x . le ng th ; i ++ ) {6 if (x [i ]%2 == 1 | | x[ i] > 0) {7 coun t++;8 }9 }

10 r e tu rn co u nt ;11 }12

13 / / e xa mp le t es t c as e : i n pu t : x = [ -3 , - 2, 0 , 1 , 4 ]; o ut pu t : 3

Next, we have a really poorly-designed API (Id give it a D at most, maybe an F) because itsimpossible to succinctly describe what it does. Do not design functions with interfaces likethis. But we can still draw a CFG, no matter how bad the code is.

1 / ** R e tu r ns t he m e a n o f t h e f i rs t m a x Si z e n u m be r s i n t h e a rr ay,2 i f t he y a re b e t we e n m in a n d m ax . O t he r wi s e , s k ip t h e n um b er s . * /3 d o ub l e c o mp u te M ea n ( i nt [ ] v al ue , i nt m ax Si ze , i nt m in , i nt m ax ) {4 i nt i , ti , tv , s um ;5

6 i = 0; ti = 0; tv = 0; sum = 0;7 w hi le ( ti < m ax Si ze ) {8 t i++;9 i f ( v al ue [ i ] >= mi n & & v al ue [ i] 0)16 r e turn (d o u b le ) su m / tv ;17 else18 t h row n ew I l l eg a lArg u m en tE x cep t io n ( ) ;19 }

4


30/69



We are going to completely switch gears and talk about testing concurrent programs next. Forthose of you in 3A, SE 350 is the rst real exposure to these concepts; youll see them more in CS343. ECE 459 also teaches you how to leverage parallelism.

Context: Multicores are everywhere today! For the past 10 years, chips have not beengetting more GHz. We still have more transistors though. Hardware manufacturers have beensharing this bounty with us, through the magic of multicore processors!

If you want performance today, then you need parallelism.

The dark side is that concurrency bugs will bite you.

More often than not, printing a page on my dual-G5 crashes the application.The funny thing is, printing almost never crashes on my (single-core) G4PowerBook.

http://archive.oreilly.com/pub/post/dreaded_concurrency.html

The most famous kind of concurrency bug is the race condition. Lets look at this code.

1 #include < iostream >2 #include < thread >34 int counter = 0;56 void func() {7 int tmp;8 tmp = counter;9 tmp++;

10 counter = tmp;11 }1213 int main() {14 std::thread t1(func);15 std::thread t2(func);16 t1.join();17 t2.join();18 std::cout ./a.out2plam@polya /tmp> ./a.out2plam@polya /tmp> ./a.out1plam@polya /tmp> ./a.out1plam@polya /tmp> ./a.out

2plam@polya /tmp> ./a.out2

Yes, thats a race condition.

A race occurs when you have two concurrent accesses to the same memory location, at leastone of which is a write .

1


31/69

Race conditions arise between variables which are shared between threads. Note that when theresa race, the nal state may not be the same as running one access to completion and then the other.

Tools to the rescue. While races may be entertaining, race conditions are never good. Youhave several dynamic analysis tools at your disposal to eradicate races, including:

Helgrind (part of Valgrind) lockdep (Linux kernel) Thread Analyzer (Oracle Solaris Studio) Thread Analyzer (Coverity) Intel Inspector XE 2011 (formerly Intel Thread Checker)

We can run the race condition shown above under Helgrind:

plam@polya /tmp> g++ -std=c++11 race.C -g -pthread -o raceplam@polya /tmp> valgrind --tool=helgrind ./race[...]==6486== Possible data race during read of size 4 at 0x603E1C by thread #3==6486== Locks held: none==6486== at 0x400EA1: func() (race.C:8)==6486== by 0x402254: void std::_Bind_simple::_M_invoke(std::_Index_tuple) (functional:1732)==6486== by 0x4021AE: std::_Bind_simple::operator()() (functional:1720)==6486== by 0x402147: std::thread::_Impl::_M_run() (thread:115)==6486== by 0x4EF196F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)==6486== by 0x4C2F056: mythread_wrapper (hg_intercepts.c:234)==6486== by 0x56650A3: start_thread (pthread_create.c:309)==6486== by 0x595FCCC: clone (clone.S:111)==6486====6486== This conflicts with a previous write of size 4 by thread #2==6486== Locks held: none==6486== at 0x400EB1: func() (race.C:10)==6486== by 0x402254: void std::_Bind_simple::_M_invoke(std::_Index_tuple) (functional:1732)==6486== by 0x4021AE: std::_Bind_simple::operator()() (functional:1720)

==6486== by 0x402147: std::thread::_Impl::_M_run() (thread:115)==6486== by 0x4EF196F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)==6486== by 0x4C2F056: mythread_wrapper (hg_intercepts.c:234)==6486== by 0x56650A3: start_thread (pthread_create.c:309)==6486== by 0x595FCCC: clone (clone.S:111)==6486== Address 0x603e1c is 0 bytes inside data symbol "counter"

Not enough. OK, great. Now youve eliminated all races (as required by specication). Of course, there are still lots of bugs that your program might contain. Some of them are evenconcurrency bugs. Here, there are no longer any contended accesses, but we cheated by cachingthe value. Atomic operations would be safe.

1 #include 2 #include 3 #include 4

5 int counter = 0;6 std::mutex m;7

8 void func() {9 int tmp;

2


32/69

10

11 m.lock();12 tmp = counter;13 m.unlock();14 tmp++;15

m.lock();16 counter = tmp;17 m.unlock();18 }

We can test our code for additional concurrency bugs:

run the code multiple times

add noise (sleep, more system load, etc)

Helgrind and friends

force scheduling (e.g. Java PathFinder)

static approaches: lock-set, happens-before, state-of-the-art techniques

Reentrant/recursive Locks What happens if you have two requests for a POSIX/C++11 lock?

If the requests are in different threads, the second thread waits for the rst thread to unlock. But,if the requests are in the same thread, that thread waits for itself to unlock. . . forever!

To avoid this unhappy situation, we can use recursive locks. Each lock knows how many times itsowner has locked it. The owner must then unlock the same number of times to liberate.

Java locks work this way, e.g.1 class SynchronizedIsRecursive {2 int x;3

4 synchronized void f() {5 x--;6 g(); // does not hang!7 }8

9 synchronized void g() {10 x++;11 }12 }

Although every Java object is a lock, and we can synchronized() over every lock, ReentrantLocksare more special:

we can explicitly lock() & unlock() them,

3


33/69

(or even trylock() )!

However, inexpertly-written Java programs might hog the lock. To avoid that, use a try / finallyconstruct:

1 Lock lock = new ReentrantLock();2 lock.lock();3 try {4 // you got the lock! workworkwork 5 } finally {6 // might have thrown an exception 7 lock.unlock();8 }

Tools for detecting lock usage issues

The reference for the next example is Engler et al [ECCH00]. This example falls short of excellence:1 / 2.4.0:drivers/sound/cmpci.c:cm midi release: /2 lock_kernel(); // [PL: GRAB THE LOCK]3 if (file->f_mode & FMODE_WRITE) {4 add_wait_queue(&s->midi.owait, &wait);5 ...6 if (file->f_flags & O_NONBLOCK) {7 remove_wait_queue(&s->midi.owait, &wait);8 set_current_state(TASK_RUNNING);9 return -EBUSY; // [PL: OH NOES!!1]

10 }11 ...12 }13 unlock_kernel();

The problem: lock() and unlock() calls must be paired! They are on the happy path, but not onthe -EBUSY path. [ECCH00] describes a tool that allows developers to describe calls that must bepaired.

Another example:

1 foo(p, ...)2 bar(p, ...);

1 foo(p, ...)2 bar(p, ...);

1 foo(p, ...)2 // ERROR: foo, no bar!

Our tool might then give the following results: 23 errors, 11 false positives. A false positive is some-thing where the tool reports an error, but for instance, the error is in an infeasible path.

The next challenge is: how do we nd such rules? In particular, we want to nd rules of the formA() must be followed by B(), or a(); ... b(); , which denotes a MAY-belief that a() followsb(). iComment by Lin Tan et al propose mining comments to nd them [TYKZ07].

Here is some OpenSolaris code that demonstrates expectations. Also different locking primitives.

4


34/69

1 /* opensolaris/common/os/taskq.c: */ 2 /* Assumes: tq->tq_lock is held. */ 3 /* consistent */ 4 static void taskq_ent_free(...) { ... }5

6

static taskq_t7 *taskq_create_common(...) { ...8 // [different lock primitives below:]9 mutex_enter(...);

10 taskq_ent_free(...); /* consistent */ 11 ...12 }

Getting back to actual bugs, here is a bad comment automatically detected by iComment:

1 /* mozilla/security/nss/lib/ssl/sslsnce.c: */ 2 /* Caller must hold cache lock when calling this. */ 3

static sslSessionID * ConvertToSID(...) { ... }4

5 static sslSessionID *ServerSessionIDLookup(...)6 {7 ...8 UnlockSet(cache, set); ...9 sid = ConvertToSID(...);

10 ...11 }

We observe a specication in the comment at line 2, and then a usage error at line 9 where weunlock the cache and then call ConvertToSID . The badness of the comment was conrmed byMozilla developers.

Issue: Comments are not updated alongside code. Bad comments can and do cause bugs.

Heres another bad comment in the Linux kernel, also automatically detected.

1 // linux/drivers/ata/libata-core.c:2

3 /* LOCKING: caller. */ 4 void ata_dev_select(...) { ...}5

6 int ata_dev_read_id(...) {7

...8 ata_dev_select(...);9 ...

10 }

Once again, the specication at line 3 states that the caller is to lock. But line 8 calls ata dev select()without holding the lock. The badness of this comment was conrmed by Linux developers.

5


35/69

Deadlocks

Another concurrency problem is deadlocks. We focus on a particular form of deadlock here, whichoccurs when code may get interrupted by interrupt handlers, and the code shares locks with theinterrupt handler. This problem inspired aComment [TZP11].

In particular, if the spinlock is taken by code that runs in interrupt context (either a hardwareor software interrupt), then code requesting the lock must use the spin lock form that disablesinterrupts. Otherwise, sooner or later, the code will deadlock.

Heres an example of the right way to do things:

1 spinlock_t mr_lock = SPIN_LOCK_UNLOCKED;2 unsigned long flags;3 spin_lock_irqsave(&mr_lock, flags);4 /* critical section... */ 5 spin_lock_irqrestore(&mr_lock, flags);

6


36/69

spin lock irqsave() disables interrupts locally and provides spinlock on symmetric multipro-cessors (SMPs).

spin lock irqrestore() restores interrupts to state when lock acquired.

This covers both interrupt and SMP concurrency issues.

References

[ECCH00] Dawson R. Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Checking systemrules using system-specic, programmer-written compiler extensions. In Michael B.Jones and M. Frans Kaashoek, editors, OSDI , pages 116. USENIX Association, 2000.

[TYKZ07] Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. /*icomment: bugs or badcomments?*/. In Thomas C. Bressoud and M. Frans Kaashoek, editors, SOSP , pages145158. ACM, 2007.

[TZP11] Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. acomment: mining annotations fromcomments and code to detect interrupt related concurrency bugs. In Richard N. Taylor,Harald Gall, and Nenad Medvidovic, editors, ICSE , pages 1120. ACM, 2011.

7


37/69



Assertions are a key ingredient in writing tests, so I thought I would explicitly describe them.

Denition 1 An assertion contains an expression that is supposed to evaluate to true .

For instance, if we have a linked list, then the doubly-linked list property states that prev is theinverse of next , i.e. for nodes n, we have n.next.prev == n . After inserting a node, one mightexpect this property to be true of that node (possibly with caveats, for instance about the end of the list.)

We use assertions in unit tests to say whats supposed to be true.

Preconditions and postconditions. More generally, we can express what is supposed to betrue upon entry & exit from a method.

We saw this code in Linux:1 /* LOCKING: caller. */2 void ata_dev_select(...) { ...}

This expresses an assertion (although not written as a program statement) that the lock is heldupon entry.

Assume/Guarantee Reasoning. Why would you use preconditions and postconditions? Rea-soning about programs is difficult, and preconditions and postconditions simplies reasoning.

When reasoning about the callee, you get to assume that the precondition holds upon entry.

When reasoning about the caller, you get to guarantee the precondition holds before the call.

The reverse holds about the postcondition.

aComment

We talked about the aComment approach for locking-related annotations. In particular, it:

extracts locking-related annotations from code; extracts locking-related annotations from comments; and propagates annotations to callers.

1


38/69

Tools

To provide some context, consider the OS X Mavericks goto fail bug:1 if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)2 goto fail;3 if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)4 goto fail;5 goto fail; /* MISTAKE! THIS LINE SHOULD NOT BE HERE */ 6 if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)7 goto fail;8 err = sslRawVerify(...);9

10 fail:11 return err;

The bug:opensource.apple.com/source/Security/Security-55471/libsecurity_ssl/lib/sslKeyExchange.c

No bug:www.opensource.apple.com/source/Security/Security-55179.13/libsecurity_ssl/lib/sslKeyExchange.c

A writeup:nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/

Detecting goto fail. In retrospect, a number of tools couldve found this bug:

compiler -Wunreachable-code option PC-Lint:warning 539: Did not expect positive indentation PVS-Studio:V640: Logic does not match formatting

The problem is that these tools also report many other issues, and live issues can be buried among

those other issues.

The Landscape of Testing and Static Analysis Tools

Heres a survey of your options:

manual testing; running a JUnit test suite, manually generated; running automatically-generated tests; running static analysis tools.

Well examine several points on this continuum today. More on this later (Lecture 23), thanks toguest lecturer. Some examples:

Coverity: a static analysis tool used by 900+ companies, including BlackBerry, Mozilla, etc. Microsoft requires Windows device drivers to pass their Static Driver Verier for certication.

2


39/69

Tools for Java that you can download

FindBugs. An open-source static bytecode analyzer for Java out of the University of Maryland.

findbugs.sourceforge.net

It nds bug patterns:

off-by-one; null pointer dereference; ignored read() return value; ignored return value (immutable classes); uninitialized read in constructor; and more...

FindBugs gives some false positives. Here are some techniques to help avoid them:

patricklam.ca/papers/14.msr.saa.pdf

Java Path Finder (JPF), NASA. The key idea: Implement a Java Virtual Machine, butexplore many thread interleavings, looking for concurrency bugs.

JPF is an explicit state software model checker for Java TM bytecode.

JPF can also search for deadlocks and unhandled exceptions ( NullPointerException , AssertionError );

race conditions; missing heap bounds checks; and more.

javapathfinder.sourceforge.net

Korat (University of Illinois). Key Idea: Generate Java objects from a representation invariantspecication written as a Java method.

For instance, heres a binary tree. Binary Tree!

One characteristic of a binary tree:

left & right pointers dont refer to same node.

3


40/69

We can express that characteristic in Java as follows:1 boolean repOk() {2 if (root == null) return size == 0; // empty tree has size 0 3 Set visited = new HashSet(); visited.add(root);4 List workList = new LinkedList(); workList.add(root);5 while (!workList.isEmpty()) {6 Node current = (Node)workList.removeFirst();7 if (current.left != null) {8 if (!visited.add(current.left)) return false; // acyclicity 9 workList.add(current.left);

10 }11 if (current.right != null) {12 if (!visited.add(current.right)) return false; // acyclicity 13 workList.add(current.right);14 }15 }16

if (visited.size() != size) return false; // consistency of size17 return true;18 }

Korat then generates all distinct (non-isomorphic) trees, up to a given size (say 3). It uses thesetrees as inputs for testing the add() method of the tree (or for any other methods.)

korat.sourceforge.net/index.html

Randoop (MIT). Key Idea: Writing tests is a difficult and time-consuming activity, and yetit is a crucial part of good software engineering. Randoop automatically generates unit tests forJava classes.Randoop generates random sequence of method calls, looking for object contract violations.

To use it, simply point it at a program & let it run.

Randoop discards bad method sequences (e.g. illegal argument exceptions). It remembers methodsequences that create complex objects, and sequences that result in object contract violations.

code.google.com/p/randoop/

Here is an example generated by Randoop:1

public static void test1() {2 LinkedList list = new LinkedList();3 Object o1 = new Object();4 list.addFirst(o1);5

6 TreeSet t1 = new TreeSet(list);7 Set s1 = Collections.synchronizedSet(t1);8

9 // violated in the Java standard library!10 Assert.assertTrue(s1.equals(s1));11 }

4


41/69


Lecture 14 February 4, 2015Patrick Lam version 1

Recall that weve been discussing beliefs. Here are a couple of beliefs that are worthwhile to check.(examples courtesy Dawson Engler.)

Redundancy Checking. 1) Code ought to do something. So, when you have code that doesntdo anything, thats suspicious. Look for identity operations, e.g.

x = x , 1 y, x&x , x|x .

Or, a longer example:1 /* 2.4.5-ac8/net/appletalk/aarp.c */ 2 da.s_node = sa.s_node;3 da.s_net = da.s_net;

Also, look for unread writes:1 for (entry=priv->lec_arp_tables[i];2 entry != NULL; entry=next) {3 next = entry->next; // never read!4 ...5 }

Redundancy suggests conceptual confusion.

So far, weve talked about MUST-beliefs; violations are clearly wrong (in some sense). Lets examineMAY beliefs next. For such beliefs, we need more evidence to convict the program.

Process for verifying MAY beliefs. We proceed as follows:

1. Record every successful MAY-belief check as check.

2. Record every unsucessful belief check as error.

3. Rank errors based on check : error ratio.

Most likely errors occur when check is large, error small.

Example. One example of a belief is use-after-free:1 free(p);2 print(*p);

That particular case is a MUST-belief. However, other resources are freed by custom (undocu-mented) free functions. Its hard to get a list of what is a free function and what isnt. So, letsderive them behaviourally.

1


42/69

Inferring beliefs: nding custom free functions. The key idea is: if pointer p is not usedafter calling foo(p) , then derive a MAY belief that foo(p) frees p.

OK, so which functions are free functions? Well, just assume all functions free all arguments:

emit check at every call site;

emit error at every use.

(in reality, lter functions with suggestive names).

Putting that into practice, we might observe:

foo(p)p = x;

foo(p)p = x;

foo(p)p = x;

bar(p)p = 0;

bar(p)p=0;

bar(p)p = x;

We would then rank bar s error rst. Plausible results might be: 23 free errors, 11 false positives.

Inferring beliefs: nding routines that may return NULL. The situation: we want to knowwhich routines may return NULL. Can we use static analysis to nd out?

sadly, this is difficult to know statically ( return p->next; ?) and, we get false positives: some functions return NULL under special cases only.

Instead, lets observe what the programmer does. Again, rank errors based on checks vs non-checks.As a rst approximation, assume all functions can return NULL.

if pointer checked before use: emit check;

if pointer used before check: emit error.This time, we might observe:

p = bar(...);p = x;

p = bar(...);if (!p) return;p = x;



Again, sort errors based on the check:error ratio.

Plausible results: 152 free errors, 16 false positives.

2


43/69

General statistical technique

When we write a(); . . . b();, we mean a MAY-belief that a() is followed by b(). We dont actuallyknow that this is a valid belief. Its a hypothesis, and well try it out. Algorithm: assume every ab is a valid pair;

emit check for each path with a() and then b(); emit error for each path with a() and no b().

(actually, prelter functions that look paired).

Consider:foo(p, ...);bar(p, . . . ); // check

foo(p, ...);bar(p, . . . ); // check

foo(p, ...);// error: foo, no bar!

This applies to the course project as well.

1 void scope1() {2 A(); B(); C(); D();3 }45 void scope2() {6 A(); C(); D();7 }8

9 void scope3() {10 A(); B();11 }12

13 void scope4() {14 B(); D(); scope1();15 }16

17 void scope5() {18 B(); D(); A();19 }2021 void scope6() {22 B(); D();23 }

A() and B() must be paired:either A() then B() or B() then A().

Support = # times a pair of functions appears together.support( {A,B})=3

Condence( {A,B } ,{A }) =

support( {A,B})/support( {A}) = 3/4

Sample output for support threshold 3, condence threshold 65% (intra-procedural analysis):

bug:A in scope2, pair: (A B), support: 3, condence: 75.00% bug:A in scope3, pair: (A D), support: 3, condence: 75.00% bug:B in scope3, pair: (B D), support: 4, condence: 80.00% bug:D in scope2, pair: (B D), support: 4, condence: 80.00%

The point is to nd examples like the one from cmpci.c where theres a lock kernel() call, but,on an exceptional path, no unlock kernel() call.

Summary: Belief Analysis. We dont know what the right spec is. So, look for contradictions.

MUST-beliefs: contradictions = errors! MAY-beliefs: pretend theyre MUST, rank by condence.

(A key assumption behind this belief analysis technique: most of the code is correct.)

3


44/69

Further references. Dawson R. Engler, David Yu Chen, Seth Hallem, Andy Chou and BenjaminChelf. Bugs as Deviant Behaviors: A general approach to inferring errors in systems code. InSOSP 01.

Dawson R. Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Checking system rules us-ing system-specic, programmer-written compiler extensions. In OSDI 00 (best paper). www.stanford.edu/ ~engler/mc-osdi.pdf

Junfeng Yang, Can Sar and Dawson Engler. eXplode: a Lightweight, General system for FindingSerious Storage System Errors. In OSDI06. www.stanford.edu/ ~engler/explode-osdi06.pdf

Using Linters

We will also talk about linters in this lecture, based on Jamie Wongs blog post jamie-wong.com/2015/02/02/linters-as-invariants/ .

First there was C. In statically-typed languages, like C,

1 #include 2

3 int main() {4 printf("%d\n", num);5 return 0;6 }

the compiler saves you from yourself. The guaranteed invariant:

if code compiles, all symbols resolve.

Less-nice languages. OK, so you try to run that in JavaScript and it crashes right away. In-variant?

if code runs, all symbols resolve?

But what about this:

1 function main(x) {2 if (x) {3 console.log("Yay");4 } else {5 console.log(num);6 }7 }8

9 main( true );

4


45/69

Nope! The above invariant doesnt work.

OK, what about this invariant:

if code runs without crashing, all symbols referenced in the code path executed resolve?

Nope!1 function main() {2 try {3 console.log(num);4 } catch (err) {5 console.log("nothing to see here");6 }7 }8

9 main();

So, when youre working in JavaScript and maintaining old code, you always have to deduce: is this variable dened? is this variable always dened? do I need to load a script to dene that variable?

We have computers. Theyre powerful. Why is this the developers problem?!

1 function main(x) {2 if (x) {3 console.log("Yay");4 } else {5 console.log(num);6 }7 }8

9 main( true );

Now:

$ nodejs /usr/local/lib/node_modules/jshint/bin/jshint --config jshintrc foo.jsfoo.js: line 5, col 17, num is not defined.

1 error

Invariant:

If code passes JSHint, all top-level symbols resolve.

5


46/69

Strengthening the Invariant. Can we do better? How about adding a pre-commit hook?

If code is checked-in and commit hook ran,all top-level symbols resolve.

Of course, sometimes the commit hook didnt run. Better yet:

Block deploys on test failures.

Better invariant.

If code is deployed,all top-level symbols resolve.

Even better yet. It is hard to tell whether code is deployed or not. Use git feature branches,merge when deployed.

If code is in master,all top-level symbols resolve.

6


47/69



Data ow Criteria

So far weve seen structure-based criteria which imposed test requirements solely based on thenodes and edges of a graph. These criteria have been oblivious to the contents of the nodes.

However, programs mostly move data around, so it makes sense to propose some criteria based onthe ow of data around a program. Well be talking about du -pairs, which connect denitions anduses of variables.

du-pair

x = 5 // def(x)...print(x) // use(x)

Lets look at some graphs.

n 0 : x = 5n 1 :

print(x)

We write

def(n 0 ) = use( n 1 ) = {x}.

Note that edges can also have defs and uses, for instance in a graph corresponding to a nite statemachine. In that case, we could write:

use(n 0 , n 1 ) = {}.

Heres another example.

q 0 : z=x-y q 1 :if (z < w)

q 2 :print(z)

q 3 :print(w)

q 4

1


48/69

A particular def d of variable x may (or may not) reach a particular use u . If a def may reacha particular use, then there exists a path from d to u which is free of redenitions of x . In thefollowing graph, the def at n 2 does not reach the use at n 3 , since no path goes from n 2 to n 3 .

n 2 : x=5

n 3 : use(x)

Another example of a denition which does not reach:

n 0 : x = 2 n 1 : x = 17 n 2 : use(x)

We say that the denition at n 1 kills the denition at n 0 , so that def( n 0 ) does not reach n 2 . Weare therefore looking for def-clear paths.

Denition 1 A path p from 1 to m is def-clear with respect to variable v if for every node n k and every edge ek on p from 1 to m , where k = 1 and k = m , then v is not in def (n k ) or in def (ek ).

That is, nothing on the path p from location 1 to location m redenes v . (Locations are edges ornodes.)

Denition 2 A def of v at i reaches a use of v at 2 if there exists a def-clear path from i to jwith respect to v.

Quick poll: does the def at n 0 reach the use at n 5 ?

n 0 : x=1729 n 1

n 2 : x=84

n 3

n 4 n 5 : use(x)

Building on the notion of a def-clear path:

Denition 3 A du-path with respect to v is a simple path that is def-clear with respect to v from a node n i , such that v is in def (n i ), to a node n j , such that v is in use (n j ).

2


49/69

(This denition could be easily modied to use edges ei and e j ).

Note the following three points about du -paths:

associated with a variable

simple (otherwise there are too many) may be any number of uses in a du-path

Coverage criteria using du-paths

We next create groups of du -paths. Consider again the following double-diamond graph D :

n 5 : x = 5

u 0 : use(x)

n 9 : x = 9

u 1

n 3 : x = 3

n : use(x)

u 2 : use(x)

We will dene two sets of du -paths:

def-path sets: x a def and a variable, e.g.

du(n 5 , x ) = du(n 3 , x ) =

def-pair sets: x a def, a use, and a variable, e.g. du( n 5 , n ,x ) =

These sets will give the notions of all-defs coverage (tour at least one du -path from each def-pathseta weak criterion) and all-uses coverage (tour at least one du -path from each def-pair set).

How can there be multiple elements in a def-pair set?

3


50/69

Heres an example with two du -paths in a def-pair set.

n 17 : x = 17 n : use(x)

We then havedu( n 17 , n ,x ) =

Note the general relation

du( n i , v ) = n j du( n i , n j , v )

There are more def-pair sets than def-path sets. Cycles are always allowed as du -paths, as long asthe du -path is simple; you can always tour a du -path with a non-simple path, of course.

Useful exercise. Create an example where one def-path set splits into several def-pair sets; youcan get a smaller example than the one in the book.

We can use the above denitions to provide coverage criteria.

Criterion 1 All-Defs Coverage (ADC). For each def-path set S = du (n, v ), TR contains at least

one path d in S .

Criterion 2 All-Uses Coverage (AUC). For each def-pair set S = du (n i , n j , v ), TR contains at least one path d in S .

4


51/69

What do these criteria mean? For each def,

ADC: reach at least one use;

AUC: reach every use somehow;

In the context of the earlier example,

ADC requires:

AUC requires:

Nodes versus edges. So far, weve assumed denitions and uses occur on nodes.

uses on edges ( p-uses) work as well;

defs on edges are trickier, because a du -path from an edge to an edge may not be simple.(We could make things work out with more work.)

Another example.

n 0 : def(x)

n 1 : use(x)

n 2

n 3

n 4

n 5 : use(x)

n 6

Some test sets that meet these criteria:

ADC:

AUC:

5


52/69



Subsumption Chart

Here are the subsumption relationships between our graph criteria.

Complete Path CoverageCPC

Prime Path CoveragePPC

All-du-Paths CoverageADUPC

All-Uses CoverageAUC

All-Defs CoverageADC

Edge-Pair CoverageEPC

Edge CoverageEC

Node CoverageNC

Complete Round-Trip CoverageCRTC

Simple Round-Trip CoverageSRTC

We know EPC subsumes EC subsumes NC from before, and clearly CRTC subsumes SRTC.Also, CPC subsumes PPC subsumes EPC, and PPC subsumes CRTC, because simple pathsinclude round trips.

In this offering, we didnt talk about ADUPC at all, and we only touched briey on CRTCand SRTC.

Assumptions for dataow criteria:

1. every use preceded by a def (guaranteed by Java)

2. every def reaches at least one use

1


53/69

3. for every node with multiple outgoing edges, at least one variable is used on each out edge,and the same variables are used on each out edge.

Then:

AUC subsumes ADC. Each edge has at least one use, so AUC subsumes EC.

Finally, each du -path is also a simple path, so PPC subsumes ADUPC (not discussed thisterm, but ADUPC subsumes AUC). (Note that prime paths are simpler to compute thandata ow relationships, especially in the presence of pointers.

Dataow Graph Coverage for Source Code

Last time, we saw how to construct graphs which summarized a control-ow graphs structure.Lets enrich our CFGs with denitions and uses to enable the use of our dataow criteria.

Denitions. Here are some Java statements which correspond to denitions.

x = 5: x occurs on the left-hand side of an assignment statement;

foo(T x) { ... } : implicit denition for x at the start of a method;

bar(x) : during a call to bar , x might be dened if x is a C++ reference parameter.

(subsumed by others): x is an input to the program.

Examples:

Uses. The book lists a number of cases of uses, but it boils down to x occurs in an expressionthat the program evaluates. Examples: RHS of an assignment, or as part of a method parameter,or in a conditional.

Complications. As I said before, the situation in real-life is more complicated: weve assumedthat x is a local variable with scalar type.

What if x is a static eld, an array, an object, or an instance eld?

How do we deal with aliasing?

2


54/69

One answer is to be conservative and note that weve said that a denition d reaches a use u if itis possible that the address dened at d refers to the same address used at u. For instance:

c l a s s C { i n t f ; }v o i d f o o ( C q ) { u se (q . f ) ; }

x = new C ( ) ; x . f = 5 ;y = new C ( ) ; y . f = 2 ;

fo o (x ) ;f o o (y ) ;

Our denition says that both denitions reach the use.

Exercise. Consider the following graph:

N = {0, 1, 2, 3, 4, 5, 6, 7}

N 0 = {0}

N

f = {7} E = {(0, 1), (1, 2), (1, 7), (2, 3), (2, 4), (3, 2), (4, 5), (4, 6), (5, 6), (6, 1)}

test paths:

t1 = [0, 1, 7] t2 = [0, 1, 2, 4, 6, 1, 7] t3 = [0, 1, 2, 4, 5, 6, 1, 7] t4 = [0, 1, 2, 3, 2, 4, 6, 1, 7]

def(0) = def(3) = use(5) = use(7) = { x }

(a) Draw the graph.(b) List all du -paths with respect to x . Include all du -paths, even those that are subpaths of others.

(c) List a minimal test set (using the given paths) satisfying all-defs coverage with respect to x.

(d) List a minimal test set (using the given paths) satisfying all-uses coverage with respect to x.

Compiler tidbit. In a compiler, we use intermediate representations to simplify expressions,including denitions and uses. For instance, we would simplify:

x = foo(y + 1 , z * 2)

Basic blocks and defs/uses. Basic blocks can rule out some denitions and uses as irrelevant.

Defs: consider the last denition of a variable in a basic block. (If were not sure whether xand y are aliased, leave both of them.)

Uses: consider only uses that arent dominated by a denition of the same variable in thesame basic block, e.g. y = 5; use(y) is not interesting.

3


55/69

Graph Coverage for Design Elements

We next move beyond single methods to design elements, which include multiple methods, classes,modules, packages, etc. Usually people refer to such testing as integration testing.

Structural Graph Coverage.

We want to create graphs that represent relationships couplings between design elements.

Call Graphs. Perhaps the most common interprocedural graph is the call graph .

design elements, or nodes, are methods (or larger program subsystems) couplings, or edges, are method calls.

Consider the following example.

A

B

C

D

E

F

For method coverage, must call each method.

For edge coverage, must visit each edge; in this particular case, must get to both A and Cfrom both of their respective callees.

Like any other type of testing, call graph-based testing may require test harnesses. Imagine, forinstance, a library that implements a stack; it exposes push and pop methods, but needs a testdriver to exercise these methods.

Data Flow Graph Coverage for Design ElementsThe structural coverage criteria for design elements were not very satisfying: basically we only hadcall graphs. Lets instead talk about data-bound relationships between design elements.

caller: unit that invokes the callee; actual parameter: value passed to the callee by the caller; and formal parameter: placeholder for the incoming variable.

Illustration.

caller: foo(actual1, actual2);callee: void foo(int formal1, int formal2) { }

4


56/69



We want to dene du-pairs between callers and callees. Heres an example.

1: x=5

2: x=4

3: x=3

4: B(x)

11:B(int y)

12: z=y

13: t=y

14:print(y)

Well say that the last-defs are 2, 3; the rst-use is 12. We formally dene these notions:

Denition 1 (Last-def) . The set of nodes N that dene a variable x for which there is a def-clear path from a node n N through a call to a use in the other unit.

Denition 2 (First-use) . The set of nodes N that have uses of y and for which there exists a path that is def-clear and use-clear from the entry point (if the use is in the callee) or the callsite (if the use is in the caller) to the nodes N .

We need the following side denition, analogous to that of def-clear:

Denition 3 A path p = [n 1 , . . . , n j ] is use-clear with respect to v if for every n k p, where k = 1and k = j , then v is not in use (n k ).

In other words, the last-def is the denition that goes through the call or return, and the rst-usepicks up that denition.

1


57/69

Here are two more examples:

x = 14; // last-defy = g(x);print(y); // first-use

int g(a) {print(a); // first-useb = 24; // last-defreturn b;

}

main() {x = 1 ;x = 2; // last-defif (...) { x = 3; /* last-def */ }B(x);

}B(int y) {

if (...) {Z = y; // first-use

} else {T = y; // first-use

}print(y);

}

One can create pairs of triples linking last-defs and rst-uses; characterize a last-def or a rst-useby the method name, the variable name, and the statement, and then link them.

Tests can, of course, go beyond just testing the rst-use. Our rst-use and last-def denitions,however, make testing slightly more tractable. We could, of course, carry out full inter-proceduraldata-ow, i.e. covering all du-pairs, but this would be more expensive.

Syntax-Based Testing

We are going to completely switch gears now. We will see two applications of context-free grammars:

1. input-space grammars: create inputs (both valid and invalid)

2. program-based grammars: modify programs (mutation testing)

Mutation testing. The basic idea behind mutation testing is to improve your test suites bycreating modied programs ( mutants ) which force your test suites to include test cases whichverify certain specic behaviours of the original programs by killing the mutants.

Generating Inputs: Regular Expressions and Grammars

Consider the following Perl regular expression for Visa numbers:

^4[0-9]{12}(?:[0-9]{3})?$

Idea: generate valid tests from regexps and invalid tests by mutating the grammar/regexp. (Whydid I put valid in quotes? What is the fundamental limitation of regexps?)

2


58/69

Instead, we can use grammars to generate inputs (including sequences of input events).

Typical grammar fragment:

mult exp = unary exp | mult exp STAR unary arith exp | mult exp DIV unary arith exp;unary exp = quant exp | unary exp DOT INT | unary exp up;

...start = header ? declaration

Using Grammars. Two ways you can use input grammars for software testing and maintenance:

recognizer: can include them in a program to validate inputs;

generator: can create program inputs for testing.

Generators start with the start production and replace nonterminals with their right-hand sides toget (eventually) strings belonging to the input languages.

We specify three coverage criteria for inputs with respect to a grammar G.

Criterion 1 Terminal Symbol Coverage (TSC). TR contains each terminal of grammar G.

Criterion 2 Production Coverage (PDC). TR contains each production of grammar G.

Criterion 3 Derivation Coverage (DC). TR contains every possible string derivable from G.

PDC subsumes TSC. DC often generates innite test sets, and even if you limit to xed-lengthstrings, you still have huge numbers of inputs.

Another Grammar.

roll = actionaction = dep | deb

dep = "deposit" account amountdeb = "debit" account amountaccount = digit { 3 }amount = "$" digit+ "." digit { 2 }digit = [0 9]

3


59/69

Examples of valid strings.

Note: creating a grammar for a system that doesnt have one, but should, is a useful QA exercise.Using this grammar at runtime to validate inputs can improve software reliability, although itmakes tests generated from the grammar less useful.

Some Grammar Mutation Operators.

Nonterminal Replacement; e.g.dep = "deposit" account amount = dep = "deposit" amount amount

(Use your judgement to replace nonterminals with similar nonterminals.)

Terminal Replacement; e.g.amount = "$" digit+ "." digit { 2 } = amount = "$" digit+ "$" digit { 2 }

Terminal and Nonterminal Deletion; e.g.dep = "deposit" account amount = dep = "deposit" amount

Terminal and Nonterminal Duplication; e.g.dep = "deposit" account amount = dep = "deposit" account account amount

Using grammar mutation operators.

1. mutate grammar, generate (invalid) inputs; or,

2. use correct grammar, but mis-derive a rule oncegives closer inputs (since you only missonce.)

4


60/69

Why test invalid inputs? Bill Sempf (@sempf), on Twitter: QA Engineer walks into a bar.Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Ordersa sfdeljknesv.

If youre lucky, your program accepts strings (or events) described by a regexp or a grammar. Butyou might not use a parser or regexp engine. Generating using the regexp or grammar helps detectdeviations, both right now and in the future.As you saw in Assignment 1, its the easiest thing to overlook invalid inputs. Yet they may lead toundened behaviour.

What well do is to mutate the grammars and generate test strings from the mutated grammars.

Some notes:

Book claims we dont have much experience using grammar-based operators.

Can generate strings still in the grammar even after mutation.

Recall that we arent talking about semantic checks. Some programs accept only a subset of a specied larger language, e.g. Blogger HTML

comments. Then testing intersection is useful.

5


61/69



Mutation Testing

The second major way to use grammars in testing is mutation testing. Strings are usually programs,but could also be inputs, especially for testing based on invalid strings.

Denition 1 Ground string: a (valid) string belonging to the language of the grammar.

Denition 2 Mutation Operator: a rule that species syntactic variations of strings generated

from a grammar.

Denition 3 Mutant: the result of one application of a mutation operator to a ground string.

Mutants may be generated either by modifying existing strings or by changing a string while it isbeing generated.

It is generally difficult to nd good mutation operators. One example of a bad mutation operatormight be to change all predicates to true.

Note that mutation is hard to apply by hand, and automation is complicated. The testing com-munity generally considers mutation to be a gold standard that serves as a benchmark againstwhich to compare other testing criteria against.

More on ground strings. Mutation manipulates ground strings to produce variants on thesestrings. Here are some examples:

the program that we are testing; or valid inputs to a program.

If we are testing invalid inputs, we might not care about ground strings.

Credit card number examples. Valid strings:Invalid strings:

We can also create mutants by applying a mutation operator during generation, which is usefulwhen you dont need the ground string.

NB: There are many ways for a string to be invalid but still be generated by the grammar.

1


62/69

Some points:

How many mutation operators should you apply to get mutants? One. Should you apply every mutation operator everywhere it might apply? More work, but can

subsume other criteria.

Killing Mutants

Generate a mutant m for an original ground string m 0 .

Denition 4

Documents

Patrick Lam Lectures