Patrick Lam Lectures

Embed Size (px)

DESCRIPTION

aaa

Citation preview

  • 7/18/2019 Patrick Lam Lectures

    1/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 2 January 7, 2015Patrick Lam version 1

    Faults, Errors, and Failures

    For this course, we are going to dene the following terminology.

    Fault (also known as a bug): A static defect in softwareincorrect lines of code.

    Error : An incorrect internal statenot necessarily observed yet.

    Failure : External, incorrect behaviour with respect to the expected behaviourmust bevisible (e.g. EPIC FAIL).

    These terms are not used consistently in the literature. Dont get stuck on memorizing them.

    Motivating Example. Heres a train-tracks analogy.

    (all railroad pictures inspired by: Bernd Bruegge & Allen H. Dutoit, Object Oriented Software Engineering: Using UML, Patterns and Java .)

    Is it a failure? An error? A fault? Clearly, its not right, But no failure has occurred yet; thereis no behaviour. Id also say that nothing analogous to execution has occurred yet either. If therewas a train on the tracks, pre-derailment, then there would be an error. That picture most closelycorresponds to a fault.

    1

  • 7/18/2019 Patrick Lam Lectures

    2/69

    Perhaps it was caused by mechanical stresses.

    Or maybe it was caused by poor design.

    Software-related Example. Lets get back to software and consider the code from last time:p u bl i c s t at i c n u mZ e ro ( i nt [ ] x ) {

    / / e f fe c ts : i f x i s n ul l , t h ro w N u l lP o in t e rE x ce p t io n // o therwise , re tu rn number of occurrences of 0 in x .

    i nt c ou nt = 0 ;fo r ( i nt i = 1; i < x . le ng th ; i + +) {

    / / p r og r am p o in t ( *)i f ( x [ i ] = = 0 ) c ou n t ++ ;

    }re turn co u nt ;

    }

    As we saw, it has a fault (independent of whether it is executed or not): its supposed to return thenumber of 0s, but it doesnt always do so. We dene the state for this method to be the variablesx, i , count , and the Program Counter (PC).

    Feeding this numZero the input {2, 7, 0 } shows a wrong state.

    The wrong state is as follows: x = {2, 7, 0 }, i = 1, count = 0, PC = (*) , on the rst timearound the loop.

    The expected state is: x = {2, 7, 0 }, i = 0, count = 0, PC = (*)

    However, running numZero on {2, 7, 0 } executes the fault and causes a (transient) error state,but doesnt result in a failure, as the output value count is 1 as expected.

    On the other hand, running numZero on {0, 2, 7 } causes an error state with count = 0 on return,hence leading to a failure.

    2

  • 7/18/2019 Patrick Lam Lectures

    3/69

    RIP Fault Model

    To get from a fault to a failure:

    1. Fault must be reachable ;

    2. Program state subsequent to reaching fault must be incorrect: infection ; and

    3. Infected state must propagate to output to cause a visible failure.

    Applications of the RIP model: automatic generation of test data, mutation testing.

    Dealing with Faults, Errors and Failures

    Three strategies for dealing with faults are avoidance, detection and tolerance. Or, you can justtry to declare that the fault is not a bug, if the specication is ambiguous.

    Fault Avoidance. Certain faults can just be avoided by not programming in vulnerable lan-guages; buffer overows, for instance, are impossible in Java. Better system design can also helpavoid faults, for instance by making an error state unreachable.

    Fault Detection. Testing (construed broadly) is the primary means of fault detection. Softwareverication also qualies. Once you have detected a fault, if it is economically viable, you mightrepair it:

    Fault Tolerance. You are never going to remove all of the bugs, and some errors arise fromconditions beyond your control (such as hardware faults). Its worthwhile to tolerate faults too.Strategies include redundancy and isolation. An example of redundancy is provisioning extrahardware in case a server goes down. Isolation includes things as simple as checking preconditions.

    3

  • 7/18/2019 Patrick Lam Lectures

    4/69

    Testing vs Debugging

    Recall from last time:

    Testing : evaluating software by observing its execution.Debugging : nding (and xing) a fault given a failure.

    I said that you really need to automate your tests. But even so, testing is still hard: only certaininputs expose the fault in the form of a failure. As youve experienced, debugging is hard too: youhave the failure, but you have to nd the fault.

    Contrived example. Consider the following code:if ( x - 100

  • 7/18/2019 Patrick Lam Lectures

    5/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 3 January 9, 2015Patrick Lam version 2

    (We saw the numZero example today. That example is in the L02 lecture notes.)

    Exercise: ndLast. Heres a faulty program from last time, again.

    1 s ta ti c p ub li c in t fi nd La st ( i nt [ ] x , in t y ) {2 / * b u g : l o o p t e s t s h o u l d b e i > = 0 * / 3 for ( int i = x .length - 1; i > 0; i --) {4 if ( x[ i] == y ) {5 r e tu r n i ;6 }

    7 }8 r e turn -1;9 }

    On the assignment, you will have a question like this:

    (a) Identify the fault, and x it.

    (b) If possible, identify a test case that does not execute the fault.

    (c) If possible, identify a test case that executes the fault, but does not result in an error state.

    (d) If possible, identify a test case that results in an error, but not a failure. (Hint: PC)

    (e) For the given test case, identify the rst error state. Be sure to describe the complete state.

    I asked you to work on that question in class. (I noticed that most people were on-topic, thanks!)Here are some answers:

    (a) The loop condition must include index 0: i >= 0 .

    (b) This is a bit of a trick question. To avoid the loop condition, you must not enter the loop body.The only way to do that is to pass in x = null . You should also state, for instance, y = 3.The expected output is a NullPointerException , which is also the actual output.

    (c) Inputs where y appears in the second or later position execute the fault but do not result inthe error state; nor do inputs where x is an empty array. (There may be other inputs as well.)So, a concrete input: x = {2, 3 , 5 }; y = 3 . Expected output = actual output = 1.

    (d) One error input that does not lead to a failure is where y does not occur in x. That results inan incorrect PC after the nal executed iteration of the loop.

    1

  • 7/18/2019 Patrick Lam Lectures

    6/69

    (e) After running line 6 with i = 1 , the decrement occurs, followed by the evaluation of i > 0 ,causing the PC to exit the loop (statement 8) instead of returning to statement 4. The faultystate is x = {2, 3 , 5 }; y = 3; i = 0; PC = 8 , while correct state would be PC = 4.

    Someone asked about distinguishing errors from failures. These questions were about failures at

    the method level, and so a wrong return value would be a failure when were asking about methods.In the context of a bigger program, that return value might not be visible to the user, and so itmight not constitute a failure, just an error.

    Line Intersections

    We then talked about different ways of validating a test suite. Consider the following Python code,found by Michael Thiessen on stackoverow ( http://stackoverflow.com/questions/306316/determine-if-two-rectangles-overlap-each-other ).

    1 c l a s s L in eSeg men t :2 d ef _ _ in i t_ _ ( s el f , x 1 , x 2 ) :3 s el f. x1 = x1 ; s el f. x2 = x2 ;45 d ef i n te r se c t ( a , b ) :6 r et ur n (a . x1 < b. x2 ) & ( a. x2 > b. x1 ) ;

    We could construct test suites that:

    execute every statement in intersect (node coverage, well see later). Well, thats not veryuseful; any old test case will do that. There are no branches, so what well call edge coveragedoesnt help either.

    feed random inputs to intersect ; unfortunately, interesting behaviours are not random, soit wont help much in general.

    check all outputs of intersect (i.e. a test case with lines that intersect and one with linesthat dont intersect): were getting somewherethat will certify that the method works insome cases, but its easy to think of situations that we missed.

    check different values of clauses a.x1 < b.x2 and a.x2 > b.x1 (logic coverage)better thanthe above coverage criteria, but still misses interesting behaviours;

    analyze possible inputs and cover all interesting combinations of inputs (input space coverage)

    can create an exhaustive test suite, if your analysis is sound.

    Lets try to prove correctness of intersect . Theres an old saying about testingsupposedly, itcan only nd the presence of bugs, not their absence. This is not completely true, especially if youhave reliable software that automatically constructs an exhaustive test suite. But that is beyondthe state of the practice in 2015, for the most part.

    [Below is a cleaner presentation than the one in class. It also has the advantage of being not wrong.]

    2

  • 7/18/2019 Patrick Lam Lectures

    7/69

    Inputs to intersect . There are essentially four inputs to this function. Rename them aAbB ,for a.x1 , a.x2 , etc.

    Lets rst assume that all points are distinct. We should make a note to ourselves to checkviolations of this, as well: we may have a = A, a = b, a = B , and symmetrically for B :

    b = a, b = A, b = B . For the purpose of the analysis, lets assume that a < b ; when constructing testcases, we can

    swap a and b around. Thats why there are duplicate assert statements below.

    Without loss of generality, we can assume that a < A and b < B . (We ought to update theconstructor if we want to make that assumption.)

    With these assumptions, we have to test the three possible permutations aAbB , abAB , and abBa .It is simple to construct test cases for these permutations, using Pythons unittest framework:

    1 d e f t e s t_aAb B(se l f ) :2 a = L i ne S eg m en t ( 0 , 2)3 b = L i ne S eg m en t ( 3 , 7)4 s e l f . a s se r tFa l se ( in t e r sec t ( a , b ) )5 s e l f . a s se r tFa l se ( in t e r sec t (b , a ) )67 d e f t e s t_ab AB(se l f ) :8 a = L i ne S eg m en t ( 0 , 4)9 b = L i ne S eg m en t ( 3 , 7)

    10 s e l f . a s se r tTru e( in t e r sec t ( a , b ) )11 s e l f . a s se r tTru e( in t e r sec t (b , a ) )1213 d e f t e s t_ab BA(se l f ) :14 a = L i ne S eg m en t ( 0 , 4)15 b = L i ne S eg m en t ( 1 , 2)16 s e l f . a s se r tTru e( in t e r sec t ( a , b ) )17 s e l f . a s se r tTru e( in t e r sec t (b , a ) )

    Those test cases pass. However, if you construct test cases for equality (as Ive committed to therepository), you see that the given intersect function fails on line segments that intersect only ata point. Replacing < with with >= xes the code.

    3

  • 7/18/2019 Patrick Lam Lectures

    8/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 4 January 12, 2015Patrick Lam version 1

    About Testing

    We can look at testing statically or dynamically.

    Static Testing (ahead-of-time): this includes static analysis, which is typically automated and runsat compile time (or, say, nightly), as well human-driven static testingwalk-throughs (informal)and code inspection (formal).

    Dynamic Testing (at run-time): observe program behaviour by executing it; includes black-boxtesting and white-box testing.

    Usually the word testing means dynamic testing .

    Naughty words. People like to talk about complete testing, exhaustive testing, and fullcoverage. However, for many systems, the number of potential inputs is innite. Its thereforeimpossible to completely test a nontrivial system, i.e. run it on all possible inputs. There are bothpractical limitations (time and cost) and theoretical limitations (i.e. the halting problem).

    In the absence of complete testing, we will dene testing criteria and evaluate test suites with them.

    Test cases

    Informally, a test case contains:

    what you feed to software; and

    what the software should output in response.

    Here are two denitions to help evaluate how hard it might be to create test cases.

    Denition 1 Observability is how easy it is to observe the systems behaviour, e.g. its outputs,effects on the environment, hardware and software.

    Denition 2 Controlability is how easy it is to provide the system with needed inputs and to get the system into the right state.

    1

  • 7/18/2019 Patrick Lam Lectures

    9/69

    Anatomy of a Test Case

    Consider testing a cellphone from the off state:

    on 1 519 888 4567 talk end prex values tes t case values vericat ion values exit codes

    postx values

    Denition 3

    Test Case Values : input values necessary to complete some execution of the software.(often called the test case itself)

    Expected Results : result to be produced iff program satises intended behaviour on a test case.

    Prex Values : inputs to prepare software for test case values.

    Postx Values : inputs for software after test case values; verication values : inputs to show results of test case values; exit commands : inputs to terminate program or to return it to initial state.

    Denition 4

    Test Case : test case values, expected results, prex values, and postx values necessary toevaluate software under test.

    Test Set : set of test cases.

    Executable Test Script : test case prepared in a form to be executable automatically and which generates a report.

    On Coverage

    Ideally, wed run the program on the whole input space and nd bugs. Unfortunately, such a planis usually infeasible: there are too many potential inputs.

    Key Idea: Coverage. Find a reduced space and cover that space.

    We hope that covering the reduced space is going to be more exhaustive than arbitrarily creatingtest cases. It at least tells us when we can plausibly stop testing.

    The following denition helps us evaluate coverage.

    Denition 5 A test requirement is a specic element of a (software) artifact that a test case must satisfy or cover.

    2

  • 7/18/2019 Patrick Lam Lectures

    10/69

    We write TR for a set of test requirements; a test set may cover a set of TRs.

    For instance, consider three ice cream cone avours: vanilla, chocolate and mint. A possible testrequirement would be to test one chocolate cone. (Volunteers?)

    Two software examples:

    cover all decisions in a program (branch coverage); each decision gives two test requirements:branch is true; branch is false.

    each method must be called at least once; each method gives one test requirement.

    Denition 6 A coverage criterion is a rule or collection of rules that impose test requirements on a test set.

    A test set may or may not satisfy a coverage criterion. The coverage criterion gives a recipe forgenerating TRs systemically.

    Returning to the ice cream example, a avour criterion might be cover all avours, and thatwould generate three TRs: {avour: chocolate, avour: vanilla, avour: mint }.

    We can test an ice cream stand by running two test sets on it, for instance: test set 1 includes 3chocolate cones and 1 vanilla cone, while test set 2 includes 1 chocolate cone, 1 vanilla cone, and 1mint cone.

    Denition 7 (Coverage). Given a set of test requirements TR for a coverage criterion C , a test set T satises C iff for every test requirement tr TR, at least one t T satises TR.

    Infeasible Test Requirements. Sometimes, no test case will satisfy a test requirement. Forinstance, dead code can make statement coverage infeasible, e.g.:

    if (false)unreachableCall();

    or, a real example from the Linux kernel:

    while (0){local_irq_disable();}

    Hence, a criterion which says test every statement is going to be infeasible for many programs.

    Quantifying Coverage. How good is a test set? Its great if it covers everything, but sometimesthats impossible. We can instead assign a number.

    Denition 8 (Coverage Level). Given a set of test requirements TR and a test set T , the coverage level is the ratio of the number of test requirements satised by T to the size of TR.

    Returning to our example, say TR = {avour: chocolate, avour: vanilla, avour: mint }, and testset T1 contains {3 chocolate, 1 vanilla }, then the coverage level is 2/3 or about 67%.

    3

  • 7/18/2019 Patrick Lam Lectures

    11/69

    Subsumption

    Sometimes one coverage criterion is strictly more powerful than another one: any test set thatsatises C 1 might automatically satisfy C 2 .

    Denition 9 Criteria subsumption: coverage criterion C 1 subsumes C 2 iff every test set that satises C 1 also satises C 2 .

    Software example: branch coverage (Edge Coverage) subsumes statement coverage (Node Cov-erage). Which is stronger?

    Evaluating coverage criteria. Subsumption is a rough guide for comparing criteria, but itshard to use in practice. Consider also:

    1. difficulty of generating test requirements;

    2. difficulty of generating tests;

    3. how well tests reveal faults.

    4

  • 7/18/2019 Patrick Lam Lectures

    12/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 5 January 14, 2015Patrick Lam version 0

    Graph Coverage

    We will discuss graph coverage in great detail. Many forms of software testing reduce to graphcoverage, so once you understand how graph coverage works, you will have a good understandingof a key software testing topic.

    Denition of Graphs. (You should already be familiar with this material, but lets establishthe terms well use in this class). Lets start with an example graph.

    C

    B

    A

    D

    E

    N : Set of nodes {A,B,C,D,E }N 0 : Set of initial nodes {A}N f : Set of nal nodes {E }

    E N N : Edges, e.g. (A, B ) and ( C, E );C is the predecessor and

    E is the successor in (C, E ).

    Subgraph: Let G be a subgraph of G; then the nodes of G must be a subset N sub of N . Thenthe initial nodes of G are N 0 N sub and its nal nodes are N f N sub . The edges of G areE (N sub N sub ).

    For example, consider the case where we set N sub = {A,B,E }. This induces the subgraph:

    A B E

    Note that graphs need not be connected.

    Paths. The most important thing about a graph, for testing purposes, is the path through thegraph. Here is a graph G# and some example paths through the graph.

    1

  • 7/18/2019 Patrick Lam Lectures

    13/69

    1 2 3

    4

    5 6 7

    path 1: [2, 3, 5], with length 2.

    path 2: [1, 2, 3, 5, 6, 2], with length 5.

    not a path: [1 , 2, 5].

    We say that path 1 is from node 2 to node 5. We can also say that path 1 is from edge (2, 3) to(3, 5).

    Denition 1 A path is a sequence of nodes from a graph G whose adjacent pairs all belong to the set of edges E of G.

    Note that length 0 paths are still paths.

    Denition 2 A subpath is a subsequence of a path.

    This textbook denition is ambiguous: is [1 , 2, 5] a subpath of [1, 2, 3, 5, 6, 2]?

    Denition 3 A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.

    Yes, [1, 2, 5] is a subsequence of [1, 2, 3, 5, 6, 2]. However, [1, 2, 5] is not a path, so we are going tosay that it is not a subpath.

    Test cases and test paths.

    Some paths are also test paths. Here is a test path from G# :

    [1, 2, 3, 5, 6, 7];

    here is another one:[1, 2, 3, 5, 6, 2, 3, 5, 6, 7].

    You can easily come up with more paths. Now, test paths are linked to test cases. First, lets denethe notion of a test path.

    2

  • 7/18/2019 Patrick Lam Lectures

    14/69

    Denition 4 A test path is a path p (possibly of length 0) that starts at some node in N 0 and ends at some node in N f .

    Running the test case on the program or method yields one or more test paths. A test path mayrepresent many test cases (for instance, if a program takes the same branches on all of those test

    cases); or a test path may represent no test cases (if it is infeasible).

    Paths and semantics. When a graph is a programs control-ow graph, some of the paths inthe graph may not correspond to program semantics. Consider the following graph.

    :if (false)

    Clearly will never execute.

    In this course, we will generally only talk about the syntax of a graphits nodes and edgesandnot its semantics .

    However, in the following denition, well talk about both notions.

    Denition 5 A node n (or edge e) is syntactically reachable from n i if there exists a path from n i to n (or e). A node n (or edge e) is semantically reachable if one of the paths from n i to n can

    be reached on some input.

    Standard graph algorithms, like breadth-rst search and depth-rst search, can compute syntacticreachability. (Semantic reachability is undecidable; no algorithm can precisely compute semanticreachability for all programs.)

    We dene reach G ( ) as the portion of the graph syntactically reachable from . ( might be anode, an edge, a set of nodes, or a set of edges.) For example:

    reach G (N 0 ) is the set of nodes and edges reachable from the initial node(s);

    reach G # (2) in the above graph G # is {2, 3, 4, 5, 6, 7};

    reach G # (7) is {7}.

    Note that we include in the set of nodes reachable from , because paths of length 0 are paths.

    When we talk about the nodes or edges in a graph G in a coverage criterion, well generally meanreach G (N 0 ); the unreachable nodes tend to (1) be uninteresting; and (2) to frustrate coveragecriteria.

    3

  • 7/18/2019 Patrick Lam Lectures

    15/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 6 January 16, 2015Patrick Lam version 1

    Some binary distinctions

    Lets digress for a bit and dene some older terms which we wont use much in this course, butwhich we should discuss briey.

    Black-box testing. Deriving tests from external descriptions of software: specications, re-quirements, designs; anything but the code.

    White-box testing. Deriving tests from the source code, e.g. branches, conditions, statements.

    Our model-based approach makes this distinction less important.

    Test paths and cases

    We resume our graph coverage content with the following denition:

    Denition 1 A graph is single-entry/single-exit (SESE) if N 0 and N f have exactly one element each. N f must be reachable from every node in N , and no node in N \ N f may be reachable from N f , unless N 0 = N f .

    The graphs that well be talking about in this course will almost always be SESE.

    Heres another example of a graph, which happens to be SESE, and test paths in that graph. Wellcall this graph D , for double-diamond, and itll come up a few times.

    n 0

    n 1

    n 2

    n 3

    n 4

    n 5

    n 6

    1

  • 7/18/2019 Patrick Lam Lectures

    16/69

    Here are the four test paths in D :

    [n 0 , n 1 , n 3 , n 4 , n 6 ][n 0 , n 1 , n 3 , n 5 , n 6 ][n 0 , n 2 , n 3 , n 4 , n 6 ]

    [n 0 , n 2 , n 3 , n 5 , n 6 ]

    We next focus on the path p = [n 0 , n 1 , n 3 , n 4 , n 6 ] and use it to explain several path-related deni-tions. We can say that p visits node n 3 and edge (n 0 , n 1 ); we can write n 3 p and (n 0 , n 1 ) prespectively.

    Let p = [n 1 , n 3 , n 4 ]. Then p is a subpath of p.

    Test cases and test paths. We connect test cases and test paths with a mapping path G fromtest cases to test paths; e.g. path G (t ) is the set of test paths corresponding to test case t .

    usually we just write path since G is obvious from the context.

    we can lift the denition of path to test sets T by dening path( T ) = {path( t ) |t T }.

    each test case gives at least one test path. If the software is deterministic, then each test casegives exactly one test path; otherwise, multiple test cases may arise from one test path.

    Heres an example of deterministic and nondeterministic control-ow graphs:

    if (x > 3)

    x = 5x = 0 if (x.hashCode() > 3)

    Causes of nondeterminism include dependence on inputs; on the thread scheduler; and on memoryaddresses, for instance as seen in calls to the default Java hashCode() implementation.

    Nondeterminism makes it hard to check test case output, since more than one output might be avalid result of a single test input.

    Indirection. Note that we will describe coverage criteria with respect to test paths , but we alwaysrun test cases .

    2

  • 7/18/2019 Patrick Lam Lectures

    17/69

    Example. Here is a short method, the associated control-ow graph, and some test cases andtest paths.

    i n t f o o ( i n t x ) {i f ( x < 5) {

    x ++;} e l s e {

    x ;}r e t u r n x ;

    }

    (1) if (x < 5)

    (2) x++(3) x--

    (4) return x

    TF

    Test case: x = 5; test path: [(1) , (3) , (4)].

    Test case: x = 2; test path: [(1) , (2) , (4)].

    Note that (1) we can deduce properties of the test case from the test path; and (2) in this example,since our method is deterministic, the test case determines the test path.

    Graph Coverage

    Having dened all of the graph notions well need for now, we apply them to graphs. Recall ourprevious denition of coverage:

    Denition 2 Given a set of test requirements TR for a coverage criterion C , a test set T satisesC iff for every test requirement tr in TR , at least one t in T exists such that t satises tr .

    We apply this denition to graph coverage:

    Denition 3 Given a set of test requirements TR for a graph criterion C , a test set T satises C on graph G iff for every test requirement tr in TR , at least one test path p in path (T ) exists such that p satises tr .

    Well use this notion to dene a number of standard testing coverage criteria. (At this point, thetextbook denes predicates, but mostly ignores them afterwards. Ill just ignore them right away.)

    Recall the double-diamond graph D which we saw on page 1. For the node coverage criterion, weget the following test requirements:

    {n 0 , n 1 , n 2 , n 3 , n 4 , n 5 , n 6 }

    That is, any test set T which satises node coverage on D must include test cases t; the cases tgive rise to test paths path( t ), and some path must include each node from n 0 to n 6 . (No singlepath must include all of these nodes; the requirement applies to the set of test paths.)

    Lets formally dene node coverage.

    3

  • 7/18/2019 Patrick Lam Lectures

    18/69

    Denition 4 Node coverage: For each node n reach G (N 0 ), TR contains a requirement to visit node n .

    We will state all of the coverage criteria in the following form:

    Criterion 1 Node Coverage (NC): TR contains each reachable node in

    G.

    We can then writeTR = {n 0 , n 1 , n 2 , n 3 , n 4 , n 5 , n 6 }.

    Lets consider an example of a test set which satises node coverage on D , the double-diamondgraph from last time.

    Start with a test case t1 ; assume that executing t 1 gives the test path

    path( t 1 ) = p1 = [n 0 , n 1 , n 3 , n 4 , n 6 ].

    Then test set {t 1 } does not give node coverage on D , because no test case covers node n 2 or n 5 . If

    we can nd a test case t2 with test pathpath( t 2 ) = p2 = [n 0 , n 2 , n 3 , n 5 , n 6 ],

    then the test set T = {t 1 , t 2 } satises node coverage on D .

    What is another test set which satises node coverage on D ?

    Here is a more verbose denition of node coverage.

    Denition 5 Test set T satises node coverage on graph G if and only if for every syntactically reachable node n N , there is some path p in path (T ) such that p visits n .

    A second standard criterion is that of edge coverage.

    Criterion 2 Edge Coverage (EC). TR contains each reachable path of length up to 1, inclusive,in G.

    We describe edge coverage this way so that, as far as possible, new criteria in a series will subsumeprevious criteria.

    Here are some examples of paths of length 1:

    Note that since were not talking about test paths , these reachable paths need not start in N 0 .

    In general, paths of length 1 consist of nodes and edges. (Why not just say edges?)

    Saying edges on the above graph would not be the same as saying paths of length 1.

    4

  • 7/18/2019 Patrick Lam Lectures

    19/69

    Here is a more involved example:

    n 0

    n 1

    n 2

    x < y

    x y

    Lets dene

    path( t 1 ) = [n 0 , n 1 , n 2 ]path( t 2 ) = [n 0 , n 2 ]

    Then

    T 1 = satises node coverageT 2 = satises edge coverage

    Going beyond 1. So far weve seen length 0 (node coverage) and length 1. Of course, wecan go to lengths 2, etc., but we quickly get diminishing returns. Here is the criterion for length 2.

    Criterion 3 Edge-Pair Coverage. (EPC) TR contains each reachable path of length up to 2,inclusive, in G.

    Heres an example.

    1 2 3

    4

    5 6

    nodes:

    edges:

    paths of length 2:

    5

  • 7/18/2019 Patrick Lam Lectures

    20/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 7 January 19, 2015Patrick Lam version 2

    Further properties of paths

    Weve seen so far length-0 (NC), length-1 (EC) and length-2 (EPC) paths. We could keep ongoing, but that gets us to Complete Path Coverage (CPC), which requires an innite number of test requirements.

    Criterion 1 Complete Path Coverage . (CPC) TR contains all paths in G .

    Note that CPC is impossible to achieve for graphs with loops.

    Instead of going to CPC, we would like to capture the essence of all loops. To do so, we rst setup a few denitions:

    Denition 1 A path is simple if no node appears more than once in the path, except that the rst and last nodes may be the same.

    In the graphs above, some simple paths are:

    but not:

    Some properties of simple paths:

    no internal loops; can bound their length; can create any path by composing simple paths; and many simple paths exist (too many!)

    Because there are so many simple paths, lets instead consider prime paths, which are simple pathsof maximal length.

    1

  • 7/18/2019 Patrick Lam Lectures

    21/69

    For instance, in the following graph:

    1 2

    3

    4

    5 6

    Simple paths:

    Prime paths:

    Denition 2 A path is prime if it is simple and does not appear as a proper subpath of any other simple path.

    Criterion 2 Prime Path Coverage. (PPC) TR contains each prime path in G .

    There is a problem with using PPC as a coverage criterion: a prime path may be infeasible butcontain feasible simple paths.

    Example:

    One could replace infeasible prime paths in TR with feasible subpaths, but we wont bother.

    Beyond CFGs. Lets now move beyond control-ow graphs and think about a different type of graph. The next few graphs represent nite state machines rather than control-ow graphs. Ourmotivation will be to set up criteria that visit round trips in cyclic graphs.

    n 1 n 2 n 3open close

    read

    or perhaps

    q 1 q 2 q 3 q 4

    q 5 q 6

    q 7socket bind listen closeS

    accept

    send

    closeC

    2

  • 7/18/2019 Patrick Lam Lectures

    22/69

    Prime paths apply to both CFGs and other graphs. The next criteria are mostly not for CFGs.

    Denition 3 A round trip path is a prime path of nonzero length that starts and ends at the same node.

    Criterion 3 Simple Round Trip Coverage . (SRTC) TR contains at least one round-trip path for each reachable node in G that begins and ends a round-trip path.

    Criterion 4 Complete Round Trip Coverage . (CRTC) TR contains all round-trip paths for each reachable node in G .

    Exercise: Computing Prime Paths. Last time, we saw this graph:

    1 2 3

    4

    5 6 7

    I recommend computing all of the simple paths for this example. (There are 53 simple paths, withlength up to 5, and 12 prime paths.) Hint: write out simple paths of length up to N . To getthe simple paths of length N + 1, start with the paths of length N and extend the ones that are

    extendable; the non-extendable paths are prime, so mark them as you go along.Heres another graph:

    0

    1

    2

    3

    4

    5

    6

    This graph has 38 simple paths and 9 prime paths:

    [0, 1, 2, 3, 6], [0, 1, 2, 4, 6], [0, 2, 3, 6], [0, 2, 4, 6], [0, 1, 2, 3, 5], [0, 2, 3, 5], [5, 3, 6], [3, 5, 3], [5, 3, 5].

    To compute the prime paths, we could also enumerate all of the simple paths and note non-extendable paths and cycles (which are both prime). In doing so, one would also get all of theinformation necessary to write out the test requirements for NC, EC and EPC. Since there is aloop, it is impossible to explicitly write out all test requirements for CPC, but one could write outsome of them.

    3

  • 7/18/2019 Patrick Lam Lectures

    23/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 8 January 21, 2015Patrick Lam version 1

    Another coverage criterion that Ive mentioned, but is not in the notes:

    Criterion 1 Specied Path Coverage . (SPC) TR contains a specied set S of paths.

    Specied path coverage might be useful for encoding a set of usage scenarios.

    Prime Path Coverage versus Complete Path Coverage.

    n 0

    n 1 n 2

    n 3

    Prime paths:

    path( t 1 ) =

    path( t 2 ) = T 1 = {t 1 , t 2 } satises both PPC and CPC.

    q 0 q 1

    q 3 q 4

    q 2

    Prime paths:

    path( t 3 ) =

    path( t 4 ) =

    T 1 = {t 3 , t 4 } satises both PPC but not CPC.

    Now, well see another graph theory-inspired coverage criterion. First, some denitions:

    1

  • 7/18/2019 Patrick Lam Lectures

    24/69

    Denition 1 A graph G is connected if every node in G is reachable from the set of initial nodes N 0 .

    Denition 2 An edge e is a bridge for a graph G if G is connected, but removing e from G results in a disconnected graph.

    (This is similar to an articulation point in a graph, except that articulation points are nodes, notedges.)

    Criterion 2 Bridge Coverage. (BC) TR contains all bridges.

    Assume that a graph contains at least two nodes, and all nodes in a graph are reachable fromthe initial nodes. Does bridge coverage subsume node coverage? Justify your answer, providing acounterexample if appropriate.

    Specifying versus meeting test requirements. Consider this graph.

    n 0 n 1 n 2 n 3

    The following simple (and loop-free) path is, in fact, prime:

    p =

    PPC includes this path as a test requirement. The test path

    meets the test requirement induced by p even though it is not prime. Note that a test path maysatisfy the prime path test requirement even though it is not prime.

    Graph Coverage Exercise. Consider the graph dened by the following sets.

    N = {1, 2, 3, 4, 5, 6, 7} N 0 = {1}

    N

    f = {7} E = {[1, 2], [1, 7], [2, 3], [2, 4], [3, 2], [4, 5], [4, 6], [5, 6], [6, 1]}

    Also consider the following test paths:

    t0 = [1, 2, 4, 5, 6, 1, 7] t1 = [1, 2, 3, 2, 4, 6, 1, 7]

    Answer the following questions.

    2

  • 7/18/2019 Patrick Lam Lectures

    25/69

    (a) Draw the graph.

    (b) List the test requirements for EPC. [hint: 12 requirements of length 2]

    (c) Does the given set of test paths satisfy EPC? If not, identify what is missing.

    (d) Consider the simple path [3,2

    ,4

    ,5

    ,6] and test path [1

    ,2

    ,3

    ,2

    ,4

    ,6

    ,1

    ,2

    ,4

    ,5

    ,6

    ,1

    ,7]. Is the simplepath a subpath of the test path?

    (e) List the test requirements for NC, EC, and PPC on this graph.

    (f) List a test path that achieves NC but not EC on the graph.

    (g) List a test path that achieves EC but not PPC on the graph.

    (Note: Weve talked about test sets meeting criteria in the past. For (f) and (g), we are simplytalking about a test set with one test case that induces the given test paths.)

    3

  • 7/18/2019 Patrick Lam Lectures

    26/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 9 January 23, 2015Patrick Lam version 1

    Graph Coverage for Source Code

    So far, weve seen a number of coverage criteria for graphs, but Ive been vague about how toactually construct graphs. For the most part, its fairly obvious.

    Structural Graph Coverage for Source Code

    Fundamental graph for source code: Control-Flow Graph (CFG).

    CFG nodes: zero or more statements;

    CFG edges: an edge ( s 1 , s 2 ) indicates that s 1 may be followed by s 2 in an execution.

    Basic Blocks. We can simplify a CFG by grouping together statements which always executetogether (in sequential programs):

    1 x = 52

    z = 23 q0 : i f ( z < 17) goto q14 z = z + 15 p rin t (x )6 g oto q 07 q 1 : n op

    1 (L1-2)

    2 (L3)

    3 (L4-6)

    4 (L7)

    FT

    We use the following denition:

    Denition 1 A basic block has one entry point and one exit point.

    Note that a basic block may have multiple successors. However, there may not be any jumps intothe middle of a basic block (which is why statement l0 has its own basic block.)

    Some Examples

    Well now see how to construct control-ow graph fragments for various program constructs.

    1

  • 7/18/2019 Patrick Lam Lectures

    27/69

    if statements: The book puts the conditions (and hence uses) on the control-ow edges, ratherthan in the if node. I prefer putting the condition in the node.

    1 if ( z < 17)2 p r in t (x );3 else4 p r in t (y );

    1 (L1)

    2 (L2) 3 (L4)

    4

    T F

    1 (L1)

    2 (L2)

    3

    T

    F1 if ( z < 17)2 print(x);

    Short-circuit if evaluation is more complicated; I recommend working it out yourself.

    (Recall that node coverage does not imply edge coverage.)

    case / switch statements:

    1 s wi tc h ( n ) {2 c as e I : .. .; b re ak ;3 c as e J : . .. ; / * f al l t hr u * /4 c as e K : .. .; b re ak ;5 }6 / / . ..

    1 (L1)

    2 (L2) 3 (L3) 4 (L4)

    5(L2)

    6(L3)

    7(L4)

    8 (L6)

    i j

    k

    while statements:

    1 x = 0; y = 20;2 while ( x < y ) {3 x ++; y - -;4 }

    1 (x = 0;

    y = 20)

    2 (L2)

    3 (L3)

    4

    x < y

    (x < y )

    Note that arbitrarily complicated structures may occur inside the loop body.

    for statements:

    1 for ( int i = 0; i < 57; i ++) {2 if (i % 3 == 0) {3 p rin t ( i );4 }5 }

    (an exercise for the reader!)

    2

  • 7/18/2019 Patrick Lam Lectures

    28/69

    This example uses Javas enhanced for loops, which iterates over all of the elements in the widgetList :1 f or ( Wi dg e t w : w i dg e tL i st ) {2 decorate(w);3 }4 / / . ..

    I will accept the simplied CFG or the more useful one on the right:

    1 (L1)

    2 (L2)

    3 (L4)

    w widgetList

    it = wL.iterator()

    it.hasNext()

    w = it.next();decorate(w);

    3 (L4)

    T

    F

    All of these graphs admit the notions of node coverage (statement coverage, basic block coverage)and edge coverage (branch coverage).

    Larger example. You can draw a 7-node CFG for this program:1 / ** B i na r y s e a rc h f o r t a rg et i n s o r te d s u b a rr a y a [ l ow. . h ig h ] * /2 i nt b i na r y_ s ea r ch ( i nt [ ] a , i n t l ow, i nt h i gh , i nt t a rg e t ) {3 w hi le ( l ow a [ m id d le ] )8 l ow = m id dl e + 1 ;9 else

    10 r e turn m id dle ;11 }12 r e tu r n -1 ; / * no t fo u nd i n a [ l ow. . h ig h ] */13 }

    1. if (low

  • 7/18/2019 Patrick Lam Lectures

    29/69

    Exercises

    Here are more exercise programs that you can draw CFGs for.1 / * e ff ec t s : i f x = = n ul l , t h row N u l lP o in t e rE x cep t io n2 o th e rw i se , re t ur n nu m be r o f e l em e nt s i n x th a t a re od d , p o si t iv e o r b ot h . * /3 i nt o d dO r Po s ( i nt [ ] x ) {4 i nt co un t = 0 ;5 f or ( i nt i = 0; i < x . le ng th ; i ++ ) {6 if (x [i ]%2 == 1 | | x[ i] > 0) {7 coun t++;8 }9 }

    10 r e tu rn co u nt ;11 }12

    13 / / e xa mp le t es t c as e : i n pu t : x = [ -3 , - 2, 0 , 1 , 4 ]; o ut pu t : 3

    Next, we have a really poorly-designed API (Id give it a D at most, maybe an F) because itsimpossible to succinctly describe what it does. Do not design functions with interfaces likethis. But we can still draw a CFG, no matter how bad the code is.

    1 / ** R e tu r ns t he m e a n o f t h e f i rs t m a x Si z e n u m be r s i n t h e a rr ay,2 i f t he y a re b e t we e n m in a n d m ax . O t he r wi s e , s k ip t h e n um b er s . * /3 d o ub l e c o mp u te M ea n ( i nt [ ] v al ue , i nt m ax Si ze , i nt m in , i nt m ax ) {4 i nt i , ti , tv , s um ;5

    6 i = 0; ti = 0; tv = 0; sum = 0;7 w hi le ( ti < m ax Si ze ) {8 t i++;9 i f ( v al ue [ i ] >= mi n & & v al ue [ i] 0)16 r e turn (d o u b le ) su m / tv ;17 else18 t h row n ew I l l eg a lArg u m en tE x cep t io n ( ) ;19 }

    4

  • 7/18/2019 Patrick Lam Lectures

    30/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 10 January 26, 2015Patrick Lam version 0

    We are going to completely switch gears and talk about testing concurrent programs next. Forthose of you in 3A, SE 350 is the rst real exposure to these concepts; youll see them more in CS343. ECE 459 also teaches you how to leverage parallelism.

    Context: Multicores are everywhere today! For the past 10 years, chips have not beengetting more GHz. We still have more transistors though. Hardware manufacturers have beensharing this bounty with us, through the magic of multicore processors!

    If you want performance today, then you need parallelism.

    The dark side is that concurrency bugs will bite you.

    More often than not, printing a page on my dual-G5 crashes the application.The funny thing is, printing almost never crashes on my (single-core) G4PowerBook.

    http://archive.oreilly.com/pub/post/dreaded_concurrency.html

    The most famous kind of concurrency bug is the race condition. Lets look at this code.

    1 #include < iostream >2 #include < thread >34 int counter = 0;56 void func() {7 int tmp;8 tmp = counter;9 tmp++;

    10 counter = tmp;11 }1213 int main() {14 std::thread t1(func);15 std::thread t2(func);16 t1.join();17 t2.join();18 std::cout ./a.out2plam@polya /tmp> ./a.out2plam@polya /tmp> ./a.out1plam@polya /tmp> ./a.out1plam@polya /tmp> ./a.out

    2plam@polya /tmp> ./a.out2

    Yes, thats a race condition.

    A race occurs when you have two concurrent accesses to the same memory location, at leastone of which is a write .

    1

  • 7/18/2019 Patrick Lam Lectures

    31/69

    Race conditions arise between variables which are shared between threads. Note that when theresa race, the nal state may not be the same as running one access to completion and then the other.

    Tools to the rescue. While races may be entertaining, race conditions are never good. Youhave several dynamic analysis tools at your disposal to eradicate races, including:

    Helgrind (part of Valgrind) lockdep (Linux kernel) Thread Analyzer (Oracle Solaris Studio) Thread Analyzer (Coverity) Intel Inspector XE 2011 (formerly Intel Thread Checker)

    We can run the race condition shown above under Helgrind:

    plam@polya /tmp> g++ -std=c++11 race.C -g -pthread -o raceplam@polya /tmp> valgrind --tool=helgrind ./race[...]==6486== Possible data race during read of size 4 at 0x603E1C by thread #3==6486== Locks held: none==6486== at 0x400EA1: func() (race.C:8)==6486== by 0x402254: void std::_Bind_simple::_M_invoke(std::_Index_tuple) (functional:1732)==6486== by 0x4021AE: std::_Bind_simple::operator()() (functional:1720)==6486== by 0x402147: std::thread::_Impl::_M_run() (thread:115)==6486== by 0x4EF196F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)==6486== by 0x4C2F056: mythread_wrapper (hg_intercepts.c:234)==6486== by 0x56650A3: start_thread (pthread_create.c:309)==6486== by 0x595FCCC: clone (clone.S:111)==6486====6486== This conflicts with a previous write of size 4 by thread #2==6486== Locks held: none==6486== at 0x400EB1: func() (race.C:10)==6486== by 0x402254: void std::_Bind_simple::_M_invoke(std::_Index_tuple) (functional:1732)==6486== by 0x4021AE: std::_Bind_simple::operator()() (functional:1720)

    ==6486== by 0x402147: std::thread::_Impl::_M_run() (thread:115)==6486== by 0x4EF196F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)==6486== by 0x4C2F056: mythread_wrapper (hg_intercepts.c:234)==6486== by 0x56650A3: start_thread (pthread_create.c:309)==6486== by 0x595FCCC: clone (clone.S:111)==6486== Address 0x603e1c is 0 bytes inside data symbol "counter"

    Not enough. OK, great. Now youve eliminated all races (as required by specication). Of course, there are still lots of bugs that your program might contain. Some of them are evenconcurrency bugs. Here, there are no longer any contended accesses, but we cheated by cachingthe value. Atomic operations would be safe.

    1 #include 2 #include 3 #include 4

    5 int counter = 0;6 std::mutex m;7

    8 void func() {9 int tmp;

    2

  • 7/18/2019 Patrick Lam Lectures

    32/69

    10

    11 m.lock();12 tmp = counter;13 m.unlock();14 tmp++;15

    m.lock();16 counter = tmp;17 m.unlock();18 }

    We can test our code for additional concurrency bugs:

    run the code multiple times

    add noise (sleep, more system load, etc)

    Helgrind and friends

    force scheduling (e.g. Java PathFinder)

    static approaches: lock-set, happens-before, state-of-the-art techniques

    Reentrant/recursive Locks What happens if you have two requests for a POSIX/C++11 lock?

    If the requests are in different threads, the second thread waits for the rst thread to unlock. But,if the requests are in the same thread, that thread waits for itself to unlock. . . forever!

    To avoid this unhappy situation, we can use recursive locks. Each lock knows how many times itsowner has locked it. The owner must then unlock the same number of times to liberate.

    Java locks work this way, e.g.1 class SynchronizedIsRecursive {2 int x;3

    4 synchronized void f() {5 x--;6 g(); // does not hang!7 }8

    9 synchronized void g() {10 x++;11 }12 }

    Although every Java object is a lock, and we can synchronized() over every lock, ReentrantLocksare more special:

    we can explicitly lock() & unlock() them,

    3

  • 7/18/2019 Patrick Lam Lectures

    33/69

    (or even trylock() )!

    However, inexpertly-written Java programs might hog the lock. To avoid that, use a try / finallyconstruct:

    1 Lock lock = new ReentrantLock();2 lock.lock();3 try {4 // you got the lock! workworkwork 5 } finally {6 // might have thrown an exception 7 lock.unlock();8 }

    Tools for detecting lock usage issues

    The reference for the next example is Engler et al [ECCH00]. This example falls short of excellence:1 / 2.4.0:drivers/sound/cmpci.c:cm midi release: /2 lock_kernel(); // [PL: GRAB THE LOCK]3 if (file->f_mode & FMODE_WRITE) {4 add_wait_queue(&s->midi.owait, &wait);5 ...6 if (file->f_flags & O_NONBLOCK) {7 remove_wait_queue(&s->midi.owait, &wait);8 set_current_state(TASK_RUNNING);9 return -EBUSY; // [PL: OH NOES!!1]

    10 }11 ...12 }13 unlock_kernel();

    The problem: lock() and unlock() calls must be paired! They are on the happy path, but not onthe -EBUSY path. [ECCH00] describes a tool that allows developers to describe calls that must bepaired.

    Another example:

    1 foo(p, ...)2 bar(p, ...);

    1 foo(p, ...)2 bar(p, ...);

    1 foo(p, ...)2 // ERROR: foo, no bar!

    Our tool might then give the following results: 23 errors, 11 false positives. A false positive is some-thing where the tool reports an error, but for instance, the error is in an infeasible path.

    The next challenge is: how do we nd such rules? In particular, we want to nd rules of the formA() must be followed by B(), or a(); ... b(); , which denotes a MAY-belief that a() followsb(). iComment by Lin Tan et al propose mining comments to nd them [TYKZ07].

    Here is some OpenSolaris code that demonstrates expectations. Also different locking primitives.

    4

  • 7/18/2019 Patrick Lam Lectures

    34/69

    1 /* opensolaris/common/os/taskq.c: */ 2 /* Assumes: tq->tq_lock is held. */ 3 /* consistent */ 4 static void taskq_ent_free(...) { ... }5

    6

    static taskq_t7 *taskq_create_common(...) { ...8 // [different lock primitives below:]9 mutex_enter(...);

    10 taskq_ent_free(...); /* consistent */ 11 ...12 }

    Getting back to actual bugs, here is a bad comment automatically detected by iComment:

    1 /* mozilla/security/nss/lib/ssl/sslsnce.c: */ 2 /* Caller must hold cache lock when calling this. */ 3

    static sslSessionID * ConvertToSID(...) { ... }4

    5 static sslSessionID *ServerSessionIDLookup(...)6 {7 ...8 UnlockSet(cache, set); ...9 sid = ConvertToSID(...);

    10 ...11 }

    We observe a specication in the comment at line 2, and then a usage error at line 9 where weunlock the cache and then call ConvertToSID . The badness of the comment was conrmed byMozilla developers.

    Issue: Comments are not updated alongside code. Bad comments can and do cause bugs.

    Heres another bad comment in the Linux kernel, also automatically detected.

    1 // linux/drivers/ata/libata-core.c:2

    3 /* LOCKING: caller. */ 4 void ata_dev_select(...) { ...}5

    6 int ata_dev_read_id(...) {7

    ...8 ata_dev_select(...);9 ...

    10 }

    Once again, the specication at line 3 states that the caller is to lock. But line 8 calls ata dev select()without holding the lock. The badness of this comment was conrmed by Linux developers.

    5

  • 7/18/2019 Patrick Lam Lectures

    35/69

    Deadlocks

    Another concurrency problem is deadlocks. We focus on a particular form of deadlock here, whichoccurs when code may get interrupted by interrupt handlers, and the code shares locks with theinterrupt handler. This problem inspired aComment [TZP11].

    In particular, if the spinlock is taken by code that runs in interrupt context (either a hardwareor software interrupt), then code requesting the lock must use the spin lock form that disablesinterrupts. Otherwise, sooner or later, the code will deadlock.

    Heres an example of the right way to do things:

    1 spinlock_t mr_lock = SPIN_LOCK_UNLOCKED;2 unsigned long flags;3 spin_lock_irqsave(&mr_lock, flags);4 /* critical section... */ 5 spin_lock_irqrestore(&mr_lock, flags);

    6

  • 7/18/2019 Patrick Lam Lectures

    36/69

    spin lock irqsave() disables interrupts locally and provides spinlock on symmetric multipro-cessors (SMPs).

    spin lock irqrestore() restores interrupts to state when lock acquired.

    This covers both interrupt and SMP concurrency issues.

    References

    [ECCH00] Dawson R. Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Checking systemrules using system-specic, programmer-written compiler extensions. In Michael B.Jones and M. Frans Kaashoek, editors, OSDI , pages 116. USENIX Association, 2000.

    [TYKZ07] Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. /*icomment: bugs or badcomments?*/. In Thomas C. Bressoud and M. Frans Kaashoek, editors, SOSP , pages145158. ACM, 2007.

    [TZP11] Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. acomment: mining annotations fromcomments and code to detect interrupt related concurrency bugs. In Richard N. Taylor,Harald Gall, and Nenad Medvidovic, editors, ICSE , pages 1120. ACM, 2011.

    7

  • 7/18/2019 Patrick Lam Lectures

    37/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 11 January 28, 2015Patrick Lam version 1

    Assertions are a key ingredient in writing tests, so I thought I would explicitly describe them.

    Denition 1 An assertion contains an expression that is supposed to evaluate to true .

    For instance, if we have a linked list, then the doubly-linked list property states that prev is theinverse of next , i.e. for nodes n, we have n.next.prev == n . After inserting a node, one mightexpect this property to be true of that node (possibly with caveats, for instance about the end of the list.)

    We use assertions in unit tests to say whats supposed to be true.

    Preconditions and postconditions. More generally, we can express what is supposed to betrue upon entry & exit from a method.

    We saw this code in Linux:1 /* LOCKING: caller. */2 void ata_dev_select(...) { ...}

    This expresses an assertion (although not written as a program statement) that the lock is heldupon entry.

    Assume/Guarantee Reasoning. Why would you use preconditions and postconditions? Rea-soning about programs is difficult, and preconditions and postconditions simplies reasoning.

    When reasoning about the callee, you get to assume that the precondition holds upon entry.

    When reasoning about the caller, you get to guarantee the precondition holds before the call.

    The reverse holds about the postcondition.

    aComment

    We talked about the aComment approach for locking-related annotations. In particular, it:

    extracts locking-related annotations from code; extracts locking-related annotations from comments; and propagates annotations to callers.

    1

  • 7/18/2019 Patrick Lam Lectures

    38/69

    Tools

    To provide some context, consider the OS X Mavericks goto fail bug:1 if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)2 goto fail;3 if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)4 goto fail;5 goto fail; /* MISTAKE! THIS LINE SHOULD NOT BE HERE */ 6 if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)7 goto fail;8 err = sslRawVerify(...);9

    10 fail:11 return err;

    The bug:opensource.apple.com/source/Security/Security-55471/libsecurity_ssl/lib/sslKeyExchange.c

    No bug:www.opensource.apple.com/source/Security/Security-55179.13/libsecurity_ssl/lib/sslKeyExchange.c

    A writeup:nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/

    Detecting goto fail. In retrospect, a number of tools couldve found this bug:

    compiler -Wunreachable-code option PC-Lint:warning 539: Did not expect positive indentation PVS-Studio:V640: Logic does not match formatting

    The problem is that these tools also report many other issues, and live issues can be buried among

    those other issues.

    The Landscape of Testing and Static Analysis Tools

    Heres a survey of your options:

    manual testing; running a JUnit test suite, manually generated; running automatically-generated tests; running static analysis tools.

    Well examine several points on this continuum today. More on this later (Lecture 23), thanks toguest lecturer. Some examples:

    Coverity: a static analysis tool used by 900+ companies, including BlackBerry, Mozilla, etc. Microsoft requires Windows device drivers to pass their Static Driver Verier for certication.

    2

  • 7/18/2019 Patrick Lam Lectures

    39/69

    Tools for Java that you can download

    FindBugs. An open-source static bytecode analyzer for Java out of the University of Maryland.

    findbugs.sourceforge.net

    It nds bug patterns:

    off-by-one; null pointer dereference; ignored read() return value; ignored return value (immutable classes); uninitialized read in constructor; and more...

    FindBugs gives some false positives. Here are some techniques to help avoid them:

    patricklam.ca/papers/14.msr.saa.pdf

    Java Path Finder (JPF), NASA. The key idea: Implement a Java Virtual Machine, butexplore many thread interleavings, looking for concurrency bugs.

    JPF is an explicit state software model checker for Java TM bytecode.

    JPF can also search for deadlocks and unhandled exceptions ( NullPointerException , AssertionError );

    race conditions; missing heap bounds checks; and more.

    javapathfinder.sourceforge.net

    Korat (University of Illinois). Key Idea: Generate Java objects from a representation invariantspecication written as a Java method.

    For instance, heres a binary tree. Binary Tree!

    One characteristic of a binary tree:

    left & right pointers dont refer to same node.

    3

  • 7/18/2019 Patrick Lam Lectures

    40/69

    We can express that characteristic in Java as follows:1 boolean repOk() {2 if (root == null) return size == 0; // empty tree has size 0 3 Set visited = new HashSet(); visited.add(root);4 List workList = new LinkedList(); workList.add(root);5 while (!workList.isEmpty()) {6 Node current = (Node)workList.removeFirst();7 if (current.left != null) {8 if (!visited.add(current.left)) return false; // acyclicity 9 workList.add(current.left);

    10 }11 if (current.right != null) {12 if (!visited.add(current.right)) return false; // acyclicity 13 workList.add(current.right);14 }15 }16

    if (visited.size() != size) return false; // consistency of size17 return true;18 }

    Korat then generates all distinct (non-isomorphic) trees, up to a given size (say 3). It uses thesetrees as inputs for testing the add() method of the tree (or for any other methods.)

    korat.sourceforge.net/index.html

    Randoop (MIT). Key Idea: Writing tests is a difficult and time-consuming activity, and yetit is a crucial part of good software engineering. Randoop automatically generates unit tests forJava classes.Randoop generates random sequence of method calls, looking for object contract violations.

    To use it, simply point it at a program & let it run.

    Randoop discards bad method sequences (e.g. illegal argument exceptions). It remembers methodsequences that create complex objects, and sequences that result in object contract violations.

    code.google.com/p/randoop/

    Here is an example generated by Randoop:1

    public static void test1() {2 LinkedList list = new LinkedList();3 Object o1 = new Object();4 list.addFirst(o1);5

    6 TreeSet t1 = new TreeSet(list);7 Set s1 = Collections.synchronizedSet(t1);8

    9 // violated in the Java standard library!10 Assert.assertTrue(s1.equals(s1));11 }

    4

  • 7/18/2019 Patrick Lam Lectures

    41/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 14 February 4, 2015Patrick Lam version 1

    Recall that weve been discussing beliefs. Here are a couple of beliefs that are worthwhile to check.(examples courtesy Dawson Engler.)

    Redundancy Checking. 1) Code ought to do something. So, when you have code that doesntdo anything, thats suspicious. Look for identity operations, e.g.

    x = x , 1 y, x&x , x|x .

    Or, a longer example:1 /* 2.4.5-ac8/net/appletalk/aarp.c */ 2 da.s_node = sa.s_node;3 da.s_net = da.s_net;

    Also, look for unread writes:1 for (entry=priv->lec_arp_tables[i];2 entry != NULL; entry=next) {3 next = entry->next; // never read!4 ...5 }

    Redundancy suggests conceptual confusion.

    So far, weve talked about MUST-beliefs; violations are clearly wrong (in some sense). Lets examineMAY beliefs next. For such beliefs, we need more evidence to convict the program.

    Process for verifying MAY beliefs. We proceed as follows:

    1. Record every successful MAY-belief check as check.

    2. Record every unsucessful belief check as error.

    3. Rank errors based on check : error ratio.

    Most likely errors occur when check is large, error small.

    Example. One example of a belief is use-after-free:1 free(p);2 print(*p);

    That particular case is a MUST-belief. However, other resources are freed by custom (undocu-mented) free functions. Its hard to get a list of what is a free function and what isnt. So, letsderive them behaviourally.

    1

  • 7/18/2019 Patrick Lam Lectures

    42/69

    Inferring beliefs: nding custom free functions. The key idea is: if pointer p is not usedafter calling foo(p) , then derive a MAY belief that foo(p) frees p.

    OK, so which functions are free functions? Well, just assume all functions free all arguments:

    emit check at every call site;

    emit error at every use.

    (in reality, lter functions with suggestive names).

    Putting that into practice, we might observe:

    foo(p)p = x;

    foo(p)p = x;

    foo(p)p = x;

    bar(p)p = 0;

    bar(p)p=0;

    bar(p)p = x;

    We would then rank bar s error rst. Plausible results might be: 23 free errors, 11 false positives.

    Inferring beliefs: nding routines that may return NULL. The situation: we want to knowwhich routines may return NULL. Can we use static analysis to nd out?

    sadly, this is difficult to know statically ( return p->next; ?) and, we get false positives: some functions return NULL under special cases only.

    Instead, lets observe what the programmer does. Again, rank errors based on checks vs non-checks.As a rst approximation, assume all functions can return NULL.

    if pointer checked before use: emit check;

    if pointer used before check: emit error.This time, we might observe:

    p = bar(...);p = x;

    p = bar(...);if (!p) return;p = x;

    p = bar(...);if (!p) return;p = x;

    p = bar(...);if (!p) return;p = x;

    Again, sort errors based on the check:error ratio.

    Plausible results: 152 free errors, 16 false positives.

    2

  • 7/18/2019 Patrick Lam Lectures

    43/69

    General statistical technique

    When we write a(); . . . b();, we mean a MAY-belief that a() is followed by b(). We dont actuallyknow that this is a valid belief. Its a hypothesis, and well try it out. Algorithm: assume every ab is a valid pair;

    emit check for each path with a() and then b(); emit error for each path with a() and no b().

    (actually, prelter functions that look paired).

    Consider:foo(p, ...);bar(p, . . . ); // check

    foo(p, ...);bar(p, . . . ); // check

    foo(p, ...);// error: foo, no bar!

    This applies to the course project as well.

    1 void scope1() {2 A(); B(); C(); D();3 }45 void scope2() {6 A(); C(); D();7 }8

    9 void scope3() {10 A(); B();11 }12

    13 void scope4() {14 B(); D(); scope1();15 }16

    17 void scope5() {18 B(); D(); A();19 }2021 void scope6() {22 B(); D();23 }

    A() and B() must be paired:either A() then B() or B() then A().

    Support = # times a pair of functions appears together.support( {A,B})=3

    Condence( {A,B } ,{A }) =

    support( {A,B})/support( {A}) = 3/4

    Sample output for support threshold 3, condence threshold 65% (intra-procedural analysis):

    bug:A in scope2, pair: (A B), support: 3, condence: 75.00% bug:A in scope3, pair: (A D), support: 3, condence: 75.00% bug:B in scope3, pair: (B D), support: 4, condence: 80.00% bug:D in scope2, pair: (B D), support: 4, condence: 80.00%

    The point is to nd examples like the one from cmpci.c where theres a lock kernel() call, but,on an exceptional path, no unlock kernel() call.

    Summary: Belief Analysis. We dont know what the right spec is. So, look for contradictions.

    MUST-beliefs: contradictions = errors! MAY-beliefs: pretend theyre MUST, rank by condence.

    (A key assumption behind this belief analysis technique: most of the code is correct.)

    3

  • 7/18/2019 Patrick Lam Lectures

    44/69

    Further references. Dawson R. Engler, David Yu Chen, Seth Hallem, Andy Chou and BenjaminChelf. Bugs as Deviant Behaviors: A general approach to inferring errors in systems code. InSOSP 01.

    Dawson R. Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Checking system rules us-ing system-specic, programmer-written compiler extensions. In OSDI 00 (best paper). www.stanford.edu/ ~engler/mc-osdi.pdf

    Junfeng Yang, Can Sar and Dawson Engler. eXplode: a Lightweight, General system for FindingSerious Storage System Errors. In OSDI06. www.stanford.edu/ ~engler/explode-osdi06.pdf

    Using Linters

    We will also talk about linters in this lecture, based on Jamie Wongs blog post jamie-wong.com/2015/02/02/linters-as-invariants/ .

    First there was C. In statically-typed languages, like C,

    1 #include 2

    3 int main() {4 printf("%d\n", num);5 return 0;6 }

    the compiler saves you from yourself. The guaranteed invariant:

    if code compiles, all symbols resolve.

    Less-nice languages. OK, so you try to run that in JavaScript and it crashes right away. In-variant?

    if code runs, all symbols resolve?

    But what about this:

    1 function main(x) {2 if (x) {3 console.log("Yay");4 } else {5 console.log(num);6 }7 }8

    9 main( true );

    4

  • 7/18/2019 Patrick Lam Lectures

    45/69

    Nope! The above invariant doesnt work.

    OK, what about this invariant:

    if code runs without crashing, all symbols referenced in the code path executed resolve?

    Nope!1 function main() {2 try {3 console.log(num);4 } catch (err) {5 console.log("nothing to see here");6 }7 }8

    9 main();

    So, when youre working in JavaScript and maintaining old code, you always have to deduce: is this variable dened? is this variable always dened? do I need to load a script to dene that variable?

    We have computers. Theyre powerful. Why is this the developers problem?!

    1 function main(x) {2 if (x) {3 console.log("Yay");4 } else {5 console.log(num);6 }7 }8

    9 main( true );

    Now:

    $ nodejs /usr/local/lib/node_modules/jshint/bin/jshint --config jshintrc foo.jsfoo.js: line 5, col 17, num is not defined.

    1 error

    Invariant:

    If code passes JSHint, all top-level symbols resolve.

    5

  • 7/18/2019 Patrick Lam Lectures

    46/69

    Strengthening the Invariant. Can we do better? How about adding a pre-commit hook?

    If code is checked-in and commit hook ran,all top-level symbols resolve.

    Of course, sometimes the commit hook didnt run. Better yet:

    Block deploys on test failures.

    Better invariant.

    If code is deployed,all top-level symbols resolve.

    Even better yet. It is hard to tell whether code is deployed or not. Use git feature branches,merge when deployed.

    If code is in master,all top-level symbols resolve.

    6

  • 7/18/2019 Patrick Lam Lectures

    47/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 15 February 6, 2015Patrick Lam version 1

    Data ow Criteria

    So far weve seen structure-based criteria which imposed test requirements solely based on thenodes and edges of a graph. These criteria have been oblivious to the contents of the nodes.

    However, programs mostly move data around, so it makes sense to propose some criteria based onthe ow of data around a program. Well be talking about du -pairs, which connect denitions anduses of variables.

    du-pair

    x = 5 // def(x)...print(x) // use(x)

    Lets look at some graphs.

    n 0 : x = 5n 1 :

    print(x)

    We write

    def(n 0 ) = use( n 1 ) = {x}.

    Note that edges can also have defs and uses, for instance in a graph corresponding to a nite statemachine. In that case, we could write:

    use(n 0 , n 1 ) = {}.

    Heres another example.

    q 0 : z=x-y q 1 :if (z < w)

    q 2 :print(z)

    q 3 :print(w)

    q 4

    1

  • 7/18/2019 Patrick Lam Lectures

    48/69

    A particular def d of variable x may (or may not) reach a particular use u . If a def may reacha particular use, then there exists a path from d to u which is free of redenitions of x . In thefollowing graph, the def at n 2 does not reach the use at n 3 , since no path goes from n 2 to n 3 .

    n 2 : x=5

    n 3 : use(x)

    Another example of a denition which does not reach:

    n 0 : x = 2 n 1 : x = 17 n 2 : use(x)

    We say that the denition at n 1 kills the denition at n 0 , so that def( n 0 ) does not reach n 2 . Weare therefore looking for def-clear paths.

    Denition 1 A path p from 1 to m is def-clear with respect to variable v if for every node n k and every edge ek on p from 1 to m , where k = 1 and k = m , then v is not in def (n k ) or in def (ek ).

    That is, nothing on the path p from location 1 to location m redenes v . (Locations are edges ornodes.)

    Denition 2 A def of v at i reaches a use of v at 2 if there exists a def-clear path from i to jwith respect to v.

    Quick poll: does the def at n 0 reach the use at n 5 ?

    n 0 : x=1729 n 1

    n 2 : x=84

    n 3

    n 4 n 5 : use(x)

    Building on the notion of a def-clear path:

    Denition 3 A du-path with respect to v is a simple path that is def-clear with respect to v from a node n i , such that v is in def (n i ), to a node n j , such that v is in use (n j ).

    2

  • 7/18/2019 Patrick Lam Lectures

    49/69

    (This denition could be easily modied to use edges ei and e j ).

    Note the following three points about du -paths:

    associated with a variable

    simple (otherwise there are too many) may be any number of uses in a du-path

    Coverage criteria using du-paths

    We next create groups of du -paths. Consider again the following double-diamond graph D :

    n 5 : x = 5

    u 0 : use(x)

    n 9 : x = 9

    u 1

    n 3 : x = 3

    n : use(x)

    u 2 : use(x)

    We will dene two sets of du -paths:

    def-path sets: x a def and a variable, e.g.

    du(n 5 , x ) = du(n 3 , x ) =

    def-pair sets: x a def, a use, and a variable, e.g. du( n 5 , n ,x ) =

    These sets will give the notions of all-defs coverage (tour at least one du -path from each def-pathseta weak criterion) and all-uses coverage (tour at least one du -path from each def-pair set).

    How can there be multiple elements in a def-pair set?

    3

  • 7/18/2019 Patrick Lam Lectures

    50/69

    Heres an example with two du -paths in a def-pair set.

    n 17 : x = 17 n : use(x)

    We then havedu( n 17 , n ,x ) =

    Note the general relation

    du( n i , v ) = n j du( n i , n j , v )

    There are more def-pair sets than def-path sets. Cycles are always allowed as du -paths, as long asthe du -path is simple; you can always tour a du -path with a non-simple path, of course.

    Useful exercise. Create an example where one def-path set splits into several def-pair sets; youcan get a smaller example than the one in the book.

    We can use the above denitions to provide coverage criteria.

    Criterion 1 All-Defs Coverage (ADC). For each def-path set S = du (n, v ), TR contains at least

    one path d in S .

    Criterion 2 All-Uses Coverage (AUC). For each def-pair set S = du (n i , n j , v ), TR contains at least one path d in S .

    4

  • 7/18/2019 Patrick Lam Lectures

    51/69

    What do these criteria mean? For each def,

    ADC: reach at least one use;

    AUC: reach every use somehow;

    In the context of the earlier example,

    ADC requires:

    AUC requires:

    Nodes versus edges. So far, weve assumed denitions and uses occur on nodes.

    uses on edges ( p-uses) work as well;

    defs on edges are trickier, because a du -path from an edge to an edge may not be simple.(We could make things work out with more work.)

    Another example.

    n 0 : def(x)

    n 1 : use(x)

    n 2

    n 3

    n 4

    n 5 : use(x)

    n 6

    Some test sets that meet these criteria:

    ADC:

    AUC:

    5

  • 7/18/2019 Patrick Lam Lectures

    52/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 16 February 9, 2015Patrick Lam version 1

    Subsumption Chart

    Here are the subsumption relationships between our graph criteria.

    Complete Path CoverageCPC

    Prime Path CoveragePPC

    All-du-Paths CoverageADUPC

    All-Uses CoverageAUC

    All-Defs CoverageADC

    Edge-Pair CoverageEPC

    Edge CoverageEC

    Node CoverageNC

    Complete Round-Trip CoverageCRTC

    Simple Round-Trip CoverageSRTC

    We know EPC subsumes EC subsumes NC from before, and clearly CRTC subsumes SRTC.Also, CPC subsumes PPC subsumes EPC, and PPC subsumes CRTC, because simple pathsinclude round trips.

    In this offering, we didnt talk about ADUPC at all, and we only touched briey on CRTCand SRTC.

    Assumptions for dataow criteria:

    1. every use preceded by a def (guaranteed by Java)

    2. every def reaches at least one use

    1

  • 7/18/2019 Patrick Lam Lectures

    53/69

    3. for every node with multiple outgoing edges, at least one variable is used on each out edge,and the same variables are used on each out edge.

    Then:

    AUC subsumes ADC. Each edge has at least one use, so AUC subsumes EC.

    Finally, each du -path is also a simple path, so PPC subsumes ADUPC (not discussed thisterm, but ADUPC subsumes AUC). (Note that prime paths are simpler to compute thandata ow relationships, especially in the presence of pointers.

    Dataow Graph Coverage for Source Code

    Last time, we saw how to construct graphs which summarized a control-ow graphs structure.Lets enrich our CFGs with denitions and uses to enable the use of our dataow criteria.

    Denitions. Here are some Java statements which correspond to denitions.

    x = 5: x occurs on the left-hand side of an assignment statement;

    foo(T x) { ... } : implicit denition for x at the start of a method;

    bar(x) : during a call to bar , x might be dened if x is a C++ reference parameter.

    (subsumed by others): x is an input to the program.

    Examples:

    Uses. The book lists a number of cases of uses, but it boils down to x occurs in an expressionthat the program evaluates. Examples: RHS of an assignment, or as part of a method parameter,or in a conditional.

    Complications. As I said before, the situation in real-life is more complicated: weve assumedthat x is a local variable with scalar type.

    What if x is a static eld, an array, an object, or an instance eld?

    How do we deal with aliasing?

    2

  • 7/18/2019 Patrick Lam Lectures

    54/69

    One answer is to be conservative and note that weve said that a denition d reaches a use u if itis possible that the address dened at d refers to the same address used at u. For instance:

    c l a s s C { i n t f ; }v o i d f o o ( C q ) { u se (q . f ) ; }

    x = new C ( ) ; x . f = 5 ;y = new C ( ) ; y . f = 2 ;

    fo o (x ) ;f o o (y ) ;

    Our denition says that both denitions reach the use.

    Exercise. Consider the following graph:

    N = {0, 1, 2, 3, 4, 5, 6, 7}

    N 0 = {0}

    N

    f = {7} E = {(0, 1), (1, 2), (1, 7), (2, 3), (2, 4), (3, 2), (4, 5), (4, 6), (5, 6), (6, 1)}

    test paths:

    t1 = [0, 1, 7] t2 = [0, 1, 2, 4, 6, 1, 7] t3 = [0, 1, 2, 4, 5, 6, 1, 7] t4 = [0, 1, 2, 3, 2, 4, 6, 1, 7]

    def(0) = def(3) = use(5) = use(7) = { x }

    (a) Draw the graph.(b) List all du -paths with respect to x . Include all du -paths, even those that are subpaths of others.

    (c) List a minimal test set (using the given paths) satisfying all-defs coverage with respect to x.

    (d) List a minimal test set (using the given paths) satisfying all-uses coverage with respect to x.

    Compiler tidbit. In a compiler, we use intermediate representations to simplify expressions,including denitions and uses. For instance, we would simplify:

    x = foo(y + 1 , z * 2)

    Basic blocks and defs/uses. Basic blocks can rule out some denitions and uses as irrelevant.

    Defs: consider the last denition of a variable in a basic block. (If were not sure whether xand y are aliased, leave both of them.)

    Uses: consider only uses that arent dominated by a denition of the same variable in thesame basic block, e.g. y = 5; use(y) is not interesting.

    3

  • 7/18/2019 Patrick Lam Lectures

    55/69

    Graph Coverage for Design Elements

    We next move beyond single methods to design elements, which include multiple methods, classes,modules, packages, etc. Usually people refer to such testing as integration testing.

    Structural Graph Coverage.

    We want to create graphs that represent relationships couplings between design elements.

    Call Graphs. Perhaps the most common interprocedural graph is the call graph .

    design elements, or nodes, are methods (or larger program subsystems) couplings, or edges, are method calls.

    Consider the following example.

    A

    B

    C

    D

    E

    F

    For method coverage, must call each method.

    For edge coverage, must visit each edge; in this particular case, must get to both A and Cfrom both of their respective callees.

    Like any other type of testing, call graph-based testing may require test harnesses. Imagine, forinstance, a library that implements a stack; it exposes push and pop methods, but needs a testdriver to exercise these methods.

    Data Flow Graph Coverage for Design ElementsThe structural coverage criteria for design elements were not very satisfying: basically we only hadcall graphs. Lets instead talk about data-bound relationships between design elements.

    caller: unit that invokes the callee; actual parameter: value passed to the callee by the caller; and formal parameter: placeholder for the incoming variable.

    Illustration.

    caller: foo(actual1, actual2);callee: void foo(int formal1, int formal2) { }

    4

  • 7/18/2019 Patrick Lam Lectures

    56/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 17 February 11, 2015Patrick Lam version 1

    We want to dene du-pairs between callers and callees. Heres an example.

    1: x=5

    2: x=4

    3: x=3

    4: B(x)

    11:B(int y)

    12: z=y

    13: t=y

    14:print(y)

    Well say that the last-defs are 2, 3; the rst-use is 12. We formally dene these notions:

    Denition 1 (Last-def) . The set of nodes N that dene a variable x for which there is a def-clear path from a node n N through a call to a use in the other unit.

    Denition 2 (First-use) . The set of nodes N that have uses of y and for which there exists a path that is def-clear and use-clear from the entry point (if the use is in the callee) or the callsite (if the use is in the caller) to the nodes N .

    We need the following side denition, analogous to that of def-clear:

    Denition 3 A path p = [n 1 , . . . , n j ] is use-clear with respect to v if for every n k p, where k = 1and k = j , then v is not in use (n k ).

    In other words, the last-def is the denition that goes through the call or return, and the rst-usepicks up that denition.

    1

  • 7/18/2019 Patrick Lam Lectures

    57/69

    Here are two more examples:

    x = 14; // last-defy = g(x);print(y); // first-use

    int g(a) {print(a); // first-useb = 24; // last-defreturn b;

    }

    main() {x = 1 ;x = 2; // last-defif (...) { x = 3; /* last-def */ }B(x);

    }B(int y) {

    if (...) {Z = y; // first-use

    } else {T = y; // first-use

    }print(y);

    }

    One can create pairs of triples linking last-defs and rst-uses; characterize a last-def or a rst-useby the method name, the variable name, and the statement, and then link them.

    Tests can, of course, go beyond just testing the rst-use. Our rst-use and last-def denitions,however, make testing slightly more tractable. We could, of course, carry out full inter-proceduraldata-ow, i.e. covering all du-pairs, but this would be more expensive.

    Syntax-Based Testing

    We are going to completely switch gears now. We will see two applications of context-free grammars:

    1. input-space grammars: create inputs (both valid and invalid)

    2. program-based grammars: modify programs (mutation testing)

    Mutation testing. The basic idea behind mutation testing is to improve your test suites bycreating modied programs ( mutants ) which force your test suites to include test cases whichverify certain specic behaviours of the original programs by killing the mutants.

    Generating Inputs: Regular Expressions and Grammars

    Consider the following Perl regular expression for Visa numbers:

    ^4[0-9]{12}(?:[0-9]{3})?$

    Idea: generate valid tests from regexps and invalid tests by mutating the grammar/regexp. (Whydid I put valid in quotes? What is the fundamental limitation of regexps?)

    2

  • 7/18/2019 Patrick Lam Lectures

    58/69

    Instead, we can use grammars to generate inputs (including sequences of input events).

    Typical grammar fragment:

    mult exp = unary exp | mult exp STAR unary arith exp | mult exp DIV unary arith exp;unary exp = quant exp | unary exp DOT INT | unary exp up;

    ...start = header ? declaration

    Using Grammars. Two ways you can use input grammars for software testing and maintenance:

    recognizer: can include them in a program to validate inputs;

    generator: can create program inputs for testing.

    Generators start with the start production and replace nonterminals with their right-hand sides toget (eventually) strings belonging to the input languages.

    We specify three coverage criteria for inputs with respect to a grammar G.

    Criterion 1 Terminal Symbol Coverage (TSC). TR contains each terminal of grammar G.

    Criterion 2 Production Coverage (PDC). TR contains each production of grammar G.

    Criterion 3 Derivation Coverage (DC). TR contains every possible string derivable from G.

    PDC subsumes TSC. DC often generates innite test sets, and even if you limit to xed-lengthstrings, you still have huge numbers of inputs.

    Another Grammar.

    roll = actionaction = dep | deb

    dep = "deposit" account amountdeb = "debit" account amountaccount = digit { 3 }amount = "$" digit+ "." digit { 2 }digit = [0 9]

    3

  • 7/18/2019 Patrick Lam Lectures

    59/69

    Examples of valid strings.

    Note: creating a grammar for a system that doesnt have one, but should, is a useful QA exercise.Using this grammar at runtime to validate inputs can improve software reliability, although itmakes tests generated from the grammar less useful.

    Some Grammar Mutation Operators.

    Nonterminal Replacement; e.g.dep = "deposit" account amount = dep = "deposit" amount amount

    (Use your judgement to replace nonterminals with similar nonterminals.)

    Terminal Replacement; e.g.amount = "$" digit+ "." digit { 2 } = amount = "$" digit+ "$" digit { 2 }

    Terminal and Nonterminal Deletion; e.g.dep = "deposit" account amount = dep = "deposit" amount

    Terminal and Nonterminal Duplication; e.g.dep = "deposit" account amount = dep = "deposit" account account amount

    Using grammar mutation operators.

    1. mutate grammar, generate (invalid) inputs; or,

    2. use correct grammar, but mis-derive a rule oncegives closer inputs (since you only missonce.)

    4

  • 7/18/2019 Patrick Lam Lectures

    60/69

    Why test invalid inputs? Bill Sempf (@sempf), on Twitter: QA Engineer walks into a bar.Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Ordersa sfdeljknesv.

    If youre lucky, your program accepts strings (or events) described by a regexp or a grammar. Butyou might not use a parser or regexp engine. Generating using the regexp or grammar helps detectdeviations, both right now and in the future.As you saw in Assignment 1, its the easiest thing to overlook invalid inputs. Yet they may lead toundened behaviour.

    What well do is to mutate the grammars and generate test strings from the mutated grammars.

    Some notes:

    Book claims we dont have much experience using grammar-based operators.

    Can generate strings still in the grammar even after mutation.

    Recall that we arent talking about semantic checks. Some programs accept only a subset of a specied larger language, e.g. Blogger HTML

    comments. Then testing intersection is useful.

    5

  • 7/18/2019 Patrick Lam Lectures

    61/69

    Software Testing, Quality Assurance and Maintenance Winter 2015

    Lecture 18 February 13, 2015Patrick Lam version 1

    Mutation Testing

    The second major way to use grammars in testing is mutation testing. Strings are usually programs,but could also be inputs, especially for testing based on invalid strings.

    Denition 1 Ground string: a (valid) string belonging to the language of the grammar.

    Denition 2 Mutation Operator: a rule that species syntactic variations of strings generated

    from a grammar.

    Denition 3 Mutant: the result of one application of a mutation operator to a ground string.

    Mutants may be generated either by modifying existing strings or by changing a string while it isbeing generated.

    It is generally difficult to nd good mutation operators. One example of a bad mutation operatormight be to change all predicates to true.

    Note that mutation is hard to apply by hand, and automation is complicated. The testing com-munity generally considers mutation to be a gold standard that serves as a benchmark againstwhich to compare other testing criteria against.

    More on ground strings. Mutation manipulates ground strings to produce variants on thesestrings. Here are some examples:

    the program that we are testing; or valid inputs to a program.

    If we are testing invalid inputs, we might not care about ground strings.

    Credit card number examples. Valid strings:Invalid strings:

    We can also create mutants by applying a mutation operator during generation, which is usefulwhen you dont need the ground string.

    NB: There are many ways for a string to be invalid but still be generated by the grammar.

    1

  • 7/18/2019 Patrick Lam Lectures

    62/69

    Some points:

    How many mutation operators should you apply to get mutants? One. Should you apply every mutation operator everywhere it might apply? More work, but can

    subsume other criteria.

    Killing Mutants

    Generate a mutant m for an original ground string m 0 .

    Denition 4