Combinatorial Test Coverage Measurement
User manual for coverage measurement tool can be found here.
Test coverage is one of the most important topics in software assurance. Users would like some quantitative measure to judge the risk in using a product. For a given test set, what can we say about the combinatorial coverage it provides? With physical products, such as light bulbs or motors, reliability engineers can provide a probability of failure within a particular time frame. This is possible because the failures in physical products are typically the result of physics, such as metal fatigue.
With software the situation is more complex, and many different approaches have been devised for determining software test coverage. With millions of lines of code, or only with a few thousand, the number of paths through a program is so large that it is impossible to test all paths. For each if statement, there are two possible branches, so nif statements in sequence lead to 2^{n} possible paths, and with loops (while statements) the number of paths is infinite. Thus a variety of measures have been developed to gauge the degree of test coverage. The following are some of the betterknown coverage metrics:
 Statement coverage: This is the simplest of coverage criteria – the percentage of statements exercised by the test set. While it may seem at first that 100% statement coverage should provide good confidence in the program, in practice, statement coverage is often considered nearly worthless. At best, less than nearly full statement coverage indicates that a test set is inadequate.
 Decision or branch coverage: The percentage of branches that have been evaluated to both true and false by the test set.
 Condition coverage: The percentage of conditions within decision expressions that have been evaluated to both true and false. Note that 100% condition coverage does not guarantee 100% decision coverage. For example, “if (A  B) {do something} else {do something else}” is tested with [0 1], [1 0], then A and B will both have been evaluated to 0 and 1, but the else branch will not be taken because neither test leaves both A and B false.
 Modified condition decision coverage (MCDC): This is a strong coverage criterion that is required by the US Federal Aviation Administration for Level A (catastrophic failure consequence) software; i.e., software whose failure could lead to complete loss of life. It requires that every condition in a decision in the program has taken on all possible outcomes at least once, and each condition has been shown to independently affect the decision outcome, and that each entry and exit point have been invoked at least once.
Note that the coverage measures above depend on access to program source code. Combinatorial testing, in contrast, is a black box technique. Inputs are specified and expected results determined from some form of specification. The program is then treated as simply a processor that accepts inputs and produces outputs, with no knowledge expected of its inner workings.
Even in the absence of knowledge about a program’s inner structure, we can apply combinatorial methods to produce precise and useful measures. In this case, we measure the state space of inputs. Suppose we have a program that accepts two inputs, x and y, with 10 values each. Then the input state space consists of the 10^{2} = 100 pairs of x and y values, which can be pictured as a checkerboard square of 10 rows by 10 columns. With three inputs, x, y, and z, we would have a cube with 10^{3} = 1,000 points in its input state space. Extending the example to n inputs we would have a (hard to visualize) hypercube of ndimensions with 10^{n} points. Exhaustive testing would require inputs of all 10^{n}combinations, but combinatorial testing could be used to reduce the size of the test set.
Since it is nearly always impossible to test all possible combinations, combinatorial testing is a reasonable alternative. For some value of t, testing alltway interactions will detect nearly all errors. It is possible that t = n, but recalling the empirical data on failures, we would expect t to be relatively small. Determining the level of level of input or configuration state space coverage can help in understanding the degree of risk that remains after testing. If 90%  100% of the state space has been covered, then presumably the risk is small, but if coverage is much smaller, then the risk may be substantial. But how should state space coverage be measured? Looking closely at the nature of combinatorial testing leads to several measures that are useful. We begin by introducing what will be called a variablevalue configuration.
Definition. For a set of t variables, a variablevalue configuration is a set of tvalid values for the variables.
Example. Given four binary variables, a, b, c, and d, a=0, c=1, d=0 is a variablevalue configuration, and a=1, c=1, d=0 is a different variablevalue configuration for the same three variables a, c, and d.
Simple tway combination coverage
Of the total number of tway combinations for a given collection of variables, what percentage will be covered by the test set? If the test set is a covering array, then coverage is 100%, by definition, but many test sets not based on covering arrays may still provide significant tway coverage. If the test set is large, but not designed as a covering array, it is very possible that it provides 2way coverage or better. For example, random input generation may have been used to produce the tests, and good branch or condition coverage may have been achieved. In addition to the structural coverage figure, it would be helpful to know what percentage of 2way, 3way, etc. coverage has been obtained.
Definition. For a given test set for n variables, simple tway combination coverage is the proportion of tway combinations of n variables for which all variablevalues configurations are fully covered.
Example. Figure 1 shows an example with four binary variables, a, b, c, and d, where each row represents a test. Of the six 2way combinations, ab, ac, ad, bc,bd, cd, only bd and cd have all four binary values covered, so simple 2way coverage for the four tests in Figure 1 is 1/3 = 33.3%. There are four 3way combinations, abc, abd, acd, bcd, each with eight possible configurations: 000, 001, 010, 011, 100, 101, 110, 111. Of the four combinations, none has all eight configurations covered, so 3way coverage for this test set is 0%.
a

b

c

d

0

0

0

0

0

1

1

0

1

0

0

1

0

1

1

1

Figure 1. An example test array for a system with four binary components
t+kway combination coverage
A test set that provides some percentage of coverage for tway combinations will also provide some degree of coverage for t+1way combinations, t+2way combinations, etc. This statistic may be useful for comparing two combinatorial test sets. For example, different algorithms may be used to generate 3way covering arrays. They both achieve 100% 3way coverage, but if one provides better 4way and 5way coverage, then it can be considered a more thorough set.
Definition. For a given test set for n variables, t+kway combination coverage is the proportion of t+kway combinations of n variables for which all variablevalues configurations are fully covered.
Example. If the test set in Figure 1 is extended as shown in Figure 2, we can extend 3way coverage. For this test set, bcd is covered, so 2way coverage is 100%, and 2+1way = 3way coverage is 25%.
a

b

c

d

0

0

0

0

0

1

1

0

1

0

0

1

0

1

1

1

0

1

0

1

1

0

1

1

1

0

1

0

0

1

0

0

Figure 2. Eight tests for four binary variables
Configuration coverage
So far we have only considered measures of the proportion of combinations for which all configurations of t variables are fully covered. But when t variables withv values each are considered, each ttuple has v ^{t} configurations. For example, in pairwise (2way) coverage of binary variables, every 2way combination has four configurations: 00, 01, 10, 11. We can define two measures with respect to configurations:
Definition. For a given set of t variables, configuration coverage is the proportion of variablevalue configurations that are covered.
Definition. For a given set of n variables, (p, t)completeness is the proportion of the C(n, t) combinations that have configuration coverage of at least p.
Example. For Figure 1 above, there are C(4,2) = 6 possible variable combinations and C(4,2) * 2^{2} = 24 possible variablevalue configurations. Of these, 19 variablevalue configurations are covered and the only ones missing are ab=11, ac=11, ad=10, bc=01, bc=10. But only two, bd and cd, are covered with all 4 value pairs. So for the basic definition of simple tway coverage, we have only 33% (2/6) coverage, but 79% (19/24) for the configuration coverage metric. For a better understanding of this test set, we can compute the configuration coverage for each of the six variable combinations, as shown in Figure 3. So for this test set, 17% of the variablevalue configurations are covered at the 50% level, 50% are covered at the 75% level, and 33% are covered at the 100% level. And, as noted above, for the whole set of tests, 79% of variablevalue configurations are covered. All 2way combinations have at least 50% configuration coverage, so (.50, 2)completeness for this set of tests is 100%.
Figure 3. The test array covers the six possible 2way combinations of a, b, c, and d to different levels, as shown in Table 1.
Table 1

Vars

Configurations

Config coverage

ab

00, 01, 10

.75

ac

ac: 00, 01, 10

.75

ad

ad: 00, 01, 11

.75

bc

bc: 00, 11

.50

bd

bd: 00, 01, 10, 11

1.0

cd

cd: 00, 01, 10, 11

1.0

 total 2way coverage = 19/24 = .79167
 (.50, 2)completeness = 6/6 = 1.0
 (.75, 2)completeness = 5/6 = 0.83333
 (1.0, 2)completeness = 2/6 = 0.33333
Figure 4 is an example of coverage for a 2^{87}3^{2}4^{5} set (87 binary, two 3value, and five 4value) of input variables (blue=2way, pink=3way, yellow=4way). This particular test set was not a covering array, but pairwise coverage is still quite good, with about 95% of the variables having all possible 2way configurations covered. Even for 4way combinations we see that all variables have at least 28% of their configurations covered, and about 25% of them have about 98% or more of 4way configurations covered.
Figure 4. Configuration coverage for 2^{87}3^{2}4^{5} inputs.
Figure 5. Configuration coverage for 2^{79}3^{1}4^{1}6^{1}9^{1}
Implication for Testing
Even if combinatorial testing is not used, measuring configurationspanning coverage can be helpful in understanding state space coverage. If we do use combinatorial testing, then configurationspanning coverage will be 100% for the level of t that was selected, but we may still want to investigate the coverage our test set provides for t+1 or t+2. Calculating this statistic can help in choosing between tway covering arrays generated by different algorithms. As seen in the examples above, it may be relatively easy to produce tests that provide a high degree of spanning coverage, even if not 100%. In many cases it may be possible to generate additional tests to boost the coverage of a test set.
J.R. Maximoff, M.D. Trela, D.R. Kuhn, R. Kacker, "A Method for Analyzing System Statespace Coverage within a tWise Testing Framework", IEEE International Systems Conference 2010, Apr. 411, 2010, San Diego.