Q: How can testing be "effectively exhaustive"?
A: All or nearly all failures involve only 1 to 6 factors
The key insight underlying combinatorial testing’s effectiveness resulted from a series of studies by NIST from 1999 to 2004. NIST research showed that most software bugs and failures are caused by one or two parameters, with progressively fewer by three or more. That is, they were only revealed when multiple conditions were true. For example, a 2-way interaction fault could be "altitude = 0 AND volume < 2.2". So testing all 2-way combinations of parameter values could detect this problem. A method called "pairwise testing" has been popular for decades as a way of detecting such interactions.
But it is not enough to test all pairs of values, because many failures are only revealed when more than two conditions are true. Surprisingly, no one had investigated the distribution of interactions involving more than two factors prior to the 1999 research. Looking at many other applications in different domains revealed similar patterns of failure-triggering interactions:
Implications for software testing are significant
As can be seen in the graph, most failures were caused by one or two parameters, with progressively fewer by three or more. This finding, referred to as the interaction rule, has important implications for software testing because
- it is nearly always impossible to do exhaustive testing, but
- we don't have to test all possible combinations of inputs;
- we only have to test all of the combinations that trigger faults
If all failures involve t or fewer factors, then testing all t-way combinations of factor values is in some sense equivalent to exhaustive testing
As noted, empirical data show that only a small number of factors are involved in software failures. We can't do exhaustive testing, but the interaction rule says we don't have to; we can still provide very strong assurance by testing all 4-way to 6-way combinations. Obviously we don't know which combinations trigger faults, but we can include all t-way combinations in a mathematical structure called a covering array, which is just a matrix that includes all t-way combinations of factor values. There is still of course no guarantee of finding all defects, but multiple studies have found 4-way to 6-way combination coverage was able to detect all faults found with exhaustive testing (see Case Studies section). Thus we can refer to this type of testing as "effectively exhaustive" (within reason!).
The ACTS tool can generate all 2-way to 6-way combinations in test sets that are practical for most applications
- Kuhn, D. R., Kacker, R. N., & Lei, Y. (2016, June). Estimating t-Way Fault Profile Evolution During Testing. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual (Vol. 2, pp. 596-597). IEEE.
- D.R. Wallace, D.R. Kuhn, “Failure Modes in Medical Device Software:an Analysis of 15 Years of Recall Data”, Intl J. Reliability, Quality and Safety Engineering, vol. 8, no. 4, 2001.
- Kuhn, D.R. and Reilly, M.J., An investigation of the applicability of design of experiments to software testing. 27th Annual NASA Software Engineering Workshop, 2002.. (pp. 91-95). IEEE.
- Kuhn, D.R., Wallace, D.R. and Gallo Jr, A.M., 2004. Software fault interactions and implications for software testing. IEEE Trans Soft Eng,30(6), pp.418-421.
- Cotroneo, D., Pietrantuono, R., Russo, S., & Trivedi, K. (2016). How do bugs surface? A comprehensive study on the characteristics of software bugs manifestation. J.Systems and Software, 113, 27-43.
- Z. Ratliff, R.Kuhn, R. Kacker, Y.Lei, K. Trivedi, The Relationship Between Software Bug Type and Number of Factors Involved in Failures, submitted to Intl Wkshp Combinatorial Testing, 2016.