The Other Key to Software Quality

The primary key to software quality assurance (SQA) is testing against requirements. The more rigorous and exhaustive the testing -- the better the quality has been assured.

But most software cannot be exhaustively or "heroically" tested.
  • The software may be too complicated for testing the requirements against all possible combinations of internal program logic.
  • The range of software inputs or outputs may be continuous or unbounded.
  • The software may model or simulate some process only approximately. (This is almost always true for physical processes.)
  • The software may model some physical process for which the software's intended range of usage is not completely matched by experimental data.

Examples of an important class of software that cannot be heroically tested are the GCMs (both general circulation models and global climate models). Such software are used in support of potentially high consequence decisions (and the timing of those decisions.) This requires a correspondingly high level of SQA. How can this be accomplished if exhaustive testing is impossible?

A common approach for low or medium consequence software is selective (limited) testing by an accredited authority (person, group, organization, etc.). The SQA authority determines the scope and emphasis of software testing, performs the appropriate tests, and then simply avers the quality of the software. This approach is also sometimes attempted for high consequence software. For example, the GCMs.

The efficacy of this approach is dependent on the level of trust that can be placed in the SQA authority. Logically, the quality of software can only be assured to the level of trust in the SQA authority itself. Thus, for high consequence software, an even higher level of trust in the accredited authority is required. There are currently few examples of an SQA authority being able to establish and maintain a very high level of trust. Unfortunately, partly because of "climategate," the GCMs currently lack a such a highly trusted SQA authority.

But another approach is possible and it will lead us to the other key to software quality. This other approach is to control and document the organizational processes used to plan, develop, and maintain the software. All of the software engineering consensus standards acknowledge the importance of these software management processes to software quality assurance.

The key process in this other approach, and thus the other key to software quality besides testing, is an iterative process for continuous quality improvement. Regardless of the initial trust in the software organization, instituting a consensus acknowledged process for improvement will guarantee that ultimate software quality is assured. It seems likely that soon all the GCM development organizations will adopt such a process.

Is Improving IV&V for Scientific/Engineering Software Worth the Effort?

IMHO, a significant number of scientists/engineers would opine that to impose a lot of independent verification and validation (IV&V) methodology for scientific/engineering applications would not be worth the effort.

But this would not be the consensus view. In May of 2006, a National Science Foundation (NFS) Blue Ribbon Panel issued a report on its findings and recommendations for Simulation-Based Engineering Science. Section 3.2 of the document talks about the verification, validation, and uncertainty quantification of computer-based simulations. The section addresses the question: "What level of confidence can one assign [to] a predicted [simulation] outcome in light of what may be known about the physical system and the model used to describe it?"

To quote from the Panel's findings:
While verification and validation and uncertainty quantification have been subjects of concern for many years, their further development will have a profound impact on the reliability and utility of simulation methods in the future. New theory and methods are needed for handling stochastic models and for developing meaningful and efficient approaches to the quantification of uncertainties. As they stand now, verification, validation, and uncertainty quantification are challenging and necessary research areas that must be actively pursued.


About verification and validation (V&V), the report stated:
The entire field of V&V is in the early stage of development. Basic definitions and principles have been the subject of much debate in recent years, and many aspects of the V&V remain in the gray area between the philosophy of science, subjective decision theory, and hard mathematics and physics.


On the subject of validation, the report states:
The twentieth century philosopher of science Karl Popper asserted that a scientific theory could not be validated; it could only be invalidated. Inasmuch as the mathematical model of a physical event is an expression of a theory, such models can never actually be validated in the strictest sense; they can only be invalidated. To some degree, therefore, all validation processes rely on prescribed acceptance criteria and metrics. Accordingly, the analyst judges whether the model is invalid in light of physical observations, experiments, and criteria based on experience and judgement.


And about verification the report states:
Verification processes, on the other hand, are mathematical and computational enterprises. They involve software engineering protocols, bug detection and control, scientific programming methods, and, importantly, a posteriori error estimation.


A more recent consensus is the 2009 WTEC Report which also has a section on validation, verification, and uncertainty quantification. The WTEC Report contains a lot more detail than the NSF Blue Ribbon Panel Report. However, not much has changed. This later Report notes that: "There are currently no funded U.S. national initiatives for fostering collaboration between researchers who work on new mathematical algorithms for V&V/UQ frameworks and design guidelines for stochastic systems."
So the answer is -- yes. Improving IV&V for scientific/engineering software would be worth the effort.

Verification and Validation of Scientific Software

On his blog, Jon Pipitone recently commented on the validity and soundness of scientific software. Jon found it curious that the terms verification and validation are not in more common use. Jon also mentions that there does not seem to be a standard term or adjective that is applied to software that been verified or validated.

I commented that:
In the nuclear safety software area you will often encounter the term "qualified software." The adjective refers to software that has been subjected to a defined level of verification and validation appropriate to the software's level of usage. This is called "qualification" of the software. Qualification is a part of overall software risk management and helps to ensure the "quality" (e.g., "soundness", "validity", "reliability", "security", etc.) of the software.

I would like to expand on my comment here. The scope and emphasis of software risk management processes (of which software verification and validation are subprocesses) depend on the nature and use of the software. What may be adequate for one type of software may not be adequate for another. However, much scientific software as well as much engineering design and analysis software are similar in nature and use. They model some physical process, system, structure, or component about which information is needed. Thus, a uniform software verification and validation approach may be appropriate.

The goal of software verification and validation is to ensure the quality of the software. For scientific and engineering programs this often means ensuring the accuracy and precision of the output.



The above figure is from the Wikipedia article on accuracy and precision. Although all models of nature or engineered systems are simplified abstractions, they must be relevant. That is, the abstractions must be inherently able to provide adequate accuracy and precision for the intended use. The process that ensures this required applicability is validation.

Additionally, the software code must faithfully embody these simplified abstractions. That is, the code must be adequately bug free. The process that ensures this required reliability is verification.

Are Real Numbers "Real"?

As mentioned in my previous post, the scientific method is just about all the philosophy an engineer needs to know. In this post, I describe how this attitude can be applied to all the mathematics that engineers use.

The scope of the scientific method is defined by the assumption that the sole test of knowledge is experiment. That is, that which is outside the scope of experimental confirmation or falsification is not knowledge. Thus, to count mathematics as knowledge means that its axioms, operations, and results must be experimentally confirmed.

But there are inherent limits to the precision of experimental measurements in the real world. No quantity or phenomenon can be measured with absolute precision and no process can be performed without at least an infinitesimal possibility of error. For example, there is no experiment that can exactly measure any irrational, rational, or natural number. And since these numbers claim to be exact, this means they do not exist in reality. There is no zero because absolute zero cannot be observed. Just the infinitesimal. There is no infinity for the same reason. Just the very huge.

Does this affect the mathematical techniques useful to engineers -- such as the calculus? Are these techniques to be abandoned or viewed with suspicion? Not at all. Abraham Robinson was able to incorporate infinitesimals and huge numbers into what is called nonstandard analysis. Thus, the techniques remain the same, just the nature of the proofs change.

Interestingly, there is a number that is totally incommensurate with any attempt at direct measurement or observation yet inferred by the observed laws of Nature. It is the square root of minus one. In this sense it is the only number that is purely "imaginary."

The Scientific Method

The scientific method is just about all the philosophy an engineer needs to know. The scientific method refers to a framework of techniques for acquiring, correcting, and integrating knowledge. Although there are disagreements on the practical details, the philosophy behind the basic cyclical process enjoys a large consensus. The process can be illustrated as follows:

 

Theory/Model - Prior to new experimental evidence, this can be viewed as the initial state of the cycle and represents the relevant knowledge (hypotheses, theories, models, etc.) that is to be tested. After the experiment has been performed, this can be viewed as the final state that represents the (possibly conditioned) relevant knowledge as modified by the new experimental evidence.

Deduction/Simulation - This action represents the process of exercising the logical/mathematical consequences of the theory/model in order to yield a consequence capable of being experimentally tested.

Prediction/Forecast - Based on the prior theory/model, this state represents some sort of explicit statement of what the outcomes (observations/results) of the experiment are predicted or forecast to be.

Experiment - This action represent the actual setup and performance of the experiment in order to test the predictions/forecasts.

Observations/Results - This state represents the evidence (facts/data) acquired by the experiment.

Abduction - This action represents the process (logical induction) of modifying/replacing (if necessary) the prior theory/models to make them more consistent/agreeable with the latest experimental evidence.

IV&V - These actions performed by independent stakeholders (Independent Verification and Validation) take place throughout the cycle. These quality assurance processes (such as peer review) are necessary to reduce the risk of error to an acceptable level.

The fundamental assumptions upon which the scientific method rests (that is, that which is more-or-less undefined and simply taken on faith by all stakeholders) are approximately:
  1. Theory shall be logically consistent. (For example, the interpretation of experimental evidence as Bayesian.)
  2. Theory and experiment shall be parsimonious. (For example, Occam's Razor.)
  3. The sole test of theory shall be experiment. (Feynman's 'almost' definition of science.)
  4. All experimental processes and evidence shall be independently verified and validated.
The above assumptions are necessary and sufficient for scientific objectivity within the realms that perform IV&V. I use the word 'shall' above simply to reinforce the concept that these are simply rules taken on faith.

Why these assumptions? The first assumption is necessary to promote rational discourse about science. Otherwise, consensus is unobtainable. The second rule is 'merely' practical and helps control error and makes IV&V easier. The third rule is the most fundamental one and key to any process being labeled as 'scientific'. The fourth rule is needed only to the extent that fallible beings are used to conduct science.

Note that even this most bare form of the scientific method contains two logical fallacies. The first is the use of abduction (affirming the consequent). The second is the partial reliance on IV&V for error management (appeal to authority). The use of abduction eliminates logical certainty from the scientific method and introduces the possibility of error. The logical shortcoming of IV&V means that finding and eliminating error is never certain.

Also available is a Bayesian version of this approach to the scientific method.

Knowledge, Creativity, Rationality, and Reliability

Having the wealth, power, opportunity, and the will to engineer a solution to a problem is not enough. Also needed is:
  • Knowledge - The engineer may be ignorant of that which is necessary to engineer a solution to the problem. This may be due in part to society (engineering organization, etc.) not having built or maintained the required infrastructure -- the schools, universities, and other institutions necessary for knowledge to be created, kept, and taught.
  • Creativity - The engineer may not express the imagination/intelligence necessary to engineer a solution. This may be due in part to society (engineering organization, etc.) not creating or maintaining a nurturing artistic environment for the engineer.
  • Rationality - The engineer may be locked into mental modes that are irrational/irresponsible/insane or otherwise disordered. This may be due in part to societal/political pressures or prejudices.
  • Reliability - The engineer is fallible. Engineering processes and organizations must provide the engineer with a self-correcting mechanism for finding and fixing errors.