Failures During Runtime

Researchers recently published online a PDF entitled: A Characteristic Study on Failures of Production Distributed Data-Parallel Programs. The data they used was provided by Microsoft. The programs were all MapReduce-like in structure, composed of "declarative SQL-like queries and imperative C# user-defined functions."

Interesting to me was that the authors collected some actual statistics about the failures encountered during runtime. I love real-world numbers. However, I found their results a bit hard to follow. Here is what I got out of the paper.

They ignored operating system and hardware failures. Of the run-time errors considered: 15% were "logic" errors, 23% were "table" errors and 62% were "row" errors.

They gave examples of logic errors such as: cannot find DLLs or scripts for execution, accessing array elements with an out-of-range index, accessing dictionary items with a non-existent key, or other data-unrelated failures.

Table errors such as: accessing a column with an incorrect name or index, a mismatch between data schema and table schema, or other table-level failures.

And row errors such as: corrupt rows with exceptional data, using illegal arguments, dereferencing null column values, exceptions in user-defined functions, allowing out-of-memory errors, or other row-level failures.

Also interesting was that, according to the authors, 8% of runtime errors could not have been caught by programmers during testing using their current debugging toolset.

It seems possible to create check-in and checkout documentation processes for a development organization's SCM system that could automatically generate statistics similar to the above. I think this would have a positive effect on software quality. For example, the researchers suggest that many failures have a common fix pattern that could be automated. Whether the cost would be worth the effort--I don't know. But it does seem obvious that SCM should be a prime source of quality-related data.

    No comments:

    Post a Comment