Failures During Runtime

Researchers recently published online a PDF entitled: A Characteristic Study on Failures of Production Distributed Data-Parallel Programs. The data they used was provided by Microsoft. The programs were all MapReduce-like in structure, composed of "declarative SQL-like queries and imperative C# user-defined functions."

Interesting to me was that the authors collected some actual statistics about the failures encountered during runtime. I love real-world numbers. However, I found their results a bit hard to follow. Here is what I got out of the paper.

They ignored operating system and hardware failures. Of the run-time errors considered: 15% were "logic" errors, 23% were "table" errors and 62% were "row" errors.

They gave examples of logic errors such as: cannot find DLLs or scripts for execution, accessing array elements with an out-of-range index, accessing dictionary items with a non-existent key, or other data-unrelated failures.

Table errors such as: accessing a column with an incorrect name or index, a mismatch between data schema and table schema, or other table-level failures.

And row errors such as: corrupt rows with exceptional data, using illegal arguments, dereferencing null column values, exceptions in user-defined functions, allowing out-of-memory errors, or other row-level failures.

Also interesting was that, according to the authors, 8% of runtime errors could not have been caught by programmers during testing using their current debugging toolset.

It seems possible to create check-in and checkout documentation processes for a development organization's SCM system that could automatically generate statistics similar to the above. I think this would have a positive effect on software quality. For example, the researchers suggest that many failures have a common fix pattern that could be automated. Whether the cost would be worth the effort--I don't know. But it does seem obvious that SCM should be a prime source of quality-related data.

    Programmers Must Consider Risk

    There is a thoughtful programmer-oriented blog called The Codist written by Andrew Wulf. In a recent posting he starts off:
    I need to step outside my usual persona of writing about programming to comment on the happenings of the past few days. 
    In Boston two brothers decided to blow up the Marathon, and an hour from my house half the city of West, Texas was blown to pieces in a massive explosion.
    However, he goes on to discuss these events in a way that I don't think is actually outside the realm of programming. Why? He talks about risk.

    As I have written about before, there is always the risk that software may not perform correctly. The general risks, both benefits and consequences, will be different for different stakeholders. (I wrote that software should be independently verified and validated.) The software development effort must understand, communicate and help deal with the risks associated with the software for all its stakeholders.

    In this post I simply note that software is an integral part of modern society. Thus, risk is an integral part of software engineering. In fact, risk is a fundamental concept to engineering in general.

    Software may have played an important part in both of the events Andrew mentions. For the Boston Marathon, I am pretty sure that data mining software such as face recognition algorithms were used in identifying the suspects. No details about the Texas event are publicly available yet, but with SCADA systems being common in plants nowadays, I can easily imagine software being important there too.

    Why are PC Sales Declining?

    On good data, PC sales are rapidly declining. Shocking! Everyone has an opinion about why this is happening. Mine? I think a lot of it is has to do with a self-fulfilling belief currently held by many people influential in the PC industry. For example:
    “In a sense, these devices [smartphone, tablet, PC] are kind of blurring together,” Andreessen says. “A lot of the killer apps these days – and I would say this is true of Facebook, Twitter, Pinterest, and Gmail – you can use them on whatever device you want, or use them on all the devices at the same time.

    “I use the laptop at work, I use the phone when I am walking around – it’s the marrying of the smart device and the user interface back to the cloud that makes these things magical.”
    This has become a meme. A type of meme where believing in it makes it come true. And the PC business believes it. Look at the interface for Windows 8, formerly known as Metro. Look at the Unity interface for Ubuntu. Both obviously terrible interfaces for doing things only a PC is powerful enough to do. Yet the idea of one interface tied back to the cloud for all devices seems to be a truism among the PC leadership. So that's what it's going to be.

    The PC will be the equivalent of a big-screen TV.

    On the other hand, PCs can do so much more than smartphones and tablets can do. Is there anything else PCs should be doing?

    From a recent Alan Kay, Time Magazine article, the interviewer (David Greelish) asked:
    "What do you think about the trend that these devices are becoming purely communication and social tools? What do you see as good or bad about that? Is current technology improving or harming the social skills of children and especially teens? How about adults?"
    To which Alan Kay replied:
    "Social thinking requires very exacting thresholds to be powerful. For example, we’ve had social thinking for 200,000 years and hardly anything happened that could be considered progress over most of that time. This is because what is most pervasive about social thinking is “how to get along and mutually cope.” Modern science was only invented 400 years ago, and it is a good example of what social thinking can do with a high threshold. Science requires a society because even people who are trying to be good thinkers love their own thoughts and theories — much of the debugging has to be done by others. But the whole system has to rise above our genetic approaches to being social to much more principled methods in order to make social thinking work.

    "By contrast, it is not a huge exaggeration to point out that electronic media over the last 100+ years have actually removed some of day to day needs for reading and writing, and have allowed much of the civilized world to lapse back into oral societal forms (and this is not a good thing at all for systems that require most of the citizenry to think in modern forms).

    "For most people, what is going on is quite harmful."
    Kay thinks PCs could be doing more. So why don't they? Do we lack the knowledge, wisdom, and skill to make it so?

    I've been working on making the PC a "personal and family web assistant". Software that does something to help us to, as Kay put it, "make social thinking work." A "device" that acts as our agent and family protector, working to optimize the relationship between our private lives and the WWW. The main component of the software can only run on a PC.

    The Solution is Not the Problem

    A great thing about agile software development is that it encourages problem decomposition. The decomposition may be functional, structural, or object oriented. Decomposition is my workhorse technique for handling software complexity. And complexity is my number one development problem.

    Since an agile goal is working code each iteration, decomposition is usually intended to produce real and useful (if incomplete) solutions. This means testing and data gathering are an integral part of the agile design process.

    Thus, agile development has some ideas that I think are appealing to every type of engineer. However, the agile programming methodology also has its hard-to-do parts.

    One weakness is that agile development tends to focus on the solution and not the problem. For example, people needing custom software will often submit their "requirements" in the form of a description of how they want their new GUI to look like. (This has happened to me many times.) In other words, they implicitly define the requirements by explicitly defining what they think the solution should be. Unfortunately, these kinds of customers tend to fit in well on an agile team. My experience is that such solutions tend to be mediocre at best. (I call these "Every Program Looks Like a Facebook Web Page" solutions.)

    Better is to define the requirements independent of what the eventual implementation might look like. As the British poet, Edward Hodnett once said: “If you do not ask the right questions, you do not get the right answers.” The solution is not the problem. So keep them separate.

    But obviously, this is hard to do in an iterative development environment.

    Also, the software requirements are just information. And there is a big difference between information, knowledge, and wisdom. Time is required to become knowledgeable about the requirements (and domain expertise and experience) and even more time is required to determine the wisest solution.

    Interesting aside. I just googled:
    • "software information" (> 9 million hits)
    • "software knowledge" (> 1 million hits)
    • "software wisdom" (< 5000 hits).
    I guess "software wisdom" basically doesn't exist.

    Agile methods can also sometimes be an impediment in using my other workhorse technique for attacking complex problems: paradigm shifts. I talked about paradigms in my previous post.

    And the reason why is -- innovation. Coming up with an innovative solution to a hard software problem  depends on coming up with a new analogy or new paradigm shift. Agile methods commit to paradigms way too early and at too low level for much chance of real innovation happening. So using agile methods for such problems are difficult.

    Software Meta Development Note - Paradigms

    Implementing complex software solutions to user requirements is what makes programming so difficult. That is, programming is a lot harder than just implementing algorithms. There are two common ways of handling this complexity--decomposition into multiple interfaces and "paradigm shifts" at interfaces. (These interfaces include functional, class, and data-structural APIs. They include user interfaces.)

    First let me clear up what I mean by a paradigm shift at an interface. It's where a more complex internal problem solution procedure (the paradigm) is presented (at the interface) as something simpler and easier to understand to the user of the procedure. For example, Newton's Method can be used to find the square root of a number. But it is more practical to implement a special function called sqrt rather than expose the complexity of a function called newton_raphson to the programmer, and let her figure out how to get a square root from it. There is a paradigm shift from Newton's Method to square root at the sqrt interface.

    The subject of this post is that there are three different kinds of simple paradigms that are useful to think about: naturalartificial, and what I call synthetic.

    A natural paradigm is encountered a lot in object oriented programming. We can have a logical car object, checkbook object, or screen object. By using such real-life analogies it makes our complex problems easier to understand and work with. Even without further decomposition, we already understand these natural, complex things. I was able to implement a robot/facility message handling system at an automated factory once by simulating the US postal system. In the digital simulacrum, electronic messages became letters, junk mail,  priority mail, etc. Every robot had its logical mail box complete with flag. So did the cranes and other facility equipment. Some computers turned into post offices. Servers became mail centers. Every message had its sender and return zip codes. Etc. We knew the design would work -- the mail does get delivered in real life! And I could even explain what we we doing to the project's managers. :-)

    Natural paradigms have carried over to skeuomorphic graphical user interfaces (GUIs) that emulate objects in the physical world. Steve Jobs and Apple were famous for this. The concept of a file folder is a classic natural paradigm for users.

    An artificial paradigm is where the problem is expressed, at least in part, in terms of abstract, domain specific entities. A C.S. degree requires learning a lot of artificial computer science paradigms. Mathematicians, scientists, and engineers have their own artificial paradigms. An example of an artificial paradigm shift is mapping data from a hierarchical file structure to a relational database. In this case from an artificial paradigm to a different artificial paradigm.

    Perhaps the archetypal example of an artificial GUI paradigm is the QWERTY keyboard. Before the typewriter, no one would have a clue what a keyboard was for. After the typewriter--see next.

    A synthetic paradigm is where you take a common natural paradigm or concept and mix it with an artificial paradigm. A good example is a spreadsheet. Naturally, it represents a sheet of paper with rows and columns where you can write numbers. The concept is fundamental to all accounting. But artificially, we add the ability to put live mathematical formulas and scripts in the cells. Something we can't do in nature, but a snap in the abstract computer world. It changes everything about what spreadsheets can do. The way accountants did their jobs changed in a fundamental way with the invention of spreadsheets.

    The most popular GUI oriented synthetic paradigm is the mouse. The mouse has artificial components such as multi-clickable buttons and a clickable, rotatable wheel. But it is also a natural extension of hand pointing.

    What's the takeaway from all this? If you want to write software that changes the way things are accomplished in a fundamental way, invent a new synthetic paradigm for a user domain and present it in an API or GUI.

    NASA Science

    Big science is making big announcements (for example, see here) about the Alpha Magnetic Spectrometer (AMS) device attached to the International Space Station.

    Another type of signature for dark matter has been found instead of just gravity signatures. There is now good data that suggests dark matter particles collide with each other and that they produce "ordinary" collision decay particles. So even though dark matter does not interact with light, these decay particles do and that is what we are seeing in the AMS.

    OK. So to continue to be skeptical of dark matter, then I must explain data involving another whole type of phenomena in addition to coming up with a better explanation of dark matter gravity observations.

    Dark matter just became more likely to actually exist, I think. This makes the AMS device a great bit of science.

    But do I think the $2 billion spent on the device was worth that kind of information? Off the top of my head, that's about $5 for every person in the USA.

    Truthfully, I can't say. I think it doubtful that a consensus argument can be made that society as a whole (each person, on average) benefits from such an experiment to the tune of $2 billion ($5 each). How can the numbers possibly work out? How can we come to a consensus on a dollar amount to assign to the results of this experiment?

    And that applies to any dollar amount. What if the next dark matter experiment will cost $20 billion, $200 billion, ...? Is there any way we can avoid being arbitrary, capricious and whimsical in our spending on science? To have an engineering discipline in our spending on science?

    This may be a more difficult question than the one on dark matter.

    The Guided Cantilever Method - Part 4

    The Guided Cantilever Method for Quickly Checking Piping Flexibility -- Part 4

    This is a continuation of my last post. I couldn't help myself, I just had to do some coding! :-) Here in Part 4, the final part in this series, I describe a simple web application that allows you to perform a quick flexibility check using the guided cantilever method without having to memorize anything.

    The web page is here. And here is a screenshot:

    The expansion of the long vertical leg causes the short horizontal leg to deflect, which is modeled as a guided cantilever. The application calculates the minimum length required to guarantee the system is adequately flexible. There are a number of assumptions required, so be sure to read my previous posts here, here, and here.

    To use the program, enter into the listboxes:
    • Nominal pipe size (NPS) in inches 
    • Maximum design temperature (Tmax) in degrees F
    • Length of the longer leg (the vertical leg) in feet.
    And indicate the pipe material using the radio buttons, either carbon steel or stainless steel.

    The minimum length of the shorter leg will be automatically calculated and displayed as shown above in red.

    As covered in previous posts, these are the equations used:
    L = 7.77 * sqrt(y * NPS)
    // Where:
    //    L = minimum length required for shorter, horizontal leg (ft)
    //    NPS = nominal pipe size (in)
    //    y = (Tmax - 100) * LL / 10000
    //           y = thermal expansion to be absorbed (in)
    //           Tmax = maximum design temperature (F)
    //           LL = length of longer, vertical leg (ft)
    Everything is done on the client in Javascript. So feel free to download the web page and use it however you like.