The Guided Cantilever Method - Part 3


The Guided Cantilever Method for Quickly Checking Piping Flexibility -- Part 3

This is a continuation of my last post. Here in Part 3, I show how to manually check more complicated piping systems for adequate thermal flexibility. There are two tricks for doing this: 1) fictitious anchors and 2) the-tail-does-not-wag-the-dog.

Fictitious Anchors

The idea here is that any point or points on a complicated piping network can be fixed against rotation and/or displacement (or a fixed displacement or rotation can be imposed at any point) and the flexibility of the resulting subsystems analyzed independently. (The fixed points isolate one part of the system from the others.) The importance of this is that if each subsystem has adequate flexibility with fixed points in place, then the overall system will have adequate flexibility with the fixed points removed. (Equal sized pipe only. For unequal pipe sizes, see next trick.)

An example should make this clear:
Notice that for analysis purposes, an imaginary fictitious anchor has been added to the system illustrated in Figure 2. This allows us to use the guided cantilever method since the system has been broken up into two simple subsystems similar to what we solved for last time in Part 2. There is a 20'X20' subsystem and a 10'X10' subsystem.

Note that the required length of a guided cantilever goes up with the square root of the displacement to be absorbed. This means that if the 10'X10' subsystem is adequately flexible then the 20'X20' subsystem will be even more adequately flexible. And so if the 10'X10' subsystem is OK, then the entire system will be OK when the imaginary anchor is "removed".

So is the 10'X10' subsystem adequately flexible?
from math import sqrt

# The thermal displacement to be absorbed (y) is:
#     Material = stainless steel
#     Length = 10 (ft)
#     Temperature = 350 (F)
#     Formula from last post:
#         y_per_100ft = 1.33 * (T - 100.) / 100.
y_per_100ft = 1.33 * (350. - 100.) / 100.
y = y_per_100ft * (10 / 100)
# y = .3325 (in)

# So that the required leg length to absorb this expansion is:
NPS = 4
L = 7.77 * sqrt(y * NPS)
# L = 9.0 (ft)
Thus, the 10'X10' subsystem is adequately flexible. Thus, the entire system is adequately flexible.

Although it takes some practice to using fictitious anchors effectively, once you've gotten the hang of it, I have found fictitious anchors are a very effective tool.

The Tail Does Not Wag the Dog

Recall that resistance to bending moment is proportional to section modulus. And that the section modulus of pipe is proportional to its radius to the third power. Thus, all other things being proportional, larger diameter piping will dominate the thermal displacement reactions of complex piping systems versus smaller diameter pipe. For example, the section modulus of a 6" NPS sch. 40 pipe is 8.50 (in3) while the section modulus of a 3" NPS sch. 40 pipe is 1.72 (in3). A strength ratio of almost 5 to 1, even though the size ratio is less than 2 to 1. Thus, a 3" NPS "tail" will not wag a 6" NPS "dog".

This means that in complicated piping systems with varying pipe sizes, it is appropriate for the fictitious anchors to be placed and to impose movements that would otherwise be the unrestrained displacements of the larger pipe sizes.

For example, in Figure 2 above, if the 20'X20' subsystem were 8" NPS pipe instead of 4", and there were an 8X4 reducer at the fictitious anchor, it would be appropriate to assume the fictitious anchor would impose 20' worth of thermal expansion on the 10'X10' subsystem instead of being fixed in place. Clearly, the 10'X10' subsystem would then fail a manual check and the entire system would have to be computer analyzed.

Summary

In practice, it is easily possible to manually analyze for flexibility a majority of piping systems in less time than it would take to perform computer analyses on these systems. All piping engineers and designers should be familiar with the guided cantilever method.

The Guided Cantilever Method - Part 2


The Guided Cantilever Method for Quickly Checking Piping Flexibility -- Part 2

This is a continuation of my last post. Here in Part 2, let's go through a specific example of how to do a quick manual check that a piping system has adequate flexibility to absorb required thermal expansion displacements.
The system consists of carbon steel pipe with a shorter leg of 20', an elbow, and a longer leg of 25'. The pipe's outside diameter is 6.625" (NPS 6"). And the fluid's operating temperature is 500F. As the pipe warms, the thermal expansion of both legs will generate enormous forces unless the system contains adequate flexibility to relieve the imposed displacements. Assuming the end anchors are rigid and as strong as the pipe (definitely not the case if attached to rotating equipment), controlling will be the expansion of the longer leg being absorbed by bending in the shorter leg. If the bending stresses in the shorter leg are below code allowables, then the system will be adequately flexible.

The first step is to conservatively estimate the free expansion of the longer leg.

The thermal expansion of pipe (y) in inches per 100 feet (in/100ft), between 70 degrees F and some temperature (T) can be approximated by:
y_per_100ft = (T - 100.) / 100.  // for carbon steel
y_per_100ft = 1.33 * (T - 100.) / 100.  // for stainless steel
Or in words: For carbon steel, take the design temperature and subtract 100 then divide by 100. That's the expansion in inches per hundred feet. For stainless steel, add a third more.

The formulas will break down for temperatures around 200F or less. But for the range of temperatures commonly encountered, the equations overestimate less than 10 percent. And the formulas are easy to remember and work out in our head. (This is important if we want to be able to do quick manual checks.)

The next step is to conservatively calculate the length of the shorter leg needed to adequately absorb the expansion of the longer leg. For this we use a convenient version of the guided cantilever formula:
from math import sqrt

L = sqrt(E * y * D / S / 48)

// Where:
//    L = Length of cantilever beam (ft)
//    E = Youngs modulus (psi)
//    y = displacement of the the guided end (in)
//    D = beam (pipe) outside diameter (in)
//    S = max. nominal bending stress (psi)
Modeling the shorter pipe leg as a cantilever whose guided end is being displaced by the free thermal expansion of the longer leg is a very conservative model for maximum bending stresses in Figure 1. For use with a calculator (remember those?) or an iPython session this formula can be made even simpler. Let:
  • E=29E6. This is a typical maximum value for carbon steel. Stainless is less. So a conservative value to use for all materials.
  • D=NPS. This is not a conservative assumption, but who can remember the outside diameters for all different nominal pipe sizes? I never could. This introduces an error of less than about 5%, which is more than made up for in the following assumption.
  • S=10E3. Why such a conservatively low a number? (In the Peng link I gave in Part 1, the author used 20,000 psi for his guided cantilever method.) My justification for using 10,000 psi is:
    • If the system lacks adequate flexibility, fatigue cracking will occur in the elbow, not the pipe. That's because the elbow has a "stress intensification factor" (SIF) associated with it. SIFs are multiplied by the nominal stress in the adjacent pipe to estimate the actual stress in the elbow.
    • It is true that SIFs are mitigated by an elbow's "flexibility factor" (FF). (An elbow goes out-of-round under stress and so will strain more than simple beam theory would predict.) However, the stress reduction of FFs do not completely offset the stress intensification caused by SIFs. So, conservatively, elbow FFs are ignored.
    • All pipe fittings, not just elbows, have SIFs associated with them. Piping fatigue almost always occurs in the system's fittings.
    • SIFs are usually between 2.0 and 3.0.
    • Finally, since thermal cycling causes fatigue failure, allowable fatigue stresses are about 1.5 times yield stress, which is typically between 20,000 and 30,000 psi. So using 10,000 psi, assuming a 2.0 to 3.0 SIF, and ignoring FFs will keep me 50% below a 1.5 times yield stress number.
    • Therefore, using S=10000 for both carbon and stainless pretty much guarantees a conservative result. I can't recall ever encountering a pathological case where using this assumption gave an unconservative result.
The resulting formula is:
L = 7.77 * sqrt(y * NPS)  // 7.77 is easy to remember!

// Where:
//    y = thermal expansion to be absorbed (in)
//    NPS = nominal pipe size (in)
Using this formula for the system in Figure 1 we have:
// Given:
// Expansion of longer leg which is 25 ft long:
y = ((500. - 100.) / 100.) * (25. / 100.)
// y = 1. (in)
// And nominal pipe size:
NPS = 6

// Required length of shorter leg:
L = 7.77 * sqrt(1. * 6.)
// L = 19. (ft)
Since the shorter leg is 20 ft in length, the piping system is adequate.

The Guided Cantilever Method - Part 1

The Guided Cantilever Method for Quickly Checking Piping Flexibility - Part 1

When I started out in engineering, as a twenty-one year old back in the 1970's, I was a pipe stress analyst. My mentor was a very senior, very experienced engineer named Norman Blair. Computerized analyses of piping systems were common in those days but it was time consuming to prepare the input (yes, generating punched cards) and computers were expensive to use. (We only had those big mainframes in those days!) So we still did a lot of manual piping stress analyses.

There were two types of pipe stress that we had to analyze. Primary stresses due to sustained loadings such as weight and secondary stresses due to the pipe undergoing temperature changes that caused it to cyclically expand and contract. (Usually because of the the hot/cold fluids running through them.) Keeping secondary stresses within code allowables required insuring the piping had adequate flexibility to absorb the expansions/contractions without over-stressing.

Supporting a pipe's weight was easy. For thermal analyses, Norm gave me a copy of Spielvogel's Piping Stress Calculations Simplified to read and to help me develop a good understanding of what piping flexibility analysis was all about. It turns out I didn't actually employ very many of the manual techniques in the book. That would have been more time consuming than preparing computer analyses of them. But Norman did introduce me to a quick, conservative way to manually eliminate piping systems from having to undergo computer analysis at all.

Most piping designers were pretty good at guessing at how much flexibility may be needed. I became adept at quickly verifying the acceptable flexibility of perhaps over 80% of the systems I encountered in typical oil refineries or chemical process plants without resorting to using a computer at all. This translated into significant savings of time and material.

The method I used for most of these manual verifications is called the Guided Cantilever Method. The best basic description of the method that I could google is Quick Check on Piping Flexibility by L. C. Peng. However, Peng doesn't mention the crucial trick that makes the method extremely practical -- fictitious anchors. In fact, I could not google a relevant reference to "fictitious anchors" at all. So allow me to document the trick here.

The main reason I want to to document the Guided Cantilever Method is because, IMHO, there is a tendency among pipe stress engineers nowadays to go ahead and run a computerized analysis on almost every piping system they encounter. However, I think that even in this age of desktop computers with great GUI interfaces, manual analyses still have their place. Perhaps I can encourage some pipe stress engineers to do more manual analyses where cost effective.

I will go through the method in some detail, in upcoming Part 2 of this post, because if engineers actually want to use the method, they will have to understand what makes the method conservative so they will be able to identify when the assumptions being made might not be applicable in some unusual case. And it is the unusual cases that cause the biggest failures in analyses.

Simple PHP Data Storage

The subject of my post today is about a database schema I often to use for websites. To make things a bit more practical I present some PHP code snippets detailing how I often store and retrieve form data, which is a very common task when developing custom websites.

But first, let's talk databases. There are exceptions to every rule, but my default data storage/retrieval library is SQLite. I've used it for desktop applications, embedded smartphone applications, as well as for websites. I've accessed it from a number of languages. In case you are not familiar with this RDBMS:
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code for SQLite is in the public domain.
From SQLite's When To Use page:
The basic rule of thumb for when it is appropriate to use SQLite is this: Use SQLite in situations where simplicity of administration, implementation, and maintenance are more important than the countless complex features that enterprise database engines provide. As it turns out, situations where simplicity is the better choice are more common than many people realize.
Another way to look at SQLite is this: SQLite is not designed to replace Oracle. It is designed to replace fopen().
And later from the same link:
SQLite usually will work great as the database engine for low to medium traffic websites (which is to say, 99.9% of all websites). The amount of web traffic that SQLite can handle depends, of course, on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
Second, let's talk frameworks. There are a myriad of web application frameworks we can use to help us develop our websites. In the link above, the list just for PHP-based ones has 17 entries. (The funny thing is, I consider PHP itself to be a "web application framework". A very general one. It is certainly not much a general-purpose programming language.)

IMHO, the reason there are so many frameworks to choose from is because each is only appropriate for it own set of requirements which, apparently, don't overlap as much as one might assume. There may be more sinister reasons. So we need to be very careful and skeptical when choosing a framework.

No one person can be familiar with more than a few of these frameworks, but I'm sure most of the frameworks provide some help in mapping form data to the database and back. But what if we decide not to use a framework?

Finally, the interface between forms and their data storage. The most obvious way is to map a form to a database table. Each column is a field in the form. (Multiple select listboxes are a complication.) Each row in the table is a different user. The only drawback to this, but it's a big one, is that every time one of a form's fields change the database schema has to change.

So I've gotten into the habit of modeling the form data into associative arrays. It's almost as simple as a one-to-one mapping. Yet much more robust of a design. All the forms and all the users can go into one table if needed. The schema?
CREATE TABLE ArrayData (  -- all form data can be stored here
    ownerId TEXT,     -- form user
    arrayId TEXT,     -- form Id
    arrayKey TEXT,    -- form field name
    arrayValue TEXT,  -- form field value
    uTime NUMBER      -- date record last modified, PHP microtime(TRUE)
);   

Notice the flexibility. The table's schema never changes no matter how any form changes. The timestamp means old data doesn't have to be deleted from the table unless you need to.

Here is a PHP code snippet I used for interfacing to the database for the last website I built. (A simple, small, internal website.) I cleaned it up a little bit. Notice its schema does not have columns for ownerID or uTime as the more general example above did. They weren't needed for this particular web application. However, there is code for handling multi-valued form items.

/*************************************************************************
 * Simple SQLite DB access functions.
 *
 * Our web pages will need some sort of db, for example, storing form 
 * data. In this case, nothing complex, so set up something easy, 
 * convenient, but not absurdly inefficient or with obvious injection 
 * issues.
 *
 * Example usage:
 *
 *     $form_id = 'test_form';
 *
 *     $form_data = array(
 *         'one' => 'first',
 *         'two' => 'second',
 *         'three' => 'third',
 *         'four' => array(
 *             'fourth',
 *             '4th',
 *             'one less than fifth'
 *         )
 *     );
 *     $multi_valued_items = array('four');
 *     save_form_data($form_id, $form_data);
 *     $new_form_data = fetch_form_data($form_id, $multi_valued_items);
 *     // Note: fetch_form_data($form_id) would return only last value in 
 *     // 'four' array.
 *
 *
 * NOTE: If using the highest level functions (save_form_data and 
 * fetch_form_data) are not appropriate, we can often use the lower level 
 * functions to help access any SQLite 3 database.
 * See open_db, exec_db, fetch_db, and fetchall_db below.
 *
 ***************************************************************************/

// A form's data will be persisted to/from an SQLite database modeled as an 
// associative array.

// Insert/replace the forms data in the database.
function save_form_data($form_id, $form_data, $db_filename='info.sqlite') {
    $db = open_db($db_filename);
    exec_db($db, "CREATE TABLE IF NOT EXISTS ArrayData (" .
        "arrayId TEXT, arrayKey TEXT, arrayValue TEXT)");
    foreach ($form_data as $key => $value) {
        exec_db($db, "DELETE FROM ArrayData " .
            "WHERE arrayId = ? AND arrayKey = ?", array($form_id, $key));
        if (is_array($value)) {
            foreach ($value as $val) {
                exec_db($db, "INSERT INTO ArrayData " .
                    "VALUES (?, ?, ?)", array($form_id, $key, $val));
            }
        } else {
            exec_db($db, "INSERT INTO ArrayData " .
                "VALUES (?, ?, ?)", array($form_id, $key, $value));
        }
    }
}

// If some keys can have multiple values (like a list box) then they must 
// be identified in $multi_values array.
function fetch_form_data($form_id, $multi_values=NULL, 
    $db_filename='info.sqlite') {
    
    $form_data = array();
    $db = open_db($db_filename);
    $results = fetchall_db($db, "SELECT * FROM ArrayData");
    foreach ($results as $value) {
        if ($value['arrayId'] != $form_id) continue;
        $key = $value['arrayKey'];
        if (is_array($multi_values) AND in_array($key, $multi_values)) {
            $form_data[$key][] = $value['arrayValue'];
        } else {
            $form_data[$key] = $value['arrayValue'];
        }
    }
    return $form_data;
}

function open_db($db_filename = 'info.sqlite') {
    try {
        $db = new PDO('sqlite:' . $db_filename);
        $db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    } catch (Exception $e) {
        // Hard to see how we can continue in a way meaningful to the user.
        // Log and let user know bad news.
        $e_msg = $e->getMessage();
        $msg = "$db_filename: $e_msg";
        //log_error($msg);
        exit();
    }
    return $db;
}

function exec_db($db, $sql, $parameters=false) {
    try {
        if (!$parameters) {
            $result = $db->exec($sql);
        } else {
            $query = $db->prepare($sql);
            $result = $query->execute($parameters);
        }
    } catch (Exception $e) {
        $msg = $e->getMessage();
        //log_error("EXEC EXCEPTION: $msg"); // try to keep plugging along
        return false;
    }
    return $result;
}

// Return result (one row) of $sql query.
// Returns false rather than an assoc array on failure.
// Since an exception is likely just bad SQL, also returns false on exception.
function fetch_db($db, $sql, $parameters=false) {
    try {
        if ($parameters) {
            $query = $db->prepare($sql);
            if ($query) {
                $query->execute($parameters);
            };
        } else {
            $query = $db->query($sql);
        }
        if ($query) {
            return $query->fetch(PDO::FETCH_ASSOC);
        } else {
            return false;
        }
    } catch (Exception $e) {
        //log_error("SQL: $sql");
        if ($parameters) {
            foreach ($parameters as $param) {
                //log_error("PARAMETER: $param");
            }
        }
        $msg = $e->getMessage();
        //log_error("FETCH EXCEPTION: $msg");
        return false;
    }
}

// Return results (all rows) of $sql query.
function fetchall_db($db, $sql, $parameters=false) {
    try {
        if (!$parameters) {
            $query_results = $db->query($sql);
            if ($query_results) {
                return $query_results->fetchall(PDO::FETCH_ASSOC);
            }
        } else {
            $query = $db->prepare($sql);
            $query->execute($parameters);
            return $query->fetchall(PDO::FETCH_ASSOC);
        }
    } catch (Exception $e) {
        $msg = $e->getMessage();
        //log_error("FETCHALL EXCEPTION: $msg");
        return false;
    }
}

Printf Debugging in PHP

I've been debugging programs since the 1970's. I have fixed a few bugs over the years! Although nothing beats a symbolic debugger that is able to let me set breakpoints and single step through code, I don't think I'm just being an old dinosaur programmer by saying that "printf" debugging and assertions still have practical uses. That's because most of the time I don't need anything more sophisticated to understand what my code is doing wrong. (I'm more apt to need an IDE-based debugger trying to figure out someone else's code.)

Below is the code snippet that I use to help me do printf debugging in PHP. If you want to use it, place it in an utility file that gets included at the top of your other other PHP files. Of course, using these printing functions will certainly mess up any fancy formatting you may be trying to accomplish on your web page. But inserting an assert() or checkpoint() call in your code is only intended as a quick and easy way to probe buggy behavior anyway.

I would be remiss not to mention that during development (only!), make sure that the Apache web server's php.ini configuration file has set the display_errors parameter to on. I usually catch more than half my PHP scripting bugs this way. Very in-your-face, quick and easy debugging. An alternative to editing the PHP configuration file is to open up a (Ubuntu) terminal window and enter:

    tail -f /var/log/apache2/error.log

This will display the ten latest PHP errors that have been generated.

Here is the code:

// ***************************************************************************
// Development Debugging Utilities.
// (For those of us who like to debug with printing to output and/or log 
// files.)
//
// Note debugging flag below so we don't have to worry too much about leaving
// embedded debugging/assertion dev/test code in source.
//
// Typical Usages:
//
//   assert("$sanity_state == 'SANE' /* This comment will be visible too! */");
//
//   checkpoint(); // Will print 'GOT HERE!' and checkpoint's location
//   checkpoint($var); // Will print $var object and checkpoint's location
//   checkpoint($var, $tag); // Will print $tag string and $var object and 
//                           // checkpoint's location
//
// ***************************************************************************
$DEBUGGING_FLAG = true; // set to false to turn these utilities off!

// Setup for assertions:
assert_options(ASSERT_ACTIVE, $DEBUGGING_FLAG);
if (assert_options(ASSERT_ACTIVE)) {
    error_reporting(E_ALL | E_NOTICE);
}
assert_options(ASSERT_WARNING, true);
assert_options(ASSERT_QUIET_EVAL, true);
function assert_handler($file, $line, $code) {
    echo "<hr><p>
    <b>Assert Triggered</b><br>
    Trigger Code: $code<br>
       &nbspFile: $file<br>
       &nbspLine: $line<br>
    </p><hr>";
}
assert_options(ASSERT_CALLBACK, 'assert_handler');

// Setup for embedded (in the output) checkpoints:
function checkpoint($var='GOT HERE!', $id='CHECKPOINT') {
    global $DEBUGGING_FLAG;
    if ($DEBUGGING_FLAG) {
        $btr = debug_backtrace();
        $line = $btr[0]['line'];
        $file = $btr[0]['file'];
        echo "<hr><p>";
        echo "<b>Checkpoint:</b> $id<br>";
        echo "File: $file<br>";
        echo "Line: $line<br>";
        if (is_array($var)) {
            echo "Value:\n<pre>";
            print_r($var);
            echo "</pre>";
        } elseif(is_object($var)) {
            echo "Value:\n<pre>";
            var_dump($var);
            echo "</pre>";
        } elseif (isset($var)) {
            echo "Value: $var";
        }
        echo "</p><hr>";
    }
}

// Setup for logging errors to the server's log file:
function log_error($msg) {
    global $DEBUGGING_FLAG;
    if ($DEBUGGING_FLAG) {
        $dbg_btr = debug_backtrace();
        $line = $dbg_btr[0]['line'];
        $file = $dbg_btr[0]['file'];
        error_log("function log_error::\"$msg\"::$file::$line");
    }
}

Fixing Bugs in Complex Software

In my previous post I gave a list of Akin's Laws. I noted things I felt might be similarly true for both software design and spacecraft design. One of these was:
  • When in doubt, estimate. In an emergency, guess. But be sure to go back and clean up the mess when the real numbers come along. 
In what way do I think this applies to software engineering? When testing complex software.

From the blog posting A Fork in the Road by Matt Youell is a relevant quote:
Modern software systems contain so much abstraction and layering that it is really hard to judge the level of effort that will be involved in addressing any one problem.
Youell goes on to describe two very different ways of trying to find a bug in a "quite tangled system".

A really tough bug is often occurring at a level of abstraction or layer very much below the application level. This is what can make it hard to find. This is code we most likely did not write. There may be a misunderstanding about how a layer behaves. The bug may actually be in the application's operating system or is a bug in one of the libraries the application is using. It may even be showing up due to the way libraries interact.

The blog posting points out that for such complex bugs, it may be better to just refactor a portion of the code rather than try to track down the exact location of the bug. Refactoring often means that different abstractions and layers are used and used in different ways. (And, obviously, it should get rid of a possibly flawed application layer algorithm.)

But the point of the law given above is that we must be aware that this leaves a real mess as far as testing the refactored code is concerned. How do you test that a bug has been fixed if you did not understand the nature of the bug in the first place? This is over and above having to revalidate and reverify the refactored code from scratch.

What Else Do We Know?

In a previous post, I reproduce a list of some of the general things we know about software development practices. But what about good engineering practices in general? Specifically, what are practices that software engineering shares with the other engineering disciplines?

An interesting list I recently stumbled across is Akin's Laws of Spacecraft Design. As both a software developer and a (former) registered mechanical engineer, some of the items on the list struck me as having some meaning to software engineers as well as aeronautical engineers. Here is what I culled:
  • Engineering is done with numbers. Analysis without numbers is only an opinion.
  • Design is an iterative process. The necessary number of iterations is one more than the number you have currently done. This is true at any point in time.
  • Your best design efforts will inevitably wind up being useless in the final design. Learn to live with the disappointment.
  • At the start of any design effort, the person who most wants to be team leader is least likely to be capable of it.
  • In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point.
  • Not having all the information you need is never a satisfactory excuse for not starting the analysis.
  • When in doubt, estimate. In an emergency, guess. But be sure to go back and clean up the mess when the real numbers come along.
  • Sometimes, the fastest way to get to the end is to throw everything out and start over.
  • There is never a single right solution. There are always multiple wrong ones, though.
  • Design is based on requirements. There's no justification for designing something one bit "better" than the requirements dictate.
  • (Edison's Law) "Better" is the enemy of "good".
  • (Shea's Law) The ability to improve a design occurs primarily at the interfaces. This is also the prime location for screwing it up.
  • The previous people who did a similar analysis did not have a direct pipeline to the wisdom of the ages. There is therefore no reason to believe their analysis over yours. There is especially no reason to present their analysis as yours.
  • The fact that an analysis appears in print has no relationship to the likelihood of its being correct.
  • Past experience is excellent for providing a reality check. Too much reality can doom an otherwise worthwhile design, though.
  • The odds are greatly against you being immensely smarter than everyone else in the field. If your analysis says your terminal velocity is twice the speed of light, you may have invented warp drive, but the chances are a lot better that you've screwed up.
  • A bad design with a good presentation is doomed eventually. A good design with a bad presentation is doomed immediately.
  • (Larrabee's Law) Half of everything you hear in a classroom is crap. Education is figuring out which half is which.
  • When in doubt, document. (Documentation requirements will reach a maximum shortly after the termination of a program.)
  • The schedule you develop will seem like a complete work of fiction up until the time your customer fires you for not meeting it.
  • It's called a "Work Breakdown Structure" because the Work remaining will grow until you have a Breakdown, unless you enforce some Structure on it.
  • (Montemerlo's Law) Don't do nuthin' dumb.
  • (Varsi's Law) Schedules only move in one direction.
  • (Ranger's Law) There ain't no such thing as a free launch.
  • (von Tiesenhausen's Law of Program Management) To get an accurate estimate of final program requirements, multiply the initial time estimates by pi, and slide the decimal point on the cost estimates one place to the right.
  • (von Tiesenhausen's Law of Engineering Design) If you want to have a maximum effect on the design of a new engineering system, learn to draw. Engineers always wind up designing the vehicle to look like the initial artist's concept.
  • (Mo's Law of Evolutionary Development) You can't get to the moon by climbing successively taller trees.
  • (Atkin's Law of Demonstrations) When the hardware is working perfectly, the really important visitors don't show up.
  • (Patton's Law of Program Planning) A good plan violently executed now is better than a perfect plan next week.
  • (Roosevelt's Law of Task Planning) Do what you can, where you are, with what you have.
  • (de Saint-Exupery's Law of Design) A designer knows that he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.
  • Any run-of-the-mill engineer can design something which is elegant. A good engineer designs systems to be efficient. A great engineer designs them to be effective.
  • (Henshaw's Law) One key to success in a mission is establishing clear lines of blame.
  • Capabilities drive requirements, regardless of what the systems engineering textbooks say.
  • The three keys to keeping a new manned space program affordable and on schedule:
    •        No new launch vehicles.
    •        No new launch vehicles.
    •        Whatever you do, don't decide to develop any new launch vehicles.

Why Politely Avoid Sophistry?

I've always been rather polite in my writings and comments on the Internet. (OK, I'm human. I'm sure there's been an exception or two.) The reason is that I've always felt decorum fosters consensus.

Whether this is true or not is debatable. But the article Crude Comments and Concern: Online Incivility's Effect on Risk Perceptions of Emerging Technologies by Ashley A. Anderson, et. al.,  gives some evidence that supports the converse.

From the abstract:
Uncivil discourse is a growing concern in American rhetoric, and this trend has expanded beyond traditional media to online sources, such as audience comments. Using an experiment given to a sample representative of the U.S. population, we examine the effects online incivility on perceptions toward a particular issue—namely, an emerging technology, nanotechnology. We found that exposure to uncivil blog comments can polarize risk perceptions of nanotechnology along the lines of religiosity and issue support. [Emphasis added.]
And, from the conclusion:
The effects of online, user-to-user incivility on perceptions towards emerging technologies may prove especially troublesome for science experts and communicators that rely on public acceptance of their information. The effects of online incivility may be even stronger for more well-known and contentious science issues such as the evolution vs. intelligent design debate or climate change. Future research may explore these issues to gain a better understanding of the formation of risk perceptions for controversial political or science topics in the context of user-generated online comments. [Emphasis added.]
Obviously, if my goal is to learn -- to change my mind or understanding about a topic, such as climate change for example -- then becoming polarized on issues about the topic is a bad thing. Those readers with the same goal will agree. And those with other goals? I don't care too much about them.

What We Know About Software Development Practices

Almost sixty years after the the invention of FORTRAN, we are beginning to build a consensus about some basic software development best practices. Here are a couple of lists recently put together of best practices for scientific software. The lists are mostly complementary. I hope to be able to discuss some of the various items individually in later posts.

Note that since all software are written by people, for their own benefit, these rules should apply to most any type of software development.

The first list is from an Arxiv preprint titled Best Practices for Scientific Computing
 by Greg Wilson, et. al.
  1. Write programs for people, not computers.
    • A program should not require its readers to hold more than a handful of facts in memory at once.
    • Names should be consistent, distinctive, and meaningful.
    • All aspects of software development should be broken down into tasks roughly an hour long. [...] total productivity is maximized when people work roughly 40 hours a week.
  2. Automate repetitive tasks.
    • Rely on the computer to repeat tasks.
    • Save recent commands to a file for re-use.
    • Scientists should use a build tool to automate their scientific workflows.
  3. Use the computer to record history.
    • Software tools should be used to track computational work automatically.
  4. Make incremental changes.
    • Work in small steps with frequent feedback and course correction.
  5. Use version control.
    • Use a version control system.
    • Everything that has been created manually should be put in version control.
  6. Don't repeat yourself (or others).
    • Every piece of data must have a single authoritative representation in the system.
    • Code should be modularized rather than copied and pasted.
    • Re-use code instead of rewriting it.
  7. Plan for mistakes.
    • Add assertions to programs to check their operation.
    • Use an off-the-shelf unit testing library.
    • Use all available oracles [that is, something which tells a developer how a program should behave or what its output should be] when testing programs.
    • Turn bugs into test cases.
    • Use a symbolic debugger.
  8. Optimize software only after it runs correctly.
    • Use a profiler to identify bottlenects
    • Write code in the highest-level language possible.
  9.  Document design and purpose, not mechanics.
    • Document interfaces and reasons, not implementations.
    • Refactor code instead of explaining how it works
    • Embed the documentation for a piece of software in that software.
  10. Collaborate.
    • Use pre-merge code reviews.
    • Use pair programming when bringing someon new up to speed and when tackling particularly tricky problems.
    • Use an issue tracking tool.

The article is impressive if for nothing else than its 67 references.

From the conclusion:
Research suggests that the time cost of implementing these kinds of tools and approaches in scientific computing is almost immediately offset by the gains in productivity of the programmers involved. Even so, the recommendations described above may seem intimidating to implement. Fortunately, the different practices reinforce and support one another, so the effort required is less than the sum of adding each component separately. Nevertheless, we do not recommend that research groups attempt to implement all of these recommendations at once, but instead suggest that these tools be introduced incrementally over a period of time
The second list is Ten Simple Rules for the Open Development of Scientific Software by Andreas Prlic and James B. Procter.
  1. Don't Reinvent the Wheel
    •  As in any other field, you should do some research before starting a new programming project to find out if aspects of your problem have already been solved.
  2. Code Well
    •  Study other people's code and learn by practice. Join an existing open-source project.
  3. Be Your Own User
    •  One of the more graphic mottos in the open-source community is "eat your own dog food".
  4. Be Transparent
    •  People with similar or related research interests who discover the project will find that they have more to gain from collaborating than from competing with the original developers.
    •  One consequence of transparent, open development is that it allows many eyes to evaluate the code and recognize and fix any issues, which reduces the likelihood of serious errors in the final product.
  5. Be Simple
    •  If your software is too complex to obtain and operate or can only run on one platform, then few people will bother to try it out, and even fewer will use it successfully (particularly your reviewers!).
  6. Don't Be a Perfectionist
    •  Don't wait too long with getting the first version of your source code out into the public and don't worry too much if your first prototypes still have critical features missing. If your idea is innovative, others will understand the concept.
  7. Nurture and Grow Your Community
    •  The biggest advantage of open development is that it allows users and developers to freely interact and form communities, and if your software is useful, your user base will grow.
  8. Promote Your Project
    •  Appearance matters, and a clean, well-organized website that will help your cause is not hard to achieve.
    •  Create personae for your project on social networks that people can connect to, and increase your presence in online discussion forums.
    •  Finally, remember about more traditional ways of communicating your work.
  9. Find Sponsors
    •  No matter how large the community around your project and how efficiently it is developed and managed, some level of funding is essential.
  10. Science Counts
    •  Whilst the development of software for the consumption of others aligns well with other processes of scientific advancement, it is the science that ultimately counts. Scientific software development fulfils an immediate need, but maintenance of code that is no longer relevant to your own research is a serious time sink, and will rarely lead to your next paper, or secure your next grant or position.
 From the article:
The sustainability of software after publication is probably the biggest problem faced by researchers who develop it, and it is here that participating in open development from the outset can make the biggest impact. Grant-based funding is often exhausted shortly after new software is released, and without support, in-house maintenance of the software and the systems it depends on becomes a struggle. As a consequence, the software will cease to work or become unavailable for download fairly quickly, which may contravene archival policies stipulated by your journal or funding body. A collaborative and open project allows you to spread the resource and maintenance load to minimize these risks, and significantly contributes to the sustainability of your software.

Better Than Fizz Buzz?

Last post I criticized Fizz Buzz. (Actually job interviewers that use Fizz Buzz.) It would be constructive for me to offer an alternative. My choice would be the Monte Hall Problem (MHP). Here is a quick description from the Wikipedia:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

The MHP has been one of the first in-class assignments in the introduction to programming classes I have taught. (I believe the best way to learn programming is to read other peoples code and to program yourself. So I have students do both in class.)

Over 50 implementations in various languages are available here.

An interesting thing about the MHP is that an implementation of it can be either an analysis or a simulation. Consider these two versions written in Python.


'''Monte Hall Problem

This program "solves" the Monty Hall Problem.
However, it is not a simulation, it's an analysis!

So anybody unconvinced by this analysis of the problem will likely remain
unconvinced about what a contestant should do in real life!

'''

import random

# Initial conditions:
iterations = 10000000  # law of large numbers
win_if_switch = 0
win_if_dont = 0

# Perform the random experiments
for i in xrange(iterations):

    # Random (1 in 3) chance of car being behind doors 1, 2, or 3
    door_with_car = random.randint(1, 3)

    # Without loss of generality, contestant always picks door 1
    if door_with_car == 1:
        win_if_dont += 1
    else:
        # Only two doors left and Monty narrows that down to one door
        # by exposing a goat door to the contestant, so
        win_if_switch += 1


print "%d: %d to %d" % (iterations, win_if_switch, win_if_dont)

# Example output:
#    10000000: 6666110 to 3333890
#
# Thus, expect about 2 to 1 favorite odds if contestant switches doors.
# Versus 1 in 3 chance of winning a car if contestant doesn't switch.


'''Monte Hall Simulation

Based on the theory that the best way to understand a problem is to
simulate it! Repeatedly!

'''

import random

# Initial conditions:
iterations = 1000000  # law of large numbers
win_if_switch = 0
win_if_dont = 0

# Before the contestant makes her choice, the prizes are set up:
doors = {1: "goat", 2: "car", 3: "goat"}
# Note that Monty knows the car is behind door #2.
# The contestant does not.

# Perform the random experiments
for i in xrange(iterations):

    # First, the contestant picks a door
    # Since she doesn't know it's behind door #2,
    # she has to guess
    picked_door = random.randint(1, 3)

    # Monty offer a switch.
    # Show either door 1 or 3. Obviously, can't show door 2, that's the car!
    if picked_door == 3:
        shown_door = 1
    else:
        shown_door = 3

    # Next, contestant confirms her choice, or switches:
    # Suppose, in one universe, she switches
    if picked_door == 1:
        # Monty showed door 3:
        final_choice = 2
    elif picked_door == 2:
        # Monty showed door 3
        final_choice = 1
    else: # her pick was door 3
        # Monty forced to show door 1
        final_choice = 2

    # So the result if switching
    if final_choice == 2:
        win_if_switch += 1

    # Suppose, in another universe, she does not switch
    # Contestant collects her prize
    if picked_door == 2:
        win_if_dont += 1

print "%d: %d to %d" % (iterations, win_if_switch, win_if_dont)

# Example output:
#    1000000: 666693 to 333307
#
# Thus, expect about 2 to 1 odds if contestant switches doors.
# Versus 1 in 3 chance if don't switch.
# Totals sum to iterations, since in one universe or other,
# someone always wins the car.

The solution to the MHP, that it is doubly preferable to switch doors if given the chance, is unintuitive. Most people think it doesn't matter. So, for most programmers, the process of implementing a correct solution should force them to an unintuitive conclusion. Process over bias! So you would think this a perfect test for competency.

However, this has not been my classroom experience. Before programming the MHP, I would ask the students what they thought the answer should be. As expected, the majority got it wrong. The surprise was that even after implementing a solution (for those that could implement a solution) they implemented the wrong solution. It was the solution they expected.

I suppose there are a lot things that could be said about this.

Fizz Buzz Implementations

Fizz Buzz is commonly used as an interview screening test for computer programmers. It is claimed in the blogsphere (see the references cited in the previous Wikipedia link) that demonstrating the ability to implement Fizz Buzz live is a trivial exercise for any competent developer. This is in spite of the fact that such a claim is often accompanied by anecdotal evidence that over half of experienced programmers applying for typical programming jobs need more than a trivial amount of time (say, more than five minutes) to implement a Fizz Buzz solution.

Why evidence about the difficulty of implementing Fizz Buzz can't be taken at face value is baffling to me. Instead, the blogsphere is full of arguments for why Fizz Buzz shows that most experienced programmers really can't program. That there are few truly competent programmers. And the ability to actually code anything is a skill reserved for the talented few.

But I see confirmation bias in this argument. I see elitism. I see a knee-jerk reluctance to question authority opinion or consensus. This is the kind of thing that shows programming is comprised of great artists rather than great engineers. And importantly, even if it is true that not many people can actually program, what does Fizz Buzz have to do with anything? Answer -- Nothing.

Just as there is no test that demonstrates programming competence, there is no test that demonstrates programming incompetence. Also, any trivial test is of trivial usefulness.

Let's demonstrate. For Fizz Buzz, the usual requirements are:

  • For a range of numbers (say, 1 to 100)
  • If the current number is divisible by 3, then print "Fizz"
  • If the current number is divisible by 5, then print "Buzz"
  • If the current number is divisible by both 3 and 5, then print "FizzBuzz"
  • Else, print the current number.

The most straightforward solution I can think of that satisfies these requirements would be (in, say, Python):


'''Straightforward Fizz Buzz implementation.'''
print 1
print 2
print "Fizz"
print 4
print "Buzz"
# You get the idea...
print 13
print 14
print "FizzBuzz"
print 16
# No need to go on...

I could type, cut and paste, and save the whole thing out in under five minutes. So the task is, in fact, trivial. Asking for a "standard code review" afterward would catch any typos.

However, I am almost certain this solution would be deemed by interviewers as trivial using a derogatory meaning of the word. They would ask me about the unstated requirements. There are always unstated requirements. It's called domain knowledge. I must know that the purpose of the Fizz Buzz test is to demonstrate being a competent programmer. Every competent programmer knows the tricks of the trade. I must use loops and conditional branching. I must show fluency in a particular computer language.

So next I would try implementing a meta-solution to Fizz Buzz (in, say, PHP):


<?php
// Fizz Buzz Meta-Implementation.
// Notice output is a Python program that implements Fizz Buzz
// as stupidly as possible.

for ($i = 1; $i <= 100; $i++) {
    $s = (string)$i;
    $outs = array($s, 'Fizz', 'Buzz', 'FizzBuzz'); 
      // efficiency deliberately shown to not be a requirement
    $token = 0; 
      // its final value will tell us to print Fizz, Buzz, 
      // FizzBuzz, or the loop index
    if (is_int($i/3)) {
        $token += 1;
    }
    if (is_int($i/5)) {
        $token += 2;
    }
    echo("print '" . $outs[$token] . "'\n");
}
?>

This actually took me more than five minutes, but it might be sufficiently original to impress an interviewer. Notice, among other things, that testing this solution is gratuitously difficult. First the PHP program would have to be run. Then the Python program.

Does the above solution show programming competence in any meaningful way? Of course not.

How about an implicit requirement that I implement a solution like I would implement a "real" program? In that case, I would immediately google a solution--probably trying StackOverflow first. In under five minutes, this yielded a beautiful solution that's potentially even a one-liner. However, it took more than five minutes just to type up and fix the typos. (Which, for this particular case, were all spelling errors.)


'''Fizz Buzz implementation. See Wikipedia entry for requirements.

Input: None
Output is one of: { loop_counter_value | Fizz | Buzz | FizzBuzz }

Tried to make the program as simple to understand as possible.
But did not try to hide from Python idiom.
So simple-to-understand != easy-to-understand necessarily.
Depends on familiarity with Python.

From snippet found on StackOverflow.

Basic idea is that divisibility can be calculated using modulo arithmetic.
That is, if x is divisible by y, then (not x % y) is true, else false.
Everything else is just Python tricks.

Validation and Verification: Inspected that results met requirements.

'''

# Range of numbers to determine FizzBuzz for:
smallest = 1
largest = 100

for i in range(smallest, largest + 1):
    print (not i % 3) * 'Fizz' + (not i % 5) * 'Buzz' or str(i)

All the comments are because real code is documented and my implicit requirement was to imitate real code. So does this demonstrate competence? No. What about the fact I took more than a trivial amount of time. More than five minutes. Is the above code outside the abilities of many programmers given sufficient time and knowledge of the basic idea described in the code's comments? No.

A crucial point is the trick about modulo arithmetic. A competent programmer could go their entire career, depending on their domain expertise, without every having to take the remainder of any number. Much less test it.

Having googled Fizz Buzz and seen various solutions in various languages, could I now implement a typical solution from memory in under five minutes in a language I haven't seen it implemented in? Here is the implementation I came up with in Go (golang).


package main

import (
 "fmt"
)

func main() {
 smallest := 1
 largest := 100
 for i := smallest; i < largest+1; i++ {
  fizz := i%3 == 0
  buzz := i%5 == 0
  if !(fizz || buzz) {
   fmt.Print(i)
  } else {
   if fizz {
    fmt.Print("Fizz")
   }
   if buzz {
    fmt.Print("Buzz")
   }
  }
  fmt.Println()
 }
}

No. The above took more than five minutes. Quite a bit more. I was not as familiar with the details of Go syntax as I thought I was. I cannot write more than a few lines of code without making some kind of mistake. I had to look up the command for automatic reformatting.

Finally, what if I reimplemented the above solution in a language I am very comfortable with? Say, C++?


#include <iostream>
using namespace std;

int main() {
 int smallest = 1;
 int largest = 100;

 bool fizz;
 bool buzz;

 for (int i = smallest; i < largest+1; i++) {
  fizz = i % 3 == 0;
  buzz = i % 5 == 0;
  if (!(fizz || buzz)) {
   cout << i;
  } else {
   if (fizz) {
    cout << "Fizz";
   }
   if (buzz) {
    cout << "Buzz";
   }
  }
        cout << endl;
 }
}

Yes. I could essentially copy the solution from Go to C++ in under five minutes. Does this demonstrate I am a competent programmer? No.

In conclusion, despite the seeming consensus of expert programmer/bloggers, Fizz Buzz is NOT a trivial exercise for any competent programmer. It is a tenable belief that there are no trivial tests for either competency or incompentency. The evidence about Fizz Buzz can be taken at face value. I would never require an interviewee to implement Fizz Buzz.