What We Know About Software Development Practices

Almost sixty years after the the invention of FORTRAN, we are beginning to build a consensus about some basic software development best practices. Here are a couple of lists recently put together of best practices for scientific software. The lists are mostly complementary. I hope to be able to discuss some of the various items individually in later posts.

Note that since all software are written by people, for their own benefit, these rules should apply to most any type of software development.

The first list is from an Arxiv preprint titled Best Practices for Scientific Computing
 by Greg Wilson, et. al.
  1. Write programs for people, not computers.
    • A program should not require its readers to hold more than a handful of facts in memory at once.
    • Names should be consistent, distinctive, and meaningful.
    • All aspects of software development should be broken down into tasks roughly an hour long. [...] total productivity is maximized when people work roughly 40 hours a week.
  2. Automate repetitive tasks.
    • Rely on the computer to repeat tasks.
    • Save recent commands to a file for re-use.
    • Scientists should use a build tool to automate their scientific workflows.
  3. Use the computer to record history.
    • Software tools should be used to track computational work automatically.
  4. Make incremental changes.
    • Work in small steps with frequent feedback and course correction.
  5. Use version control.
    • Use a version control system.
    • Everything that has been created manually should be put in version control.
  6. Don't repeat yourself (or others).
    • Every piece of data must have a single authoritative representation in the system.
    • Code should be modularized rather than copied and pasted.
    • Re-use code instead of rewriting it.
  7. Plan for mistakes.
    • Add assertions to programs to check their operation.
    • Use an off-the-shelf unit testing library.
    • Use all available oracles [that is, something which tells a developer how a program should behave or what its output should be] when testing programs.
    • Turn bugs into test cases.
    • Use a symbolic debugger.
  8. Optimize software only after it runs correctly.
    • Use a profiler to identify bottlenects
    • Write code in the highest-level language possible.
  9.  Document design and purpose, not mechanics.
    • Document interfaces and reasons, not implementations.
    • Refactor code instead of explaining how it works
    • Embed the documentation for a piece of software in that software.
  10. Collaborate.
    • Use pre-merge code reviews.
    • Use pair programming when bringing someon new up to speed and when tackling particularly tricky problems.
    • Use an issue tracking tool.

The article is impressive if for nothing else than its 67 references.

From the conclusion:
Research suggests that the time cost of implementing these kinds of tools and approaches in scientific computing is almost immediately offset by the gains in productivity of the programmers involved. Even so, the recommendations described above may seem intimidating to implement. Fortunately, the different practices reinforce and support one another, so the effort required is less than the sum of adding each component separately. Nevertheless, we do not recommend that research groups attempt to implement all of these recommendations at once, but instead suggest that these tools be introduced incrementally over a period of time
The second list is Ten Simple Rules for the Open Development of Scientific Software by Andreas Prlic and James B. Procter.
  1. Don't Reinvent the Wheel
    •  As in any other field, you should do some research before starting a new programming project to find out if aspects of your problem have already been solved.
  2. Code Well
    •  Study other people's code and learn by practice. Join an existing open-source project.
  3. Be Your Own User
    •  One of the more graphic mottos in the open-source community is "eat your own dog food".
  4. Be Transparent
    •  People with similar or related research interests who discover the project will find that they have more to gain from collaborating than from competing with the original developers.
    •  One consequence of transparent, open development is that it allows many eyes to evaluate the code and recognize and fix any issues, which reduces the likelihood of serious errors in the final product.
  5. Be Simple
    •  If your software is too complex to obtain and operate or can only run on one platform, then few people will bother to try it out, and even fewer will use it successfully (particularly your reviewers!).
  6. Don't Be a Perfectionist
    •  Don't wait too long with getting the first version of your source code out into the public and don't worry too much if your first prototypes still have critical features missing. If your idea is innovative, others will understand the concept.
  7. Nurture and Grow Your Community
    •  The biggest advantage of open development is that it allows users and developers to freely interact and form communities, and if your software is useful, your user base will grow.
  8. Promote Your Project
    •  Appearance matters, and a clean, well-organized website that will help your cause is not hard to achieve.
    •  Create personae for your project on social networks that people can connect to, and increase your presence in online discussion forums.
    •  Finally, remember about more traditional ways of communicating your work.
  9. Find Sponsors
    •  No matter how large the community around your project and how efficiently it is developed and managed, some level of funding is essential.
  10. Science Counts
    •  Whilst the development of software for the consumption of others aligns well with other processes of scientific advancement, it is the science that ultimately counts. Scientific software development fulfils an immediate need, but maintenance of code that is no longer relevant to your own research is a serious time sink, and will rarely lead to your next paper, or secure your next grant or position.
 From the article:
The sustainability of software after publication is probably the biggest problem faced by researchers who develop it, and it is here that participating in open development from the outset can make the biggest impact. Grant-based funding is often exhausted shortly after new software is released, and without support, in-house maintenance of the software and the systems it depends on becomes a struggle. As a consequence, the software will cease to work or become unavailable for download fairly quickly, which may contravene archival policies stipulated by your journal or funding body. A collaborative and open project allows you to spread the resource and maintenance load to minimize these risks, and significantly contributes to the sustainability of your software.

No comments:

Post a Comment