Modelers HATE Python!?

I recently ran across the following by a person involved in mesoscale weather modeling and graduating meteorology majors:
Fortran is the language of choice and the reason has nothing to do with legacy code. Nearly all modelers that I know are fluent not only in Fortran, but C, C++, and Perl as well. Fortran is the language used because it allows you to express the mathematics and physics in a very clear succinct fashion. The idea here is that [while] a craftsman has many tools in his tool chest, the amateur believes everything is a nail. The only common feature in terms of programming tools amongst modelers is a universal HATRED of object-oriented programming languages, particularly Python.
Object-oriented programming is the answer to a question that nobody has ever felt the need to ask. Programming in an object-oriented language is like kicking a dead whale down the beach.


I have no doubt that this is a sincere and knowledgeable comment. And I am not saying that just because this blog observes proper decorum and thus always assumes the Principle of Charity. I think I know why such an attitude may be prevalent. (But not universal. I have modeled things and I do not hate Python.) Let me explain by way of a toy example.

Principles of Planetary Climate is an online working draft of a book on the physics of climate. The author is Raymond T. Pierrehumbert who does research and teaches at the University of Chicago. Dr. Pierrehumbert also frequently posts (as "raypierre") on the popular blog RealClimate.

There is a computational skills web page that accompanies the book. On this page is a tutorial with links to basic and advanced Python scripts for solving a simple example ordinary differential equation with one dependent variable. There is also a script that uses a numerical/graphical library called ClimateUtilities.

The basic Python script implements three different ODE integration methods (Euler, midpoint, and Runge-Kutta) for the example differential equation and then compares their error to the exact analytical solution. (Let me note, since this blog is very concerned about IV&V issues, that the only discussion of validation and verification is a brief reference to a numerical stability problem with the midpoint method. The opportunity to discuss convergence and performance issues is also missed. As is the practicality of using multiple algorithms to solve the same problem as a verification technique. Obviously the author felt such IV&V issues to be of less than fundamental importance.)

The advanced Python script implements the three methods using an object oriented approach. IMHO, this script clearly demonstrates the unsuitability of an "everything is a nail" object oriented approach to numerical programming. IMHO, it perfectly illustrates the point of the comment I quoted above. The "advanced" script has many more lines of code, is much more complex in design, and I am certain would execute much more slowly than the "basic" implementation.

But a forceful reply to the comment quoted above is that neither the "advanced" nor the "basic" approaches are appropriate. Sure pure Python is numerically slow, but Python comes with batteries included. Every Pythonista knows that a key feature of Python is its "extensive standard libraries and third party modules for virtually every task". Python is a great way to glue libraries and third party modules together.

But here I found out that some libraries are better than others. I was unable to successfully install the ClimateUtilities library on my version of Ubuntu Linux (9.10). So I wrote a script that uses the SciPy library instead (as well as my own version of Runge-Kutta), as shown below. Note how short and straightforward the implementation is and, if you run it yourself, how much faster it is to use a numerical library. It is practically as fast as any compiled language implementation, Fortran or whatever. And don't forget, in Python, everything is an object. (E.g., do a dir(1) in Python. Even the number one is an object!)

(I ran into some interesting numerical features. See the dt value I used below for RK4. Maybe a subject for a later post?)


 '''
 Numerically solve an ODE using RK4 or scipy's odeint.
 
 See gmcrewsBlog post for details.
 
 ODE: dy/dt = slope(y, t)
 Where: slope(y, t) = -t * y
 And: y(0.) = 1.0
 Stopping at: y(5.)
 
 Note that analytical solution is: y(t) = y(0) * exp(-t**2 / 2)
 So error at y(5.) may be calculated.
 '''
 
 import math
 import time
 from scipy.integrate import odeint
 
 
 def slope(y, t):
     '''Function to use for testing the numerical methods.'''
     return -t * y
 
 
 # Parameters:
 t_start = 0.
 y_start = 1.
 t_end = 5.
 
 # Analytical solution:
 y_exact = y_start * math.exp(-t_end**2 / 2.)
 
 print "ODE: dy/dt = -t * y"
 print "Initial condition: y(%g) = %g" % (t_start, y_start)
 print "Analytical solution: y(t) = y(0) * exp(-t**2 / 2)"
 print "Analytical solution at y(%g) = %g" % (t_end, y_exact)
 print
 
 # Do a Runge-Kutta (RK4) march and see how good a job it does:
 
 dt = 0.000044 # chosen so that approx. same error as scipy
 # However: try dt = .04 which gives even lower error!
 dt = .04
 runtime_start = time.time() # keep track of computer's run time
 t = t_start
 y = y_start
 h = dt / 2.
 while t < t_end:
     k1 = dt * slope(y, t)
     th = t + h
     k2 = dt * slope(y + k1 / 2., th)
     k3 = dt * slope(y + k2 / 2., th)
     t = t + dt
     k4 = dt * slope(y + k3, t)
     y = y + (k1 + (2. * k2) + (2. * k3) + k4) / 6.
 runtime = time.time() - runtime_start
 
 err = (y - y_exact) / y_exact * 100. # percent
 err_rate = err / runtime # error accumulation rate over time
 
 print "RK4 Results:"
 print "dt = %g" % dt
 print "Runtime = %g seconds" % runtime
 print "Solution: y(%g) = %g" % (t_end, y)
 print "Error: %g %%" % err
 print
 
 # What does scipy's ode solver return?
 runtime_start = time.time() #keep track of computer's run time
 results = odeint(slope, y_start, [t_start, t_end])
 runtime = time.time() - runtime_start
 
 y = results[1][0]
 err = (y - y_exact) / y_exact * 100. # percent
 err_rate = err / runtime # error accumulation rate over time
 
 print "scipy.integrate.odeint Results:"
 print "Runtime = %g seconds" % runtime
 print "Solution: y(%g) = %s" % (t_end, y)
 print "Error: %g %%" % err
 print

3 comments:

  1. It looks like odeint does adaptive time-steps, so it's hard to compare to a constant step-size RK scheme.

    I think maybe the small step size is running into subtractive cancellation problems; there's an optimal step size that makes the truncation error and the error due to subtractive cancellation equal. It'd be interesting to see how the wall-clock time for your custom RK scheme at the optimum time-step-size compares to the adaptive scheme in Scipy's odeint.

    ReplyDelete
  2. ...wall-clock time for your custom RK scheme...
    I guess you'd need to put it into a compiled language and wrap it for a more apples to apples comparison to odeint (Python loops are pretty slow, though not as bad as Octave).

    ReplyDelete
  3. I just stumbled across the issue. It's not like I intimately understand SciPy's odeint implementation and deliberately set things up. And a good point about using wall-clock time. It didn't even dawn on me when I was coding this up.

    In SQA, it is often emphasized that a software by itself is never qualified. It is always qualified for an intended usage. I think even this toy program shows why this is the case. It has issues some other ODE would not have.

    But to reiterate the basic point, unless it's something exceptional (such as using a Python dictionary for a hash table -- Python's hashing algorithm is world class) assume Python is slow and call some library that's been compiled if numerical performance is needed. Python's strengths lie elsewhere. And so do the strengths of object oriented programming.

    ReplyDelete