Weasel Program - Python + C Version

This is a continuation of recent posts on the Weasel Program. There are various claims about what the program demonstrates about the processes that drive evolution. And I am trying to sort it out for myself.

Computer models are great for studying the relative importance of the factors affecting the thing being modeled. One aspect of this is sensitivity analysis. In order to do a sensitivity study of the parameters affecting the Weasel Program, it would be convenient to use Python. This would allow me to use tools such as matplotlib and numpy. However, Python is relatively slow. C is much faster. In this post I show how to use SWIG (Simplified Wrapper and Interface Generator) to get a combination that is just about the best of both languages. I've used SWIG since the 90's and it remains popular today. It has been much improved over the years and it works not only with C, but C++ and about 20 other languages as well. So it's a tool made for polyglots.

The first thing I did was put all source code in the same folder (weasel.c and weasel-driver.py). Then I refactored the C program I orginally wrote. Now that I understand the problem better I improved its design a bit. (All software designs evolve, no pun intended.) But most importantly, I decided which of the model's defining parameters and functions need to be externally available. (I used the Go convention that what I thought needed to be a publicly available name begin with a capital letter.) Actual implementation details remain hidden. Here is the refactored weasel.c file.

#include < stdio.h >
#include < stdlib.h >
#include < string.h >
#include < time.h >
#include < sys/time.h >

/* Weasel Program -- evolutionay simulation demonstration

A simple computer simulation demonstrating
that the process that drives natural selection
(based on random variation combined with non-random selection)
is qualitatively different from results based on pure random selection.

See Wikipedia for more info on Weasel Program.

Program has been made more general by allowing a gene to be any integer.

Program allows selection process to be modified.
That is, does not have to be number of matching genes.

Basic algorithm:
    Generate children from single parent keeping most fit as next parent.
    Single parent = no crossover

    Target genome does not drift (change, shrink, enlarge).
    Target genome is array of integers.
    So a gene is just an integer value.(from 0 to RAND_MAX)

    Stop when child genome fitness >= target fitness

    Note: to save space, child is discarded immediately if not most fit
    in generation so far.

Note, used following command lines for speed:
    gcc -O3 weasel.c -o weasel
    ./weasel
or debugging:
    gcc -Wall -g weasel.c -o weasel.dbg
    gdb weasel.dbg
*/

// Avoids any malloc/free issues:
#define MAX_GENOME_LEN 10000
#define MAX_NUM_GENES 10000
int PARENT[MAX_GENOME_LEN] = {0};
int CHILD[MAX_GENOME_LEN] = {0};
int FITTEST[MAX_GENOME_LEN] = {0};

// Define these to control simulation:
// (Give some defaults to get started with a sensitivity analysis.)

// Represents simulation parameters.
int Genome_len = 100;  // <= MAX_GENOME_LEN
int Num_genes = 30;  // <= MAX_NUM_GENES
int Generation_size = 100;
double Mutation_probability = 0.02;
double Fitness_err = 0.01;
double Target_fitness = 30;

// Control simulation.
int Max_generations = 10000;

// Represents evolutionary model.
double calc_child_fitness() {
    // CHANGE MODEL AS NEEDED.
    // For this particular model:
    // Fitness model is completely 'internalized'.
    // Our strategy is centered only on the genome itself.
    // No external world or creature to simulate.
    // No effect on child creation mechanism.
    // Actual fixed target gene value does not matter (so set = 0).
    // Only need match particular value at particular position
    // on the target genome.
    const int target[MAX_GENOME_LEN] = {0};
    int i;
    double fitness = 0.;
    for (i = 0; i < Genome_len; i++) {
        fitness += (double)(CHILD[i] == target[i]);
    }
    return fitness;
}

// Represents reproduction and gene semantics.
void create_child() {
    int i;
    for (i = 0; i < Genome_len; i++) {
        // drand48() = 48 bit uniform 0.0 to 1.0
        if (drand48() <= Mutation_probability) {
            CHILD[i] = rand() % Num_genes;
        } else {
            CHILD[i] = PARENT[i];
        }
    }
}
void create_random_parent() {
    // Setup first generation.
    int i;
    for (i = 0; i < Genome_len; i++) {
        PARENT[i] = rand() % Num_genes;
    }
}

/*
Starting with a parent,
generate children that have chance of mutated genes.
Most fit (only) child survives to become next generation's parent.
(A parent never survives from one generation to the next.)
*/
int Evolve() {  // generations needed

    int i, j;
    double fitness, max_fitness;

    // Assume first generation parent fitness is low.
    int generation = 0;
    create_random_parent();
    max_fitness = 0.0;

    // Iterate until a child reaches target fitness or to many generations.
    for (generation = 1; generation < Max_generations; generation++) {
        if (max_fitness >= Target_fitness - Fitness_err) break;
        max_fitness = 0;  // yes, a generation can go downhill!
        for (i = 0; i < Generation_size; i++) {
            create_child();
            fitness = calc_child_fitness();
            if (fitness >= max_fitness) {
                max_fitness = fitness;
                for (j = 0; j < Genome_len; j++) {
                    FITTEST[j] = CHILD[j];
                }
            }
        }
        for (j = 0; j < Genome_len; j++) {
            PARENT[j] = FITTEST[j];
        }
    }
    return generation;
}

int main() {

    struct timeval start_time, end_time;

    Genome_len = 100;
    Num_genes = 30;
    Generation_size = 100;
    Mutation_probability = 0.02;
    Fitness_err = .01;
    Target_fitness = Genome_len;

    Max_generations = 10000;
    
    srand(time(NULL));

    printf("LEN: %4d, NUM: %4d, SIZE: %4d, MUTATION: %7.3f\n",
           Genome_len, Num_genes, Generation_size, Mutation_probability);

    int i;
    int num_trials = 10;
    double total_time = 0.0;
    int total_generations = 0;;
    for (i = 0; i < num_trials; i++) {
        gettimeofday(&start_time, NULL);

        total_generations += Evolve();

        gettimeofday(&end_time, NULL);
        long microseconds =
            (end_time.tv_sec * 1000000 + end_time.tv_usec) -
            (start_time.tv_sec * 1000000 + start_time.tv_usec);
        total_time += microseconds / 1000000.;
    }

    double time_per_trial = total_time / num_trials;
    int generations_per_trial = total_generations / num_trials;

    printf("TRIALS: %4d, AVE. GEN: %4d/trial, AVE. TIME: %7.3f sec/trial\n",
           num_trials, generations_per_trial, time_per_trial);

    return 0;
}

Some things to notice. I went from characters to integers so I could use any number of genes I wanted. I increased the size of the static arrays so I wouldn't need to worry about accidental overflow. I defined some defaults for the model parameters. And I made corresponding changes to the main function so I could test all the other changes. (One of the issues with SWIG is that sometimes it makes testing harder. And it is a general best practice anyway to do as much testing before integration as practical.) There was no need to remove main before making it a python callable module using SWIG.

Here is the corresponding SWIG interface file.

/* SWIG Interface File: weasel.i */
%module weasel
%{
extern int Genome_len;
extern int Num_genes;
extern int Generation_size;
extern double Mutation_probability;
extern double Fitness_err;
extern double Target_fitness;
extern int Max_generations;

extern int Evolve();
%}

extern int Genome_len;
extern int Num_genes;
extern int Generation_size;
extern double Mutation_probability;
extern double Fitness_err;
extern double Target_fitness;
extern int Max_generations;

extern int Evolve();

The thing to note here is how this makes it much more difficult to add or change things. It's like defining a class. Something I usually try and avoid when trying to understand some new idea or prototype some unfamiliar algorithm. But this is evidence for the argument that I should have used C++ from the start rather than C.

And these are the commands I used to create the Python callable weasel module. (I'm using Ubuntu 12.04, and installed SWIG at ~/apps/swig. So your commands might need to be a little different.)

#/usr/bin/env bash
~/apps/swig/bin/swig -python weasel.i
gcc -c -fpic weasel.c weasel_wrap.c -I/usr/include/python2.7
gcc -shared weasel.o weasel_wrap.o -o _weasel.so

Finally, here is how I refactored the Python version I originally posted.

'''Weasel Program -- evolutionay simulation demonstration

A simple computer simulation demonstrating
that the process that drives natural selection
(based on random variation combined with non-random selection)
is qualitatively different from results based on pure random selection.

See Wikipedia for more info on Weasel Program.

This is a driver accessing via SWIG a C module that does the evolutionary stuff.
This was done for performance. See the weasel.i interface file for what's accessible.

'''
import random
import time
import weasel as w  # this is the C module accessible via SWIG

random.seed()

# Adjust the following to see how they affect number of generations
# needed to reach fitness goal and time to convergence.

# Set simulation parameters:
wv = w.cvar
wv.Genome_len = 30;
wv.Num_genes = 30;
wv.Generation_size = 100;
wv.Mutation_probability = 0.02;
wv.Fitness_err = .01;
wv.Target_fitness = wv.Genome_len;

# Control the simulation:
wv.Max_generations = 10000;

fmt_str = "LEN: %4d, NUM: %4d, SIZE: %4d, MUTATION: %7.3f\n"
fmt_str_vars = (wv.Genome_len, wv.Num_genes,
    wv.Generation_size, wv.Mutation_probability)
print fmt_str % fmt_str_vars

# To keep track of performance.

num_trials = 1
total_time = 0.
total_generations = 0
for i in range(num_trials):

    start_time = time.time()
    generations = w.Evolve()
    dt = time.time() - start_time

    total_generations += generations
    total_time += dt

    print "TRIAL: %3d" % (i+1), "GEN: %4d" % generations, "TIME: %7.2f sec" % dt

fmt_str = "   AVERAGE GEN: %4d TIME: %7.2f sec/trial"
print fmt_str % (total_generations / num_trials, total_time / num_trials)

I had to change it's name to weasel-driver.py because SWIG generates its own weasel.py file. The result is is a Python program that is only slightly slower than if written in pure C.