Weasel Program -- Python Version

I recently stumbled across a type of thought experiment that is both very simple to simulate on a computer and yet very interesting to study and think about. That's too good to pass up!

It's called The Weasel Program. Its "aim is to demonstrate that the process that drives evolutionary systems -- random variation combined with non-random cumulative selection -- is different from pure chance." It's very a very basic example of a genetic algorithm.

(I've had a lot of fun in the past with another class of computer optimization techniques based on simulated annealing, but I have to save that story for another day.)

It also turns out that Dawkin's Weasel has created a bit of a controversy between certain teleological argument believers, particularly Intelligent Design types, and certain outspoken scientists. See The War of the Weasels. I see the Weasel controversy as a very simplified (and so perhaps instructive) version of the difficulties encountered in the Global Climate Model verification and validation debate.

But before getting into that and possibly going off into the deep end of the pool, I know I need to start off carefully and come up to speed by actually implementing a version (or two or three) of The Weasel Program. And then playing around until I get a feel for the numbers.

Of course, there are many examples of Weasel programs in various computer languages on the web. I could just download something. But to really understand code, I have to write code.

I've been coding a lot lately in Java and PHP. Practical languages -- but personally not very exciting or pleasing to write in. So as a chance to keep fresh in other languages I know and like to code in a lot more, I decided to implement a Weasel in Python. Here it is:

'''Weasel Program -- evolutionary simulation demonstration

A simple computer simulation demonstrating that the process that drives natural selection
(based on random variation combined with non-random selection)
is qualitatively different from results based on pure random selection.

See Wikipedia for more info on Weasel Program.

Given:
    Genome modelled as string representing list of genes modelled as chars.
    Target genome is a fixed string of chars.
    Fixed mutation probability.
    Fixed number of children per generation.
    
Basic algorithm:
    Generate children from single parent keeping most fit as new parent
    for next generation.
        
    Stop when child genome = target genome
    
    Note: to save space, child is discarded immediately if not most fit
    in generation so far.

'''
import random
import time

# Also implementing this in other languages, so keep track of performance.
start_time = time.time()

# Adjust the following to see how they affect 
# number of generations and time to convergence:

TARGET_GENOME = list("METHINKS IT IS LIKE A WEASEL!")
GENES = " ABCDEFGHIJKLMNOPQRSTUVWXYZ!"  # each char is a gene
MUTATION_PROBABILITY = 0.05
NUM_CHILDREN_PER_GENERATION = 100

# Convenient to define here:

GENOME_LENGTH = len(TARGET_GENOME)  # how many genes can match?
PERFECT_FITNESS_SCORE = GENOME_LENGTH  # all genes match target

# The mechanism of evolutionary selection.

def calc_fitness(child):
    '''How many positions match the target genome?'''
    fitness = 0
    for i in range(GENOME_LENGTH):
        if child[i] == TARGET_GENOME[i]:
            fitness += 1
    return fitness

def create_child(parent):
    '''Allow chance of mutation.'''
    return [(random.choice(GENES) if random.random() < MUTATION_PROBABILITY 
               else gene) for gene in parent]

# Initialization (the first generation is just a random string of genes)
generation = 0
surviving_child = [random.choice(GENES) for _ in range(GENOME_LENGTH)]
max_fitness = calc_fitness(surviving_child)
print "%3d" % generation, "%3d" % max_fitness, "".join(surviving_child)

# Then evolution proceeds as series of successive generations,
# using random mutations and survival of the fittest.
while max_fitness < PERFECT_FITNESS_SCORE:
    parent = surviving_child[:]
    generation += 1
    max_fitness = 0
    for _ in range(NUM_CHILDREN_PER_GENERATION):
        child = create_child(parent)
        child_fitness = calc_fitness(child)
        if child_fitness > max_fitness:
            max_fitness = child_fitness
            surviving_child = child

print "%3d" % generation, "%3d" % max_fitness, "".join(surviving_child)

print "Elapsed time: %7.2fms" % ((time.time() - start_time) * 1000.)
 
The program runs in about 100ms +/-50% on my desktop machine depending on how lucky the mutations are for that particular run.

One of the things I like about Python is how easy it is to read. Even for someone who is not that familiar with programming in general or Python in particular. But there are a couple of places in the above program where the syntax might seem confusing. One is where generators are used, for example:

surviving_child = [random.choice(GENES) for _ in range(GENOME_LENGTH)]

surviving_child is a list of random genes of length GENOME_LENGTH. The underscore (_) is just a common way of indicating that the for loop's index is not being used. And the generator's syntax can seem strange since it seems to put the body of the loop before the for keyword.

Another place where Python may be confusing is its form of what C-like languages call the ternary operator.

random.choice(GENES) if random.random() < MUTATION_PROBABILITY else gene

In a C-like language, this expression would be:

random.random() < MUTATION_PROBABILITY ? random.choice(GENES) : gene

No comments:

Post a Comment