Friday, December 18, 2009

A Genetic Algorithm in Google's Go Language

Like many programmers, I tracked the announcement of the Go Programming Language by Google last month, and like many programmers, I've been having a play around with it. Because of the newness of the language, there is very little code to look at to get a feeling for the language. I spent what time I could find looking through the code for the golang API in an attempt to gleam useful idioms. I've started a learning-go project for myself on github to collect links and code from tutorials and examples I come across.

Beyond the syntax, I'm interested in exploring how to make effective use of goroutines and channels in program design. I've spent some free time over the last week or so writing a simple genetic algorithm in go, and now that it compiles and looks ok-enough to talk about, I think that making good use of this language requires a slight shift in thinking about an application toward flows of information.

Having spent so many years researching computational intelligence and evolutionary computation, the genetic algorithm is burned into my brain. It's one of the first simple non-trivial programs I like to build to play-test a new language or platform (for example, see a genetic algorithm in lua from earlier this month).

The algorithm implemented is the classical Genetic Algorithm (GA) applied to the onemax problem, where the system must induce a bitstring of unity (all ones) from a cost function that only specifies the number of ones located anywhere in a candidate solution without specifying where the errors are located. The algorithm uses a bitstring representation, one-point crossover, point mutations, and tournament selection. The file is located in my learning go project, or you can access the go file for the genetic algorithm directly. See below for an embedded gist version of the file (sorry to those in RSS feed readers that don't like javascript).



I started out by trying to build an object-oriented version of the algorithm, but shifted to a procedural implementation for simplicity. Go does have objects, but they involve defining a struct for data and associating functions with the struct. Inheritance is not provided in favor for a compositional pattern. The language is young but it already has (what initially feels like) inconsistent rules for things like semi-colons and multiple ways for doing the simplest things like variable declarations. Nevertheless, the script-language like feel quickly becomes comfortable (compared to some ANSI C which I've been using at work recently).

My algorithm implementation is not optimal or efficient, and it's not event that pretty. What it does show is basically a port of a procedural implementation of a genetic algorithm in go language syntax. The code has examples of objects (solution), multiple return values, some library usage (data structures and random numbers) and some of the concurrent features. A genetic algorithm is embarrassingly parallel, so there are numerous ways to implement the algorithm to exploit this. I chose a simple step-wise multiplexing of operations like fitness evaluation, selection, and reproduction that used goroutnes to issue the work to execute in parallel and used a channel in a barrier pattern to wait for and collect the aggregation of these tasks.

I really like the ease with which one can manage tasks in parallel without resorting to explicit synchronisation blocks and patterns or booting threads. What became apparent while lashing my GA together was that this ease of parallelism promotes a different structure to programming. Rather than writing a linear procedural application and marking-up the parallelisable parts of the code after the fact, one should better design a program to exploit the parallel constructs (no shit!). This is obvious in retrospect, and would have been if I'd sat down and actually designed a parallel simple genetic algorithm.

This leads me to thoughts about the flow of information in the program. These observations should resonate if you've worked with message passing or event-driven infrastructure. Consider the prime number filter provided on the go website (or my implementation), it provides a great example of this type of thinking where goroutines manage the branching in the flow of data down through the filters, and channels bring the results back to where they are needed. The same pattern may be adopted for the GA, where each iteration of the algorithm may be represented as a pipeline of channels and data reservoirs where goroutnes consume, compute, and produce moving candidate solutions through to the next iteration. If I find some time, I'll wire up an example.

Enough from me, go play!

4 comments:

defdac said...

How come you have big letters in the beginning of function names? (I'm new to Go)

Jason said...

Caps are used for function names that are exported from the package. Lower case functions are not exported. I believe the caps at the beginning of function names are used as an access modifier (public/private).

In the case of my GA, its an example that is not intended to be used as a library, so I'm betting convention would be to use lower case - like main().

Patrick said...

Jason,
Wonderful explanation and example. Thanks for putting this together. It was a fun read and I am tinkering with your code. This language has a lot of potential for our work (large scale simulation).

JJ said...

Man, you seem to follow the same I algorithm I do for learning new languages: program a genetic algorithm in it. Will Scala be next?