For the past two weeks or so I've been hacking on a chess rating machine learning competition at Kaggle. The objective for the competition is to come up with a rating+prediction scheme that can outperform the standard system used for this type of thing called the ELO rating system.I've been having a good time, hacking together a codebase with a bunch of highly optimizing rating systems as well as a bunch of machine learning techniques. I even managed to reach the top of the leaderboard eariler this week and hold it for a bunch of days. The competition finishes on Monday 15 November 2010, so there is a long way to go yet.
There are a bunch of chess rating systems, and I've had a crack with ELO and Glicko, although there are others like TrueSkill to have a play with. Big wins for me have come for denormalizing the data+highly tuned rating system data and running it through elaborate machine learning techniques.
I've got all my code and data analysis on github and intend to release it after the competition ends (or I get board of it). I've already released a cut down version of Glicko as a demonstration for budding chess rating system hackers.
Some lessons I've re-learned so far in this competition:
- Unit testing is critical for trust in the code (maths!).
- A robust test harnesses is critical for offline evaluation (crossvalidation of ideas).
- An audit trail of experiments and changes is critical (experiment log and source control)
- Keep code modular and extensible.
- Read read read: papers, wikipedia, own notes, graphs of the dataset.
Good times. Kaggle is a great idea (and a Melbourne company!)



0 comments:
Post a Comment