I've been busy in the last month, spending the majority of my spare time working on the Kaggle chess rating machine learning competition. In that time most of my efforts have been focused on basic data analysis, implementing a number of additional techniques, and being very systematic about tuning models. I'm recording all experiments in workbook which is checked into version control with the source code. I've managed to claw my way back into the top 10 and am currently sitting in 6th place (out of >190 teams).

The following graph summarizes all submission RMSE scores I have made to date. You can see that the last 10-15 submissions have focused on tuning my best performing method, and the small 4-submission bump before the final stretch as I perfected the implementation of my current best method.

The competition recently passed the half-way mark and interestingly, the #1 submission has remained unchanged and unchallenged for nearly a month. There are some efforts in the forums interested in modelling the relationship between the results from various offline test sets to the leaderboard, which I have participated in mildly, contributing numbers and calculating Spearman's and Pearson's correlation scores.

My general competition strategy has been:

  • Data analysis to learn about the domain and specifics of the datasets
  • Implement a range of off-the-shelf (described in papers) rating systems
  • Systematically tune each rating system in turn, taking each to a logical conclusion on the dataset
  • Apply meta-models (ensemble methods) using contributions from the better performing models.
I'm using a hand-rolled Java framework, JUnit for unit testing all the maths I can, Jakarta Commons-Math for basic stats, WEKA for regression models, and OAT for stochastic global optimization methods.

In the week I have been thinking/reading into ensemble methods, testing various bagging and boosting methods, and preparing some ground work (analysis) for some larger blending experiments. Ultimately, the success of any of these tangents is contingent on the correlation between offline test harness scores and the submission RMSE. I have also done some more advanced modelling of my own test harness and found its relationship to the submission RMSE scores worrying.

Nevertheless, I'm having a lot of fun, and have even started to replicate some of the results of Jeff Sonas methods contributed in the forum (Jeff is the guy behind the competition). My workbook is becoming so large that it is turning into a website or a dissertation - which is kind of cool. I'm learning more than I ever wanted to know about rating systems, but I'm also refining my methodology for doing systematic data analysis and managing/mitigating uncertainty of my off-line testing.

I'm looking forward to participating in future Kaggle competitions.
    Loading