The following graph summarizes all submission RMSE scores I have made to date. You can see that the last 10-15 submissions have focused on tuning my best performing method, and the small 4-submission bump before the final stretch as I perfected the implementation of my current best method.
The competition recently passed the half-way mark and interestingly, the #1 submission has remained unchanged and unchallenged for nearly a month. There are some efforts in the forums interested in modelling the relationship between the results from various offline test sets to the leaderboard, which I have participated in mildly, contributing numbers and calculating Spearman's and Pearson's correlation scores.
My general competition strategy has been:
- Data analysis to learn about the domain and specifics of the datasets
- Implement a range of off-the-shelf (described in papers) rating systems
- Systematically tune each rating system in turn, taking each to a logical conclusion on the dataset
- Apply meta-models (ensemble methods) using contributions from the better performing models.
In the week I have been thinking/reading into ensemble methods, testing various bagging and boosting methods, and preparing some ground work (analysis) for some larger blending experiments. Ultimately, the success of any of these tangents is contingent on the correlation between offline test harness scores and the submission RMSE. I have also done some more advanced modelling of my own test harness and found its relationship to the submission RMSE scores worrying.
Nevertheless, I'm having a lot of fun, and have even started to replicate some of the results of Jeff Sonas methods contributed in the forum (Jeff is the guy behind the competition). My workbook is becoming so large that it is turning into a website or a dissertation - which is kind of cool. I'm learning more than I ever wanted to know about rating systems, but I'm also refining my methodology for doing systematic data analysis and managing/mitigating uncertainty of my off-line testing.
I'm looking forward to participating in future Kaggle competitions.
