Tuesday, May 17, 2011

House Price Regression: Vermont South, Melbourne

While looking for a house I maintained statistics on the main suburbs we were visiting, and more specifically, on the houses we looked at.

Each house we visited was about 4 bed rooms and generally had the same kinds of attributes - attributes we thought we wanted in a house. For each house we visited, I recorded the address, size of land in square meters, date of sale, sale type (auction, private), asking price, sale price, and other assorted details. An additional contrived measure was the driving distance to main shops reported by Google Maps, in kilometers.

I also supplemented the dataset with additional matching houses in the area when data was available. I found prices were sometimes available from the auction results, although in other cases I had to call to find out, and.or scourer the web.

Rather than let this information go to waste, I thought I would share some of the collected data. This post provides data I collected for the Melbourne suburb of Vermont South.

The following graph simply shows sale price by date, quite boring.

The following graph shows the sale price by land size in square meters.
The following graph shows the sale price by distance to a specific set of shops in kilometers.
I found the data generally useful for plugging in new places and using simple linear regression to help answer questions about expected price at auction or private sale.

Some of this data may be available for purchase from various retail data providers, but I found collecting and entering the data myself made it a lot more personal and gave me some additional focus when inspecting properties and talking to agents about trends.

Monday, May 16, 2011

So, we bought a house

So we finally bought a house. We've been looking on and off for about 12 months although things got serious about 3 months ago.

We first looked at the place we bought last Saturday, and walking in the door I knew it was a strong contender. We looked at three other places that day, and they all paled in comparison.

The place had been passed in at auction nearly three months before, and we were told that initially the vendors expectations were too high. We saw this as a good opportunity to negotiate and try to broker a good deal. The market had been slumped for a few months and the early figures for the quarter had indicated a ~2.5% drop in median house price for the city.

We did another inspection on the following Tuesday and enlisted all of the troops (extended family) to give the place a good once over. We then sat down and signed a formal offer. It was rejected. We upped the offer $5K and it was accepted, although at the insistence of my wife we made the offer contingent on the outcome of a builders inspection.

A found a company in the yellow pages and had the inspection done on the last day of the 3 day cooling off period. The building and pest report was incredibly detailed, providing photos and a room-by-room summary, inside and out. We learned a lot about the types of preventative maintenance the place will need over the next 5-10 years, and more importantly, we learned that the upstairs balcony had some major structural problems.

The report said that the wood used was popular in the decade that the balcony was built and was known to rot unless property treated. Rather than expecting the vendor to return the balcony to new condition, we made an offer to split the difference, deducting half of the cost of the repair from our offer.

All of the negotiating occurred on the last day of cooling off period, a Friday. I had my wife on one had, adamant that she didn't want to pay a thing to have the balcony fixed, and the agent on the other hand threatening to open the property for inspection on the next day. I really liked the place and I was feeling totally strung out (to say the least).

We managed to broker a deal in the end and initial the final amendment on the Saturday, one week from our first inspection. With previous auctions and negotiations, I tried to remain emotionless, time was on our side and we could wait for a deal. I really liked this place and it was beginning to dawn on me that our remaining time to find a place (before the baby came) had shrunk to a matter of a few months. We're both happy we finally got there and have high hopes for turning the property into our home.

We learned a lot throughout the process. My analysis of median house prices, suburb selection, crime rates, and even travel time studies months ago were interesting, although in the end did not directly affect the outcome. Even the detailed suburb house price regressions I was building up were not used, as we ended up buying in a completing different suburb, inspecting the house on a whim.

I was told early on that buying a home is different from buying an investment, and it bit me in the end, because its emotional. If/when there is a next time, at least our expectations - that it is a long hard emotional roller coaster - will mean we'll be better prepared. Hopefully.

Rather than letting them go to waste, I'll post some regression analysis for a selected suburb soon.

Tuesday, May 3, 2011

Quake AI Programming Book

I intend to write a follow-up book to the Nature-Inspired Clever Algorithms book on Machine Learning. I have a lot going on this year, so I was thinking of postponing it until 2012. If I do decided to go down this road, I was thinking of taking on a different project in 2011 that would be smaller in scope, less taxing, although still interesting and rewarding.

I have been thinking of writing a book about the AI in the Quake series of computer games. I was thinking of either writing a book that analysed the Artificial Intelligence architecture in each game in the series, or analyze the AI in the bot modifications. Perhaps both. The book would walk through monster or bot case studies and describe how they fit together, think, and behave. Perhaps with small experiments and demonstrations along the way. The kind of book that would have captured me as a game programming hacker 15 years ago.

In pondering this idea, I thought it prudent to explore other books written on or related to this idea. The following is a list of books that I found:

Quake Series Programming Books

Related Programming Books
These are by no means the cream of the crop of game AI programming, and there are in fact many level design books in there as well.

All of these books are focused on teaching some form of programming or game development using an existing game as a medium. The advantage of the Quake series is that the source code is released under the GPL. The Unreal series and the Half-Life (Source Engine) series are not released as open source, although do provide access to some aspects of the source under restricted licence for the modding community.

It is clear that there is interest/demand for books on game development based on the Unreal series, which makes a lot of sense given their general success in licencing the technology.

Some concerns about tackling such a project include:
  • Interest: The games in the Quake series are old (10-15 years). The methods may be outdated, they may not be relevant to modern computer games, and it is more than likely that no one will care
  • Low Barrier: It is more than likely that no one has undertaken such a project because the barrier is so low. One can simply read the code and understand what is happening, no analysis is necessary. 
  • Copyright: Although the source code is released under the GPL, the game assets are not. One may have to acquire a licensed copy of the game to do any meaningful development. Additionally, my use of game screenshots may be restricted (fair use!?).
There is some effort required to produce such a work. Getting each project setup may be involved, especially across the three main platforms (Windows, Mac, Linux). The work would be primarily analysis: reading source code, experimenting and communicating what is happening with diagrams and descriptions. This tinker-write cycle is slightly more relaxed than the deep research needed for each algorithm in a machine learning text.

Is there interest in the market? Would you read or skim such a book?
Let me know what you think in a comment or email.

Monday, May 2, 2011

The Little Taxonomist

We are expecting our first child in a bunch of months and I have been thinking about all kinds of science experiments to perform on/with the bub. I had an idea last week for what I think is a cool little web app that allows a a parent and their child to catalogue the native species around their home and learn more about their local environment. I am referring to this idea as "Little Taxonomist".

I'll present the idea in the context of some stories:

Story #1
A father and son are curious about the plants and animals in and around their house and neighbourhood. In an effort to learn more, they decide to begin to catalogue the things they see in their backyard. They select a subject (say a flower, tree, or insect), photograph it, and note down a few descriptive phrases. They enter this information into a web application. The web application accepts the image and structured description and makes informed guesses (based on location and time of year and subjects collected by others) as to the exact species of the subject. The web application also suggests interesting subjects that are known or expected to exist in their area (probabilistically based on entries of other entries in the area), and some information about where they might be found, creating a context sensitive scavenger hunt. Slowly, a subject every few days, more on weekends, over weeks, they build up a catalogue of plants and animals in and around their home. Together they have learned more about the specifics of the suburban flora and fauna.

Story #2
A primary school class have a week or month long assignment that is a scavenger hunt of genera and/or species in the school ground. The class is split into pairs or small groups and allocated a flip camera (or equivalent single button point-and-shoot). They have a list of hints or requirements as well as blank pages, structured to capture basic taxonomic descriptions. Students return to the classroom and use the computer to copy the photos from the camera, drag them into the web application and add their descriptions. Students are awarded badges and points for the breadth and depth of species described, teams are ranked, and some indication of what other groups are finding is provided.

Vision
A web application for children to be completed in groups or with a guardian. The objective is to describe subjects in the local area (home or neighbourhood) and in so doing learn more about the local flora and fauna. The system encourages the collection of species by intelligently guessing based on brief descriptions and photos as to the actual known species. A gamaification layer is provided that includes badges, points, leaderboards, and similar extrinsic motivators. Additionally, the system uses the localized information in aggregate to suggest subjects to look for (to "collect"), and a probabilistic expectation that they can be spotted (you have a 90% chance of seeing a fruit bat between 5pm and 8pm by looking up). This probabilistic understating of what can be seen in the local area would be coupled with the gamification system, highlighting rare finds. The system would provide all data in aggregate (anonymize) allowing kids to explore what others are finding in their neighbourhood and the types of descriptions being used.

The following are some mock screens I hacked together in Google Docs:

Mock: Add Subject

Mock: List of Subjects


Forget kids, I want to use this. There might be a general case for adults with smart phones.

I am not sure whether I will build it yet, I figure it needs a 5-10 year old child to make it fun. I figure it could make money by selling some cheap cameras with a website subscription or maybe targeted advertising (kids+science).

A friend pointed me to Project Noah, which is a similar idea, but not the same.

I'm eager to hear what people think, seen anything like this? Would you use it yourself or with a child?

Sunday, May 1, 2011

May Challenge: Touch-type Faster!

We learned touch-typing in high-school. I think we did it for two or three years. Nevertheless, I suck at touch-typing, and as a programmer it may be considered embarrassing.

I can type without looking at the keyboard, so technically, I touch-type all day long. What I don't do is type as I was taught was "correct". I think that as a consequence I am starting to get some wrist pain.

Anyway, in the constant battlefield of self-improvement, I thought I would take the month of May as an opportunity to improve my touch typing and hopeful start typing "correctly" from the end of May onward in day-to-day computer interaction.

The May Challenge is to perform one lesson in touch-typing each day and to measure a standard typing word count each day. Hopefully this word count and/or accuracy will improve by the end of the month and my confidence in correctly touch typing will also improve.

At this stage, I believe I will use the free online service www.typingweb.com because it provides lessons and ad word speed tests, and of course it's free. I just took a test using the "correct" home-row based method and scored: 18 WPM average, 23 WPM gross, and 96% accuracy. It would be interesting to see what my score would be with my unorthodox method - a bad idea, I suspect it may reinforce the bad method.

Again, as with April, I will adopt a penalty-based approach to the challenge. For each day that I miss, I will have to donate $20 to an open source project of choice (not a .NET project as with last month). This is less relaxed (I support opensource, generally), because I am concerned that commitment required may mean that I miss a day or two here and here. We'll see. $20 a day is still a decent disincentive.

I'll be sure to summarize progress at the end of the month.

Image copyright Wouter Verhelst.

Saturday, April 30, 2011

April Challenge Over: Watch One Tech Video Every Day

I set my self a challenge for April to watch one technical video every day throughout the month. My rationale was that if I could apply the same discipline that I use to discriminate what I eat to what media I consume, that it would have a beneficial effect. Sure, fluffy, but the challenge was measurable and the penalty for missing one day was a donation of $20 to a .NET opensource project.

The month has come to and end and I did manage to watch one technical video each day, so no donations needed.

I intended to spend most of the month watching university lectures and Google Tech Talks, which I mostly did. It became harder and harder towards the end of the month to find an hour+ to watch a tech talk. I ended up catching a quick (15 minute) TED talk instead. TED talks are good (some can be great), but they are so brief that my retention is poor, and so high-level that I finish thinking that I have not learned very much at all.

The following image provides a breakdown of the sources for videos I watched. The "other" category includes random tech videos on youtube that do not fit into one of the other broader and popular categories.

I enjoyed almost all videos. I ranked each with a real-valued scoring between 0 and 5 and provided my own description in a spreadsheet. Five real highlight videos (in no particular order) were as follows:
I found that I read a lot less RSS feeds in my Google Reader. I also found that I consumed a lot less 'acquired' media in the form of US TV shows (selling the family media player helped here, no doubt).

I had a good time with this challenge and will attempt to stick to it and record my progress in a spreadsheet.

Friday, April 29, 2011

AIFeeds Part 4: Automating an AIFeed aggregator

In Part1, we prepared a large list of RSS feeds and filtered them down to something workable. In Part2, we processed all of the articles in the feeds and presented posts from the last five days as a static RSS reader. In Part3 we used a number of JSON APIs for social networking website to gauge the popularity of articles and highlight those popular posts at the top of the page on our static reader.

In this 4th and final post in the series we will explore different ways to disseminate the results of our filtering.

Step1: Send via Email
Last time in Part3 we ended up with a result that was quite passable. The scripts generated an HTML page that promoted popular AI, Data Mining, Machine Learning, etc. articles from the last 7 days, followed by a listing of all those other articles that, though were deemed to be less popular, may be of interest - organized by day. A simple approach to disseminate the results of this script is to send an email. The objective here is to receive the equivalent of an AI-themed version of the most excellent Hacker Newsletter.

I have a Google Gmail account and I assume most programmers do. The first step is to prepare a script that can generate an HTML email message and use the Gmail SMTP server to send the email. We are not focused on mass distribution here, just emailing the results to ourselves, at the moment, on demand.

The built-in SMTP handling in the Ruby standard library more than meets our needs here. The Gmail SMTP details are also easily obtained. The result is a script with two simple functions: the first for building a standard SMTP message with support for text/html content (mimetype), and the second for connecting to the Gmail SMTP server and posting the email. The script provides a spot test that will ask for your Gmail credentials and use them to send you a hello world email. Easy as pie.

See sendemail.rb
See below for a screenshot of my Gmail inbox with the resulting test email.

Step 2: Cron Send Email
The next step is to prepare a script that can be executed each day, generate a summary of the AIFeed output and email it to you. The easiest way to do this on any Linux or Mac machine is with cron.

A variation of the listpopulardayarticles.rb script from Part3 is used as the basis for the email. A new function is defined that generates the html content of the email and sends it. The script accepts two parameters on the command line: a gmail email address and a gmail password. These credentials are then used to send the email using the script prepared above.
See dailyfeed.rb
To execute the script we can create a shell script that contains the call to the script and the Gmail credentials used to send the email. For example, the shell script may be called run_dailyfeed.sh and look as follows:

#!/bin/sh
cd /path/AIFeeds/part4/
ruby dailyfeed.rb [email] [password] >> /path/AIFeeds/part4/dailyfeed.log

The script is three lines: a shebang, change directory to the script location, and a call to the ruby script with parameters. Replace /path/ with the path to your AIFeed directory, and replace the [email] and [password] with your gmail login details. The output of the script (and any errors) are output to a new log file dailyfeed.log.

The crontab for the current user can be opened as follows:

crontab -e

Add an entry that looks something like the following:

00    5    *    *    *    /path/AIFeeds/part4/run_dailyfeed.sh

This is all one line with tabs in between the fields. Again replace /path/ with the path to your AIFeed directory. Cron will execute the shell script once each day at 5am local time.

The following is an example of a resulting email sent to my email account.

Improvements and Extensions
This section summarizes possible improvements and extensions to this part in the series.
  • Cron is an easy way to schedule a task on your machine. A better approach would be to set this up on a server (such as Heroku, AppEngine, or AWS) and send an email to an email list (via something like MailChimp).
  • An interesting extension to this project would be to turn the output into a webpage that is re-generated every hour or so. This might provide a useful diversion to reddit and hacker news, with a targeted corpus of links to scan over.
In this fourth and final part in the series we have hacked together simple script to send email via Google Gmail and scheduled the script to email ourselves a list of popular AI articles each morning at 5am. Not bad for a few days hours work. Sure, there are some rough edges, but the result is entirely functional and I think useful.

If you would like to see this as a service or perhaps a website, drop me a comment or an email. I'd be happy to clean it up further and automate it for a broader crowd if I knew that others as passionate as me about AI and Machine Learning were interested!

Don't forget all code and data for this series is available on the AIFeeds github project.