Monday, June 23, 2008

Sprucing the OAT Software Page

One of the many projects I maintain is the open source Optimization Algorithm Toolkit (OAT). I developed this codebase over the course of my PhD (2005-2008) in the field of computational intelligence (metaheuristics), and as such the focus of the problems and algorithms codified in the project is biologically inspired optimization.

I spent today sprucing up the software page for the project. I created a new PHP version of the site, transplanted the content from the old static HTML page to the new version, and made the landing page attractive (in the 37signals sense).

The following screen shots provide a rough comparison, old on the left, new on the right:

I am defiantly not a master of the interface, although I'm satisfied with the upgrade. While putting the site together, I got to thinking about how I could improve the OAT project. I have a soft spot for the library given my huge time investment and my reliance on the software for my empirical-result focused dissertation.

By far the biggest improvement I can think of was documentation, specifically tutorials for using and extending the platform. As such, I intend to spend spare time over the coming months writing algorithm-centric and problem-centric tutorials for OAT in an effort to improve the usability and raise the profile of the project.

Monday, June 16, 2008

Re-emerging Online Life

My PhD is wrapping up and over the last few weeks I have been slowing re-integrating myself into a social life. A big part of this has involved moving work flows and processes typically relegated for my work PC, online. I am really happy with the current state of of this migration, so I thought I would capture exactly what has been migrated and/or reawakened, and my thoughts.

  • Email and Calender: I have a gmail account, and have started forwarding all email to this single account. Previously I used outlook on my locked-down work machine that retrieved email from my work account and gmail, as well as managed my calender. The centralised web-based email has been great, I check it less often, have desktop notification, can access all accounts easily from home, and most importantly google is catching all the spam I used to have to delete manually on my work account. The calender is also extremely convenient, specifically home access, email notifications, and integration of third party sources.
  • Instant Messaging: I excluded myself from IM whist studying the PhD. I have re-entered the arena, firstly with google talk which was functional and archived my messages in gmail (nice). I moved to Trillian for a while to get cross-platform capability (MSN and Jabber), and am now with Digsby. Digsby is by far the best solution (so far), integrating all platforms, as well as twitter, facebook, myspace, and handy notifications. I still use Adium on my mac at home (no Mac Digsby yet), and MSN support in Digsby does not work through my work proxy (although Digsby tech support assure me that this feature will be pushed out in the next release).
  • Sharehouse Management: I have lived in sharehouse with three others since the start of my degree. I've always managed house finances and used a spreadsheet, firstly on my desktop, and later on google docs. I recently signed the house up with a free account on mysharehouse.com.au. So far, this webapp has been awesome. It facilitates easy bill, expense, and shared task management and automatically emails/SMS members when it's their turn or when they owe money. They require SMS credits to be purchased and have an occasional ad, although it is not clear how the site makes money. I'd also really love it if I could integrate the sharehouse calender with my google calender.
  • Project Management: I am looking to collaborate with a team remotely, and have been seeking a lightweight project management web application. We went with basecamp, and so far at this early stage of collaborating on ideas and schedules it is meeting our needs. It's a bare bones tool, and it seems that it will meet our future needs nicely. Thankfully, this app does facilitate integration of the calender into google calender. Previously project management and brainstorming have been stuck in email and shared google docs, both poor excuses for this task.
There are still many tasks which are tied to my work PC, that could be moved to my laptop although I'd prefer to move into the cloud. Some items in this list maybe a little idealistic, which is good as they may motivate some budding innovator.
  • Note Taking: I still take notes in word documents. I have tried google notebook, although I didn't really like the feel or flow (I could try again). I also use pen and paper for daily to-do's. I like the tactile acts of writing and crossing off, although this too could be moved to one of the many to-do list solutions.
  • Music Management: My MP3's are all still stored on an external HDD. Further, I use winamp with a plug-in to manage music on my iPod, a device I use daily for podcasts. Collectively this solution sucks, and although winamp is faster than iTunes, it is really ugly and clunky. I want my music in the cloud, and I want to listen to it through a browser. I also want to manage my iPod and podcasts through a browser from any PC (iTunes online? podcast manager online?).
  • Paper Archive: Over the last 4-5 years I have amassed a huge multi-gigabyte research paper archive. I basically store every paper (PDF/TXT/Word) I obtain (and sometimes read) either through the web or via correspondence (see my report on PhD practices for more information about this archive). The archive organisation is pretty haphazard, although I have found it invaluable for jump starting basic research. An ideal solution would be to automatically source bibliographic information of each paper and archive them as a list on citeULike or equivalent.
  • Backups: I still archive my local files using winzip and or winrar and archive them on network drives, locally, and on external media (CD/DVD/HDD). I need to get around to backing up to the cloud. I know there is an emerging industry for this backed by services like AWS.
The state of software development, whether it is for research, a side project, or for a larger team project has been solved of a while with the use of off-site source control. I'm happy with this aspect of my desktop work flow, and use services such as sourceforge (and more recently google code) for open source projects and potentially many subscription-based services for private work. I also like the emerging trend of moving IDE's online as well, such as in Heroku and others.

That is about the extent of the big-ticket items, I'm sure there are many more less frequent tasks and work flows tied to my work desktop that I have left out.

Thursday, June 12, 2008

Lightweight Project Management WebApps

Some colleagues and I are considering working on some projects together. We are in the same city, although expect to collaborate on the projects mostly remotely. As such we need a lightweight and cheap web-based project management application. No doubt, a classical situation and a classical manifest need.

We don't need many features other than cross & per-project collaboration on discussion, tasks, time sheets, calender, and milestones (maybe some more?). At this stage there are a handful of us, and we have an expectation of at least four core projects for at least 4 months, minor needs my any standards.

Email is the first-pass solution, proving persistence and threaded conversations. It sucks because the structure of conversations is so limited. The natural next step would be to employ a host of free services, likely Google docs+calender (or equivilent) where spreadsheets are used for time management, tickets, to-do's, etc, and core information is persisted in documents. This is obviously a better solution given the variation in granularity and structure, and it's free.

I'd rather a specialised and integrated solution over a hodgepodge of online ad hoc documents. The next natural step are specialised solutions. The first batch of lightweight project management web applications our team came up with based on experience/exposure were as follows (priced for our needs):

  • Basecamp: ($24 USD/M) Slick, simple, and lots of press/fanboys.
  • Goplan: ($10 USD/M) Really simple, basic features are all there, not as pretty as some other options.
  • Harvest: ($40 USD/M) Looks good, same old set of features + invoicing
  • Zoho: ($8 USD/M) Feels more like a forum with additional features, still good
  • Liquid Planner: ($70 USD/M) Really slick, heavy, feels more like a desktop application
  • FogBugz: ($100 USD/M) (on demand) Project management with a software engineer focus
In terms of features, they all provide a reasonable fit, some heavier than others. I like the basecamp solution (maybe even backpack), although I suspect only because I'm interested in assessing it against it's press. I also like FogBugz, but not their on demand pricing.

While checking out each solution, it was clear that a "demo" account is the best way IMHO to promote your application. Sign-ups with a trial may be better for customer retention (my assumption), although for drive-by assessment, demo accounts (provided by goplan and zoho) were really useful. I also found very few 'round-ups' for this class of application (see this ask YC). I'm sure they're out there, so where are they?

The final step would be to take on the responsibility of the application ourselves. The best course in this case would be to select an opensource project or buy an application and take on the hosting concerns (maybe AWS EC2). I really don't like this final case. Although it has the potential to be cheap, I have been burned by this approach of self-managed webapps in the past (environments, patching, and eventual disaster).

Tuesday, June 10, 2008

LibraryThing: A WebApp Inspiration

I remember reading about and signing up to LibraryThing sometime in 2006. The site was simple, functional , and I liked it. At the time I equated it to del.icio.us for books (it is typically referenced as the flickr for books), taking on a specific task for which I generally used Amazon and wish lists. With my recent thoughts on building web apps, I had some notions about a books and recommendations, and came back to the site.

LibraryThing is a virtual bookshelf management application, with natural social extensions for organisation, discussion, and recommendation. Basically, if you read and have a modest personal library, then it is defiantly an application that you want to try, and most likely subscribe.

As a case study web application, LibraryThing is cool (interesting) for a number of reasons:

  • I was hacked together by a lone developer in a month (circa 2005)
  • It was hacked together to 'scratch an itch'
  • It was designed to make money from day one
  • It is narrow
  • It is successful
Check out an interview with Tim Spalding (creator), the Wikipedia entry and generic TechCrunch, CNN Money, and NYT.

Success undoubtedly came from the real need (for both the creator and users) that the narrow application fills. The fact that the inception and creation of the application are attributed to a lone web developer is inspiring. The point that I really latched onto was the smarts of Spalding to withhold some features and promote a simple subscription service.

Unlike the unbounded and useful bookmarking service delicious, the LibraryThing service imposes a simple limit on the number of books and displays ads, both of which are removed when the user buys a yearly or lifetime subscription. This is important, firstly because without this simple mechanism the site would likely have gone under (how did delicious survive? VC?), and secondly because people want to pay for a good service. As such, LibraryThing clearly is the flickr for books as it uses the same freemium model that flickr users on top of a great product.

LibraryThing, more than flickr, is an strong inspiration as a subscription-based web application for me. It's the kind of narrow problem I would like to solve, it is the model I would like to use to pay my rent, and it can be (was!) achieved by 'a guy like me'.

Wednesday, June 4, 2008

Value of Ideas

As a developer/software engineer/hacker you feel powerful with the knowledge that given time you can build anything you can think up. Regardless of the extent of truth in this belief, you take pride in learning and finding new ways for translating thought into systems. As such, you think that your only limiting factor is ideas, and hence you begin to value them, guarding your own and potentially borrowing others. Well, that's my theory...

I have considered this topic before in the context of ideas in academic research, although it naturally applies to business. When you start reading about the retrospective stories from entrepreneurs, the common theme is that the initial premise is irrelevant, that directions change, ideas shift and rapidly iterate. You are quickly educated that ideas alone are worthless, that there is no market for them, and worse still smart people can have really bad ones.

The lesson obviously is that just thinking stuff up and talking about it doesn't get you very far, that the real ideas occur and/or the real value is in the instantiation of ideas. The lesson for the self-believing all-powerful hyper-productive hacker maybe to keep building things and continually refine an understanding of problem identification and solving.

About a 18 months ago I brainstormed an idea for an idea's website with a mate. It was pretty lame, although it stuck with me because in the back of my brain I still value ideas. Anyway, I was pondering these things, and came across some interesting things. I learned about the general notion of an idea bank and ideation, and a host of public idea bank-style websites. Two standouts include halfbakery (see Wikipedia), and Cambrian House (see RWW). Also see this great list of idea bank based websites.

Halfbakery has a nice minimal design, clean per-idea layout, and lots of ideas. Cambrian House are more commercial, and make a business from funding good ideas. Reading over the ideas on these sites and those like them highlight firstly some really interesting and innovating thinking, although more generally reinforce the premise that the lack of value in ideas alone.

More Lessons Regarding Crowdsourcing

Give a deeper consideration of my trial crowdsourcing projects, I did a little more research on the topic and came across a post from the start of 2007 titled The "Dumbness of Crowds" by Kathy Sierra. The critical take-away for me was the need to differentiate consensus towards convergence from the collection of discrete and potentially useful information packets.

This point was driven home by a series of simple although poignant examples differentiating wisdom/dumbness, which I want to paraphrase so they stick:

  • Collection of book reviews at Amazon / Wiki-based collaboration on a book
  • Commented and tagged photos on Flickr / Collaborative editing of a photo
  • Input and ideas from varied perspectives / blindly averaging multiple inputs
  • Designing, voting and commenting on shirts at Threadless / community designed shirt
The premise is that individual contributions should be captured and preserved, not aggregated, converged or averaged towards consensus. This point is reiterated again and again in the post: "Art isn't made by committee", "Great design isn't made by consensus", "True wisdom isn't captured from a crowd".

The second point is that discrete contributions and convergence are two techniques used to address different problems. Crowd-based consensus may or may not be a good way of achieving a converged solution, but aggregation of contributions towards a converged solution is simply different to constructing a database of discrete human contributions.

Pigment clearly captures and maintains individual user contributions in terms of named colors, whereas the humanTSPsolver does not, aggregating used edges into an averaged incidence adjacency list. Pigment provides a system for building a human-powered database of discrete user contributions, whereas the humanTSPsolver seeks a crowdsourced converged solution.

This realisation (mistake?) suggests that it may be possible to phrase the TSP solving problem in such a way that the collection of discrete contributions from users can be used to solve instances. The current implementation jumps this question by assuming that an aggregate representation of contributions is the answer.

Tuesday, June 3, 2008

Human Powered Databases: Some Lessons Learned

A post on O'Reilly Radar today prompted me to think hard databases of discrete human contributions. The post by Tim O'Reilly made connections between ubiquitous computing and web 2.0. It was not the general topic that got me thinking, but rather the concise reiteration of the attributes of collective intelligence.

Tim clarified the breakthroughs of web 2.0 as the added meaning to existing data via algorithms, statistics, and meta-information rather than the addition of new data. He touched on the different ways of creating databases (classical Cornucopia of the Commons), focused on the shared-collective approach, and the methods of an architecture of participation where contributions are implicit and driven by the design of the system.

I have visited these concerns recently, and their discussion has motivated recent web application projects (humanTSPsolver and Pigment). Although these applications are small and exercises in learning specific frameworks and technologies, two things struck me:

  1. Both applications required explicit user contribution.
  2. Both applications focus on data collection only.
Importantly, both applications clearly demonstrate that construction of a human-powered database, even on a esoteric topics such as the travelling salesman problem and color names, is easy to do, although represents a primitive first step towards representing an application imbued with collective intelligence.

Considering the first limitation, implicit user contributions were considered although the explicit contribution mechanism was constructed first, and thus remains. This highlights that if such automatic user contribution mechanisms are desired, they must be designed and implemented first, sidelining the easier explicit contribution mechanisms.

An excellent automatic contribution mechanism for the humanTSPsolver application are games. Therefore, the web site should have been designed around the notion of small addictive games, user scores, and generally user experience. The aggregate contributions and any derived scientific value should have been relegated to a a small corner of the site. A good automatic contribution mechanism for the Pigment colour naming site would be to allow users to define and prepare colour profiles, perhaps for their own websites. Tagging of prepared colours would provide the colour name contributions. Again, this is a complete shift in the focus of the website.

I initially believed that the second limitation was an artifact of the early stage in development for both projects. I now think that claims of "it is unclear the use of such data will have until after well collect it" are bogus. I think that the full extent of insights and implications are unknown a priori (naturally), but to not think about and build first-pass tools for harnessing the data in aggregate is simply lazy.

Early experiments for the humanTSPsolver showed that feasible and complete tours can be constructed from aggregate contributions, and that such information can be used to seed probabilistic methods. The current site does not provide such primitive capability, rather focuses on a simple (although pretty) visualisation of contribution data. Now, after many thousands of explicit user contributions have been made, and relevant background research has been considered, there is scope for for more interesting applications, such as testing hypotheses about integrating multiple convex hull based sub-tours. An awesome research starting point for someone starting an Honors or Masters project.

The Pigment application does provide primitive first-pass tools for exploiting the contributions in aggregate in the form of searches for mapping human colour names to computer values, and computer colour values to human names. Naturally, these services should be promoted as webservices, and demonstration applications provided.

I would argue that seeking value in the data is the paramount concern of human-powered database applications. The application of statistics, development of algorithms, and linking of meta-data are the required tasks for firstly thinking about, and secondly figuring out what value and use the collected data has. Collection mechanism can start out rudimentary and explicit, and can be shrouded in user experience and marketed once it is clear what that value is and how it will be used.

Maybe, although there is also the school that suggests to open everything up and allow your users to define the use and value of your data.

Finally, it occurs to me that there are many models for such applications, not limited to O'Reilly's constraints. For example: (1) the classical selfish user experience where aggregate implicit contributions are provided as an additional related on unrelated user service (GWAP, social photos, and social bookmarks), and (2) the less selfish user experience where users still have a context (get attribution) although all contributions are made explicitly to a core consumable service (social news).

Small degrees of difference, although they importantly highlight the trade-offs in the continuum's of im/explicit contributions, in/visibility of aggregation ir|relevance aggregate-powered service to the primary interaction, and so on.

Monday, June 2, 2008

Priming with Google App Engine

Last week Google opened the doors to their AppEngine beta, their new platform for entry level web application development. I signed up and spent some time getting familiar with the space.

I haven't built anything of substance in python (a few scripts and some maintenance work), so the learning curve for me includes the language and API, relevant web frameworks like Django templates, and the Google platform and practices.

I started out be getting familiar with what AppEngine is all about by watching the videos from the launch of the preview release (closed beta) of the product in early April (2008): Part1, Part2, Part3, Part4, Part5, and Part6. Part2 was the best, providing an example of writing a simple guest book application and deploying it on the platform. Some of the later parts were good for demonstration applications and overview of the application dashboard.

Signing up was trivial, although requires a mobile phone to which Google SMS you a security code for application verification (I had to use my sisters). I had a little bit of trouble with unique sub-domain names (*.appspot) for my project, even at this early stage. The free package provides 500MB of storage (BigTable and GFS it seems), approximately 5 million page views per month, and 3 distinct applications. Quotas are applied to each application and are assessed each day.

The AppEngine home provides a wealth of documentation for getting off the ground quickly. The overview is nice, and I took the time to write out all the code by hand in the getting started tutorial that involves building and deploying a similar simple guest book application. The application dashboard looks slick, although I need to build and launch something (get some data in there) before I can really comment on its features.

The docs section of the site provides detailed information on the tools and development environment, which includes a simple webapp framework, the API for the datastore, and a local development server that mimics the features of the production environment (among other things). There is also a good and growing set of articles for specific application and development features, and a very active user group hammering away with Q/A.

It's early days for me, and I intend of spending a week or two learning the language, the relevant frameworks, the platform, and somewhere in there prototype a toy application. At this early stage I really like the whole package. I get the same feeling from App Engine as I did nearly 10 years ago learning Java coming from a C/C++ background. It feels as though the platform is clamped down to promote best practices and to focus effort on business logic.

Java hid all the mess of myriad of platforms, promoting a focus on the language and what you wanted to do, and App Engine is doing the same, although shifting the focus to web site infrastructure. You still need to know what you want to build and how to use the tools, but the best practices for data IO and organising your code for a scalable and maintainable web application are locked from the get-go. With Java, when I wrote a library or a GUI, I felt more productive given that my code would work the same any/every-where (whether this was completely true or not). I get the same kind of feeling when hammering out python for App Engine, that a well written application that is found to be useful will easy scale to the moon. Time will tell just how true/costly that implicit promise will be.

The platform looks competitive with Amazon Web Services now, and the loss in granular control will likely be a worth the trade-off for simplicity for most small to medium web applications, presumably their core market.