Friday, May 30, 2008

Dolores Labs and their Color Naming Survey

Given my recent interest in crowdsourcing the names of colors with Pigment (while trying out Heroku), I thought I would review the inspiration for the application: The Delores Labs Color Survey.

Delores Labs based in San Fransisco and founded by Lukas Biewald and Chris Van Pelt is a six month old business built around a service for crowdsourcing market research problems, predominantly using the Amazon Mechanical Turk platform. They provide a number of examples of their work including tagging the sentiment of comments, classification of documents, events, online price extraction, search relevance, and image retrieval. It seems their approach generally involves using the human click workers on MTurk to do the grunt of difficult (for computers) work, and cleaning it up for application using statistics and relevant data mining or machine learning processes.

The site provides a blog that highlights interesting and relevant work to their core service, including what seems to be ad hoc internal projects as marketing and promotion for their services. Some recent examples promoted by the blog include: facestat for collecting statistics on human faces (advanced hotornot), media bias for classifying of articles for US presidential candidates, classification of the covers of sports illustrated, and a color naming survey.

The initial findings from the survey were released in mid March this year (2008) with the title Where does blue end and red begin? (a which made me think of the Chemical Brothers song: "Where Do I Begin"). The post provided a brief about the nature of the project, some screen shots, a link to their in-browser color explorer (very slow), and references the World Color Survey at Berkeley. Follow-up posts that month included the release of a color dataset with 10,000 color-name pairs, and reference to a cloud view of the dataset by an IBM researcher. A follow-up post was made more than a month later (late April) highlighting additional interesting visualisations of the original dataset including a color flower, a series of network and cluster visualisations, a 3D fly-through of the cloud view, and the application of names to images.

The release of the color data with 10K instances revealed further detail about the experiment, including sampling colors based on a biased HSV color space, rather than RGB (all rendered as sRGB) and the collection of multiple colors on each submission (biased by human comparison information).

The color survey is great idea for an crowdsourced experiment, and generally an interesting and fun idea for a company.

I remember when I read about the survey and first learned about the company, the story was very popular, hitting the front page on most social news websites. As a concept, I think a day of brainstoring would highlight many applications for such services as well as gimmick-based experiments, although as a business I would focus on web and technology centric market research.

This would be the kind of business I would like (may still) start. As such I can see a few areas where the site (the company's interface to the humans) needs improvement, as follows:

  1. Organisation of Promotion Studies: One of the best marketing tools this group has is the development and promotion of in-house studies. This is because they clearly show the power and potential of the service, which may be difficult to grasp for those not familiar with the model. I would rename "examples" to "recent happy customers" or something, and create a whole new "examples" section of the site. Each project would have its own sub-section (well defined container) to house all relevant information. At the moment, these in-hose studies are presented ad hoc in blog posts, and feel very messy and inaccessible.
  2. Rigor: Elaborating on the first point, these guys need some (visible) scientific rigor. Defining and presenting each in hose project as an experiment means that a clear aim, methods, results and analysis procedure should be defined and made available. Analysis may be highlighted in blog posts, but should be well written in a scientific manner and made available somewhere on the site. This would promote professionalism regarding offered services (trust), and provide a high-class (manages and engineers would love it) of promotional material in addition to the lower-class stuff like blog posts. Rigor may also allow the use of the data in scientific publications, providing additional high-class promotion and the potential for collaborating with research groups directly (additional revenue streams).
  3. Community: The popularity of the color survey and resultant comments and additional visualisations highlights that a community must be built around each project. Importantly it will drive traffic and interest in the services of the company. Also as important, this would provide a different-level of crowdsourcing, where analysis is performed by the community, freeing staff (founders) to complete paid work. For example, a forum or comments system and base visualisations that can be manipulated and commented on (like IBM Many Eyes).
I would go so far as to also promote a general "crowdsourcing problems" community, offer open source tools and tutorials, and let the community refine the procedures, tools, and ideas that bring money into the site.

I am unsure of the direction of the company, but a quick assessment of their recent FaceStat service highlights that perhaps the company is trying to create and cash-in on a single community (like a hotornot), rather than focusing on their core service. Nevertheless, it will be interesting to see what these guys come up with next.

Wednesday, May 28, 2008

A Day With Heroku: Building a Simple Color Crowdsourcing App

I recently signed up for the Heroku private beta and decided to build a simple application to test out this free Ruby on Rails Hosting solution.

Firstly, Heroku is an awesome idea and very well executed from what I can tell. The service is perfect for prototyping webapp ideas. I only used the minimal feature set, so I cant comment regarding the more advanced (premium) features (domain names, no ad banner?, scaling). I will continue to use it to prototype ideas for web applications even if/when there is a small fee after the end of the private beta. I hope the guys go the freemium route as they will be a powerful force and rails-centric alternative to Google's App Engine.

From a technical perspective I'd like to know a little more about what automation occurs under the covers, like session cleanup, database/environment versions, etc. If this information is already available it wasn't obvious to me during development.

I chose to hack together a simple rails application called Pigment to try out the in-browser development interface (I went with the American spelling on the site: 'color'). The idea for the application is to crowdsource the mapping for color names to their decimal values and provide services to exploit this information in aggregate, specifically: names-to-numbers and numbers-to-names. It was was inspired by and elaborated on the Mechanical Turk driven Dolores Labs Color Survey. The application is raw, simple, and took me about a day (mostly cowboy-style generate and test UI iterations). Natural extensions include probabilist name/number assignment, cool visualisations of the data, and an API (REST-friendly). Try it out at pigment.heroku.com

Anyway, the development interface is pretty slick, and it generally did all the things I wanted to do in terms of script execution and source file manipulation. I loved the simplicity of the editor - it would be very tempting to overload the interface with a suite of developer-friendly features, an option the guys have thankfully resisted.

I quickly got over the difference in response-speed with the online editor, and started to enjoy the idea of in-browser development. The site is awesome, and as a user I'm sold, but I did have some problems, and in my usual style I documented them as I experienced them:

  • Crashes: I'm using Firefox (2.0.0.14), and I experienced five crashes over the course of the day's work. I am unsure of the underlying cause, although I think it is related to rapid context switching between applications on my desktop and the influence this has with initialising/sleeping the Heroku development environment (AJAX in rails would be my first guess). The first crash occurred whilst flipping between tabs (code/view) to see changes, other crashes occurred while flicking between Firefox instances, and the last couple occurred whilst flicking between IE and Firefox. A "crash" involved 100% CPU utilisation, and required me to kill the process. Some crashes also seemed to correlate with changes to the routes and (I presume) subsequent automatic restarts to the mongrel instance.
  • Editor is Sometimes Screwy: Lots of control-c/v/x/z/etc caused me problems. The evens would occur at random places in the code, or random stuff would happen - like large slabs of code disappearing. Not sure if it was a focus thing or a speed of commands thing. The biggest offender was the command not occurring at the location I wanted - such as not picking up my mouse clicks and repeated re-clicks to code locations. I typically resorted to moving with arrows and retyping code.
  • Lost file: As mentioned, sometimes a control-x or control-v would do crazy stuff. Sometimes I would loose a lot of code or even the entire file (thank god for control-z). Anyway, onetime I switched to my other browser and the view was blank. Checking back in the editor showed my view file (50 or so lines of html/rails code) gone. More messing around, and the file no longer showed in the file tree. Logging out, then back in re-showed the file, but empty. I had to retype the lot - I was pissed, but wrote it off as my fault in doing a control-v control-s without looking at the editor before the context switch. I mention it because of the pain it caused, and because my assumption may have been wrong, and some other bug caused the data loss.
  • Indenting and Formatting: I like clean looking code. Selecting slabs of code should indent and inverse-indent when I tab/inverse-tab. Instead, they are deleted (again, thank god for control-z). I want a control-shift-f format command or a format code button, or at the very least tab-handing on selected code.
  • Syntax Error: The code editor has syntax error checking functionality. Nice. I noticed with some of my views and models with regex validation, the syntax error notification would stay on even though my code was valid syntactically. I only want to know when my code is wrong, which typically involves bad calls or names (hard to check in ruby) rather than a missing brace or something.
  • Javascript Error: Using IE6 for a view-checker (independent for the crashing problem) highlighted a persistent javascript error with the Heroku banner. The result was an indication to the user on the browser, and the lack of the Heroku banner being displayed. I can understand the user needs a bleeding-edge browser to use the development center, but to view my page (the Heroku ad) the user should be able to hit it anything, including an antiquated Microsoft browser.
Some additional features I would like include:
  • Preview Mode: The current preview mechanism is modal, meaning I loose my code view, and any current organisations of the directory tree (a pain!). When I develop for the web, I iterate the view A LOT. As such I need both code and preview mode active at the same time. I resorted to making my application public (too early) and using two tabs, then instances, then browsers. The solution could be a pop-up option, and must be light and fast (not know about my Heroku status). I'm thinking something light like a post preview in a blog or forum.
  • Recently Changed: I want a list of recently changed files or "open" files. I flick between model/view/controller when adding functionality, and I want to flick between these contexts faster if I worked on them recently. Like control-tab functionality. A "recent document" list would be a big improvement!
  • Query Tester: The data view for the database is nice and simple, but I longed for a query tester. I had to mess around with an inner query and a fussy 'like' query (ruby was being a bitch) and this feature would have saved a lot of screwing around with trial and error queries in the view. (I could have used a test if i thought about it a little harder)
Again, Heroku is awesome, all love and praise!

As I mentioned, the service is in private beta, and so will naturally iterate features, changing a lot between now and being open to the wild web. If I were learning rails or wanted a free (for now) platform for prototyping and small apps, I couldn't advocate Heroku enough.

Monday, May 26, 2008

A Human TSP Solver in Rails, One Month On

Well, it has been about a month since I proposed the idea for a web-based human TSP solver application, and I thought I would reflect a little.

  • Week1: I spent the first week discussing the idea with friends and colleagues both locally in person and remotely (email and IM). I proposed the idea, considered user experience, and chose ruby on rails in which to develop. I also run some preliminary experiments (in OAT) that suggested aggregating large numbers of partial solutions using a heuristic may produce a final result close to a holistic solution with a heuristic (ideas was feasible).
  • Week2: The second week was spent learning ruby on rails and refining the premise for the application. Three technical reports were written: the first statistically highlighting the benefit of spatial-based sub-problems over random sub-problems, the second statistically highlighting the benefit of a nearest neighbor heuristic as a viable approximate human solver over random and probabilistic (back up the first report), and the third qualitatively assessed the visualisation of solutions to spatial sub-problem in aggregate. A presentation of the premise and these findings was made to my research group, resulting in a beneficial discussion, and suggestions of additional contribution mechanism (like a maze game).
  • Week3: The core development occurred in the third week, where a complete database schema and application schema were completed, a viable contribution mechanism devised (Java applet), and the site as whole was styled. The first version of the site was deployed and launched on HostingRails and made available to semi-private alpha testing on Friday the 16th of May 2008.
  • Week4: The last week has involved monitoring the server logs, setting up source control, bug tracker, forum, and furiously adding features based on user feedback. The core user feedback centered around the complexity of the home page, the lack of instructions, and usability issues with the contribution applet. The end of the forth week saw the release of an update that included these changes, and the promotion of the status of the application to perpetual beta.
I am happy with the state of the application and my acquired rails knowledge at the end of this month. The majority of the originally envisaged features were realised, which is good given the general low complexity of the application. I spent a majority of the overall time discussing the idea and doing basic research (doing homework). I spent the majority of the development time on user interface tweaking, a task I find tedious given preference for more technical problem solving. The mix was good, although I would have liked the reports as blog posts instead of word documents, I suspect they will stay locked up forever now (lower overall quality and interestingness - imagine that).

Regarding contributions, I left the door open (some what) to allow users to define what might be useful to the holistic solution. Regarding holistic solutions, I chose not to explicitly construct a tour from the crowdsourced data, and instead left it at an XML feed of contributions and a slick visualisation - again leaving it to the users to interpret the results. This second choice was made all the more easier given that the majority of the solutions I loaded into the database from TSPLIB already had known optimal solutions, which I chose to visualise in the application.

I spent the last weekend reading up on related cognitive and computational psychology related to how and why humans are so good at solving small TSP instances, highlighting the convex hull hypothesis. I have since contacted a number of researchers in the field to gauge their thoughts regarding the premise and future for the application.

I had fun climbing into rails. I lived in the API, got most of the answers I needed from Google searches, and watched railscasts when I felt like procrastinating. Next time (say learning django or something), I would spend more time reading forums rather than tutorials, and abstract more functionality into the base language to make development and testing easier.

There is plenty of room for additional features in the app, especially alternative contribution mechanisms (games!). There is also a lot of room for application optimisation (queries and caching), if and when it becomes popular or useful. In the short-term I am going to leave the application alone, at least for a week (both development and production) while I have a play with Heroku.

Saturday, May 24, 2008

Human TSP Solvers and the Convex Hull Hypothesis

I released some changes for the Human TSP Solver application yesterday, and I thought I would spend today reading up on theories as to how humans are able to effortlessly find solutions to small TSP instances. These human-based heuristics are generally mixed up into the concerns of spatial cognition, which have been investigated over the last twelve years or so by Ormerod, MacGregor, and colleagues.

The fact that humans are good TSP solvers was established perhaps in the 1970's in a study by Krolak, et. al. and has been confirmed in studies since, both for the TSP and related NP-hard problems. Krolak, et al.'s study titled A man-machine approach toward solving the traveling salesman problem (1971) uses a man-machine heuristic where the computer clustered cities for the human operator to connect, then offered sub-tours to the user to connect until a complete and feasible solution emerged. I like the paper, because it refers to the training of users on solving the TSP, and suggests at the writing a training manual (seems funny).

Ormerod and MacGregor's work started in 1996 with the publication of Human performance on the traveling salesman problem. The study involved two experiments with samples of people on 10 and 20 node TSP instances, where participants were given a booklet of connect-the-dots to complete. The human-based results were assessed against three heuristics (nearest neighbour, largest interior angle, and convex hull), and found to be superior. Human results were produced quickly (low cognitive load) and consistently were of a high quality (within a few percent of optimal). The core of the study tested the hypothesis that the difficulty of a TSP is a function of the number of interior cities with regard to a convex polygon, rather than the total number of cities, which seemed to be supported.

Follow-up work by the pair involved elaborating on the finding, and considering the underlying heuristic employed by human subjects to explain the results. Global perceptual processing in problem solving: The case of the traveling salesperson in 1999 considered a top-down explanation where the subject firstly perceived the overall structure of the problem, then made local step-by-step decisions that aligned with the global perception. The paper suggested that the global properties involved the identification of the convex hull (polygon) in the perceived data points, which guided tour construction. The experiments considered the measurement of this perception as well as aspects that may effect such global perception such as rotation. Similar concerns were also investigated in another 1999 paper titled Spatial and contextual factors in human performance on the travelling salesperson problem, where a larger instance (48 cities) was considered under a variety of contexts.

The convex hull explanation was pursued in the 2000 paper A model of human performance on the traveling salesperson problem, where the heuristic model was elaborated and assessed. The Convex Hull Hypothesis for human TSP solving proposes that the boundary cities are connected based on adjacency in a single direction, from which internal cities are linked. The implementation of this cognitive model was demonstrated to locate high quality solutions to TSP instances in a range between 10 and 100 cities.

The model was defended against an alternative explanation involving tour crossing avoidance in Convex-Hull or Crossing-Avoidance? Solution Heuristics in the Traveling Salesperson Problem, 2004. Most recently, the model was compared to a set of standard heuristic methods and found to provide the best fit with human-based observations: A Comparison of Heuristic and Human Performance on Open Versions of the Traveling Salesperson Problem, 2006.

I cannot comment on this variation of the convex hull TSP heuristic as I have not read up on the base method, although the hypothesis that humans use some variation of this approach seems quite reasonable. I know from observing my own behaviour and of others during user testing that variations on a directional bounding-polygon approach is taken. Unfortunately, this granularity of information is not recorded by the application, so I cannot do my own assessment (yet).

With regard to the Human TSP Solver application, this body of work mearly confirms that harnessing human spatial intelligence is a good idea, and Krolak's early work suggests that effective integration will result in meaningful solutions (also confirmed by my early experiments). Further, knowing that the majority of contributions in the database were collected under the convex hull assumption, means that exploiting the information in aggregate may be doe in a meaningful way. I'm not sure how yet, but my gut tells me there is an opportunity for a hull-sub-tour-based integration algorithm.

Finally, the premise of Ormerod and MacGregor's work highlights to me that humans will see spatial patterns, even if none exist, such as in random TSP instances. This is interesting because these spatial patterns strongly influence local decision making, which in the context of this research positively affects holistic TSP tours.

Monday, May 19, 2008

TSPLIB: A Library of Standard TSP Instances

A classic resources for anyone interested in the Travelling Salesman Problem is TSPLIB. I thought I would capture a few words about TSPLIB, and how I intend to use it.

TSPLIB is comprised of more than 100 instances of the TSP collected by Professor Gerhard Reinelt in the mid 1990's. Reinelt is currently associated with the University of Heidelberg, and as such, the TSPLIB is housed on the website of the Discrete Optimization Research Group at that University (see an old mirror).

The library is composed of a set of problem instances for the TSP and related problems. I am specifically interested in the Symmetric TSP instances, although the other problems in the library include: the Hamiltonian cycle problem (HCP), the Asymmetric traveling salesman problem (ATSP), the Sequential ordering problem (SOP), and the Capacitated vehicle routing problem (CVRP).

The site provides documentation (postscript format) that describes the standardised file format for the problem instances, as well as a FAQ that relates to common questions about the library.

I had used instances of the TSPLIB before in my OAT library, although I was unaware of the number of symmetrical instances in the library, and the array of "types" for this class of problem. Specifically, the file format describes a number of distance measure used to assess tours in TSP instances (that I refer to as types). As such, symmetrical instances may be one of the following types: EUC_2D (Euclidean distance rounded to an integer), GEO (Geographical for assessing longitude and latitude coordinates), ATT (Euclidean distance with scaling and rounding), MATRIX (pre-defined distances in a matrix), and CEIL_2D (Euclidean distance rounded up).

The majority of the TSP instances are of the EUC_2D type, many of which have known solutions. Considering this, I believe I will include all types except MATRIX in the the human TSP solver application, and provide optimal solutions for visualisation on the problems that have been solved. Importantly, William Cook's additional problem instances are also available in TSPLIB format, meaning that I can use a generic importing and representation approaches in the application.

One thing I may need to consider in the future is the license under which these instances are provided. I suspect they have been released under public domain license, although this will become a concern if I chose to monetise the application.

William Cook and the TSP @ Georgia Tech

Given my new found interest in the Travelling Salesman Problem, I thought it prudent to start reading up on the problem from the perspective of the prominent resources on the web. An excellent starting point is William Cook's Travelling Salesman Problem page, hosted at Georgia Tech.

I was familiar with Cook and his resource given its high ranking in Google searches (American Spelling), mention TSP Wikipedia article, and my colleagues mention of Cook, et al's CONCORDE TSP Solver any time I talked about the general problem. Cook's page is detailed in its coverage of the problem and his involvement with solving specific instances. I thought I would provide a general overview of the offerings of his TSP portal.

An overview of the general problem is provided that includes a set of links to related resources around the web. The history of the problem is covered that, interesting to me, highlights a TSP board game proposed in the mid 1800's requiring the player to solve a 20-city instance. The history seems to highlight that Cook has been involved in the leading charge for finding solutions to the hardest instances for about 15 years.

An interesting area of the site are the pages that describe the methods for solving the TSP. These pages suggest that finding good solutions, even optimal solution to large TSP problems is hard, although the methods for verifying the optimality of a solution are harder (at least more time consuming). The approachers described include: control zones, sub-tour elimination, cutting planes, local cuts, branching, and accuracy, the last few of which have yet to be written. This section also provides in depth information and demonstration Java applets are provided on the historic cutting plane method proposed in the 1954 paper Solution of a Large-Scale Traveling-Salesman Problem.

Naturally, there is a section dedicated to the CONCORDE solver, providing a downloadable version of the solver with source code for academic and general research use. Versions of this solver have been used by Cook and colleagues to find some of the largest TSP solutions known to computer science. In fact, there is a page dedicated to world records for TSP instances, which is held by Cook and collaborators for the solution of a 85,900 node VLSI instance in 2006.

The remainder of the site provides information on applications for TSP research (such as the CONCORDE solver), a gallery of TSP problems and solutions, and a nice Google Maps TSP mashup for solving for solving ad hoc instances defined by users. Interestingly, the site provides two
in-browser flash games for solving the TSP. The first is a single-player game that involves a player finding a minimal tour called Tour Finder, and the second is a two-player game that faces the players off against each other called Tour Race. Finally, the site provides a set of test TSP instances in addition to TSPLIB including a nearly 2 million node world map, 25 instances of cities around the world, and 102 VLSI instances.

Reading over the material from Cook's site consolidated a lot of ideas I had for the next iteration of the human based TSP solver application. I am thinking about ranking problems by their general difficulty (number of cities, discretised on a log scale), and offering users to contribute toward one of a set of different instance difficulty levels (easy, medium, hard, insane). I was also thinking of assigning users "experience levels" based on the number and type of contributions.

Regarding the contribution mechanism, I was thinking of allowing the user to select the difficultly of the sub-problem (number of cities), allow help in the sub-problems (nearest neighbour solutions, common solutions, broader context), and allowing sub-problems to be resized. Regarding the aggregate solutions, I was thinking of showing the nearest neighbour solution, the probabilistic-best crowdsourced solution as well as the optimal solution (if known).

At the very least, reading Cook's page has motivated me to ensuring the application can support a wide array of large (up to 100K) and small (<100) instances of the TSP, and to perhaps focus on the less regular instances available (such as geographic-based rather than VLSI instances).

Games with a Purpose: GWAP.com

Checking over my feeds revealed that Luis von Ahn's group has released their anticipated Games With A Purpose website gwap.com (cheers O’Reilly Radar). The site provides an umbrella for web-based games, from which the solutions contribute to solving AI-complete computer vision problems. The current suite of games include:

  • The ESP Game: Match tags for an image with a partner.
  • Tag a Tune: Describe a tune and guess whether your partner is listening to the same tune.
  • Verbosity: A word game, where you describe a secret word by giving clues.
  • Squigl: Outline parts of an image and match with a partner.
  • Matchin: Select favorite images and match with a partner.
Two of von Ahn's products not released under the umbrella (for whatever reason) include Peekaboom (reveal parts of an image to a partner) and Phetch (find images on the web that match a description). Related products include the implementation of ESP for Google called the Google Image Labeler, and reCAPTCHA.

The tag line for the site is: You play the games, Computers get smarter, Everyone benefits! There is also a promotional video for the site on YouTube. All games are clear, simple, fun and centered around computer vision problems that the current state of algorithms find difficult. Signing up provides a general GAWP account with a user profile that includes an aggregate score, a level that reflects experience, and a break down of contributions to each game in the suite. The site is pretty in that is very graphics heavy.

All in all, the portal is impressive, and the games are great, although as a technical kind of guy I am more interested in how each game contributes to their image database (beyond my wild assumptions), and how they plan on exploiting (harnessing) this information. They site is branded Carnegie Mellon University like von Ahn's other products, suggesting ownership and financial interests beyond von Ahn's affiliation.

I suspect that all of these products have an academic driving force rather than commercial, which raises the question as to how such products could be exploited toward commercial means (basic monetisation). I have little doubt that such database or perhaps the data collection frameworks would have a cash value to someone outside of academia. Interestingly, a similar human computation based computer vision project called Polar Rose (that grew out of a research project from two Sweden universities) have $5.1M in venture capital.

I will continue to study von Ahn's work, towards improving my notions of a human-powered NP-complete problem solving service.

Sunday, May 18, 2008

Dual License a LGPL Project

I have recently been approached with a request for a commercial license for one of my open source projects. The project is called OAT (the Optimization Algorithm Toolkit), and provides a set of computational intelligence algorithms as an API, a snappy explorer interface, and an interface for designing, executing and analysing experiments. The project was released under the GNU GPL, and more recently is licensed under the GNU LGPL.

Firstly, my research reveals that this is a valid thing to do as long as I am the sole owner of the project, which I am. It is referred to as Commercial Open Source, and licensing scheme is referred to as dual-licensing. The Economics of Commercial Open Source was an interesting read, and I found this article on LGPL and Java particularly useful.

The library/software is written in Java and the current version (1.4) is available under the LGPL. I changed to the LGPL from the GPL with version 1.4 as it permits commercial use (linking) and distribution without having to release proprietary code under the same license, unlike the GPL.

Ripped from the LGPL Wikipedia entry:

A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License.
One must only release the source to proprietary code in the case of derivative works, which involves changing code in the library, not dynamic linking (such as inheritence in Java).

A quote from The LGPL and Java:
The LGPL contains no special provisions for inheritance, because none are needed. Inheritance creates derivative works in the same way as traditional linking, and the LGPL permits this type of derivative work in the same way as it permits ordinary function calls.
An important considerations (as I understand it), is that the license is only activated when the product is distributed, meaning that usage in proprietary web service applications is legal, even if they are derivative works! (?). I need to verify this!

An important consideration in taking this step are the open source and non-commercial libraries used the project. My approach involves releasing the next version (1.5) as three modules, each with their own LGPL license. The core module would be standalone, permitting the dual licensing scheme for commercial use. The remaining two modules would provide the explorer and experimenter functionality, drawing on graphing and statistics libraries that could not be released under a commercial license.

Saturday, May 17, 2008

Human TSP Solver

Well, it has been approximately three weeks since the inception of the (crazy) idea for a human-driven TSP solving web application. I've you've haven't been tracking the progress see the inception, the consideration of user experience, getting up and running, and the discovery of a related project.

I am happy to reveal that I have released a version of the web application to private alpha testing! Given that the audience of this blog is about 50 people, (private by anyones standards) type in the title of this post .com to have a look, but only if you promise to give me some feedback!

The current state of the application is stable and offers the core functionality outlined in the projects inception, specifically: the capturing of user contributions toward a crowdsourced solution to the travelling salesman problem. I used Ruby on Rails as the web framework and Java Applets to capture and display user contributions.

After having the site up for a day, it is clear to me that I am primarily interested in the aggregate data, what it means, and the ways in which it may be harnessed or applied. Naturally, there is plenty of room for additional features, specifically focused around how contributions are provided by users (various games), although additional features concerning the aggregate contributions are harder to brainstorm. I think this is simply because I don't know what to expect. I don't know what it means to have tens or hundreds of thousands of partial solutions to TSP's. What I am sure of is that such information is far more useful than simply solving small instances of the general problem.

While pondering this question and preparing for a presentation of the project to my research group last week, I came across a series of papers from the Journal of Problem Solving (vol 1, iss 1) dedicated to the investigation of human-based TSP solving (credit to the Wikipedia TSP article). I plan on reading these in depth, although a first pass highlights some interesting and relevant findings regarding the low cognitive load and high performance of humans on small instances.

Anyway, it feels good to have a usable alpha version out the door, and exciting to consider the array of potential features that may be integrated into the next iteration.

Thanks to my my early alpha testers, your feedback has been invaluable!

Friday, May 9, 2008

Crowdsourcing Protein Folding with fold.it

I've been working hard on the human-based TSP solver application. I've done some experiments, written some technical reports, and gave a presentation to my research group. The idea appears viable. The rails application is starting to take form, users can make contributions, and I'm thinking about effective sub-problem presentation schemes (an applet for now).

I came across a similar project in the popular news today called fold.it which crowdsources protein folding as a computer game. It was covered on unews, slashdot, technology review, and hacker news among many others. The project appears to be about a year old with some serious backing, stemming fro UW's Baker Lab and Rosetta@home distributed protein folding project. Like Rosetta, the fold.it game is a desktop application that communicates with the internet to download problems and upload results (same core codebase I believe).

The game is pretty damn cool. The interface is simple and slick, and the game itself quickly increases in complexity which I think translates into fun. Each "puzzle" involves manipulating the side chains and backbones of proteins toward minimising the Gibbs energy whilst ensuring the structure is viable. I really like the idea of user manipulation involving both manual and automated tools (jitter, etc.) Problems do not have known solutions, rather an estimation or minimum score (in the training rounds) which has to be obtained before progressing. There are competition puzzles beyond training that I assume do not have this limitation, translating into a free for all optimisation like programming competitions of old (think DrDobbs).

The structure of the site is very similar to my vision for the TSP application, with the important difference that the fold.it game is desktop driven rather than in the browser. The site and game itself are all about the puzzles, promoting both user scores as well as groups of affiliated users. At a minimum, the game provides further verification on the approach toward making difficult problems accessible to be solved by non-domain experts.

The project claims their aim is to locate protein folding savants and exploit them. This is an important general consideration in crowdsourcing applications: locating and rewarding experts. I think a better aim would be to collect statistics from the humans and use the provided solutions as a dataset toward developing better automated techniques for churning much larger and hard proteins. I would also think about how to represent the problem in various different ways to exploit different spatial information and make it less complex for the user on bigger problems. No doubt these are considerations for future work.

It's a cool application, go and download it!

Friday, May 2, 2008

Why Self-Funded Web Development?

I have been chatting to a lot of people recently given my soon to be submitted dissertation. I've been describing my desire to design and build some web applications and to investigate new new wave of technologies and models that have emerged in my absence (while studying). My intention is to do this off my own bat (finances), and hopefully realise some modest revenue streams to pay the rent once my saving evaporate.

Interestingly, reactions have generally been supportive with a fair sprinkling of negativity. For example, rather than receiving pitched ideas, I have received a lot of commentary of how and why some business models are not viable, and why some popular technologies and frameworks are fads.

All quite reasonable, but where is the excitement, the optimism? Where are the advocates? I thought I would capture things from my perspective.

Naivety (or ignorance) is a powerful force when mixed with ambition, it can make you do things or take on problems (risks) that other "informed" people would never dream. Success is measured not in terms of cash but rather through merely getting something out the door. Success is pride and its reflection, recognition. Recognition alone may not pay the rent, but it will get you through the six months of contacting that will (gris-gris and/or resume padding).

The web is a perfect match for such notions. Like broadcast television, the medium has the potential for reaching and effecting anyone with a receiver. Although, importantly unlike TV the potential audience is global, and conversations are bidirectional (more engaging by default). This potential provides a key coefficient on the previously outlined notion of "success", amplified beyond locality (peer group/city) to the long tail of global attention.

Therefore, (my) self-funded web development may be reasoned using some mixture of the potentials for: innovation, audience, and recognition.

A key element left out of the above equation is creativity. Building things is fun, and being a "producer" rather than a consumer feeds into pride of accomplishment. Competency with tools, especially good tools is powerful because the barrier to the transition from ideas-to-systems lowers. There have been times when I have been immersed in a domain and felt as though my ability to transfer thought to code was only limited by my thoughts. Perhaps the is male arrogance, but I recall that it is an extremely powerful feeling.

If creation (producing) is powerful then it makes sense to explore the ways to create efficiently, to play with the thought-to-code transfer function. This motivates the investigation of the array of languages, frameworks, and models in general, and those that pertain to the web specifically. The same motivation applies when seeking knowledge (researching/reading broadly): the more you are aware of, the better your thinking and you ability to consider useful abstractions.

As such, investigating varied web technologies is motivated by the potentials for creativity, and more specifically the creative power that comes through the reduced barrier between thought and code.

So why self-funded? The source of cash is a controlling factor. Awards (research) and seed funding provide short term solutions, although ultimately self-perpetuating revenue streams are required to maintain control what you realise and how you realise it. Having no income and the unsustainability of the situation focuses attention like a laser.

That kind of captures my feels regarding self-funded web developement, as well as broader notions of my passion for software engineering and continuous learning. Looking at such activity through a "risk mitigation" lens, especially financial or technology risk misses all the interesting stuff.