Wednesday, April 23, 2008

reCAPTCHA Makes Me Want To Work On Machine Vision Problems

One of the problems I perceived in my field (computational intelligence) was how the algorithms we played with related to real world problems. The majority of the "optimisation", "pattern recognition", and "classification" problems were sanitised versions of so-called real world problems, with known optimal solutions. Such problems are exactly what you need in the investigation and demonstration of experimental algorithms and systems, although what I felt was missing were motivating real world examples beyond lofty and loose relations (likely an artifact of my narrow perspective of the space).

At the moment, many sites you visit that require you to sign up provide a CAPTCHA, and more interestingly, they use the reCAPTCHA service. I have a high-frequency of signing up to services (I like to try things out) and each time I complete a CAPTCHA challenge, I feel like dropping whatever project I have on the go and getting stuck into computer vision research.

Some background. A CAPTCHA is a test to make sure you are a human and typically involves reading, interpreting, and typing visually messed up words. reCAPTCHA is a project that exploits such human computation toward solving a computer vision problem (complete digitisation of books) that is too difficult for todays (off the shelf?) optical character recognition softwares. A reCAPTCHA challenge involves the human-based transcription of two words, one that is known to the system and one that is not, and the human is not told which is which. The system-known word is used for the test, whereas the results for the second word are aggregated into a probabilistic model that eventually solves the second word (to a desired level of confidence).

The reCAPTCHA project provides widgets to embed the service in your website (in only 4 lines of code), lots of API's and plug-in's, and provides a variation for protecting email addresses listed on websites from automated harvesters called mailhide. The use of the service appears (qualitatively to me) pervasive among web applications, and is even used by Facebook and their massive user base. Harnessing that amount of human computation is inspiring (I want that!).

Frankly, I've never approach any computer vision problems other than toy datasets for demonstration purposes, although the process of automating the apparently effective "human approach" to the problem makes my mouth water with anticipation. I would inductive models layered from sub-symbolic to symbolic representations of the domain. Lots of training, lots of tweaking, lots of fun. What holds me back is the knowledge that the devil is in the detail. Serious math-heads have hit this problem for >30 years (78?), and as such I'm sure there are no quick fixes (remember, it's hard) and I'm sure that is a lifetime of methods (dogma) to consider (lots of book time). Nevertheless, this OCR problem is in everyones face, advertising the need for effective computational intelligence algorithms. The more AI-Hard problems that are out there in the open like this one, the more likely substantial (interesting) work is completed toward addressing it.

The human computation + blind trials + probabilistic modeling approach toward problem solving is a pattern promoted by Luis von Ahn whom I've posted about before. The pattern is cool, and is being followed in academic circles beyond von Ahn's work, for example both popularized and elaborated (distributed human computation). I really like the discrete contributions made by users as a service (security, game, etc.) that feeds into a broader model used for a different purpose (digitize books, image search corpus, etc.). It is a different way of thinking about collective intelligence, where the emergent service is disjoint from the primary discrete contributions, and the parts of the model are exposed although slyly represented. For example you do not see "come help us digitize books" or "label images" (boring), instead "solve this so you can do what you came to the site to do" (alternate intent).

I relate it to the explicit construction of your social graph on Facebook, firstly satisfying your own interest, and secondary satisfying the sites (once they figure out how to exploit it). Same deal with del.icio.us and flickr (user->site or discrete->emergent). Interestingly, AideRSS uses a reverse of this pattern where the user gets their selfish service from the aggregation of other's contributions, requiring the contributions before the aggregate service (emergent->user). The same can be said for Google's search.

Beyond the collective intelligence patterns, I want to do problem solving the von Ahn way. I have been racking by brain as to how to represent an optimisation or classification problem in such a way to exploit discrete human computation (nothing substantial so far). An obvious path is to exploit human pattern recognition and approximation capabilities (almost reflexive they are soon fast/good), although it is clear that the difficulty does not lie in the selection of a problem, but rather it's perception. For example citizen science and crowdsourced a football team are nice gimmicks (direct perception of emergent effects), but the ESP game is a popular game and reCAPTCHA is a popular security service (indirect perception/interest). Interactive EC is a good start, but one must think broader regarding inductive methods, and narrower regarding application. Any thoughts?

1 comments:

Jason said...

A jibe from KK suggesting that CAPTCHA spammers may come up with AI before the computer scientists.