Monday, April 7, 2008

Toward Engineering Cumulative Advantage

I remember (looking over my notes) that around this time last year a popular meme in the 'sphere was cumulative advantage (rich get richer) and its effect on the then highly promoted collective intelligence inherent in prevailing web 2.0 definitions. The spiral was kicked off by an article by Watts in the NYT titled: Is Justin Timberlake a Product of Cumulative Advantage?

The core of the article was the findings from an experiment by Watt's and colleagues regarding the effect of social indication (popularity) on decision making. Specifically, it was found that the inclusion of a social indication (aggregate of decisions made by others) effected individual decision making made the popular choices more popular (as expected) with the interesting caveat that the distinguishing feature of elevated (popular) selections was seemingly random (did not correlate with unbiased quality). The latter point is what the 'sphere latched onto, promoting notions that the web2.0 giants were driven by the amplification of selections made by early adopters. The observation that fashion and popularity played a role in the the front page of digg or the uptake of myspace and facebook is not really a revelation.

The effect is familiar as it relates to notions of winner-take-all (competitive learning) and convergence dynamics in probabilistic systems used in machine learning for optimisation and function approximation. Importantly, it matters when there is a payoff for winning or reaching a quasi-stable state (a goal). Endorsing 'the popular' based on social indicators likely rates to individual payoff for social reasons, such as 'fitting in' by having a common shared experience through which to relate to others. This observation was made in a round about way by Karp and similar web commentators. The implication of course is the question of whether you can have a collective intelligence website without the cumulative advantage effect, and related questions of divorcing the two or manipulating the effect.

A careful consideration of social news sites like reddit point to the answer, manipulation is mandatory and implicit in the design. For example, I'm sure that a critical indicator of a given news item is its relative age. This provides an environmental negative pressure that I suspect totally dominates other more explicit negative feedbacks such as down mods that likely only effect the item on the rise. I used news decay as an example because it relates directly to the abstraction of ants in Ant Colony Optimization and the decay coefficient for pheromone effecting the stigmergy for structures created (decisions made) in the future.

Another interesting consideration is the removal of social indicators, that promotes completely autonomous action by the users. This is private by default (think your email account) and does not fit into notions of collective intelligence (no collective), although Google can still mine your inbox and provide you with targeted advertisements based on keywords it finds. A likely more effective although ethically more dubious case is if Google were to use your email corpus to relate you to other users like you and serve ads. Surely this is collective intelligence. This is an excellent example as it decouples the autonomous user activity from the application of the collective intelligence, there is no direct social indication and no winning by the user (goal state).

One more interesting case is that of del.icio.us, which is a flagship of selfish user action (bookmarking) with resultant emergent intelligence. The number of social indicators is low for an individual user profile, although given the socially-starved nature of the site, these few indications are used with abandon. I remember when I was a huge del.icio.us user (I blew my account away in 2007) I would read all of the comments of a URL by other users, intently scan my URL's see how popular they were, and scan the profiles of others with similar obscure links. It is clear that although the vision for the site is to limit social bias in bookmark decision making, positive feedbacks still (I suspect strongly) effect the average user, especially when considering the new and hot feeds. Even so, the effects are far less than that of popular social news site. The semantic information in the tags alone make del.icio.us a very valuable (collectively intelligent) resource irrespective of any URL coverage bias.

The difficulty of predicting the popular reminds me of complex adaptive systems (throw away top-down reductionist methods, and focus on bottom up) , and more recently (and popular) Taleb's The Black Swan (check the long now seminar, totally worth the ~1 hour investment). Toward engineering social systems that acknowledge cumulative advantage, one may offer a desirable level of social indication toward the vision of the project as highlighted in the examples. Purely social websites are bound by the whims of cumulative advantage by virtue. For example, the goal for the front page of digg is not to locate the news from around the web that is of the highest quality or most interesting content, rather it well designed to capture and promote whatever is popular. Any successful social news site must have the same goal, even if they are bounded to specific niches.

1 comments:

Jason said...

Another commentary post titled Cumulative Dis-Advantage that makes stronger connections between Watt's findings and the effects of emergence as a service.