Friday, April 25, 2008

Patterning Interestingness: Artefact, Context, and Relevance

Yesterday, I posted about my document interestingness side project and highlighted some related research and software libraries. The domain was constrained to academic research, addressing the specific question: how interesting is a given paper in the context of my research? I received a fair bit of personal correspondence, so I thought I would capture and integrate some of the ideas and related abstractions.

Firstly, it is important to highlight that the post didn't give anything of value away. As we know, ideas are worth nothing except for their inspirational quality, and only success is worth ripping off. The approach was a crude version of a 30+ year old method, and I can only assume that specific application has been addressed in broader academic software (perhaps bibliographic management software). Success with this approach will be in its effective execution to a specific (real) user pain point, with domain-specific special source (technique customisation).

Anyway, I had some interest regarding the academic application, some interesting elaborations for news and feed filtering, and some suggestions for related domains like job applications. The following provides a summary of motivating questions (problems):

  • Academic:
    • How interesting is a given paper?
      • Should I read this paper?
    • From the papers referenced, what are the relative relevance of each?
      • Who may be most interested in my paper?
    • How does my paper related to my field? (in what ways...)
      • What fields are most likely interested in my paper?
    • How can I optimize my position (for review/acceptance/examination)?
      • Who will receive my work favourably?
    • How relevant is my work for submission to a given journal/conference?
      • Will they be interested in my paper?
  • Blogs/Feeds:
    • How interesting is a post in the context of what I read?
      • Should I read this post?
    • How interesting is this feed in the context of the feeds I read?
      • Will I read posts from this feed?
    • What is the relevance of outbound links?
      • Should I click a link?
  • Resume:
    • How interesting is a given applicant?
      • Should we interview this applicant?
    • How interesting is a given job?
      • Should I apply for a given job?
I have phrased may of the suggestions in terms of questions regarding interestingness/relevance as well as domain specific decisions. The problems cover a range of motifs including search, filtering, and recommendation which may be abstracted into a general pattern. This general pattern involves three core concerns:
  1. An artefact to assess which may be as small as a query or as large as a set of document.
  2. A context to assess against which may be as small as a document or as large as the Internet.
  3. A the relevance between the two that translates into interestingness for the user (some kind of investment, such as time to read).
What was interesting for me from this experience was the flip in concerns. I was focused with the personalisation of the process: the assessment of foreign artefacts against a user model. The suggestions were free of that constraint, highlighting concerns such as the modelling of others toward addressing a potentially unrelated user concern. For example, in the academic setting: the modelling of the interests of high-reputation individuals or organisations toward a users economic payoff (attention, acceptance, etc.).

This trend may be extrapolated. For example, the equivalent in the blogsphere would be the modeling of the readers of your blog based on their own blogs (and related content) and suggesting articles of yours for them to read. A crazy but intriguing notion of pushing per-user recommendation onto content producers: show me (not my group) your best goods. Alternatively, you use such information to campaign other bloggers into becoming readers/subscribers/advocates of your own work product.

The abstraction has a clear relationship with internet search (query as the artefact against the internet corpus) as well as context-sensitive advertisement (mapping posts to an ad corpus or the other way around). In the latter case and many of the cases listed above, classical problems of keyword spamming are not a concern, as there is no payoff to do it. That does not mean the system cannot be gamed, it means the goal posts are moved. For example, if a system was devised based on the premise of the automated relation of your content with other people's content (read posts related to what you write about), you may be able to achieve a broader readership (through recommendations) by having generic content (broader but shallow keyword frequency). Similarly, in the academic domain, a conference may promote a general interest in their corpus to promote higher submission/attendance numbers.

Finally, this experience has confirmed to me that short development (hack) cycles followed by broader disclosure is a powerful force. The old me would have been satisfied with progress captured in a private word document, and moved on to the next self-delimited cycle or idea. Cheers for the email and IM!

1 comments:

Jason said...

Cam had some additional good notions on this, specifically with regard to the deployment of the service as a firefox plug-in:
- Automatic assessment of linked pages for interestingness (should i click this link?)
- Automatic assessment of content such as paragraphs (should I read this part?)
- Exploitation of meta data like delicious (is this link as defined by social meta data interesting?)

Cheers Cam.