April 10, 2005
Standout phrases on amazon

Amazon has a new interesting feature based on the full etxt data they have because of the Search Inside feature. They show you the phrases from a particular book that are statistically improbable, i.e. standout phrases, phrases that are unique to a particular book. This is very useful, I'm not sure it's surprisingly useful, but it's certainly useful.
I am reminded of an IBM research paper on hierarchical bayesian categorization which used similar ideas to obtain useful hierarchical categories of documents. Since I read that paper I've been wondering when we would see this applied in the real world, but no search engine seems to have emerged from the IBM project.
Oddly related projects: Technorati "related" tags and by extension, applications of Yahoo's term extraction service - this is like open sourcing the context algorithsm underlying e.g. Adsense.

Posted by Claus at April 10, 2005 07:32 PM | TrackBack (0)
Comments (post your own)
Help the campaign to stomp out Warnock's Dilemma. Post a comment.

Email Address:


Type the characters you see in the picture above.

(note to spammers: Comments are audited as well. Your spam will never make it onto my weblog, no need to automate against this form)


Remember info?