September 20, 2003
Test before you claim

A new search engine will do keyword weighting as reported on Yahoo News. The idea is an old one: When you're searching for "free downloads", "free" is a qualifier for "downloads" thay you're not willing to live without. Therefore a search engine strategy is to cluster words, hierarchically, so that pages that match "free downloads" are favoured over pages that just list "downloads". It's a proxy for genuine understanding of language of course, since you really need to determine whether "free" or "downloads" is the key word of the search phrase.

Claims are being made about a new search engine capable of this, but the researchers making the claims haven't rehearsed what they want to say properly:

Clever ranking algorithms, such as Google PageRank, are becoming misused (spammed) by techniques like 'Google bombing,'" Schaale explained. Vox Populi can help remove this kind of spamming by identifying so-called "artificial" link clusters, he explained.

To spam Google, wily Web masters create "domain clusters" that consist of hundreds of homogeneous dummy sites optimized for keywords. The word "women" might appear in thousands of dummy sites that contain pornography, for instance. A person seeking information on "women's studies" might have to wade through page after page of spam before hitting a university women's studies department.

Let's test the claim that Google is broken, like so. There is absolutely no porn in sight. It's a good thing these people are researchers. If this was a commecial demonstration everybody would now be walking away from it laughing.

Comments (post your own)
