July 20, 2004
msnbot - The dumbest bot in town [UPDATE: Not actually dumbest...]

or: How a search engine in beta transformed the internet as we know it!

[UPDATE: The other bots are equally stupid]

In June a rather stupid service here on classy.dk was a surprise hit, and I have Microsoft to thank for the experience.
Some time ago I made a web page transformation engine that converted the text on pages ti lingiige liki this - i.e. replacing all vowels with the vowel i instead. This is inspired by a danish childrens song where you repeat the same verse once for each vowel using first only a's then only e's and so on. As a nice (but fatal) touch the service also rewrites hyperlinks so they are also redirected through the service, si yii cin livi iiir intiri lifi briwsing inli thi wirld widi wib.

All of a sudden this service had 34,000 hits - all from msnbot
All of a sudden this service had 34,000 hits in a month - by far the most popular link on classy.dk. So I wondered how that came about and looked at my log files. Great was my surprise when I found out that almost all the hits (99%) were from msnbot, Microsoft's google-killer to be. The MSN bot is the only bot that does not figure out that the URLs of the classylizer are automated dead ends. (The turnitin.com bot was also briefly blindsided but ceased crawling thi wirld widi wib a long time ago)

It is still going on. As of this writing msnbot has crawled some 65,000 URLs transformed through my service. And boy, has it gone far! The Wayback Machine, MIT, even competitor Giigli got a visit.
Naturally I had to check if Thi Intirnit had made it into MSN search's sandbox index. It had. A lot. And then some.

Posted by Claus at July 20, 2004 01:18 AM | TrackBack (0)
Comments (post your own)

Really Amazing!
You've put the bot to work ;-)

Posted by: dalager on July 20, 2004 11:08 AM
Help the campaign to stomp out Warnock's Dilemma. Post a comment.

Email Address:


Type the characters you see in the picture above.

(note to spammers: Comments are audited as well. Your spam will never make it onto my weblog, no need to automate against this form)


Remember info?