July 03, 2004
More problems should be solved like this

Here's an example of the kind of thinking required to do software well as opposed to just doing it. As it turns out this particular example is also becoming quite fashionable as the XML backlash (aka the "XML as programming language" backlash) continues and terms like domain specific languages and little languages get thrown around more often.
The problem at hand is that of word stemming and the solution to the problem is the Snowball language. Stemming is the act of truncating search words to a root for use in search queries (e.g. "words" -> "word"), which is useful in searches. More than 20 years ago Martin Porter created the common standard algorithm in english language stemming, now known simply as The Porter Stemmer. Over the next many years a number of implementations appeared and most of them were in fact faulty. People simply weren't capable of implementing the stemming algorithm correctly. To solve this problem once and for all, Porter designed a little language specifically suited to the definition of stemming algorithms. Along with the language he designed a Snowball to C compiler so that the snowball stemmers would be useful in common programming environments. This story is found in Porter's account of the creation of Snowball.
After the appearance of Snowball, stemmers have been submitted to the project for 11 additional languages. The brevity of the snowball stemming algorithms is testament to the usefulness of this particular little language, and the page describing the snowball implementation of the Porter stemmer from the original algorithm is good evidence as well.

So what has this got to do with how software should be done as opposed to how it is done? Simply this: Even relatively small self contained problems like word stemming take an enormous effort to do correctly. And note: By "correctly" I don't even mean "perfectly", since that is certainly not true of algorithmic word stemmers, I just mean "as intended by design". Only a very limited part of all software is written with that level of attention to detail or that amount of upfront design to guarantee a decent chance of success.
It also demonstrates quite exactly the promise of dynamic extensible languages: Good extensible languages afford the construction of little languages for specific tasks within their own programming environment, and little languages afford a clarity of implementation you can't get without domain specific languages.

Posted by Claus at July 03, 2004 02:45 PM | TrackBack (0)
Comments (post your own)
Help the campaign to stomp out Warnock's Dilemma. Post a comment.
Name:


Email Address:


URL:



Type the characters you see in the picture above.

(note to spammers: Comments are audited as well. Your spam will never make it onto my weblog, no need to automate against this form)

Comments:


Remember info?