June 13, 2003
Not every idea deserves the name invention. Not every reflection the name analysis

The web is infested with hyperbole. The dotcom bubble of course wa shyperbole through and through; nothing but me-too companies and ridiculous 'bleednig-edge' e-business portals, platforms and technologies. But just because we're in a downturn that doesn't mean the hyperbole is gone. In particular the celebritybloggers have gone in to much too high a gear when it comes to evaluating the quality of blog journalism and the depth of blogging technology. It's as if the act of publishing on the web turn nice ideas into Deep Intellectual Property.
A case in point: Blog Post Analysis. The simple insight: Most blogs display multiple unrelated posts on their main page (that is sorta the concept), so we have to extract individual posts from RSS permalinks or through some kind of parsing heuristic. Nice idea. It is even well executed. What is annoying about this simple concept and the simple tools built to support it is that this is then turned into Blog Post Analysis (tm probably pending) aka 'BPA technology'. Give us a break would you?
The auto RSS maker is wery nicely done but 'BPA Technology'?

To check how well done the script was I wrote a simple version myself using perl's standard toolkit for the job:
Classy's nonfunctional three hour Blog Post Analyser. Source of said analyzer.. Some text gets lost and there are plenty of other kinks.
It does reasonably well on my own log. A previous version did much better on Doc Searls' weblog than the current one. Somewhere along the way my heuristic for associating text with links went haywire.

The next thing to do about this is to break it up: First write one or two parses that condense webpages into very simplified xml markup, which only preserves the tree structure of the original page.
This should then be processed, not by a perl script but an XSL transform.

The direct opposite of this overstating of results and ideas may be found in the world of mathematics that I was trained in. Here, the goal is to present much, saying very little so results are more likely to be understated than overstated (At least work by good mathematicians).

Posted by Claus at June 13, 2003 10:16 PM
Comments (post your own)
Help the campaign to stomp out Warnock's Dilemma. Post a comment.

Email Address:


Type the characters you see in the picture above.

(note to spammers: Comments are audited as well. Your spam will never make it onto my weblog, no need to automate against this form)


Remember info?