Today I was blogspammed. That is, My Movable Type blog was hit by a Moveable Type capable spambot. Nothing major - in fact rather cleverly done. It seems a google search for 'ass' led to the references to Cory Doctorow's praise of Google on oreilly.com. In response to that a couple of nondescript messages from 'holly' and 'hannah' (one an 'Eliza' style reworking of my blog entry title, one mildly sexual) both were left by authors identified by anonymous sound adresses in Australia.
Since I posted this story in May 2002 - one has to bet that these references were crawled- not found by hand, hence I think I was hit by a MT posting bot of some kind. A nice trick, really.
Just attended a good performance of Sjostakovitj's 4th symphony with the Danish Radio Symphony Orchestra. A tough but rewarding hour of uncompromising modern music. Leaving the concert hall, one can't help but admire how uncompromising this music is and the enormous scale of Sjostakovitj's ambition in writing it. Of course the world still sees huge new achievements and several schools of composition has formed and faltered between the 30's when this was written and today, but that kind of intensity in a composer and his work is rare and may just be impossible now.
My workstation here at home has been upgraded to a supermarket bargain buy top of the line machine.
What a difference. The machine is less noisy and more capable than the one I had before - and to boot equipped with interfaces to everything. Drawback: Windows XP. I am a notorious late adopter, and I have to say I really hate what they've done with window. They've dumbed it down. And in addition windows itselt (the standard windows frames, and the titlebar and icons) now takes up more screen real estate than before (or mayby I just need to find a skin with the right terseness).
To the extent that I can reclaim win2k look and feel (which was an improvement and simplification in most respects on NT 4) I'm doing ok - but a little bothered I am....
I have to take issue with my esteemed colleague Just's opinion on the new design of berlingske.dk. Just likes it. I don't. my parameters for judgement: 1. No interesting news in the content. 2. The new site is S*L*O*W compared to the old one. And due to the way it's done the page hasn't loaded properly until ALL the crap is loaded, so you cant start news browsing until you have waited for a very long time.
I haven't seen such a poor relaunch in terms of performance since krak.dk decided to destroy their website a while back, by making it a ten click instead of a 1-click operation to locate a map to some location. Among other much used but very sucking services: TDC's phone directory. First of all - even when you're explicitly looking for directory service you are redirected to the yellow pages. THAT SUCKS VERY MUCH! It's just about as annoying as spam. Second of all, the page has the same problems as berlingske.dk. Embedded script necessary to navigate from yellow pages to directory listing fail to load if you don't wait for all possible crap to load and again you have to wait a lot. Sucks, sucks, sucks, sucks, sucks!
BUT, I know that all you graphic-ponys like the information I don't use conveyed in the 'nice look' of these sites and you take issue with Jakob Nielsens strict 'the text is only the words on the page' view of webpages, and that you all feel that the look conveys information as well and of course you have a point (heck, even Google are using the rendered form of hyperlinks to enhance their page ranking algorithms by increasing the score for boldfaced links that stand out on the page). It's just that there is absolutely no way anything but the name and address entry fields of the phone directory are helping me find that phone number in any way, shape or form. That particular action is information redux - a pure memory prosthetic. Just the digits, please.
Google - as we all know - gets the idea of the memory prosthetic. Speed and simplicity of application is of the essence for the use of Google to look up a lot of stuff we actually already know. Since Google runs so fast, it is a viable replacement for keeping your own favorite links around, in some cases it is even a viable alternative to DNS lookups. The operative concept here is Michael Polanyis tacit knowledge. You really don't want to spend time thinking about how you recover information. The latency in recovery completely kills the value of the information, since information is hardly ever the end but always the means towards an end, and if you have a high latence along some information recovery path you're just not going to use that path.
The end of BookWait settled an old score. Even though they are largely sponsored by the dark forces of Seattle, the DevelopMentor XML reference is just what I wanted. Paperback sized, comprehensive, up to date (not the usual book collection DTD's they need to retire from all the books on XML) and it even covers Java (by court order, perhaps?). It manages to make it from tags to soap (sadly no WSDL) in 350 pages. No mean feat when there's even plenty of examples and room to cover SAX and DOM as well.
Classy.dk - the finest unread news source I write - is of course not alien to the requirements of the media. They are unavoidable. Scans of my referrer logs reveal the strangest porn searches - searches for which I crop up as the umpteenth possible link - are still generating hits. And of course the default style sheet of moveable type looks positively ugly when the text doesn flow ad infinitum down the screen. That must be the blog version of dead air.
I think I've posted this before - but let me just briefly remind you that the sculptor Hein Heinsen has developed a beautiful response to the constantly escaliting attention grabbing nature of the culture of ideas, namely to produce only very few of them. In fact he tries entirely to avoid ideas - making only hard to define constantly shifting sculptures meant to be simple being as opposed to being about. A beautiful sentiment - even if it is a complete failure inasmuch as he is doing this in a museum as a proxy for actual being outside the museum, and in that respect merely being about being instead of being being (all with me sofar?)
As a darwinistic strategy for the attention or idea economy (same thing) that clearly sucks - which of course is why we don't see it much in the media. The people getting your attention are most definitely trying to do so.
This observation about the media is also the first warning against believing in the 1-1 society promised to us by the cluetrain people. Attention and networks constructed from it just don't work that way.
An indication of how skewed the world of attention is can be found in the beforementioned referrer logs. Basically the demand curve for information is so incredibly skewed that I get as many hits from being at the absolute periphery of some of the big ideas (like porn) as I get from being at the center of whatever classy.dk is really about.
In fact it might be worthwhile to write down the math of search according to some decent model of site popularity distribution and search term occurence statistics. If the demand curve for information was flat I would all my hits from searches where my site was basically the single most important match (as evidenced by the surrender monkeys incident). If the curve was skewed excessively compared to 'supply curve' of page indexes, then mismatches from failed searches should outrank the relevant matches.
The end result is probably a slight skew towards relevancy, indicating that the web is more polarized than the demand statistic.
This seat of the pants math may be all wrong (it's late) but it should be good fun to examine this in more detail.
If you were wondering about the legitimacy of the news-coverage of the war in Iraq you need look no further than the banner ads for CNN's war tracker that I found here. A convenient little desktop application giving you access to the latest and greatest in blood and mayhem - and served to you with the light hearted slogan "Click! It's quick!". I'm sure that slogan was also in the hopeful minds of President Bush and his staff when they pushed the button and decided to go to war, but really... Even trashy news sources like CNN should be able to do better than 'a convenient desktop tracker' in covering this very real and deadly crisis.
Amazon.co.uk ordering has gotten a lot worse lately. Clearly somebody in the logistics supply chain has decided to stretch their contractual obligations to the limit - by delivering as slowly as at all possible. It used to be the case, that if you ordered books that could be dispatched the same day or the day after, you could have books 3 days after ordering if you were lucky. That number has gone up to 5 - and sometimes you wait a full week. That sucks!Part of it seems to be amazon buying less expensive postal service. The published estimates are of the 5-7 day variety. Another part of it is crappy local postal service (I live in the second worst postal district in Denmark wrt. delivery quality) but the end result is that Amazon is simple not as appealing a place to by books as it used to be. 3 days is a breeze but avoiding a week-long wait is almost worth the higher prices in danish bookstores.
UPDATE : We're now on day 8 of BookWait - this clearly sucks!
UPDATE 20030321 : BookWait day 9...finally brings some books!
A study has been made on the influence of spell checkers on writing quality. The result - a little unsurprising I think - is that using spelling and grammar checking can actually impede performance.
The way this probably works is that you change your work mode from a 'creative' mode to a rules-based fact checking mode. This has two problems: First of all, the spell checking software is far from perfect. Generally speaking the checker will not catch all errors, so the rules you're checking against to see if you're done are incomplete.
Second, you rarely want to be in a rules based mode during writing. Working rules based (with present day technology at least) invariably means that you're working from closed world, severely bounded models of the problem - i.e. the proverbial hammer that nails your english to the floor. A new book that's coming out just now, makes the same point as the experiment only about child creativity and interactive computer games as opposed to old-fashioned creative self-made games.
Writing is more of a search problem in that respect . You're scanning your memory and the situation for appropriate phrases to apply when continuing the text. While I am an AI optimist, computers are unlikely to sensibly support that process in any known near term future.
p.s. I know that classy.dk is a living counterexample to the experiment discussed. Browsers are possibly the worst text entry interface possible.
Den falske brug af 'italiensk' fik mig så i?vrigt til at spekulere på hvorfor netop f?devarer, er pr?get af at der systematisk lyves om dem i produktomtaler.
Alle produkter s?lges - og alle produkter s?lges også på tilf?re (eller bare opfinde) nogen v?rdier der ikke n?dvendigvis knytter sig til produktet. Det er bare meget forskelligt hvordan det g?res: Med teknikvarer (biler, computere, mobiltelefoner) er tricket som regel at s?lge meget hårdt på enkelte tekniske specs der kan bringes til at virke favorable, eller bare at s?lge livsstil. Modevarer, og kulturvarer (b?ger, film, etc.) er de kulturvarer de er og dermed livsstil.
Men med noget så basalt som f?devarer opstår der pludslige de mest forvr?vlede ideer om hvad varerne kan. Ved n?rmere eftertanke, så er det i virkeligheden noget der sker for alle de varer vi har i vores helt umiddelbare hverdag. Vaskepulverreklamer er notoriske. Og derfor vil jeg skynde mig at formulere Classys lov om invers overdrivelse i reklame:
Jo mere enkelt og velkendt og almindelig en vare er, jo mere fup og vr?vl vil reklamen om den v?re baseret på. Jo f?rre sp?ndende overraskelser et indk?b vil kunne give dig, jo flere vil foromtalen garantere.
Det er som om der er et eller andet krav om at informationsindholdet i vores dagligvarer skal v?re konstant, så når der ikke er andet at sige om det i virkeligheden end at det er vaskemiddel eller m?lk, så må man jo finde på. Og er en ting givet, så er det at din verden ikke forandrer sig ved et skft af tandpasta, hårshampoo eller vaskemiddel.
I dag k?bte jeg hos bageren et "italiensk grov ciabatta", et stykke magel?st produktvr?vl, der var pr?cis ligeså kvalitetsforladt som det gode gamle franskbr?d med birkes er hos en almindelig bager: En krumme som vat og en skorpe som krummer. Med et forvr?vlet navn skal man selvf?lgelig ikke vente sig mere, og det gjorde jeg heller ikke, men det mindede mig om det besynderlige i at 'italiensk' idag er gledet ind i sproget ved siden af ?kologisk som et fuldst?ndig meningst?mt plus-ord man kan s?tte foran hvadsomhelst - også dårligt br?d fra Vesterbrogade.
Det virker ekstraordin?rt besynderligt, for meningen med ordet skulle jo v?re umisforståelig og i?vrigt meget specifik.
Man kan til n?d forstå at '?kologisk' udvandes til at omfatte vilkårlige industriprodukter hvor lige pr?cis det element af produktionen der omfatter dyr eller planter i 'Landbruget' behandles på en s?rlig magisk vis som måske har milj?effekt - selvom den gode smag, som er det eneste den ?kologiske behandling beviseligt kan give dine guler?dder eller oksesteg er fortyndet nok til at v?re um?rkelig når produktet til sidst er "?kologisk k?rgården" eller "?koligisk br?d fra schulstad".
At kalde noget hjemmelavet - nåja, det er da i det mindste en veletableret l?gn om industriprodukter. Og at en camembert kan v?re fra H?ng, jamen det er i det mindste en type betegnelse.
Men italiensk - det betyder ikke andet end at br?det efter producentens mening er i kontakt med selve madguderne. Altså ingenting som helst.
I?vrigt er også de faktiske italienske typebetegnelser, navnlig på br?d, voldsomt truet. Hos Rådhuskonditoret i Ribe (en gang en god bager) hvor jeg handlede for to dage siden kunne man få ciabattaboller der mindede ubehageligt meget om et gammeldags dansk sigtebr?d, og som sagt var det elendige b?rd jeg k?bte idag også af denne groft misbrugte typebetegnelse.
The latest wired - a 10th anniversary issue seems to be flooding with old-style futurism. They're even bringing out that chestnut: Underground cities.
Having lived near Toronto I can testify to the fact (mentioned in the article) that this is already a reality there, which makes a lot of sense when it is unbearably cold on the upside.
There's only one little problem: Sunlight is still a much, much better source of light than any artificial source (in fact, here in Denmark it is illegal for a company to put their employees in offices without access to natural light) so either we have to dispense with that particular quality of life or somebody has to develop a good full-spectrum light source or these cities underground must be built with some kind of sunlight distribution system. Think parabolic ray collectors and fiber optic conduits - or maybe a collector up top and then a shaft with almost transparent mirrors reflecting low percentages of the intense light collected at every sub surface living layer it passes.
Another interesting thing about this is that real estate will become a three-dimensional, not a two-dimensional property. As far as I understand in Tokyo this is already the case. Due to the inflated prices of the eighties, legislation was passed limiting the right of property to some level underground, so that subways and other utilities could still be built during the boom.
The other important bit of futurism in todays Wired News is that the old dream of radio active physical objects. I'm not talking plutonium enriched milk, but rather products equipped with radio transmitters for identification. Ever since people first thought of the intelligent refrigerator (able to say "I'm out of cheese" to the supermarket service on its own) this has been a dream of many including Yours Truly - and of course an object of ridicule for even more people.
The first large scale application in the everyday world (i.e. outside industrial applications) is set to be Benetton clothing tags with RFID chips.
What's not so good about that is that the proximity of the clothes to your body means that they are effectively equipping you with a Radio Frequence ID chip - and that means that your Benetton store could soon be that annoying hyper personalized world that John Anderton tries to escape from in Minority Report.
However, until the IT-Department at Benetton screws up and publishes their customer database it is only Benetton who will be able to do that though, since the tag only identifies the garment. Indeed if well done it could simply provide opaque serial numbers and rely on alternate services to identify the specific garments a serial number refers to.
I think I'll coin a phrase to describe reactions to that experience : Identification Allergy.
This could soon be a very real and unpleasent experience (as failures in CRM systems and direct marketing has already made us intensely aware of)
But of course there are other problems to ponder first: What do washing instructions for silicon look like?
Wired news is awash with old style futurism (some of it near term) today. First off is an article from Wired 11.04: How Hydrogen Can Save America. This is an outline of how fuel cells could obsolete fossil fuels (think: Mass poverty on the Arabian Peninsula. No war with Iraq). There are some terrible costs of deployment of fuel-cell energy to replace the current fossil-energy infrastructure, but the prices are coming down, and the technology is (as promised so many times) almost there.
The article hasn't got much news. In fact I think the most significant news here is that the mere possibility of this technology has made ultra-green politicians in the worlds most technologically advanced state (that's California) force the major automakers to actually invest in this technology if they want a slice of this very rich market.
At long last there's a new perl Apocalypse out. What a wait. But what an Apocalypse! It is long (64 page printout) and dense as you would expect, but it outlines some of the most needed features for programming perl in the large, which is exactly what perl needs, since programming in the large is definitely the thing that is most difficult in perl. There's a solid type system (optional of course), function prototypes (optional of course) and a general consolidation of the model of ALL structure which at the same times cleans up the many ways in which control can flow and then promptly fills up the conceptual space made available by the cleanup with yet more arcana.
I stand by earlier statements that the redesign is doing more harm than good by the enormous addition of features, but clearly there are vast improvements also.
The syntax in this apocalypse is presented using spiffy new Apocalypse 5 regexes, and they are beautiful and very readable. The new type system and prototypes look very promising indeed.
Also, a programming style replacing line noise with method calls on builtin abstract data types and a few universal operators is emerging in the examples, which is very welcome. I find the many discussions on super powerful operators sad and completely beside the point when the point isn't writing one page programs doing amazing stuff.
In short, there is hope still for a simple to use but extremely powerful new perl. The pace of development is higly unpredictable however. This Apocalypse has been 9 months in the making. From mailing lists I gather there's been plenty of external problems delaying it, and the subject of course promises to be the second last whopping big one (the last one being objects). With a little luck we're halfway there, so we'll have perl6 around 2005.
There's a separate problem with the runtime though. It seems to be bogged down in some pretty arcane discussions without solid use-case discussions, but I'm just a bystander in the process, so I might just be missing some solid thinking.
The story about Reggae on Ice - the Siberian, motor-racing adventures of Jamaican reggae-star Lenky Roy reads like a William Gibson novel. In fact is sounds very much like a side plot removed from Gibsons latest novel, Pattern Recognition, and the whole reggae thing is of course to be found in Neuromancer.
How odd that Lenky Roy's marketing people should try to emulate Gibson's sense of story, environment and characters, at the very same time that Gibson is trying to transcend the boundaries of fiction by emulating reality with greater and greater precision.
Oh goody, the tried and true, 'fast but inflexible' idea of using just serialized objects instead of relational datastores has a new name:Object Prevalence. To be fair, there's a consistent API to define the way the serialization is used, but it IS just serialization.
The tradeoffs are the usual ones: No cross object queries. No real transaction support (well, you can use a transactional datastore that is non-relational (i.e. what people do using Berkeley DB as an object store from perl via one of the many 'persist to hash' modules on CPAN) and the two tier nature of this way of hacking storage means that it is good and fast if you don't need to scale it or query it in as yet unthought of ways.
What WOULD be nice would be an upgrade path to relational data and replicated objects and transactions - but wait, that's what EJB's are supposed to do, isn't it?
The position of the danish association of nespapers suing Newsbooster is ridiculous for so many reasons. I just realised a new one. According to "Newsbooster - the case seen by an eye witness" the Newsbooster people are even nice enough to respect the Robots Exclusion Protocol. That is, if the newspapers had bothered to ask - and implement a robots.txt file - they would not get picked up by Newsbooster. IT is extremely sad that the courts failed to take this into account.
That means that the case is not only about fair use, but also about whether or not the newspapers have an obligation to make a clear statement of what use they em>consider fair.
The robots exclusion standard, while developed as a purely technical feature to limit the resource use by robots, really covers some of the same ground as e.g. Creative Commons licenses.
Wouldn't everyone then start to publish very restrictive licenses? Maybe - but the commons would have the ability to fight back by using the published, machineable license (e.g. dropping content from search engines), so the restrcitive license would tend to get in the way of the purpose of publishing in the first place.
It's ironic that the Google search page for spam is one of the most junk-filled search pages I've seen on Google. Who's messing it up: Spam busters of course....
February was another blowout month on classy.dk with 40% traffic growth. This time the growth was caused by one very specific reference to a a joke from "The Simpsons" about the french. The phrase in question was hard to find on the net and therefore my lowly weblog turned up as number 4 on Google for this high profile search term. That single weblog entry accounted for almost 20% of all hits on classy.dk.
What is interesting about this is first of all Google's efficiency even for low change low rank sites like mine. 4 days after I posted the entry the traffic doubled because of this log entry.
It would appear that a viable Google marketing strategy is very specific multi-word text which uses infrequent words. I.e. if you can find a phrase, some meme that is flying around currently, then you can harness that as a cheap traffic boosting scam by publishing the exact phrase. It is more difficult to get a good ranking for very general searches or single word importance - but for very specific searches you can get lucky. This is truly targeted advertising.
It is not surprising that it works of course and taking advantage of the effect doesn't even have to be malicious, in fact it can be hard to be malicious about it. You have to define a very specific audience that you're targeting and if you're being specific enough people might not even mind, nor consider what you're doing advertising. On the other hand, one can easily envision a new kind of link farm generating entries for search terms crawled automatically from Google Zeitgeist, like e.g. Norah Jones 'Come Away With Me'.
It would be interesting to study how possible this is considering the distribution of PageRank on the internet. The simpler your search term, the more popular it will be. But the simple term will also mean that the subset of URLs meeting the search criteria grows, and this of course means that the PageRank of the most popular URL in the subset is higher.
Incidentally, from the case of Norah Jones - one of the top 5 searches for the week ending Feb 25 - one can see that Google is ordering by, but not displaying on the google toolbar (which I use), alternatives to the pure PageRank of the URLs displayed. As an example the Amazon.com page for Norah Jones' album is listed in the top ten. Obviously a good choice - but of course that particular Amazon search URL for that particular album doesn't have a significant PageRank (it is 2).
I'm not sure if a simple page equivalence based on similarity of content fixes that or if it takes something more sophisticated than that to correct the Rank for this effect, but it might just be really simple to do: If you do a site specific search on Norah Jones you get approximately 23000 amazon pages. The top twenty on that list are different URL's showing essentially the same data - either a Norah Jones search page or the product page for her debut album.
I first have to say very clearly that I was not looking for this title, but you too should check out Amazon's page for David Hasselhoff Best-of CD. A fan cult has sprung up in praise of this masterpiece of trash. The album has over 500 reviews at amazon and they universally give it a 5-star rating (Did I miss the email campaign to vote?). And to top it off the most ingenious of the reviewers even added hilarious recommendations for books to get instead of La Hasselhoff:
German for Singers: A Textbook of Diction and Phonetics , and
Sodomy and the Pirate Tradition: English Sea Rovers in the Seventeenth-Century Caribbean
Who said amazon book reviews are unusable...
The compaign is ongoing - The CD is set to break 600 reviews (since the first edition of this post approx 60 reviews have been added)