Notes from Classy's Kitchen: Text, Reading, Intepretation Archives

July 18, 2011

Information is different

It's a chestnut of interface critique: Embodiment is good, the concrete beats the abstract, nobody reads online. It drives interfaces towards the tangible, and I'll be the first to agree that good physical design (and design that *feels* physical) is pleasurable and restful on the mind.
None of these facts are, however, easy to reconcile with the fact that every day 15% of the queries seen by Google are queries Google has never seen before. Put differently, the information space Google presents to the world grows by 15% every day. Imagine a startup experiencing this kind of uptake. You'd consider yourself very lucky - even if a lot of those 15% will be spelling mistakes etc.
The 15% number sounds staggering, but it's when you compound it a little it becomes truly mindblowing - and in fact hard to believe entirely - 15% daily discovery means that in a month, the entire current history of Google searches fades to about 1% of all queries seen. Obviously this isn't a description of typical use, but it is a description of use, none the less. This is complete rubbish and I'm emberrased to have written it, read on below

Now, try to imagine building a physical interface where all uses it has been put to, since the beginning of time, fade to 1% in a month. That's very hard to do. The thing is, that thinking is different, language is different, information is different. The concrete approach breaks down when confronted with the full power of language.

This is also why we'll always have command lines.

COMPLETE RUBBISH ALERT

So, above I make a really embarrasing probability calculus 101 error, when I tried to compound the "every day we see 15%" new queries statistic. This isn't a toin coss, but something completely else. Chances are that "every day we see 15% new queries" compounds on a monthly basis to .... 15% new queries. To see why, I'm going to make a contrived draw of numbers that match the "every day we see 15% new queries" statistic.

Let's suppose we wanted to produce a string of numbers, 100 every day, so that we can say that "every day we see 15 numbers we haven't seen before". The easiest way to do that is to just start counting from 1, so the first day we see the numbers 1..100. Of course on the first day we can't match the statistic, since we haven't seen any numbers before.
On the second day however we draw 85 times from the numbers we have already seen - we just run the numbers 1..85 - and for the remaining 15 we continue counting where we left off on day 1, so on day 2 we would have the numbers 1..85,101..115. On day 3 we run 1..85,116..130 and so on.
This way, it's still true that "every day we see 15 numbers we haven't seen before" but at the end of the first month (30 days) you will have seen in total the numbers 100+29*15 = 535 numbers.
In month 2 (let's say that's 30 days also) we change things a little. Instead of just running through 1..85 we continue upwards until we have cycled through all the numbers we saw in month 1. There were 535 of those, so that'll only take 7 days. You'll see 30*15 = 450 new numbers and 535 old ones when doing this or 46% numbers you've never seen before of all the numbers you see in month 2.
In month 3 (still 30 days) we do the same thing as we did in month 2, but this time there are 535+450 old ones, so the 450 new ones only amount to 31% of all the numbers we see in month 3.
We continue like this. The most already seen numbers we have time to run through doing 85 a day for 30 days is 30*85, and we'll still have 30*15 new ones, so lo and behold, when we continue this process we end up seeing 15*30/(15*30+85*30)=15*30/(15+85)*30=15/100=15% numbers we have never seen before.

Posted by Claus at 10:21 AM

April 21, 2011

E-bøger, oplevelse og erindring

Har interessant diskussion (FB-link) med Christian Dalager (og også Kenneth Auchenberg) om bogoplevelse vs e-bogsoplevelse baseret på nyligt køb af Kindle og oplevelserne med den.
Den er entydigt det mest bog-agtige ikkepapirmedium jeg har brugt, og giver totalt læselyst. Man bliver ikke træt af at læse på den, og man kan se nok af gangen til at læsningen synes forankret. Jeg har hidtil mest læst Kindle-bøger på min Androidtelefon, og det funker i og for sig fint, men forankringen i teksten kommer til at mangle, man svæver bare lidt gennem ordene, uden helt at få afsat sig selv i materien, eller omvendt, sådan at ens erindring af det læste simpelthen ikke helt får samme karakter som papirlæsning - uanset om man så læste en pdf man lige har printet eller hvad det nu måtte være.
Det er det diskussionen handler om. Christian siger

[...]der er stadig et eller andet med at man mister den taktile/visuelle kobling mellem værk og objekt. F.eks. hvis man kædelæser.[...]der sker også nogle gange et mnemoteknisk sammenbrud for mig, hvor min hukommelse ikke har et "anker", hvis det giver mening?

Og det giver fuldstændig mening. Bøger er forskellige. Jeg kan ikke være den eneste der har kasseret en bestemt fysisk udgave af et værk, for at skifte til en anden med bedre typografi, papir, sidestørrelse, marginer og derfor større læseoplevelse. Den form for sanselig erindring af skriften får man mindre af med en ebogslæser, og hvis det anekdotiske vidnesbyrd fra mig og Dalager kan tages for gode varer, går det udover det recall og den glæde læsningen efterlader hos en bagefter.
Fra beskrivelsen ovenfor forstår man, at jeg bilder mig ind at det er objektive begreber jeg har fat i her, at det virkelig er forskelle i medierne, der gør en forskel, men det anekdotiske studium har indtil videre den svaghed at være baseret på et sample af 2, nogenlunde jævnaldrende mænd, der kendte hinanden i forvejen og langt hen ad vejen beskæftiger sig med de samme ting.
Man må formode at bogens værdi på langt sigt bliver dramatisk påvirket af om der er noget om snakken. Hvis læsning bliver flow istedet for lagertagning (Sådan ca, jeg ved godt at alt minder mig om den sondring i år), som det ellers har virket som for både mig og Dalager, så ryger en del af betalingsviljen selvfølgelig også.
Man kunne også forestille sig at forlag (og Amazon) kommer frem til at overflade stadig gælder og satser lidt mere på e-bogens mikrostruktur, sådan at vi kan få lidt flere ujævnheder i overfladen at huske på.

[In other stock & flow news: FBs (nye?) mulighed for at hjemtage hele samtalelagret i en zip-fil er a thing of beauty. Mange års samtalehukommelse regained]

Posted by Claus at 12:25 PM

April 15, 2011

Sufficiently indistinguishable from magic

A few days ago my brother asked me if I know of a good way to record what was going on on the screen of his Windows laptop. The following conversation then occured

Me: There are a number of good options. The good word to search for is 'screencast'

Brother: Alright, I'll search then.

Me: I think it's easier that way.

Brother: Agree. But the search word is important.

Me: Yup.

Arthur C. Clarke has this famous quote that "any sufficiently advanced technology is indistinguishable from magic", and here it is, realized almost to a T: How do you operate Google? Through magical incantation. You can find anything - if you know the magic spell. In this case, channeling Harry Potter, findus screencastius!

Posted by Claus at 11:28 AM

July 14, 2010

Og ordet blev kød

OK, jeg indrømmer gerne at jeg er et sentimentalt menneske, men jeg får sgu da tårer i øjnene når jeg ser smuk teknologi som det her navigations-armbånd fra 1920ernes England. Det er et snedigt lille apparat med udskiftelige ruller, der viser vej når man ligger og kører sin lastbil ude på ruten.

Og hvad er det så der er så smukt ved det? Jow, altså. Det fine ved armbåndet er at det er lavet lige præcis som et ur. Blogposten har godt fat i det: En GPS fra 1920erne. Anvisningerne kunne forsåvidt have stået i en bog, men det var ikke den effekt producenten var ude efter. Han ville lave et apparat, en slags kompas, bortset fra at man her navigerer efter de symboler vi har hældt ud over den grå graf af asfalt vi har hældt ud over landskabet. Vejene, og kortet over dem og systemet med vejnumre, har lavet den store indviklede 3-dimensionelle verden om til sprog. Fordi verden er blevet til sprog, pga den menneskelige snilde, så kunne man pludselig i 20erne lave et navigationsapparat, næsten en slags sensor, helt og aldeles ud af ord og tal.

Det synes jeg er stærke sager.

Både at sproget kan det, og armbåndsproducentens implicitte accept af at det er sådan. Naturligvis kan man lave et armbåndskompas ud af ord. Naturligvis er sproget en maskine, der faktisk virker.

Nu er kort og GPSer jo i sig selv sproglige forenklende modeller af verden, men den får lige en ekstra drejning af skruen her. Jeg husker faktisk tydeligt en iøvrigt rædselsfuld busrejse til London, fra før flyrejser blev rigtig billige, hvor nogle gæve jyske chauffører skulle finde vej gennem Tyskland, Holland og England. Det fremgik af de ikke rigtig havde gjort turen før, og hvad de havde med som guide var lige præcis en oversigt i stil med 1920er apparatet, bare lavet i hånden: En bunke sammenklipsede A4-ark med vejnumre, orienterende blik på hvilke sideveje man kan forvente at se, og så rutevalg undervejs. På intet tidspunkt brugte de kort. Et enkelt større vejarbejde eller trafikuheld med omdirigering og vi havde været lost.

Siden jeg boede på Regensen og havde lejlighed til at nærstudere mekanikken i gårdens ur i detaljer - en helt klassisk tårnursmekanik som de har set ud siden Huygens opfandt penduluret, men med en smuk og meget overskuelig mekanik, har jeg været fascineret af spillet mellem uret, mekanikken og tiden der går.

Fra den rituelle venten nytårsaften kender vi alle fornemmelsen af at uret ikke bare viser tiden, men er tiden. Vi kigger på resten af verden og hvordan den bevæger sig i forhold til uret, for at forstå tingene. Det er uret, der er grundpulsen. Og så er det alligevel sådan, hvad man kan forvisse sig om ved at kigge på en gammeldags pendulursmekanik, at et ur bare er et lod, der falder til jorden, på en indviklet måde.
Et lod, der falder til jorden. Det er det, der driver solen henover himlen. Det er for mig en fantastisk tanke, og navigationsarmbåndet minder mig i den grad om den. Kortet bliver landskabet.

Posted by Claus at 09:59 AM

May 17, 2010

Armadillo, armslængde og Frank Jensen

Jeg glæder mig til at se dokumentaren om krigen i Afghanistan. Det er helt sikkert voldsomt og spændende, og grusomt og livsændrende både for de der bliver ramt og de der skyder.
Når jeg ser forskræppet for dokuen er det dog noget andet, der sidder fast, nemlig en dansk soldats ord i en avis for et stykke tid siden om hvor skuffende journalisters besøg i lejren altid var. Det eneste de vil er med ud at slås. Der skal action på drengen, for det er der de gode historier er. Der er ingen gode historier i at kede sig i lejren, eller bevogtningsopgaver i byer, hvor børn går i skole eller lignende mere fredsommelig indsats.
Carsten Jensen har sunget sig helt i sky på en "krig er grusomt"-rus i anledning af filmen og snakker meget om hvor "udansk" det er sådan at være med til noget grusomt, men man kan ikke blive fri for mistanken at der fanme skulle findes noget grusomt, for det er bare "stærkere" at være på den anden side af noget virkelig grusomt, end at være på den anden side af ... børns skolegang.

Jeg er glad for at jeg ikke er i Afghanistan og skal træffe nogle af de stenhårde valg, der skal træffes af de danske soldater; jeg tror ikke jeg ville være særlig god til det, simpelthen, men det er så - for mig - også det. Jeg stiller mig lidt tvivlende overfor den egentlige fordybelse i det kritiske blik, navnlig som det er skaffet gennem stærkt fortalt neodokumentar. Krigen set fra tusindevis af kilometers afstand kommer ikke af den grund til at handle om en grøft i Afghanistan på nogen ærlig måde. Her må det handle om Taliban, frihed, og danske soldaters liv og lemmer. Man er ikke mere involveret i krigen fordi man sidder i sin lænestol og bliver mere og mere anfægtet.

Jeg har en mistanke, måske endda en forhåbning, om at "for" i "fortalt" om nogle år vil blive aflæst på samme måde som det bliver i forædt, forvitret, forødet, som destruktionen af det man snakker om gennem en eller anden handling. Man fortæller verden helt i stykker. Frank Jensens kommuneadministration har efterhånden fået ansat over 100 fortællere af kommunens mission, et fuldstændig absurd antal, med mindre de bliver brugt til noget andet end titlen "kommunikationsmedarbejder". Hvis nu de lavede intern kommunikation, og på den måde omgjorde kommunen til en slagkraftig netværksorganisation, hvor de små enheder har ansvar og frihed til at gøre en forskel, navnlig hvis de kan gøre det nu og her for små midler uden at skulle lægge femårsplaner og rapportere på lovkrav, så kunne det være det gjorde en faktisk forskel for nogen - men man tvivler jo på at det er den slags kommunikation. Det er vist mere noget med at forklare hvorfor der ikke sker en skid. Eller hvorfor det ingenting, der sker, faktisk er en hel masse!.
Det er den samme underlige armslængdeprincip til virkeligheden som deltagelse per anfægtelse i Afghanistanmissionerne. Det ændrer ikke noget som helst. Indsigt og forståelse er afgørende, men det skal jo helst være den slags, der får os op af lænestolen, ikke bare en stærkere og stærkere fornemmelse af at lænestolen er et fantastisk sted at være vidne til det hele på.

Hvor ville jeg dog ønske det var det, der var pointen for den ene og den anden Jensen.

Posted by Claus at 08:51 AM

October 06, 2009

Orwell on writing (and politics)

The word Fascism has now no meaning except in so far as it signifies "something not desirable." The words democracy, socialism, freedom, patriotic, realistic, justice have each of them several different meanings which cannot be reconciled with one another. In the case of a word like democracy, not only is there no agreed definition, but the attempt to make one is resisted from all sides. It is almost universally felt that when we call a country democratic we are praising it: consequently the defenders of every kind of regime claim that it is a democracy, and fear that they might have to stop using that word if it were tied down to any one meaning. Words of this kind are often used in a consciously dishonest way. That is, the person who uses them has his own private definition, but allows his hearer to think he means something quite different. Statements like Marshal Petain was a true patriot, The Soviet press is the freest in the world, The Catholic Church is opposed to persecution, are almost always made with intent to deceive. Other words used in variable meanings, in most cases more or less dishonestly, are: class, totalitarian, science, progressive, reactionary, bourgeois, equality.

From Politics and the English Language (pdf).

Posted by Claus at 02:18 PM

September 08, 2008

It smells like future in here

Esquire har udgivet verdens første papirmagasin med aktivt e-ink cover. Der er et magasintilpasset batteri med i pakken og en reklame i levende billeder for sponsoren Ford. Det lyder som et medie fra fremtiden.

Posted by Claus at 01:38 PM

May 28, 2008

Hvor sidder sandheden på en journalist?

Virkelighedsfornemmelsen er helt ude at svømme i et svar på tiltale fra Dagen doku producer Lars Seidelin. Han er ude af stand til at erkende en forskel på f.eks. EBs for tiden rullende pædofili-agentvirksomhed og hans egen opfundne TV-virkelighed.
Det er rigtigt at journalisterne i begge tilfælde har skabt nyhederne, men i EB tilfældet har journalisterne skabt nyheder der er faktiske nyheder. Det er ikke fiktion at en række mænd har bidt på krogen. I Dagen-doku tilfældet lyves der for seerne.
Man skal nok ikke arbejde med aktualitetsstof hvis man ikke ved at journalistikkens sandhedsbud omhandler relationen mellem journalist og seer, og ikke nogen af de andre relationer journalisten har med hvem det nu er. Tværtimod kunne man sige. Hvis journalistikken f.eks. formes af et loyalitetsforhold til en bestemt politiker, så laver man jo en slags undladelelsesløgn over for læser/seeren.

Posted by Claus at 10:38 PM

January 21, 2008

Word-mosaic

Images representing words, averaged and arranged by semantic connections at 80 Million Tiny Images.

Posted by Claus at 02:13 PM

April 20, 2006

New language: I'm googled out

"I'm googled out" is 2006-ese for "I'm out of ideas".

Posted by Claus at 07:17 PM

June 18, 2005

Weinberger's taxonomy blockbuster

David Weinberger goes looking for a "Da Vinci Code"-esque opening chapter for his book on the tree and leaves of knowledge and finds the beautiful and story full Linnean Society. If Weinberger buries those beautiful images later in the book, his agent is not going to be happy...

(yes, this is the same project that his reboot talk was about)

Posted by Claus at 02:08 PM

May 26, 2005

Fun with words in eastern Europe

Inspired by a recent trip to Eastern Europe I would like to point out the nice, but sadly expired, pun in that the capital of the former communist dictatorship of Romania is called "Book/Arrest".
And a pun near miss: Why on earth didn't the Hungarians call they airline MagLev instead of just Malev?

Posted by Claus at 02:47 PM

February 06, 2005

...loved combing his hair...

NaNoWriMo is a write-a-novel-in-a-month movement. A gigantic free writing exercise. Don't miss the East Bay Express feature article by the way.

Erik Benson, of All Consuming Fame (sounds rather grand doesn't it?), took part in 2002 and 2003.
He then went on to self-publish his first attempt, and at the book website he had about 9 months of fun running all kinds of interesting stats on the book text as part of the process.
Among the fun had, a random sentence generator, based on the book text. The rules for generating the sentences are simple:

You supply the starting word
Each word is followed by a word that also appears after it somewhere in the original text
The more times a word pair appears in the original text, the more likely it is used to be used in the sentence generator

Yes, mathematical readers, that's a discrete time Markov Chain. I tried seeding the generator with 'Remember' and got:

Remember not only did we discover that a handsome man loved dressing, loved combing his hair, loved preparing the angles and planes of his life, falling apart, trying to do so if necessary.

That's so good its hard to believe Benson isn't cheating.

All of these fun text tools deserve to be made generally available, and I think I'll add implementations here at classy.dk. Watch this space. If you know of a good existing resource (i.e. a reason not to build these tools) post a comment.

Posted by Claus at 04:11 PM

February 04, 2005

Weapons of Mass *duction

Here's a short comparative study of various lame WOMD jokes:

Weapons of Mass Seduction (8810 hits) - First place because some of the uses are about the notion of WOMD's as a means of political seduction, rather than romantic seduction. But even that use gets some play
Weapons of Mass Production (2050 hits) - about work or performance
Weapons of Mass Reduction (149 hits) - about dieting or physical exercise

Posted by Claus at 02:11 PM

January 10, 2005

Hacking french

Great (if very long) K5 post on learning French. Through a self defined cultural immersion course (involving among other things Harry Potter, Schaum's outlines, flip-cards and Russian winter nights) Konstantin Ryabitsev taught himself French. The self made style of learning exudes pure hacker ethos to me, both good and bad. The level of involvement you get from that kind of self learning is much deeper than any pre-rolled educational program can give you - a good thing. The unwillingness to just embrace what has worked for millions before is not always a good thing.

Posted by Claus at 01:24 AM

January 04, 2005

Loosely coupled metadata

Here's an interesting essay on loose sharing of metadata on the web. It describes communities that just happen to obey some metadata standard for no good reason, which turns out to be independently useful after a while. If and when the semantic web comes about on a more cohesive form, it will be as accumulation and condensation of this kind of metadata vapour.
This further supports what I wrote about metadata a year and a half ago.

Posted by Claus at 01:28 AM

November 29, 2003

More language logging

There's Language Log and there's Lambda the Ultimate for natural and artificial languages respectively...

Posted by Claus at 06:10 PM

November 13, 2003

Data/Ink Ratio

I was unfamiliar with the notion of the Data/Ink ratio of a piece of infographics. It's exactly what is says: How much information is presented compared to how much ink you put on the page to present it. When the d/i is low the page is overdecorated and you're not really communicating the data as much as presenting the ink.
I read about it on ongoing. Which even has a very clear illustration of the kind of thinking that accompanies the concept.
Must find the time to buy the book of "The Jakob Nielsen of quantitative visuals".

Meanwhile [shifts into danish] hvis du vil vide noget mere om hvordan det er med det der form og det der indhold så kan jeg varmt anbefale Holgers fremragende artikel om brugsgrafik fra UCMag #1 (pdf 3MB). Det er historien om Betydning, Organisation og Skønhed.

Posted by Claus at 10:27 PM

October 26, 2003

Language drift and cultural volatility

It's a shame I don't keep a tagline archive here on classy.dk But I don't. I was unhappy with the self important 'Claus and Effect' joke, and decided that a better (and now self deprecating) tagline for this particular pollution of url space was the new one "The tragedy of the commons". The idea of the reference is both to point to an important problem that is on my mind (in a strange reverted way, since the main threat to the commons these days is the false claims of the Intellectual Property Mafia) and to say that a world where everyone gets to speak all the time has its own problems in drowning out any central focus of discussion. The benefits outweigh the disadvantages, but the problems are real.
The inspiration for adopting the phrase was a notice of a search engine referral for my site for the term 'Crackulating', that I learned of from a youth ministry teen lingo resource. There is however also the much richer urbandictionary.com. Much richer in that they also found room for my new tagline The tragedy of the commons. Which hardly qualifies as particularly urban. The dictionary is user edited which explains why the reach is broader. The user editing also helps as an illustration on the lameness of average knowledge of net users, and how this breeds urban legends like there was no tomorrow. Take for example the term All your base - a reference to the particularly bad english language texts in an old computer game. The correct explanation of the term is given - along side an impressive list of incorrect explanations. You get to vote on the quality of explanations, so the right one floats to the top, but the range of explanation gives one an idea of the enourmous drift there is in colloquial English, and of course the same thing applies to all other languages.
When it comes to definitions of amusing terms, that is less of a problem, but one tends to feel that this complete lack of historical sensibility goes deeper than langugae drift and cultural drift. I don't think I'm being "older, hence slower" or turning my beck on the future in any way, when I say that it is a weakness of online culture that nothing is ever fixed or corrected or allowed to stay true, or condensed and solidified and accepted as The Best Current Opinion on some matter.
It's as if the online culture never turns off to dream and reconsider. That makes for a volatile culture and the volatility and incessant storytelling going on is easily abused by those that actually have an agenda or idea they need to sell for some period of time. It's been said better by others, notably in a brilliant talk by Douglas Rushkoff at reboot.

Posted by Claus at 06:43 PM

October 09, 2003

My vehicle of the pneumatic shock absorber is full with Aalen

We all know the fun of feeding a phrase into Babelfish repeatedly, to see it slowly transform into nonsense. Now the process has been automated. I tried with the Month Python classic "My hovercraft is full of eels" and ended with the beautiful gibberish of the title.

Posted by Claus at 12:49 AM

September 20, 2003

Test before you claim

A new search engine will do keyword weighting as reported on Yahoo News. The idea is an old one: When you're searching for "free downloads", "free" is a qualifier for "downloads" thay you're not willing to live without. Therefore a search engine strategy is to cluster words, hierarchically, so that pages that match "free downloads" are favoured over pages that just list "downloads". It's a proxy for genuine understanding of language of course, since you really need to determine whether "free" or "downloads" is the key word of the search phrase.

Claims are being made about a new search engine capable of this, but the researchers making the claims haven't rehearsed what they want to say properly:

Clever ranking algorithms, such as Google PageRank, are becoming misused (spammed) by techniques like 'Google bombing,'" Schaale explained. Vox Populi can help remove this kind of spamming by identifying so-called "artificial" link clusters, he explained.

To spam Google, wily Web masters create "domain clusters" that consist of hundreds of homogeneous dummy sites optimized for keywords. The word "women" might appear in thousands of dummy sites that contain pornography, for instance. A person seeking information on "women's studies" might have to wade through page after page of spam before hitting a university women's studies department.

Let's test the claim that Google is broken, like so. There is absolutely no porn in sight. It's a good thing these people are researchers. If this was a commecial demonstration everybody would now be walking away from it laughing.

Posted by Claus at 04:47 PM

September 02, 2003

Are you looking for "1+1" or '1+1'?

Google has added yet another 'smart feature' to Google search, namely a calculator. If you search for 1+1 Google will now show you a page with the result 2. Very nifty, but personally I must say that I consider this feature to be a nail in Google's coffin, nota milestone on Google's path to supremacy. I realize there aren't maybe that many of us - but what if I was actually looking for a website that contained the phrase '1+1'? You read something and you remember that the expression was on the page, and you search for that, based on the notion that 1+1 is a pretty rare thing to write on a webpage, hence a good filter to apply. There's an escape hatch - a link that asks if I would rather have searched for the expression than the result, and that is good information economy (On average, very few additional clicks are added to searches), but when I'm looking for the expression I don't really care about the average, all I know is that I now have further to go.
Similarly with searches for phrases like 'amazon.com'. For a while that simply took me to amazon.com - now there's a complicated escape hatch with 5 options - dangerously close to the maximum number of options that one can comprehend easily.
All of these things just underline two basic problems:

The more blanket assumptions Google make about the context of search as it existed before I turn to Google the closer Google gets to the annoying ways of the Microsoft Office applications: "Since people aren't that smart on average - we're going to assume that you're an idiot". It's true that it works a lot of the time, but I would hate to have to do the same half hour configuration tango to turn off "smart" features for Google that I have to do for MS Word

Trying to be everything to everybody is IMO a direct violation the 2. and 3. Google Commandments (see the left bar of this page).

A couple of features more and there'll be an opening again for "just the search, please" companies. It's classic "Innovators Dilemma" stuff.

Posted by Claus at 12:24 PM

July 19, 2003

We are biomass for the processing of information

And some more Jakob Nielsen, this time on Information Foraging:

Information foraging's most famous concept is information scent: users estimate a given hunt's likely success from the spoor: assessing whether their path exhibits cues related to the desired outcome. Informavores will keep clicking as long as they sense (to mix metaphors) that they're "getting warmer" -- the scent must keep getting stronger and stronger, or people give up. Progress must seem rapid enough to be worth the predicted effort required to reach the destination.

Like it. Concept adopted.

Posted by Claus at 01:20 PM

July 15, 2003

This is not a blog - it is a bliki

Here I thought I was blogging, and then it turns out I have been secretly bliki'ing instead:

So I decided I wanted something that was a cross between a wiki and a blog - which Ward Cunningham immediately dubbed a bliki. Like a blog, it allows me to post short thoughts when I have them. Like a wiki it will build up a body of cross-linked pieces that I hope will still be interesting in a year's time.

That is exactly the goal of classy.dk. Sometimes posts are short things unrelated to anything. Sometimes they are 'part of a series' and fold nicely into a more coherent structure of text and meaning (I hope).

By that definition, Ongoing is certainly a bliki. Sam Ruby maintains a distinction between 'essays' and his weblog, but the structure of comments on his weblog is intricate enough that he is really authoring a completely different medium. I think we need ONE tool that will let you publish hypertext, thread comments, and blog in one package. But wait, isn't that just a CMS? Not necessarily - Thinking in a webservices vein, I should be able to use MoveableType for blogging, use another plug-in comment and trackback engine (because blog-comment engines aren't naturally threaded; a bad mistake) and then run an accompanying Wiki using the same comment engine.
How hard can that be? Lets first publish a trackbackable threaded comment engine. Once that's done

Posted by Claus at 12:59 AM

May 24, 2003

Gibson blog

The preceding link was suggested by avid (if recent) blogger William Gibson. He writes a lot on his blog and I think his commentary on 'real' writing versus bloggin is right on the money.
The purpose of blogging is to not be real writing. There are no compositional principles to employ. In fact I think many blog readers would quickly scan past any real writing occuring on their favourite blog.
That still makes for an interesting new kind of prose, but it is not essays or any other 'slow but careful' kind of writing.

I've found blogging to be excellent for flashes of ideas, and random linkage but poor for synthesizing. I've been working with ideas like that since approx 1988, writing most of the ideas I write down in no more than a page or two of longhand. While very few single entries stand the test of time, even the poor ones end up as important support material for the better ones. The publication of ideas - apart from the obvious vanity factor - adds just a little accountability, and hyperlinks is an obvious opportunity if one likes that kind of thinking.
IF one is to be serious about any of the ideas written down, there is no shortcut around writing it down properly in a longer coherent system.
That's why I hope to reorganize some of the longer running themes here on classy.dk around some more focused essays in the next month or so. These will be versioned slow texts, but on the other hand with a little luck they'll hold some interesting and useful thinking.

Posted by Claus at 04:04 AM

Get the hook up on Teen Lingo

Trust your local church to keep you posted on TEEN LINGO - a brilliant resource explaing old timers like '5 0' and 'booty' and containing brillant entries like

Skeeza (ski -zah)
An unattractive, yet promiscuous female. See "skank"

or lookups with colorful sample phrases like

Whoop
v. 1. To beat up. "You mad doggin me? I'll whoop you so bad your cousin will cry!" 2. To beat someone in a sport. "We whooped their team 126 to 57!"

I even learned some new words I'll definitely start to use

Whas' crackulatin'
(derived from "What's crackulating?") What is going on? How is it going? Good to see you. When greeting someone say "Whas' crackulatin'?"

Posted by Claus at 03:41 AM

February 11, 2003

Just Read : Pattern Recognition

William Gibsons new book looked to be really great from press coverage and the topic of the book was completely in tune with the times it seemed, So I hurried up and read it. It was less fantastic than I had hoped, although I did enjoy myself while reading it. There are plenty of nice touches in the cultural observations and language, but in the end the story is too trivial to be interesting and comes apart at the end, like in other Gibson novels.
And in fact even some of the languages seesm stale. The mentions of Google are so frequent you'd think he was getting paid by the word.
The notion of brand allergy is great fun though, and of course the invention of 'Naomi Klein Clothing' - that particular brand of clothing that is brand-less, completely captures the trap that Naomi Klen and cohorts simply can't get out of: It's not the brands that are making us idea-conscious. It's our idea consciousness that makes brands a possibility. Or, to paraphrase Frank Zappa playing a room full of a previous generation of left-wing anti-establishemt types: "Don't kid yourself - evryone in this room is wearing a uniform".
If you want a second opinion - read the wired review.

Seeing as steganography (the hiding of information invisibly within another information source) is a theme in the book, one can't help but notice the many similarities to neuromancer, Gibsons first book

Our protagonist is called Case (albeit spelled differently this time). The plot takes the form of a global search for information involving strange and shady characters. Behind the scenes mighty conspiracies of wast power juggle for supremacy. Our hero is physically insignificant but is provided by the powers running the action with a physically superiour watchdog. And in buth novels the powers running the show emerge succesful in the end and continues to run the show, even stronger than before.

So much for narrative pattern recognition, as for actual steganography, i.e. formal differences between the novels revealing new meaning, I doubt there is any.

Posted by Claus at 02:41 AM

January 16, 2003

HTTP + XML != SOAP

Jon Udell has a nice piece on the power of combining simple things viaServices and Links. The absence of SOAP from the discussion is a quality in itself in this discussion on web service infrastructure. Pure publishing and open link space are the original selling points of the web and they remain what it does best.
As Udell shows, this does not mean that the web becomes non-machineable, even if the lack of typing of most of URL space may make the services fail a lot. But that's a feature not a bug for some kinds of information.

Working out from that link via this link there was sort of a webbed thread about the semantic web effort and how it relates to the current vogue of piggybacking infrastructure on top of the html web via clever parsing and abuse of semantic elements in HTML. My ability to read the thread appears to be derived directly from the two-way functions in MT that we don't use at classy.dk since the embarrasing lack of readers would be then be readily on display, so that's a nice indication of where URL space could go as a mechanism.

The conflict between heavy standards based strongly typed hard to deploy solutions and easy to do makeshift solutions is a classic and the many weblog formats clearly have been able to build tremendous momentum from good enough interfaces. The main threat to the semantic web efforts remains the need to build infrastructure before attaining value and the unclear value semantic web efforts provides to first mover implementers. And of course any successful application would have to expect the data to be broken a lot and still be valuable.

Posted by Claus at 01:03 AM

December 16, 2002

Promiscuous interfaces

After a particularly humbling fight with the Xerces parser - trying to find a simple way of outputting the text of a tag and all tags contained therein - I had to think a little while about the ease of use of natural language interfaces and the difficulty in using artificial language interfaces.

Natural language interfaces are extremely simple to use. We know so many ways in which to derive meaning from a particular statement. Use of the ambient context of the statement implicitly. Standard rhetorical figures that form naturally to create new meanings around old ones. Rephrasing of the statement in similar forms. Just the many ways in which we can derive meaning from a word, just because it is a word of a particular kind. From nouns we can create an amazing number of different parts of speech through derivation from the noun. Our ability to derive meaning from letters, tokens, words, sentences, and conversations is extreme. The pipe dream of productivity tools is to create as rich a conversation using artificial languages.

Creating such a rich environment has so many aspects and for each of these aspects there seems to be a programming technology devoted to making that particular aspect appealing and straightforward.

Object oriented programming is an attempt to reformulate imperative program instruction as world modeling.

Intentional programmand and aspect oriented programming try to address the openness of natural world models in ways object oriented programming fails to.

Template programming tries to add inflection and token derivation to the type system.

The notion of mixins resembles aspects in the way it tries to open up class hierarchies, but has a data component, not just a program flow component.

Real metalanguages would probably claim to attempt 'all of the above'

What I've yet to see native language support for is the promiscuity of natural language, when it comes to describing objects. When you talk about some object you adapt the understanding of the object on the fly. In a conversation it is you not the provider of the object who defines the properties and interface to the object, simply by describing them. Maybe you have to start with the description you've been given - but from that you can rapidly adapt your own description and work in terms of that instead.
The point compared to the mechanisms available is that the object remains of the original class, you're just creating a temporary interface appropriate to your situation.

Another related candidate for language innovation is 'code-free delegation' i.e. how to use stock implementations in interfaces.

The true metalanguages, and the low level text-based ones like perl, allow for something like this, but either they are not languages with rich libraries - which really defeats the purpose of something like this, or the feature is only available by accident and certainly not in a type strong fashion. The idea with an extension like this would exactly be to hang on to type strength but make crossing of type boundaries possible when needed.

The object way of doing that is to write an adapter class presenting your own interface which is then implemented in terms of the existing interface, but to write your adapter you typically have to aggregate the old object, and that then means you have to actually construct a new object which can be a time-consuming and not very on-the-fly experience.

What I'm looking for is the ability to specify - for a class already in existence - an interpretation which I guess would be the keyword used to describe it. It is basically an implicit adaptor created on the fly for another class. It cannot add data to the original class (that would just be aggregation) but only implement a new interface in terms of the old one.
This is sort of a backdoor into generics. You'd specify an interface for e.g. a containerclass, and then you would create an interpretation of an existing class to make the container hold the existing object.

An adaptor refactoring tool in a modern IDE could be the solution if done in proper two-way fashion.

It's clear that the only languages with any shot at resembling natural languages are the self-morphing text based or just very richly reflective languages. The problem with most of these is that keeping languages that open also means contracts can be hard to enforce, and you can't prove any kind of behaviour even with the sloppiest understanding of program proof without contracts of some kind. The type-system is of course the weakest type of contract and that is only enforced by a few of these languages.

Posted by Claus at 11:23 PM

November 22, 2002

Survey on human language technology

Found the Survey of the State of the Art in Human Language Technology. Looking forward to reading it.

Posted by Claus at 03:23 PM

November 20, 2002

Aspect oriented programming

I've been reading a little about aspect oriented programming - which is reminiscent of intentional programming as evidenced by the recent company formation by intentional pioneer Simonyi and Aspect pioneer Kiczales.

Other than being a cool concept it also fits very well into the pragmatic vein being almost precisely a technology for making good use of the discursive state of language.

The basic notion is overloading the program flow with 'stacked' pre-method and post-method calls packaged as aspects. The typing comes about by specifying target predicates for the method signatures a particular aspect modifies.

Once that idea has settled in as defined on top of an OO language like Java the extension to a pure aspect language where ALL function calls are basically invoked by predicate comes to mind. Clearly the execution order of very large sets of predicated computations become an issue - especially since aspects explicitly allow sideeffects (a pet example of an aspect is 'design by contract') but the final execution model has other advantages like implicit simple langage for multitasking of operations. It is easy to consider aspect invocation events - certainly if you are used to the Delphi class libraries where implementation by delegate uses a pre- and post-method pattern all the time.

Continuing that thinking you end up with a notion reminiscent of Linda tuple-space, only the tuples are now method signatures where program state has been reified (fancy word for 'stored as data' - with some theoretical sauce added), so that the dynamic state of a particular computation is available inside the tuple space to any available processor. I'm not to sure about the reification part though. Methinks I should hack something like that together. Of course the real beauty of someting like AspectJ is that all the decisions on 'pattern matching', i.e. typing, make sense at compile time, so that dynamic complexity and type inferencing complexity is not multiplied, whereas the fully dynamic model does NOT do this, but clearly it doesn't have to be that bad at all. If desired one could dispense with the ability to compuet predicates dynamically and do the same compile time optimization for each definable pre- and post-state available.

Coming up with a viable language to express such a complicated flow of execution is certainly an admirable achievement, and the formulation given in AspectJ seems very elegant. The equivalence with pure java is a nice selling point. As long as you're in doubt you have a perfect code generator at your disposal and can work in generated code. The original code is a good design medium though - and the developers of AspectJ even took pains to debug enable aspects. That is very close to being the complete list of requirements for good tooling.

Posted by Claus at 04:14 AM

November 17, 2002

Reiteration : Google is the best semantic web

The notion of the semantic web and the schema efforts to enable it are worthwhile, and by their openness on the right track, but since language and hence knowledge is a game of incompleteness and ambiguity, the schema efforts are likely to fail due to their grand scheme nature. People will not comply. Reading some comments about XML support in the most common client data tools on the planet (apart from the browser) namely MS Office it is comforting to know that they are at least getting some of it right, working up from data instead of down from metadata. This is the only thing that could possibly work. And this is the reason why Google is such a huge success. The ambient contradiction is, that the story is about the XML-enabling of Office (i.e. a huge push down from metadata, not up from data).
From a client perspective, however, there is no question that the direction I indicate is the important one. The really interesting thing is that one will expect to be able to import old non-standard data to XML (proprietary of course - they are still not the good guys, just less shady).

Next step up: RDF actively deployed and used by e.g. Google. The first application of this is already out there of course - being the many interlinked weblogs about web services and their many cross supscriptions and structured cross linkage.

Posted by Claus at 01:59 AM

October 16, 2002

Tool or terror

Further analysis of my web stats indicate that the notes on the Mythical Man Month is the most read article on the site. This is also consistent with the most wanted search results (except the helpmeleavemyhusband ones).

I'm certainly not done with the concepts I talk about in this log entry and most recently I've thought a little about the irony of the fact that I always, without fail disable auto completion features from 'Office-Style' applications, like mail programs, the browser (the form handling, not the history) and the word processor. This seems to be a contradiction of the happiness with which I embrace IDE enhancements like code completion. Why the difference. First of all - The standard auto-completion apps are a nuisance. They are quite simply badly done, and very intrusive. Second, I don't really trust Microsoft for a second to not send my keystrokes to shady businessmen. (So password suggestion is a very specific Thing That I Do Not Use). Thirdly, the lack of a model theory underlying the suggestions make them much less valuable. They are rarely the proper suggestion. In a word processing application you always need a specific grammatical form of a word, and the suggestion mechanism does not typically include grammar in the algorithm to guess words to suggest. Articifial languages like the ones used in computers have the quality that tokens are never modified to alter meaning, so seggestion mechanisms need not take this into account.

Of course word form is not the only interplay between language and code completion. The Delphi IDE has a particularly obnoxious model clash in it's code completion. When you're writing a sentence, and is mid sentence at an 'active' point in the sentence, i.e. the next token you write will be the first token of an expression identifying a value missing from the entire statement, Delphi has the good idea to require completion to return a value of a compatible type, since the sentence will not compile if there isn't one. However, the Delphi IDE is not able to accept the fact that you could write a sentence that computes a reference of the correct type, e.g. by descent through properties of objects nested within one another. This is particularly annoying : Code completion starts up, but does not allow you to navigate to the accessor you need, since it has the wrong type.

The Internet Explorer link history is another case in point. If you have used a site with many implicitly loader resources, the suggestion mechanism cannot do what my weblog analyzer can : Distinguish between navigable pages and implicitly loaded resources, so the history record is polluted with all kinds of gif's etc that hide the link suggestions that make any sense.

So to do a proper suggestion, you need an exact or a good statistical suggestion mechanism, and you need to allow for the grammar of expression that is currently in focus.

Posted by Claus at 01:30 AM

October 08, 2002

In Fact You Can Buy It Today

From IBM. Runs on your Pocket PC - translates between major languages.

Posted by Claus at 09:43 PM

But Does It Work?

Of course one should never make bold claims about the wonders of mechanical translation, or indeed of any claim within the broad area of artificial intelligence (if there is such a discipline) without providing proof that these claims could be reasonable.

As a simple test I used the latest web-demo of Systrans translation software on the log entry about machine translation in practice. Here are the results.
I took the paragraph:

The problem of correctly converting speech to meaning is hard enough, but if you can limit the responder to use a limited functional grammar - that could even be made an unambiguous closed subset of the open ambiguous grammar of the local language, you could conceivable have a valuable interchange with the speakers of this other language.

In Systran-French this is :

Le probl?me de convertir correctement la parole en signification est assez dur, mais si vous pouvez limiter le r?pondeur pour utiliser une grammaire fonctionnelle limit?e que pourrait m?me ?tre fait un sous-ensemble ferm? non ambigu de la grammaire ambigu? ouverte du langage local, vous pourriez imaginable avoir un ?change valable avec les parleurs de ce autre langage.

Which again (the true test) back-translates to this:

The problem to correctly convert the word into significance is rather hard, but if you can limit the responder to use a limited functional grammar which a nonambiguous closed subset of the open ambiguous grammar of the local language could be even made, you could conceivable have a valid exchange with the loudspeakers of this other language.

Clearly, this is not English spoken by a speaker of English - but for situations where conversational context can supply a lot of the meaning it isn't really all that bad. And certainly with the ability to further interrogate, speakers at both ends of the system could identify the typical style of error and adjust both their own utterances and their interpretation of the counterpart, to make some sense of the text.

As a further test, I iterated once more English->French->English and arrived at:

Problem to convert correctly word into significance is rather hard, but if you can limit the responder to use a limited functional grammar that a subset closed nonambiguous of the open ambiguous grammar of the local language could be even made, you could conceivable have a valid exchange with the loudspeakers of this other language.

This is really quite OK for degradation after four translations. Knowing the context of the conversation, I think you would get at least some of the meaning of the sentence.

Posted by Claus at 09:26 PM

Humanistic computing in the battlefield

It will come as no surprise - it is old hat to state that the killer app - quite literally - for augemented sensory reality through technology is the battlefield. The severity of the contest and the willingness to pay is unrivalled. So it will come as no surprise that the US Army is investigating the use of a Universal Translator Star Trek style. The solution to the comprehension problem is years off so basically the idea is to use a limited form of universal speech generation app.

Essentially these are just digital phrase books that actually speak out the required phrase in a number of languages, but using an approach not too unlike what I would think a child is doing to learn language you can do better.
You would work your way out from a sort of functional language model - using simple substitution rules for templated sentences to specifiy a large number of sentences that are precise enough. In the same fashoin you could have some comprehension, by using these sentences in an interrogation style conversation taking not much more than affirmative or negative cues from the responses you're given in a native tongue.

The problem of correctly converting speech to meaning is hard enough, but if you can limit the responder to use a limited functional grammar - that could even be made an unambiguous closed subset of the open ambiguous grammar of the local language, you could conceivable have a valuable interchange with the speakers of this other language.

This is not too unlike the approach in artificial languages like esperanto, except that the use of technology affords additional simplicity in that one could specify a functional subset of any natural language of interest, instead of forcing everybody to adapt to a shared functional language subset.

This seems to me to be a case for humanistic intelligence. The downconversion to a subset of your own language is close to effortless. - The translation of the downconverted language to a subset of another natural language could be feasible for a computer, and would make communication a lot simpler.

We all do this when we try to speak some language of last resort in a foreign country. Working out from stock phrases using a simple subset of techniques for constructing meaning in the foreign language, we try to make ourselves understood. This may be a personal experience of mine, but I always find the approach stumbles when I'm answered, in the signal processing phase - when I receive auditory input in response that is alien to me.
If I could have just a babelfish quality translation of words said to me in French with me when traveling I would be infinitely better off than I am today if forced to understand French.
A device to offer that kind of information should be possible today - even in a mobile device - certainly a personal computer of fairly recent model is sufficient, and the most powerful palmtops are only something like 6-7 years behind stationary devices in computing power.

So in short what I want on my PocketPC is MS Talk - It is a personal dictation program in any of the supported languages, and it offers translation of the quality Systran has been able to offer since the mid '90s of the dictated text. Very few sentences would make any sense at all - but just a fast translation of stock phrases, and common terms would help me book hotel rooms, find the train station etc.

Posted by Claus at 10:54 AM

October 02, 2002

Reflection and natural language

In the ongoing series on quips about natural and artificial languages I sat and thought a while about the purpose of reflection in artificial languages.

At first glance reflection looks like a feature of language, but in reality it usually serves another purpose: It provides information about the model the language is speaking about.
In natural languages, the ambient knowledge about the world modeled by language is so strong that explicit reflection is rarely needed. In artificial languages this is not the case - and reflection provides the model knowledge that enables brevity of expression. It is not quite the same as meta linguistic expressiveness of natural languages, since metalinguistic abstraction of natural languages exactly relies on heavy ambient model data and deep conversational state. Or maybe more precisely, reflection in natural language is not as explicit, since reference to entities can be made indirectly, relying on deep state.

So we see now that the level of reflection of the language is paramount for productive brief expression in a language.
This again tells us something about where wizards fail, and what the purpose of IDE's really is at the very abstract level.
IDE's provide rich conversational state - adding to the ambient knowledge about the world modeled by the software. The better the IDE the more straightforward the conversation leading to code. Intentional and aspect oriented software initiatives are looking to dramatically raise the ambient level of conversational state.
Wizards also provide rich conversational state. But the second the conversation is done, the wizard forgets it ever took place. So the conversational state is gone and you are left with a very explicti very heavy uphill struggle to keep the momentum going. Often it is simpler to redo the conversation with variations. Repetitive actions are error prone, so the conclusion is that wizards are bad.

This then is the clear cut but sufficiently abstract reasoning behind previous notes on the requirements for good development methods.

Posted by Claus at 09:51 PM

September 29, 2002

Debugging for the blind

Every hacker watching The Matrix would know this: While the greenish glyphs streaming down the screen in the hacker submarine look really cool they do not represent in any significant way the use of visual information when hacking.
The reason: Our perception of visual information is geared for an enormous ability to orchestrate information spatially and this is done at the cost of a very poor visual resolution for temporal information.
We all know from the cinema what the approximate maximal resolution of visual information is : Approx 24 Hz, the rate of display for standard film. If it were better, movies would not look to us like fluent motion.

Our shape recognition ability on the other hand is almost unlimited and the brain even has some amazing computing related tricks where we have very high spatial resolution in the focus area of vision, which comes at the expense of general sensitivity (amateurs guess : Sincy you need a certain number of photons for a difference over space to be present you need a higher level of lighting to realize good spatial resolution). Our peripheral vision on the other hand is extremely sensitive, but has less resolution.

So a better way to construct a new age visual hacking device would be to keep the complicated glyphs - which we can easily learn to recognize - for focal vision and add peripheral information that is important but only as background information that may require us to shift our attention.

An idea for debugging could by glyphs representing various levels of function from the highest to the lowest - all visible at the same time - and then use the peripheral information for auxiliary windows. In the case of a debugger you could have variable watches etc. in the peripheral view and they would only flicker if some unexpected value was met.

I think complex glyphs would be a workable model for representing aspect oriented programming. In linguistic terms we would be moving from the standard indo-european model of language form to some of the standard cases of completely different grammers (insert technical term here) where meanings that are entire sentences in indo-european languages are represented as complex words through a complicated system of prefixing, postfixing and inflection. Matrix-like complex glyphs would be good carriers for this model of language.

Aspect oriented programming is reminiscent of this way of thinking of meaning, in that you add other aspects of meaning and interpretation of programming as modifiers to the standard imperative flow of programming languages. Design By Contract is another case in point. Every direct complex statement has a prefix and a postfix of contract material.

What would still be missing from the debugging process would be some sense of purpose of the code. And that's where the temporal aspects of hacking that the glyph flows in The Matrix represent come into play. A group of scientists have experimented with turning code into music. The ear, in contrast to the eye, has excellent temporal resolution in particular for temporal patterns, i.e. music. That's a nice concept. You want your code to have a certain flow. You want nested parentheses for instance and that could easily be represented as notes on a scale. While you need to adopt coding conventions to absorb this visually, failure to return to the base of the scale would be very clear to a human listener.
In fact, while our visual senses can consume a lot more information than our aural senses, the aural senses are much more emotional and through that emotion - known to us everyday in e.g. musical tension, the aural senses can be much more goal oriented than the visual. This would be a beautiful vision for sound as a programming resource.

They should make some changes in The Matrix Reloaded. The perfect futurist hackers workbench would consist of a largish number of screens. The center screens would present relatively static, slowly changing, beautiful complex images representing the state of the computing system at present. The periphery would have images more resembling static noise, with specific color flares representing notable changes in state away from the immediate focus. I.e. changes that require us to shift our attention.
While working, this code-immersed hacker would listen to delicate code-induced electronica and the development and tension in the code/music would of course be the tension in the film as well, and this then would tie the emotions of the hacker as observer of The Matrix - i.e. the software world within the world of the film - neatly to the emotions of the moviegoer.

Posted by Claus at 12:47 PM

September 20, 2002

The mould rider on the breaking dyke

Kierkegaard once remarked that luck is a perfectly legitimate route to genius. Therefore I must say that I am quite happy with the new proverb that forms the title of this entry. It is a machine-translation from german of Die Schimmelreiter auf dem brechenden Deich - which was a sentence used in Die Zeit in a commentary on this sundays election for chancellor. The translation was from Googles translation beta. I wanted to see if they had improved on babelfish in any significant way but it is hard to judge from a few experiments. Services such as this always look impressive at first sight.

For instance the translation "Pitch for Hans's acorn" for "Pech f?r Hans Eichel" is funny but not so meaningful.

Posted by Claus at 02:57 PM

September 19, 2002

Google news without the fuzz

I hadn't noticed this new Google beta, but now you can get your news headlines in glorious Google quality. News sources are scanned and news are arranged by topic - and I don't mean crummy general topics but a specific headline capturing the story.
We are now avaiting another stupid lawsuit from the less than intelligent association of danish newspapers who as previously reported fail to understand the nature and benefits of the internet in general and the hypertext of the web in particular

Thinking about the news categorization offered, this is probably less of a feat than it could have been. First of all your typical news story will name a person or geographical location providing rare occuring words to correlate stories from different sources. Secondly the presence of a few global newswire services read by everyone in the industry probably correlates news copy more than you'd care to recognize if you like the free press and the free world.

With only five minutes of experience with the service here's another scary observation: In the minutes it took me to write this log entry the news listed on the site were completely changed ! The lead Headline News story was moved four slots down, the headline chosen as representative for the group of stories was changed, and another story was removed from the Headlines category and demoted to a subcategory.

That's news just a little too hot for me. Clearly there is desire to keep updated. The usual dead link experience encountered when jumping from web-indexes is not really acceptable either so the balance may be hard to strike. Andy Warhols 15 minutes of fame is turning into parts of a second as we speak.

Posted by Claus at 12:23 AM

September 14, 2002

Lynch Mob

Fik endelig set Mulholland Drive, og ligesom for Paul Auster må man konstatere at David Lynch's teknik i denne film også virker meget velkendt og af og til noget gentaget (tavse kamerature gennem jordfarvede lejligheder kunne v?re outtakes fra Lost Highway, The Log Lady er på det n?rmeste med som en nabo til vore hovedpersoner - Som med ?ret i Blue Velvet er der st?jende kamerature 'ind under overfladen' af magiske genstande, Vore hovedpersoner modtager bes?g af fantasipersoner med sk?bnetunge budskaber). I mods?tning til i Auster tilf?ldet lykkes det dog for Lynch både at skabe forventningen og overraskelsen der g?r filmen interessant.

Stilm?ssigt minder filmen mere om Lost Highway end nogen af Lynch's andre film, ihvertfald til en begyndelse. Der er den samme statiske mystik. I mods?tning til i Lost Highway er der imidlertid indlagt en stort set 'naturlig' forklaring på de otrolige m?ngder af mystiske tildragelser, og i den forstand minder filmen mere om en film som Blue Velvet.

På den måde har man både den s?rlige Lynch oplevelse - bizarre tableauer med uklar betydning, men smukt og uhyggeligt sat op - og så alligevel en forståelig og vellykket historie at gl?de sig over

Posted by Claus at 12:52 AM

Austersk Automatskrift

Har netop l?st Paul Austers "The Book of Illusions" og er egentlig noget skuffet.

Austers teknik er over tydelig i bogen. Dens emne er helt banalt ikke s?rlig interessant, og v?rst af alt så har man set de fleste greb og ideer f?r.
De markante gentagelser fra tidligere b?ger er

Den udstrakte reference og citat fra fiktive v?rker - Som i 'The Locked Room' og 'Leviathan' spiller citat fra og omtale af fiktive kunstv?rker en k?mperolle. I denne omgang mindes man dog mest om Borges' ber?mte citat om hellere at referere en god id? til en bog på 500 sider på 5 sider, end faktisk at skrive bogen. Austers citater fra stumfilmstjernen Hector Mann's produktion er n?rmest uendlige

Tilf?ldigheden som plot-twist - En Austerklassiker : personerne forhindres hele tiden i deres t?nkte forehavende og fort?llingen i det mål der egentlig blev stukket ud fra begyndelsen af tilf?ldigheder, der bev?ger sig ind fra periferien og folder sig ud som den egentlige pointe

Den fejlbarlige jeg fort?ller - T?t knyttet til tilf?ldighederne er Austers standard fort?llesituation - Den bagudkiggende jeg fort?ller, som konstant påminder l?seren om at han er ved at referere baggrunden for nutiden og altså hele tiden fastholder l?seren i opfattelsen af at der refereres noget fuldst?ndig belyst og i tydeligt afsluttet. Og denne fort?llesituation punkteres så konstant af at fort?lleren i det refererende ?jeblik tager fejl af begivenhedernes sk?bnetyngde. Der sker en begivenhed og den placeres af vores fort?ller i en sammenh?ng med nutiden, som punkteres af et af de undelige tilf?ldige twists i handlingen

Det er fascinerende - men mindre end man havde håbet simpelthen fordi fort?lleteknikken er så tydelig og set så mange gange f?r. Så oplevelsen bliver som en slags finkulturel detektiv-roman. I?vrigt en beskrivelse som Auster sikkert ville s?tte pris på. For at tryllekunsten skal lykkes skal der dog etableres forventning og overraskelse og dertil er bogens teknik ganske enkelt for tydelig og for meget af en gentagelse.

Caveat : Faktisk har jeg l?st "Illusionernes bog" - den danske overs?ttelse og den kan selvf?lgelig v?re skyld i noget af skuffelsen, men Austers stil er meget 'overs?tbar' - ikke s?rlig floromvunden og meget direkte. Sprogtonen er mere bundet op på fort?lleform end på de poetiske aspekter af sproget der overs?ttes dårligt.

Posted by Claus at 12:38 AM

August 02, 2002

Open OpenCola ?

The rapid adoption of weblogs as a primary publication medium must be a challenge to would be vendors of peer to peer information accumulators and a boon to proponents of the semantic web as adressed (indirectly) here.

The characteristics of the web log:

Pure text

Short focused message bursts

Reliance on essentially nothing but hyperlink as metadata provider

Open directory standards (of sorts - aggregators can tap into RSS as a publish subscribe directory builder)

has proven to be enough to make it an efficient builder of semantic structure, when combined with large scale indexing like Google. This only goes to show how efficient natural language is as a knowledge interface.

It is interesting that this technique, which is essentially statistical, is looking more and more as a viable competitor to more classical "linguistic" approaches to knowledge representation - at least according to the linked article.

This is completely in sync with the experiences reported on natural language processing round and about the net. The more deterministic techniques based on grammars and linguistic structure are loosing to statistical, information theoretic techniques.

The classic guess as to why should be that the process of interpretation has no known condensation point. Without semantic hints, there is simply too much room for interpretation for language processign to be practical.

The interesting thing is that the semantic hints turn out to be efficient almost on their own, in providing a useful interpretation.

This observation makes it an open guestion to me at least whether more explicit schemes for accumulating semantics on the web, like the soon to be available OpenCola, offer any real advantage over the extremely lightweight (for the client anyway) Google.

Needless to say, there are plenty of advances to be made in the space between more formal interpretation of knowledge, like relationally stored business data, and the weakly organized semantic web, but it sounds implausible to me, that an explicit semantic application could beat the simplicity of hypertext, when hypertext is augmented with something like Google.

Some background : In formal logic, i.e. the study of formal languages, there is a notion of a model theory which is basically the idea that language utterances can be mapped - throug a well-defined mapping - to assertions about a reality through the act of interpretation.

In the theory of signs, while the notion of interpretation remains, so that the essential understanding of what language does remains, the idea of a well-defined comprehensible mapping fails for all practical purposes, because the model of language is enhanced with information about intent (as evidenced by Peirce's famous sentence that "a sign is something that stands for something (else) in some respect or other for somebody") The problem with the mapping is that, the act of interpretation itself, can be construed as an utterance to be interpreted, and there is no natural stopping point. The process of interpretation is endless, an idea known as infinite semiosis.

This idea is also present in formal logic, where it appears in the form of meta languages. Without getting too technical, it appears as a conflict between utterances in a formal language and utterances about a formal language, in what is called a meta language. What happens is that assertions that cannot be proven true in the formal language can, through an act of interpretation in the meta language be proven correct. But of course there is then the question of a meta meta language of utterances about the meta language and so on. (and incidentally this is all related to the famous and celebrated type theory of Russell and Whitehead and the famous incompleteness theorem of G?del).

These abstract notions become very concrete when attempting to map concrete utterances to a model.

The beauty of natural language is that it is a closed, if imperfect, system. In this system hypertext become weakly typed semantic assertions, and this turns out to be enough to condence the web semantically.

Posted by Claus at 11:39 PM

July 30, 2002

Vigilante internet

No this is not an article about a failed 2M Invest company... Actual legislation is being proposed in the US Congress to allow any copyright holder to hack the hackersas reported on K5. In short, the proposed bill provides immunity for a number of possible liabilities caused by interfering with another party's computer, if the intent was explicitly - and upfront - to foil illegal use of copyrighted material.

This is the old "If guns are outlawed only outlaws will have guns" idea. Let the good guys give the bad guys a taste of their own medicine. Only, in the virtual world, where boundaries of location (especially in a P2P world) are abstract and hard to define, it seems to me that this bill is an extension of the right to self defence and the right to protect the sanctity of the home, to actually allowing aggresive vigilante incursions on other peoples property, when the other people are accused of copyright infringement.

It goes right to the core of current intellectual property debates, and raises in a very clear way the civil right issues involved in the constant and rapidly increasing attempts at limiting right-of-use for lawfully purchased intellectual property. Whose property IS intellectual property anyway?

UPDATED 20020731

In the olden days - when intellectual property was securely tied to some kind of totem, a physical stand-in for the intellectual property, in the form of the carrier of the information, i.e. a book or an LP or similar, there was a simple way to settle the issue. Possesion of the totem constituted an interminable right of use of the intellectual property. The only intellectual property available on a per-use basis was the movies. Live performance does not count in this regard, since live performance is tied to the presence of the performer, and the consumption of live performance is not therefore a transfer of an intellectual property to the consumer, in that it is neither copyable or transferable or repeatable.
It is of course the gestural similarity with live performance that has led to the rental model for film.

As the importance of the totem began to degrade, so began the attacks on the physical interpretation of intellectual property. We have seen these attacks and reinterpretations of purchase through the introduction of casette tapes, video tape, paper copiers, copyable CD rom media, and now just the pure digital file.

At each of these turning points attempts are made to limit the right-of-use to film-like terms. Use of intellectual property is really just witnessing of a performance. So you pay per impression, and not per posession.
What is interesting of late, and in relation to the lawsuit, is both the question of whether this 'artistic' pricing model is slowly being extended from the entertainment culture to all cultural interaction. Modern software licenses are moving towards a service-model with annual subscription fees. This could be seen as a step towards pure per-use fees for all consumable culture - an idea that is at least metaphorically consistent with the notion of the information grid. Information service (including the ability to interact) is an infrastructure service of modern society, provided by information utilities, and priced in the same way as electrical power.
In practice you do not own the utility endpoints in your home - the gasmeter and the electrical power connection to the grid. And ownership of any powercarrying of powerconsuming device does not constitute ownership of the power/energy carried or consumed. In the same way the content companies would have us think of hardware. And Microsoft would like you to think of Windows as content in this respect.

Secondly, there is the important question of how this interpretation of information and culture relates copyright to civil right.
The sanctity of physical space (i.e. the right of property) is a very clear and therefore very practical measure of freedom. Actions within the physical space are automatically protected through the protection of the physical space. There are very real and important differences between what is legal in the commons and what is legel in private space. And of course the most important additional freedom is the basic premise of total behavioural and mental freedom.

The content company view of intellectual property is a challenge to this basic notion of freedom. There is a fundamental distinction between the clear cut sanctity of a certain physical space, and the blurry concept of "use".
The act of use itself can be difficult to define, as property debates over "deep-linking" make clear.
In more practical terms, any use of digital data involves numerous acts of copying of the data. Which ones are the ones that are purchased, and which ones were merely technical circumstances of use. The legislation proposed enters this debate at the extreme content-provider biased end of the scale. Ownership of anything other than the intellectual rights to content are of lesser importance than the intellectual ownership.

The difficulty of these questions compromise the notion of single use and use-based pricing. And ultimately - as evidenced by the deep-link discussions - the later behaviour of the property user is also impacted by purchase of intellectual property according to the content sellers. This is a fundamental and important difference between the electrical grid and live performance on one hand, and intellectual property on the other. Intellectual property simply is not perishable, and, as if by magic, it appears when you talk about it.

Interestingly a person with a semiotics backgorund would probably be able to make the concept of "use" seem even more dubious, since the act of comprehension of any text or other intellectual content, is in fact a long running, never ending and many faceted process. In the simplest form, you would skirt an issue such as this, and go with something simple like "hours of direct personal exposure to content via some digital device". That works for simple kinds of use, but not for complicated use. And is should be clear from endless "fair use" discussions that content owners are very aware of the presence of ideas made available in their content in later acts of expression.

A wild farfetched guess would be that as we digitize our personal space more and more, expression will be carried to a greater and greater extent over digital devices, so that the act of thought is actually external, published and visible (witness the weblog phenomenon). In such a world, the notion that reference is use becomes quite oppresive.

Ultimately the concept of free thought and free expression is challenged by these notions of property. It is basically impossible to have free thought and free expression without free reference or at least some freedom of use of intellectual materials.

Posted by Claus at 12:46 AM

July 26, 2002