Information is different - Notes from Classy's Kitchen

July 18, 2011

Information is different

It's a chestnut of interface critique: Embodiment is good, the concrete beats the abstract, nobody reads online. It drives interfaces towards the tangible, and I'll be the first to agree that good physical design (and design that *feels* physical) is pleasurable and restful on the mind.
None of these facts are, however, easy to reconcile with the fact that every day 15% of the queries seen by Google are queries Google has never seen before. Put differently, the information space Google presents to the world grows by 15% every day. Imagine a startup experiencing this kind of uptake. You'd consider yourself very lucky - even if a lot of those 15% will be spelling mistakes etc.
The 15% number sounds staggering, but it's when you compound it a little it becomes truly mindblowing - and in fact hard to believe entirely - 15% daily discovery means that in a month, the entire current history of Google searches fades to about 1% of all queries seen. Obviously this isn't a description of typical use, but it is a description of use, none the less. This is complete rubbish and I'm emberrased to have written it, read on below

Now, try to imagine building a physical interface where all uses it has been put to, since the beginning of time, fade to 1% in a month. That's very hard to do. The thing is, that thinking is different, language is different, information is different. The concrete approach breaks down when confronted with the full power of language.

This is also why we'll always have command lines.

COMPLETE RUBBISH ALERT

So, above I make a really embarrasing probability calculus 101 error, when I tried to compound the "every day we see 15%" new queries statistic. This isn't a toin coss, but something completely else. Chances are that "every day we see 15% new queries" compounds on a monthly basis to .... 15% new queries. To see why, I'm going to make a contrived draw of numbers that match the "every day we see 15% new queries" statistic.

Let's suppose we wanted to produce a string of numbers, 100 every day, so that we can say that "every day we see 15 numbers we haven't seen before". The easiest way to do that is to just start counting from 1, so the first day we see the numbers 1..100. Of course on the first day we can't match the statistic, since we haven't seen any numbers before.
On the second day however we draw 85 times from the numbers we have already seen - we just run the numbers 1..85 - and for the remaining 15 we continue counting where we left off on day 1, so on day 2 we would have the numbers 1..85,101..115. On day 3 we run 1..85,116..130 and so on.
This way, it's still true that "every day we see 15 numbers we haven't seen before" but at the end of the first month (30 days) you will have seen in total the numbers 100+29*15 = 535 numbers.
In month 2 (let's say that's 30 days also) we change things a little. Instead of just running through 1..85 we continue upwards until we have cycled through all the numbers we saw in month 1. There were 535 of those, so that'll only take 7 days. You'll see 30*15 = 450 new numbers and 535 old ones when doing this or 46% numbers you've never seen before of all the numbers you see in month 2.
In month 3 (still 30 days) we do the same thing as we did in month 2, but this time there are 535+450 old ones, so the 450 new ones only amount to 31% of all the numbers we see in month 3.
We continue like this. The most already seen numbers we have time to run through doing 85 a day for 30 days is 30*85, and we'll still have 30*15 new ones, so lo and behold, when we continue this process we end up seeing 15*30/(15*30+85*30)=15*30/(15+85)*30=15/100=15% numbers we have never seen before.

Posted by Claus at July 18, 2011 10:21 AM | TrackBack (0)

Comments (post your own)

Help the campaign to stomp out Warnock's Dilemma. Post a comment.