January 20, 2003
Newsbooster respect!

Infinite respect to the newsbooster people for demonstrating how shallow the position of the danish newspapers on the pseudo concept of deep links is.

Tim Berners-Lee summarises what the web is all about here. And the message (mine, not necessarily his) to newspapers is simple:

  • The use of links falls under 'fair use' - it's just citation

  • If you don't want people to link - do not publish links

  • Putting a link on a published web page constitutes publishing

  • It is not only a bad idea to try to legislate ones way out of this - it is not technologically feasible

I'll rephrase my position from a previous post.
Hypertext consists of text - with functionality added - in the form of links. It is the full thing that is the text, not just the words. The links are part of the meaning of the text not somehow 'a delivery device' or 'functionality' auxiliary to the text but not really text - hence speech - itself. On the other hand, the links are machineable, i.e. easily accessible to software. The machineability of the links is what makes them interesting.

There is an interesting but fundamental fact in the theory of computation: There is no fundamental distinction between program and data. What that means is that if you interested in producing a particular result by entering data into a computer program, you can be reworking the program and the data basically move any of the information involved from what is considered program to what is considered data and vice versa.

The idea is simple: To add two numbers a and b, you can either use your plus program plus to compute plus(a,b), or you can use your plus_a program on b:
plus_a(b) or your plus_b program on a: plus_b(a).
The example may look silly but of course this underlies all that we do with software, and the browser is a case in point:
Accessing web pages involves numerous formats of data, parsed and interpreted by a stack of processors. At the very least this stack contains at the bottom IP packets, on top of that TCP connections, on top of that the HTTP protocol, and on top of that an HTML rendererer. Conversely each layer provides data to the next layer, the HTML data is packaged inside an HTTP interaction, which is packed inside a TCP socket connection, which is packaged inside. The important thing is that each layer provides a computed result as if it was just data to the layer above. So when we say that we retrieve an html document from berlingske.dk we are actually computing an html document from a large number of IP packets we have received from the Berlingske server (well actually I received the packages from my ISP who in turn received them from ... (insert arbitrary number of links here) ... who retrieved the from a server at berlingske.dk).

This is of legal interest since you would generally consider the program 'active', i.e. the executor of the program is legally responsible for it's use or misuse, whereas the data is 'passive' i.e. just used. And the fact that you can move any specific bit of information from the active part to the passive part and vice versa is of course essential to the problems with digital technology and intellectual rights and legal responsibility. It should be clear from the above that my reading of pages at berlingske.dk involves a largish number of actions by many people. The actions span from writing the Windows TCP/IP stack to typing the words I end up reading to actually clicking the URL. There are many intermediaries (machines or people) that are responsible for assembling some of the meaning presented to me as data and/or software at various levels. Exactly which of these many intermediaries should be considered to play a direct part in my ability to access the information is almost impossible to say.

Clearly each of the actors involved have the ability to move responsibility around by repackaging what used to be 'passive' data to 'active' software. Newsboosters latest idea does exactly that. There are tons of other cases. I use an adblocker plugin for my browser, so when I look at berlingske.dk I don't have to wait for all the silly GIFs and - even more important - I don't have to look at them, which is the real annoyance of ads. Clearly this is automated use of data published on berlingske.dk in a way berlingske.dk did not originally endorse. But it should be equally clear that it is entirely legal. The use of published material is clearly not controllable by the web site providers and it is impossible to establish a boundary between what constitutes "the published work" and what constitutes "illegal derivation from the published work" when the work is made available in a machineable format and therefore can be decomposed through layers of automation/software.

So the 'no to deep links' position makes absolutely no sense, as long as I am able to run software on my own computer. Should the newspapers manage to get an injunction against the new 'active' link provision, the responsibility for finding the links can be moved to other places in the software.

What does make sense then and how do people get paid then? Clearly that is a problem that needs to be solved, but not in this heavy handed manner. I think that is the job for another post that may be considered 'in progress'.

Posted by Claus at January 20, 2003 09:31 AM
Comments (post your own)
Help the campaign to stomp out Warnock's Dilemma. Post a comment.

Email Address:


Type the characters you see in the picture above.

(note to spammers: Comments are audited as well. Your spam will never make it onto my weblog, no need to automate against this form)


Remember info?