August 02, 2002
Open OpenCola ?

The rapid adoption of weblogs as a primary publication medium must be a challenge to would be vendors of peer to peer information accumulators and a boon to proponents of the semantic web as adressed (indirectly) here.

The characteristics of the web log:

  • Pure text

  • Short focused message bursts

  • Reliance on essentially nothing but hyperlink as metadata provider

  • Open directory standards (of sorts - aggregators can tap into RSS as a publish subscribe directory builder)

has proven to be enough to make it an efficient builder of semantic structure, when combined with large scale indexing like Google. This only goes to show how efficient natural language is as a knowledge interface.

It is interesting that this technique, which is essentially statistical, is looking more and more as a viable competitor to more classical "linguistic" approaches to knowledge representation - at least according to the linked article.

This is completely in sync with the experiences reported on natural language processing round and about the net. The more deterministic techniques based on grammars and linguistic structure are loosing to statistical, information theoretic techniques.

The classic guess as to why should be that the process of interpretation has no known condensation point. Without semantic hints, there is simply too much room for interpretation for language processign to be practical.

The interesting thing is that the semantic hints turn out to be efficient almost on their own, in providing a useful interpretation.

This observation makes it an open guestion to me at least whether more explicit schemes for accumulating semantics on the web, like the soon to be available OpenCola, offer any real advantage over the extremely lightweight (for the client anyway) Google.

Needless to say, there are plenty of advances to be made in the space between more formal interpretation of knowledge, like relationally stored business data, and the weakly organized semantic web, but it sounds implausible to me, that an explicit semantic application could beat the simplicity of hypertext, when hypertext is augmented with something like Google.

Some background : In formal logic, i.e. the study of formal languages, there is a notion of a model theory which is basically the idea that language utterances can be mapped - throug a well-defined mapping - to assertions about a reality through the act of interpretation.

In the theory of signs, while the notion of interpretation remains, so that the essential understanding of what language does remains, the idea of a well-defined comprehensible mapping fails for all practical purposes, because the model of language is enhanced with information about intent (as evidenced by Peirce's famous sentence that "a sign is something that stands for something (else) in some respect or other for somebody") The problem with the mapping is that, the act of interpretation itself, can be construed as an utterance to be interpreted, and there is no natural stopping point. The process of interpretation is endless, an idea known as infinite semiosis.

This idea is also present in formal logic, where it appears in the form of meta languages. Without getting too technical, it appears as a conflict between utterances in a formal language and utterances about a formal language, in what is called a meta language. What happens is that assertions that cannot be proven true in the formal language can, through an act of interpretation in the meta language be proven correct. But of course there is then the question of a meta meta language of utterances about the meta language and so on. (and incidentally this is all related to the famous and celebrated type theory of Russell and Whitehead and the famous incompleteness theorem of G?del).

These abstract notions become very concrete when attempting to map concrete utterances to a model.

The beauty of natural language is that it is a closed, if imperfect, system. In this system hypertext become weakly typed semantic assertions, and this turns out to be enough to condence the web semantically.

Posted by Claus at August 02, 2002 11:39 PM
Comments (post your own)
Help the campaign to stomp out Warnock's Dilemma. Post a comment.

Email Address:


Type the characters you see in the picture above.

(note to spammers: Comments are audited as well. Your spam will never make it onto my weblog, no need to automate against this form)


Remember info?