Data modeling is not important - Notes from Classy's Kitchen

June 17, 2003

Data modeling is not important

How many people in the world know the date The World Trade Center was attacked and collapsed? It is probably not an exaggeration to say billions of people. If we think of society as a data processing device that constitutes an enormous redundancy. This enormous redundancy applies to all human knowledge. The most prized information is not the rarest or most exotic, but rather the most common and most known information.
Within a single brain it is an open question on the other hand what the processing/memory ratio really is.

In data modeling texts (the entity-relationship kind) the ideal of information is the coherent, global data model and models emphasize qualities such as consistency and lack of redundancy. The knowledge engineering approach to the semantic web adheres to the same ideals. Much work goes into establishing semantic validation and proof systems.
I think that for practical applications, this approach is wrong. We should expect this model to break completely down and expect the web to start replicating the redundancy of human knowledge.
This, inspired by some slides on RDF and the ongoing RDF debate. I think it is completely wrong to expect the publication of RDF by everybody to matter at all. The notions built on top of RDF involving provability and consistency verification also seem to me unlikely to matter until some later time. The hope that deduction will suddenly work directly off RDF seems entirely unbelivable to me.
Indexing of RDF on the other hand could be useful on limited vocabularies published with very specific purposes. But RDF should be thought of as nothing more than a technology for distributed publication of hierarchical content. That is, RDF should only be seen as construction a tree of information in the same way XML documents do, with the only important extension being that the RDF tree of information can be distributed. The notion of proof should be condensed down to not much more than XPATH like matching of the information tree. BUT by indexing RDF, i.e. actually caching particularly valuable instances of the virtual documents RDF enables this kind of search could be made efficient.

What does data modeling is not important mean then? Only that the usefulness of the data will be built dynamically by indexers, and the indexers will just extract the information they can and handle the consistent presentation of data on their own. And also that all information on the network will be present with a great deal of redundance, cached for every purpose for which it is useful.

Posted by Claus at June 17, 2003 12:17 AM

Comments (post your own)

Help the campaign to stomp out Warnock's Dilemma. Post a comment.