July 28, 2002
Types: Strong or weak/Static or Dynamic/Explicit or Implicit

Just found en extremely interesting thread about
DTDs, W3C schemas and RELAX NG. The subjuct which may at first appear rather esoteric, concerns the nature of type systems, and how typing relates to XML, and ultimately this seems to be important for the role of XML as open and productivity enhancing, rather than just as a new inefficient means of consuming bandwidth and clock-cycles.

The thread starts out as a discussion of the relative merits of the schema language RELAX NG vs. XML Schema.

Proponents of Relax NG claim a number of advantages of Relax NG over XML Schema

  1. RELAX NG is simpler to express (no argument there - You have to agree
  2. No interpretation of data in the schema language (XML Schema has a notion of default elements, i.e. data to be understood by a document processor is communicated through the schema language
  3. RELAX NG is purely document oriented i.e. there are no application/implementation layering conflicts with XML (a corollary to 2. really). This concern is even more clearly an issue with XSLT, which is rapidly degrading from open rewrite/rendering technology to a clunky processor specific but XML branded language, through the addition of processor escapes (to scripting language etc.)

A corollary to the above; RELAX NG does not stipulate any information about the data other than what is present in the document, in particular there is no explicit type information. Typing is reduced to "data shape", i.e. constraints verifiable on the data through processing, but not through static querying of information type.

In contrast the key Schema proponent of the conversation, Don Box, claims explicit named typing as an advantage of XML-Schema over RELAX NG. In short, a war of religion is looming over XML typing.

There are at least three notions of type to consider in order to form an opinion on this issue (btw. I am not an expert of even a computer science graduate, so if these distinctions are at an odd angle with standard descriptions let me know (comment).

There is a notion of strong or weak types, meaning whether assertions of type about data are implictly enforced, requiring explicit action by the programmer for type-reinterpretation to be allowed(this is strong typing - as in C++)

There is a notion of static or dynamic type, which is somewhat similar to strong/weak typing, but concerns whether type assertions are enforced as before (static) or at (dynamic) run time.

Finally there is the notion of explicit/implicit type, i.e. basically whether or not the typename is part of the type signature or if is just the concrete interface serviced by the type, that is. In the latter case, only the ability to access the interface counts, whether the interface was available for the right reasons (i.e. through the proper type) is not important.

A language can make choices along each one of these axes independently of the others. But since all of the above properties adress the balance between constraints on algorithms and processing instructions (i.e. between predicative and imperative aspects of an algorithm), usually the bias of a language tends toward either the predicative (strong/static/explicit) or the imperative (weak/dynamic/implicit).

Note however that a well thought out language need not sacrifice any predicative accuracy by going (weak/dynamic/explict). There is a sacrifice in processing time by doing so, since the satisfaction of constraints must be computed at runtime, but it is very possible if not very common to do heavily constrained programming in highly dynamic languages.

What makes the XML thread interesting in this context is James Clarks speculation about the use of named types :
However, I still have my doubts that named typing is appropriate for XML. I would speculate that named typing is part of what makes use of DCOM and CORBA lead to the kind of relatively tight coupling that is exactly what I thought we were all trying to avoid by moving to XML (from this post).

I think this is a very valid point, and one that is even more to the point when it comes to SOAP and WSDL, which has a particularly bad structure in the way type information is mixed with other service data.

In bad SOAP implementations (like e.g. the one available in Borlands Delphi environment) this means that the client side of the SOAP request is bound at compile time to the server implementation. The client interface is in fact published by the server, so the server metadata is used once, at compile time.
So instead of requiring a particular input data set, and accepting whatever the server sends that happens to match the requirements, there is now an assumption about what exactly the server sends.

I think this is in contrast to some of the usual protocol design maxims, about specifying only what one end of an interface must accept, and usually stating that non-accepted content must either be passed through for later processing or ignored.

The flip side of the coin is whether there is a viable alternative to named types if XML is to be used predominantly as a data-centric language.
Clearly the possibility of naming types is economical, as is the possibly of default interpretation. And on the other hand the true need for openness is often in question.


(TO BE CONTINUED))

Posted by Claus at July 28, 2002 01:56 PM
Comments (post your own)
Help the campaign to stomp out Warnock's Dilemma. Post a comment.
Name:


Email Address:


URL:



Type the characters you see in the picture above.

(note to spammers: Comments are audited as well. Your spam will never make it onto my weblog, no need to automate against this form)

Comments:


Remember info?