Everybody wants to use XML. Every vendor wants to have a "native XML support". But everybody keeps reinventing XML so that it suits their very specific business needs. Everybody that tried to put some digital signature on an XML payload ran into problems that even the W3C guys did not manage to fix and decided to go for some "fixed prefix" or "don't use non-ASCII characters" tricks. Some hard-coded their own limited XML parser for CPU-performance reasons. Some mandate the use of UTF-8 encoding, etc…
I was wondering whether, within five years, XML will still exist as a standard, or if everybody will have a custom supported XML-subset or proprietary extensions as it is the case today with CSS and HTML.
Is XML interoperability stillborn?
Q: What does "put[ting] a digital signature on an XML payload" mean?
A: For instance computing a SHA-1 or an HMAC digest. You need to do it on some canonicalized form. Unfortunately, the W3C canonicalized form is not enough for this, as they recognize themselves. As soon as you use namespaces, you're doomed because the prefixes aren't normalized. You therefore must be ready to accept any prefix, validate the signature using that prefix, and when you export it you must use the same prefix again. No issue until you have to export a batch of orders, each using its own prefix. You'll end up declaring the same namespace with different prefixes again and again. It is much easier to say "ok, let's only using that prefix", but it is not XML anymore. Nevertheless, I saw business solutions where the latter approach was used.
Q: I still don't get it. Why the need for a canonicalized form? I sign messages containing all kinds of non-canonical data (like images with different levels of compression or different compression algorithms). The signature indicates that the message is authentic, not that another independently created message is identical. Why do you care about the contents of the payloads you sign?
A: I do because the signature does not apply on the data
, but on some XML fragment itself. If you SHA-1 <Tag> or <Tag >, the results will be different, but neither your DOM nor your SAX parser will let you know that there is a space just there! How are you going to validate the signature? Same question with <Abc:Tag> and <Xyz:Tag>. Abc and Xyz are supposed to be interchangeable as long they point to the same uri, but once again the SHA-1 will give you different results.
"Interchangable" doesn't mean diddly to digital signing. They have to be identical. I still don't understand how you think XML is to blame for any of that.
I don't blame XML, I blame (mis)uses of XML. HTML is not to blame on its own...
Q: Where are these custom XML-subsets and proprietary extensions? All the XML I've seen has been handled by Xerces or MSXML, and it all seemed perfectly interoperable.
A: Using Xerces or MSXML is not at all a guarantee of interoperability. This is what I see in my day-to-day business, but I'm not allowed to go further about it. Of course, there isn't much publicity about it, and I was wondering if anybody else had the same experience.
Q: You can't give a single example?
A: I'm not allowed to give you named examples that I know of. But creating a non XML-compliant application is so easy! Any non-trivial business application (I mean really non-trivial, forget about RSS or SOAP, I mean TRUE business logic) will run into problems. An application is not compliant if:
- it assumes anything about the attribute ordering
- it assumes anything about the prefix that is used
- it mandates a specific encoding
- it mandates the use of CDATA sections
- it forbids you to use the full range of allowed characters for tag names
- it forbids you to declare the same namespace through 150 prefixes and use them randomly
- it forbids you to interpret escape sequences
- etc... put your favorite here.
I saw them all... Is this really a taboo in the XML community or am I the only one that ever ran into those kinds of situations?
Several of those are explicitly forbidden by the XML spec (assuming attribute order, for example). I've written many non-trivial business apps that use XML and never had any of the problems you can't name. I think. -- EricHodges
Of course they are forbidden. I'm talking about non XML compliant applications. You never had the case? I did.
No, like I said above, all of the XML I've worked with has been handled by Xerces or MSXML (well, plus a bit of JDOM here and there). XML interoperability seems as alive as it ever did (in other words, mostly a dream based on faulty understanding of "meaning" in data.)
The question remains: am I alone?