The following post was written by Manu Sporny, Digital Bazaar’s Founder and CEO.
Recently a few XML experts have been claiming that the decision made by large Web Service providers, like Twitter and Foursquare, to drop XML from their Web Services infrastructure is not very interesting news. They also assert that the claims that JSON is more useful than XML for the majority of Web Services is wishful thinking by a “cadre of Web API designers” that have yet to provide “richer APIs”. As the rest of this post will attempt to explain, some of these folks may be missing the bigger sea change that is happening.
This blog post started out as a Twitter exchange with Norman Walsh, who is a big name in the XML world:
Manu Sporny: Big name in XML says “Meh” to JSON: http://ht.ly/3bBMb – what about JSON-LD: http://ht.ly/3bBNy #rdf #jsonld /via @roessler @ndw
Norm Walsh: @manusporny Wow. By the time you start doing that, you’re sure you wouldn’t be better with a richer markup vocabulary?
Manu Sporny: @ndw creating a new markup mechanism incurs enormous costs and is not necessary for many, many applications. #rdfa #jsonld
Manu Sporny: @ndw re: “richer markup” – we have seen success w/ #rdfa by layering meaning on top of successful formats (HTML), JSON-LD -> same strategy
Robin Berjon: @manusporny I don’t think @ndw said “Meh” to JSON, it sounds more like he’s saying “Meh” to the XML vs JSON non-debate. I certainly agree.
Manu Sporny: @robinberjon @manusporny Exactly, Robin.
Manu Sporny: @robinberjon @ndw The XML vs JSON “non-debate” (for Web Services) has legs, will elaborate more in an upcoming opinionated blog post. =P
Robin Berjon: @manusporny Please do, and let’s have a strong debate about the non-debate! cc @ndw
The XML vs. JSON “non-debate”
So, here we go: XML vs. JSON isn’t a “non-debate” and people are choosing to drop support for XML in their Web Services for very good, very important reasons. Few find themselves being more productive while working with XML Web Services vs. JSON Web Services. Certainly, Norm points out many of the benefits of JSON in his blog post:
If all you want to pass around are atomic values or lists or hashes of atomic values, JSON has many of the advantages of XML: it’s straightforwardly usable over the Internet, supports a wide variety of applications, it’s easy to write programs to process JSON, it has few optional features, it’s human-legible and reasonably clear, its design is formal and concise, JSON documents are easy to create, and it uses Unicode.
It is during the last sentence that Norm starts to lose those of us that have used both XML and JSON on a day to day basis for Web Services. Norm goes on to suggest JSON shortcomings vs. XML:
XML deals remarkably well with the full richness of unstructured data. I’m not worried about the future of XML at all even if its death is gleefully celebrated by a cadre of web API designers.
And I can’t resist tucking an â€œI told you soâ€ token away in my desk. I look forward to seeing what the JSON folks do when they are asked to develop richer APIs. When they want to exchange less well strucured data, will they shoehorn it into JSON? I see occasional mentions of a schema language for JSON, will other languages follow?
I predict there will come a day when someone wants to federate JSON data across several application domains. I wonder, when they discover that the key â€œwidthâ€ means different things to different constituencies, will they invent namespaces too?
As a disclaimer, I should note that our company has historically deployed all of its Web Services using XML and SOAP. We have also recently dropped XML support for all of our most popular Web Services for many of the reasons mentioned by Twitter and Foursquare. We did this after years of struggling with the complexity of SOAP and delivering XML data over the Web.
The problem is not that XML doesn’t work for Web Services, it’s that it is far too complex a solution for most of the Web Service problems that people are solving. Interfacing with JSON web services is typically much easier as well, since JSON fits nicely into most programming languages via associative arrays. Even forgiving the nightmare that is SOAP, that wasn’t entirely XML’s fault, most people are still choosing JSON over XML when it comes to serializing and transmitting their data. Why is that?
The Intrinsic Simplicity of JSON
Norm asserts that JSON isn’t intrinsically simpler than XML, citing that doing mixed content in JSON is difficult. That’s true, but I would be willing to bet that the majority of us building next generation web services don’t care about mixed content in our Web Services, or if we do – we use HTML or HTML+RDFa. We’re not exchanging documents via Web Services, we’re exchanging data. Most of that data doesn’t need to be namespaced for us to get useful work done, and this is an example of what is at the core of the XML vs. JSON debate for Web Services:
XML is more complex than necessary for Web Services. By default, XML requires you to use complex features that many Web Services do not need to be successful.
That’s not to say that XML doesn’t have its place on the Web – it certainly does, but its value for implementing Web Services is being actively challenged. In some ways, claims that XML vs. JSON is a “non-debate” these days are true. Many are outright rejecting XML in favor of JSON as their data serialization format for Web Services – the debate is over, just pick JSON and save yourself a ton of headache.
Namespaces, Documents, Well-formedness Rules, and Schemas
One would be mistaken to take the arguments in this post as general arguments against XML. JSON will inevitably see growing pains as we try to move it into new areas. This is what happens with new technologies. When they’re successful in the problem domain that they were designed for, you try to see what other problem areas could benefit from the new-found simplicity.
XML is great in its problem domain – namespaced, well-formed, mixed content documents. However, Web Services are often not the problem domain of XML. When you apply a technology that is not intended for a particular problem domain, you will inevitably feel like you are attempting to accomplish your task with a rabid lemur firmly planted on your keyboard (the analogy being that the lemur is more than you need, and is going to create side-effects that are not desirable).
For example, XML defines namespaces for all attributes, but many of the most popular web services have not needed namespaces for their data attributes. XML focuses on documents, JSON focuses on data. There are a slew of well-formedness rules for XML, but only a few for JSON, resulting in less surprises when exchanging data between sites. You can’t get very far when discussing XML without also talking about document schemas, entities and DTDs, whereas JSON is largely schema-less and entity-free. These may not sound like big issues, but they creep into the APIs that you use to work with XML – adding unnecessary complexity along the way.
All Business Up Front, Party in the Back
We made the change because JSON was fundamentally easier to work with than XML. We had been translating from XML to our databases before, but found JSON so easy to work with that we just made the objects persist from the front-end, through the application layer, all the way down to the databases. We have been systematically removing XML support from our systems for a few years now. We do consume XHTML from various websites, but we store anything extracted as JSON.
This is what those that don’t understand the big fuss over JSON should take note and realize: Where good developers can simplify, they do – and when it comes to JSON vs. XML, XML ends up on the chopping block in many technology companies these days. Not only for the front-end Web Services systems, but for the back-end application and storage systems as well.
Namespaces and Validation Rear Their Ugly Head
This is not to say that Norm wasn’t right in his post about eventually needing namespaces and some of the other features of XML. Some companies may eventually need these mechanisms, but many of us are doing just fine without them. Specifically, Digital Bazaar consumes quite a bit of XHTML+RDFa, but we don’t store that data as XML – we translate what we need into JSON-LD and store that. We continue to use JSON by layering URLs on top of its simple data model. For namespaces, our company decided to build on top of RDF and specifically, JSON-LD provides a layer on top of JSON that is easy to work with and also provides a global Web-based naming convention using URLs.
Validation was also an issue, but not one that Relax NG would solve for us. Validation tends to be very application specific, so we do validation by providing prototype JSON structures with acceptable ranges via the application layer. Some of the data validation we need to do is dynamic, not static, so there are not many XML solutions that worked for us. It’s not ideal, but our home grown solution is also not as complicated as writing RelaxNG or any other XML/DTD schema language.
Why not XML?
The response from Norm about JSON-LD was interesting: Wow. By the time you start doing that, you’re sure you wouldn’t be better with a richer markup vocabulary?. The answer is not as straight-forward as it may seem. A richer markup vocabulary, like XML, might be better for expressing RDF in XML. However, we should not forget that RDF/XML has been around for over a decade with very little uptake outside of academic circles. We should also note that in a very short time, XHTML+RDFa has had enormous uptake – every search engine company supports it – Google, Yahoo!, and Bing. Just last month 25,000 sites went live with XHTML+RDFa support thanks to native Drupal 7 support. The reason for this is because it was easy for people to adopt RDFa because it layered on top of an already successful format – XHTML.
The same strategy is employed for JSON-LD, which layers Linked Data concepts on top of JSON. Many developers don’t need to change anything about their JSON structures, but those that do want the added benefit of globally identifiable names need only write a tiny bit of JSON to bring their namespace-less objects into a namespaced world. There is much more about JSON-LD in last months post about Linked Data for JSON on this blog. To summarize, for Web Services it’s better to build on top of successful technologies on their way up (JSON), than ones that are actively being rejected (XML).
“Meh” and Sea Changes
Perhaps XML experts are trapped by the innovators dilemma, not seeing the struggles that the rest of us face when building Web Services as XML non-experts. XML tends to be surrounded by complicated tool-chains and processing rules, which probably seem simple if you’ve been heavily involved in XML world for years.
However, the world moves on and when young programmers compare XML and JSON side-by-side, they almost inevitably gravitate towards JSON. JSON is easier for those yearning for a simple data serialization format that works seamlessly with the Web. JSON is certainly scratching an itch that many of us Web developer types have and for that we can be thankful. XML will continue to survive for a very long time, it is good in its problem domain. However, we should not forget that XML tried to play in the Web Services game and it fell short for all of the reasons mentioned in this post.