Web Services: JSON vs. XML

The following post was written by Manu Sporny, Digital Bazaar’s Founder and CEO.

This page is also available in Spanish. Thanks to Maria Ramos from Web Hosting Hub for doing the translation.

Recently a few XML experts have been claiming that the decision made by large Web Service providers, like Twitter and Foursquare, to drop XML from their Web Services infrastructure is not very interesting news. They also assert that the claims that JSON is more useful than XML for the majority of Web Services is wishful thinking by a “cadre of Web API designers” that have yet to provide “richer APIs”. As the rest of this post will attempt to explain, some of these folks may be missing the bigger sea change that is happening.

This blog post started out as a Twitter exchange with Norman Walsh, who is a big name in the XML world:

Manu Sporny: Big name in XML says “Meh” to JSON: http://ht.ly/3bBMb – what about JSON-LD: http://ht.ly/3bBNy #rdf #jsonld /via @roessler @ndw
Norm Walsh: @manusporny Wow. By the time you start doing that, you’re sure you wouldn’t be better with a richer markup vocabulary?
Manu Sporny: @ndw creating a new markup mechanism incurs enormous costs and is not necessary for many, many applications. #rdfa #jsonld
Manu Sporny: @ndw re: “richer markup” – we have seen success w/ #rdfa by layering meaning on top of successful formats (HTML), JSON-LD -> same strategy
Robin Berjon: @manusporny I don’t think @ndw said “Meh” to JSON, it sounds more like he’s saying “Meh” to the XML vs JSON non-debate. I certainly agree.
Manu Sporny: @robinberjon @manusporny Exactly, Robin.
Manu Sporny: @robinberjon @ndw The XML vs JSON “non-debate” (for Web Services) has legs, will elaborate more in an upcoming opinionated blog post. =P
Robin Berjon: @manusporny Please do, and let’s have a strong debate about the non-debate! cc @ndw

The XML vs. JSON “non-debate”

So, here we go: XML vs. JSON isn’t a “non-debate” and people are choosing to drop support for XML in their Web Services for very good, very important reasons. Few find themselves being more productive while working with XML Web Services vs. JSON Web Services. Certainly, Norm points out many of the benefits of JSON in his blog post:

If all you want to pass around are atomic values or lists or hashes of atomic values, JSON has many of the advantages of XML: it’s straightforwardly usable over the Internet, supports a wide variety of applications, it’s easy to write programs to process JSON, it has few optional features, it’s human-legible and reasonably clear, its design is formal and concise, JSON documents are easy to create, and it uses Unicode.

If you’re writing JavaScript in a web browser, JSON is a natural fit. The XML APIs in the browser are comparitively clumsy and the natural mapping from JavaScript objects to JSON eliminates the serialization issues that arise if you’re careless with XML.

One line of argument for JSON over XML is simplicity. If you mean it’s simpler to have a single data interchange format instead of two, that’s incontrovertibly the case. If you mean JSON is intrinsically simpler than XML, well, I’m not sure that’s so obvious. For bundles of atomic values, it’s a little simpler. And the JavaScript APIs are definitely simpler. But I’ve seen attempts to represent mixed content in JSON and simple they aren’t.

It is during the last sentence that Norm starts to lose those of us that have used both XML and JSON on a day to day basis for Web Services. Norm goes on to suggest JSON shortcomings vs. XML:

XML deals remarkably well with the full richness of unstructured data. I’m not worried about the future of XML at all even if its death is gleefully celebrated by a cadre of web API designers.

And I can’t resist tucking an “I told you so” token away in my desk. I look forward to seeing what the JSON folks do when they are asked to develop richer APIs. When they want to exchange less well strucured data, will they shoehorn it into JSON? I see occasional mentions of a schema language for JSON, will other languages follow?

I predict there will come a day when someone wants to federate JSON data across several application domains. I wonder, when they discover that the key “width” means different things to different constituencies, will they invent namespaces too?

As a disclaimer, I should note that our company has historically deployed all of its Web Services using XML and SOAP. We have also recently dropped XML support for all of our most popular Web Services for many of the reasons mentioned by Twitter and Foursquare. We did this after years of struggling with the complexity of SOAP and delivering XML data over the Web.

The problem is not that XML doesn’t work for Web Services, it’s that it is far too complex a solution for most of the Web Service problems that people are solving. Interfacing with JSON web services is typically much easier as well, since JSON fits nicely into most programming languages via associative arrays. Even forgiving the nightmare that is SOAP, that wasn’t entirely XML’s fault, most people are still choosing JSON over XML when it comes to serializing and transmitting their data. Why is that?

The Intrinsic Simplicity of JSON

Norm asserts that JSON isn’t intrinsically simpler than XML, citing that doing mixed content in JSON is difficult. That’s true, but I would be willing to bet that the majority of us building next generation web services don’t care about mixed content in our Web Services, or if we do – we use HTML or HTML+RDFa. We’re not exchanging documents via Web Services, we’re exchanging data. Most of that data doesn’t need to be namespaced for us to get useful work done, and this is an example of what is at the core of the XML vs. JSON debate for Web Services:

XML is more complex than necessary for Web Services. By default, XML requires you to use complex features that many Web Services do not need to be successful.

JSON not only fits in well with the Web’s JavaScript programming model, but it provides a simpler, easier to understand data model for Web programmers. XML is more complex than JSON – many people don’t need namespaces and they don’t need mixed content documents. They need simple data structures and a simple, compact, data exchange format.

That’s not to say that XML doesn’t have its place on the Web – it certainly does, but its value for implementing Web Services is being actively challenged. In some ways, claims that XML vs. JSON is a “non-debate” these days are true. Many are outright rejecting XML in favor of JSON as their data serialization format for Web Services – the debate is over, just pick JSON and save yourself a ton of headache.

Namespaces, Documents, Well-formedness Rules, and Schemas

One would be mistaken to take the arguments in this post as general arguments against XML. JSON will inevitably see growing pains as we try to move it into new areas. This is what happens with new technologies. When they’re successful in the problem domain that they were designed for, you try to see what other problem areas could benefit from the new-found simplicity.

XML is great in its problem domain – namespaced, well-formed, mixed content documents. However, Web Services are often not the problem domain of XML. When you apply a technology that is not intended for a particular problem domain, you will inevitably feel like you are attempting to accomplish your task with a rabid lemur firmly planted on your keyboard (the analogy being that the lemur is more than you need, and is going to create side-effects that are not desirable).

Rabid Lemur
Not amused by Web Services, and rabid.

For example, XML defines namespaces for all attributes, but many of the most popular web services have not needed namespaces for their data attributes. XML focuses on documents, JSON focuses on data. There are a slew of well-formedness rules for XML, but only a few for JSON, resulting in less surprises when exchanging data between sites. You can’t get very far when discussing XML without also talking about document schemas, entities and DTDs, whereas JSON is largely schema-less and entity-free. These may not sound like big issues, but they creep into the APIs that you use to work with XML – adding unnecessary complexity along the way.

All Business Up Front, Party in the Back

Here is what many of the people that claim that there is no XML vs. JSON debate miss out on. When you replace your front-end service with a particular technology, it tends to affect the back-end. At our company, we’ve started to use JSON-like objects for not only our Web Services, but for all of our system data. We build our systems in C++, so we didn’t make the change because we run in a JavaScript environment. You will find the same at companies that are building their infrastructure on databases like MongoDB or CouchDB.

We made the change because JSON was fundamentally easier to work with than XML. We had been translating from XML to our databases before, but found JSON so easy to work with that we just made the objects persist from the front-end, through the application layer, all the way down to the databases. We have been systematically removing XML support from our systems for a few years now. We do consume XHTML from various websites, but we store anything extracted as JSON.

This is what those that don’t understand the big fuss over JSON should take note and realize: Where good developers can simplify, they do – and when it comes to JSON vs. XML, XML ends up on the chopping block in many technology companies these days. Not only for the front-end Web Services systems, but for the back-end application and storage systems as well.

Namespaces and Validation Rear Their Ugly Head

This is not to say that Norm wasn’t right in his post about eventually needing namespaces and some of the other features of XML. Some companies may eventually need these mechanisms, but many of us are doing just fine without them. Specifically, Digital Bazaar consumes quite a bit of XHTML+RDFa, but we don’t store that data as XML – we translate what we need into JSON-LD and store that. We continue to use JSON by layering URLs on top of its simple data model. For namespaces, our company decided to build on top of RDF and specifically, JSON-LD provides a layer on top of JSON that is easy to work with and also provides a global Web-based naming convention using URLs.

Validation was also an issue, but not one that Relax NG would solve for us. Validation tends to be very application specific, so we do validation by providing prototype JSON structures with acceptable ranges via the application layer. Some of the data validation we need to do is dynamic, not static, so there are not many XML solutions that worked for us. It’s not ideal, but our home grown solution is also not as complicated as writing RelaxNG or any other XML/DTD schema language.

Why not XML?

The response from Norm about JSON-LD was interesting: Wow. By the time you start doing that, you’re sure you wouldn’t be better with a richer markup vocabulary?. The answer is not as straight-forward as it may seem. A richer markup vocabulary, like XML, might be better for expressing RDF in XML. However, we should not forget that RDF/XML has been around for over a decade with very little uptake outside of academic circles. We should also note that in a very short time, XHTML+RDFa has had enormous uptake – every search engine company supports it – Google, Yahoo!, and Bing. Just last month 25,000 sites went live with XHTML+RDFa support thanks to native Drupal 7 support. The reason for this is because it was easy for people to adopt RDFa because it layered on top of an already successful format – XHTML.

The same strategy is employed for JSON-LD, which layers Linked Data concepts on top of JSON. Many developers don’t need to change anything about their JSON structures, but those that do want the added benefit of globally identifiable names need only write a tiny bit of JSON to bring their namespace-less objects into a namespaced world. There is much more about JSON-LD in last months post about Linked Data for JSON on this blog. To summarize, for Web Services it’s better to build on top of successful technologies on their way up (JSON), than ones that are actively being rejected (XML).

“Meh” and Sea Changes

Perhaps XML experts are trapped by the innovators dilemma, not seeing the struggles that the rest of us face when building Web Services as XML non-experts. XML tends to be surrounded by complicated tool-chains and processing rules, which probably seem simple if you’ve been heavily involved in XML world for years.

However, the world moves on and when young programmers compare XML and JSON side-by-side, they almost inevitably gravitate towards JSON. JSON is easier for those yearning for a simple data serialization format that works seamlessly with the Web. JSON is certainly scratching an itch that many of us Web developer types have and for that we can be thankful. XML will continue to survive for a very long time, it is good in its problem domain. However, we should not forget that XML tried to play in the Web Services game and it fell short for all of the reasons mentioned in this post.

At least, that was our experience and is the experience of many of our colleagues across the Web. You’ll find that JSON developers tend to feel that the JSON libraries that they use hardly ever surprise them or fight against what they’re attempting to do. The sea change isn’t just the data serialization format of JSON, it is its simplicity – the schema-less, dynamic programming, JavaScript compatibility nature of the data model. From a data structure perspective, it has made our company far more productive and focused than we ever were when using XML. So much so that JSON is now replacing back-end services where we had traditionally used XML, and that is worthy of far more than a “Meh”.

6 Responses to "Web Services: JSON vs. XML"

  1. I don’t see how anything you’ve said is at odds with what I said. I gather that you were using XML to store and exchange a bunch of atomic values and using JSON instead has made your life easier. Ok. As an XML guy, the fact that Twitter and Foursquare and I presume whatever APIs Digital Bazaar builds or will build are going to expose JSON and not XML is still “meh” to me.
    I turn it into XML when I get it, and I turn my XML into JSON when I send it to you. I have a hugely scalable, extraordinarily high performance XML database on my backend. None of the problems you encountered with namespaces and validation ever even slowed me down, and I get significant benefit from developing applications that are built, top to bottom, on XML technologies. I get XPath to extract data, XQuery to sort through arbitrarily large amounts of data in next to no time, and XSLT to translate it into HTML or whatever format I need. These are largely declarative languages so they are easy to understand and optmize well. I can write meta-stylesheets to generate stylesheets to make my most complex data manipulation needs ever easier.
    I live and breath mixed content. That’s where I think the real valuable information lives. That doesn’t mean you can’t build a successful business processing something else, of course.

  2. I’ll attempt to explain where our thinking may be at odds. It seems as if your post, and your follow-up comment on here, is implying the following:

    These recent XML-dropping events aren’t really that interesting.

    Whereas I am proclaiming:

    These recent XML-dropping events are very significant!

    The selection of JSON over XML in many of these fresh new companies is mirroring new thinking in how to build Web-based start-ups.

    You are an XML guy, you know XML inside-and-out, you have a high performance XML database, XPath, XQuery, and XSLT. At this point, I’m trying to think of the number of web developers I know that are proficient with the required XML database, XPath, XQuery and XSLT technologies to be useful in a startup versus the number of web developers that can write valid JSON, JavaScript, use HTML/CSS/jQuery and perhaps store their data in a SQL back-end.

    The latter camp is far larger in my mind, which would seem to indicate that start-ups would naturally use the latter technologies than the former – that is, if you want your hiring pool to be larger as well as your Web Services partner pool. This thinking naturally drives these companies to expose APIs in the same set of technologies, which drives the growth of the technology (JSON Web Services).

    As a Web Service provider, you want to provide simple mechanisms to use your service – you try to provide services using familiar, popular interfaces. Twitter and Foursquare had already spent the development effort to build out their XML Web Services, people weren’t using them, so they decided to remove them. They took a hit in the form of the development cost to build the services in the first place because the services were too expensive to keep developing and maintain. More people were using the JSON services and the XML services were becoming a financial liability.

    This also may mean that even though developers know that the XML stack exists, they’re choosing not to use it. They are doing this either from ignorance, or they’re making a conscious decision not to work with a technology stack that they don’t want to work with anymore. In our case, it was the latter.

    One of the implications of your post is absolutely right, in that this is new territory and that there will be bumps in the road and fires along the way. The fact is that people are choosing the bumps and fires over the existing XML technology stack, which should lead one to ask, Why? That is what I think is significant. It’s not that XML is going away any time soon, but that people are looking at the entire XML stack and coming to the conclusion that they don’t want it, we’ve seen it happen in XHTML2 vs. HTML5 and we’re seeing it happen in XML vs. JSON Web Services.

    So, to perhaps overly-generalize, the point is not that you personally find the XML technology stack easy – it’s that others do not. The significance is that they seem to be doing just fine without the XML technology stack. This is where our thinking seems to differ: I think that is very significant for the XML community and it seems that you don’t seem to think so.

    Is that a valid statement?

  3. I’m pretty fond of JSON and I think it’s really good for the kind of APIs that people are building with JSON right now — single service provider APIs with well defined points of authority.

    I’m not going to speak for Norman, but the moment you start trying to define semantics across multiple points of authority (which is what it seems like you’re trying to do with JSON-LD), you’re going to find yourself either missing or reinventing xml namespaces.

    Let JSON do what JSON does well. It keeps the armies of complexity at bay. You want that.

Trackbacks/Pingbacks