JSON vs. XML(corecursive.com) |
JSON vs. XML(corecursive.com) |
> Adam: [...] He also wanted people to use JavaScript properly – use semicolons, use a functional style, don’t use a vowel, use JSLint and so on.
They could have done the same with XML, i.e. define a simple-XML subset without schema, CDATA, entities, etc. Instead they built it on top of another language that is so infamous that they felt the need to write JSLint.
> Adam: The thing they came up with, Doug’s idea for sending JavaScript data back and forth, they didn’t even give it a name. It just seemed like the easiest way to talk between the client side and the backend, a way to skip having to build XML parser in JavaScript.
So the original reason was that they could use eval(jsonstr)? Because of the security implications they better had written a JSON parser. At that point, is it any better than writing a simple-XML parser? At least, that would have saved them from the "it's not a standard" discussions.
Not so different from today. That quote is about HyperCard, not JS, by the way.
The current state of JSON generation/validation is simpler than the XML ecosystem, but a bit hackish.
We can have a much better stack.
Seems politeness goes a long way when you're facing federal charges
Douglas: The first time I saw JavaScript when it was first announced in 1995, I thought it was the stupidest thing I’d ever seen. And partly why I thought that was because they were lying about what it was.
A bigger more interesting thing though is how his company failed, in part, because they used hand-rolled JSON for messaging. Douglas: And some of our customers were confused and said, “Well, where’s the enormous tool stack that you need in order to manage all of that?”
“There isn’t one, because it’s not necessary”, and they just could not understand that. They assumed there wasn’t one because we hadn’t gotten around to writing it. They couldn’t accept that it wasn’t necessary.
Adam: It’s like you had an electric car and they were like, “Well, where do we put the gas in?”
Douglas: It was very much like that, very much like that. There were some people who said, “Oh, we just committed to XML, sorry, we can’t do anything that isn’t XML.”
I started my career during peak XML crazy and while I liked parts of it at the time, the number of things it was used for was quite insane. I had to maintain a system once where a major part of it was XSLT, when could have just been a simple imperative algo with some config settings.Anyhow, hope you like the episode!
Every time the topic comes up I feel the need to say that I loved XSLT. It was so nice. XML frankly was kind of simple, too. It had elements and attributes and that was it. And it had xpath, which offered, among other things, a parent axis, so you could walk the node tree upwards.
In JSON you can't get to the parent from the child. And walking down a tree is unintuitive, because nodes can be of different types, and if you want to maintain the order, or use successive instances of the same things (that would have the same name) you need to use arrays, and arrays of arrays of arrays look bad. Schemas are an afterthought.
JavaScript is cool -- it has mostly eaten the world anyway. But JSON is not so good IMHO.
XSLT was (and still is) great for transforming documents. Want that recipe collection as HTML? Easy.
You can still do XSLT in the browser. You can serve arbitrary XML and transform it. As an example, Atom feeds on my website (such as <https://chrismorgan.info/blog/tags/meta/feed.xml>) render just fine in all mainstream browsers, thanks to this processing instruction at the start of the file:
<?xml-stylesheet type="text/xsl" href="/atom.xsl"?>
But working with it is not particularly fun, because XML support in browsers has been only minimally maintained for the last twenty or so years. Error handling is atrocious (e.g. largely not giving you any stack trace or equivalent, or emitting errors only to stdout), documentation is lousy, some features you’d have expected from what the specs say are simply unsupported (and not consistently across engines), and there are behavioural bugs all over the place, e.g. in Firefox loading any of my feeds that also fetch resources from other origins will occasionally just hang, and you’ll have to reload the page to get it to render; and if you reload the page, you’ll have to close and reopen the dev tools for them to continue working.JSON only competes with XML. XSLT, XPath, and XSD are just as much an afterthought in that they are completely separate from XML and are entirely optional. The engines written around those is where the powers to walk the tree and validate come from, not XML itself. There's a wide range of tools to get the same benefits for JSON sources, and they usually handle XML and other data sources too, because it shouldn't matter. The reason the X* tools have fallen out of favor is because they're unnecessarily tied to a single type of source data.
Same here. XML was going to save the world! Remember XML data islands with data embedded in page source and displayed via XSLT?
The craziest thing I had to build was a tool to manage the dozens to hundreds of XML configuration files that powered our product. The tool allowed editing and deploying the files, complete with validation and even input suggestion based on associated XSD for each XML file.
I was sad to hear that Crockford is not aiming to be the author of "the next language" anymore, but I wonder how sincere that really is. His thoughts on actor-based languages are interesting.
Crockford's thoughts on actors are really interesting. I tried to pull them apart but I didn't get very far and ended up not including them in the episode.
What he is envisioning is not exactly like Erlang but not exactly like Scheme. He said that Carl Hewitt had a lot of ideas and they were hard to unpack.
If you're interested though, I would reach out to him. He is very approachable and excited to talk to people with ideas for new ways of making things simple.
More innocent times.
Apparently Philip Wadler was the person who told them needed it, because the future was XML.
( Walder is big Haskell/PL person)
> Like with the original J2EE spec, which sought to complicate the basic mechanics of connecting databases via HTML to the internet, this new avalanche of specifications under the WS-* umbrella sought to complicate the basic mechanics of making applications talk to each other over the internet. With such riveting names as WS-SecurityPolicy, WS-Trust, WS-Federation, WS-SecureConversation, and on and on ad nauseam, this monstrosity of complexity mushroomed into a cloud of impenetrable specifications in no time. All seemingly written by and for the same holders of those advanced degrees in enterprisey gibberish.
https://world.hey.com/dhh/they-re-rebuilding-the-death-star-...
It sounds a bit like someone paved a garden path for you by that point. One of the reasons for the "enormous tool stack" wasn't just depth of tools needed ("tool X feeds tool Y which needs tool Z to process namespace A, but tool B to process namespace C, …"), but also the breadth. I recall there were at least six types of parsers to choose from with all sorts of trade-offs in memory utilization, speed, programming API: a complicated spectrum from forward-only parsers that read a node at a time very quickly but had the memory of a goldfish through to HTML DOM-like parsers that would slowly read an entire XML document all at once and take up a huge amount of memory for their XML DOM but you could query through the DOM beautifully and succinctly. (ETA: Plus or minus if you needed XSD validation at parsing time, and if you wanted the type hints from XSD to build type-safe DOMs, etc.)
A lot of XML history was standards proliferation in the xkcd 927 way: https://xkcd.com/927/
XPath tried to unify a lot of mini-DSLs defined for different DOM-style XML parsers.
XSLT tried to unify a bunch of XML transformation/ETL DSLs.
The things XPath and XSLT were designed to replace lingered for a while after those standards were accepted.
Eventually quite a few garden paths were paved from best practices and accepted "best recommended" standards and greenfield projects start to look easy and a simple number of well-coordinated tools. But do enough legacy Enterprise work and you can find all sorts of wild, brownfield gardens full of multiple competing XML parsers using all sorts of slightly different navigation and transformation tools.
However I think by now we've seen that a lot of that "unnecessary" XML complexity was not, in fact, entirely unnecessary. These days we use JSON for everything, but now we've got JSON Schema, Swagger/OpenAPI, Zod, etc etc. It's not really simpler and there's a lot of manual work - we might as well be using XML, XSD & SOAP/WSDL.
> The good thing about reinventing the wheel is that you can get a round one.
I mean... with charity I can see the context and get it. But. What!?
Overall fun read through history, even if definitely from Doug's perspective only. (As evidence by JavaScript being an originator of lambdas...) I do find the idea that JSON was as novel as history says it was kind of odd. I remember inlining javascript objects years before "JSON" was a thing. Making it a subset of what javascript could already do seems straight forward and a good execution. Getting rid of comments feels asinine to me. (I'll also note that the plethora of behaviors you get from JSON parsers shows that it is effectively CSV. Sure, there may be a "standard" out there, but by and large it is a duck typed one.)
I'm also a bit on the camp that XML is better than JSON. Being able to have better datatypes, for a start. Schemas that allow autocompletion. Is also easier to see as a markup language (per the name). That said, they clearly went too far with entities and despite making sense for markup, attributes versus children are more than a touch awkward.
I also recall that what killed XML and WSDL files in general, was the complete shit show that was getting a single document to work with both MS and non-MS clients.
>The best thing we can do today to JavaScript is to retire it. Twenty years ago, I was one of the few advocates for JavaScript. Its cobbling together of nested functions and dynamic objects was brilliant. I spent a decade trying to correct its flaws. I had a minor success with ES5. But since then, there has been strong interest in further bloating the language instead of making it better. So JavaScript, like the other dinosaur languages, has become a barrier to progress. We should be focused on the next language, which should look more like E than like JavaScript.
- https://evrone.com/douglas-crockford-interview
One of the traits that makes Douglas great is being willing to say the obvious even if it is politically unpopular.
E had some really cool ideas, it's sad that it doesn't seem to be that well known!
1. You've got to keep JS around for backwards compatibility for the billions of websites already using it.
2. You will need to two engine teams, one to maintain JS and one for the new language.
3. Now you have a whole new vector for security issues. You've made the threat surface much broader. So, you will probably need to hire additional people.
4. You need to coordinate with all the other browser makers so everyone rolls out their new engines more or less concurrently. Other than experiments, nobody is going to start using it unless it works on all the major browsers and platforms.
If we went to a scheme dialect as originally intended, we could have just ONE language for all the things.
Legacy JS? Just compile it into Scheme and run it.
HTML? Use S-expressions and support legacy HTML syntax by compiling it into them. Now you get all the power people want from template languages, but baked right into main language itself.
CSS? No more weirdness like adding sin() or calc() to make up for shortcomings. Once again, you get the power of the full Scheme language right there.
What makes XML so unergonomic to ingest is 1) attributes, which don't map cleanly to a basic data structure that you might find in a programming language, and 2) namespaces, which are extremely, extremely tedious to program against.
Programmers are going to use the format that's the easiest to ingest and manipulate. JSON wins in that regard, hands down. Every time I need to write logic to ingest a namespaced XML document I heave a deep sigh and brace myself for another long week of fighting with LXML. But with JSON it's as easy as `json_decode($str)` and move on with your life.
Abandoning XML was the webs biggest mistake.
Very unfortunately for everyone XML came up at the same time as peak "Enterprise" moat building. No design pattern went unused everything was built with mind numbing "configuration". XML got used heavily in that space because it allowed massive "Enterprise Objects" (local branding varies) to be serialized in a way another system might have a chance to read.
Meanwhile the features you mention got thrown out with the bath water because everyone hated Enterprise style architectures. While I don't love, for instance, everything about XSLT it's built directly into browsers as native code. How many person hours, megabytes of JavaScript, and wasted CPU cycles have been spent reinventing client side templating using JSON? XSLT is already right there and will happily convert serialized data to your presentation format. You also get the ability to have comments in the data and a built in schema validation.
On my current project I'd much weather be emitting and consuming XML rather than JSON. But alas everyone hated Enterprise XML so we're stuck with JSON and the inability of some parsers to handle trailing commas and ambiguous definitions of numerics and not a comment to be found.
Have we though? Earlier, the article even has Douglas saying:
> It turns out it, well, it’s a multi paradigm language, but the important paradigm that it had was functional. We still haven’t, as an industry, caught up to functional programming yet. We’re slowly approaching it, but there is a lot of value there that we haven’t picked up yet.
I do love the very ending:
Adam: What do you think is the XML of today?
Douglas: I don’t know. It’s probably the JavaScript frameworks.
They have gotten so big and so weird. People seem to love them. I don’t understand why.
For a long time I was a big advocate of using some kind of JavaScript library, because the browsers were so unreliable, and the web interfaces were so incompetent, and make someone else do that work for you. But since then, the browsers have actually gotten pretty good. The web standards thing have finally worked, and the web API is stable pretty much. Some of it’s still pretty stupid, but it works and it’s reliable.
And so, when I’m writing interactive stuff in browsers now, I’m just using plain old JavaScript. I’m not using any kind of library, and it’s working for me.
And I think it could work for everybody.
------
Earlier in the interview where they were talking about how people behind XML and SOAP wanted complexity and were upset by the simplicity of JSON, I was thinking that this was resonating with me and how I feel about how complex web development has become with babel/webpack, transpiling, react/vue, etc. It feels like complexity for complexities sake.
If only this were true.
https://medium.com/r3d-buck3t/insecure-deserialization-with-...
On the other way, XML External Entity is a part of XML standard, so any standard compliant XML implementation have to support it. This is why XXE attack applies to many languages.
JSON is simpler and easier for many cases, but then you lose the interoperability. Go try to make an app right now dealing with Federal government systems or finance, you're going to end up translating JSON<->XML which isn't fun.
There's not going to be a silver bullet solution to this problem, it's not completely solvable.
Not fun? It's not even possible in the general sense.
If you have XML that looks like:
<meal type="breakfast">
<eggs count="3">
<topping>cheese</topping>
</eggs>
</meal>
How would you convert that to JSON without knowing how the JSON consuming application expects it to be formatted? Where do you put the "breakfast" and "count" attributes?You'd need to manually write a translator for each potential translation.
Yep, therein lies the “not fun”. You write a bunch of super complex, brittle code.
Unfortunately because XML is entrenched in certain domains, you have to decide between writing these converters or doing everything in XML which also sucks, especially if you’re trying to write a modern app with a modern stack.
I'm leaving it here because it will never be used for anything but at least it may inspire somebody design a better format with simplicity in mind
Other problems to ponder: Is 0 different from 00? Is "1, 2, 3, 4" different from "1,2,3,4"? Is "a: b" different from "a : b" and "a:b"?
It's like the man never tried. Try a Java enabled browser: https://www.wikihow.com/Enable-Java-in-Firefox
Just as a reminder Minecraft (the most sold game in history) started out as an Applet.
Applets where not horrible because of the underlying technology, they where horrible because people made bad things with it, just like J2EE was a bad thing people made with J2SE.
But sometimes, rarely, people would make beautiful things with J2SE and J2ME and those are now removed from history forever under the banner of security like everything else that is good in life.
> Douglas: For me, the most difficult thing was raising money. You’re constantly going to Sandhill and calling on people who don’t understand what you’re doing, and are looking to take advantage of you if you can, and they’re going to do that, but you have to go on your knees anyway.
> I found that stuff to be really hard, although some of them I really liked. And sometimes I’d be sitting in those meetings and I’d be thinking, “I wish I was rich enough to sit on the other side of the table, because what they’re doing right now looks like a lot more fun than what I’m doing right now.” And it was even more difficult raising money then, because at this point, the.com bubble had popped and all VCs had been hurt really badly by that. So they were only funding sure things at that time, in late 2001, early 2002.
> And I thought we were a fairly sure thing, because we had already implemented our technology. And by this point, Chip and I understood the problem really well. And we had a new server and JavaScript libraries done in just a few months. And we had demonstrations. We could show the actual stuff. So it wasn’t like we were raising money so that we could do a thing. We had already done the thing, we needed the money so that we could roll it out. And that wasn’t enough for them. They wanted to see that we were already successfully selling it. And I was like, “If we could do that, we wouldn’t need you.”
Only they hadn't. They had built a demo of what we would later call a web 2.0 app. It wasn't even an application that solved a business problem or did anything specific. It was just showing the concept. That's not a product and that's not a business. The VC's point was: Show us proof that this idea has tangible benefits people will pay for.
The biggest misconception of VC's is that you raise money to "successfully sell" something you've built. You don't. You raise VC money to scale something that has value. So you need to communicate the business value, and ideally have proof-points (either in the form of sales, or data) that prove the value.
Of course Douglas found raising money difficult. But he doesn't seem to have the self awareness that this was probably due to him, and not the rich suits on the other side of the table.
1. Parsing JSON doesn't require adding new firewall rules
2. There are no comments, so nobody will try to invent their own meta format or annotations in comments and instead they will put data in the JSON as they should
3. (When compared to JS) someone finally had the balls and picked one type of quotes, this makes making parser so much simpler.
XML supports comments and I have not seen a single use of comment directives in it ever.
I have seen plenty of comment directives in programming languages, HDLs and so on. But they are usually used as hints, e.g. to linters or to control compiler warnings, and they work perfectly well and cause no problems at all in my experience.
You might say that Crockford didn't anticipate JSON being used for config files. Fair enough. But now that it is, it should support comments.
My recommendation is to use JSON5 since it has a distinct file extension and fixes some other things about JSON too (e.g. trailing commas, hex constants) without being full on YAML insane.
Even C# just punts on this issue and won't emit valid XML if a string you serialize happens to have a null character in it.
A human won't be able to read it (Unless you're crazy and have learned to read Base64), but the application still can easily. You'll just have to add a Base64 translation step before/after serialization/deserialization.
The other two premier XML use cases I can think of are
1. RSS: Last time I did this, ironically I built the payload with a JSON-API'd lib that deals with the XML drama for me. Worked fine.
2. Configs. Rarely are these done in XML anymore. Human readability matters for configs. But there are also better options than JSON for this.
Then I had to live through the whole SOAP-drama, and Java EE; and ended up promising myself to never touch it again.
It has too many degrees of freedom for its own good, the C++ of data formats.
JSON is in many ways the other end of the spectrum; simple but underspecified and painful to deal with in anything but JS.
I often dream of something in-between.
- This message brought to you by TOML gang
I’ll take edn over any of “em. https://github.com/edn-format/edn
Comments and time stamps allowed, arbitrary nesting of data structures, make your own tagged literals if you need them. And commas are whitespace, mostly unnecessary.
Come join the dark side where we enjoy the wonders of binary formats such as avro and protobuf.
Though for something where you want human readability it's hard to beat TOML in my opinion.
Somebody should add a json entry to "the ascent of ward" [0]. Of course, it will be longer than all the previous versions combined, and the fields will appear in random order because dictionary.
Choose the right tool for the job at hand. Sometimes json is the right choice, sometimes xml is. Not everything is a webapp.
Are you saying you think JSON shouldn't exist and everyone should use XML for everything?
Tooling around XML was certainly more established, but man there was a lot of complexity built up around it.
I use both extensively, and for bigger objects and definitions, XML is a very clear winner.
I'm a big believer in horses for courses type of approach, and my personal gripe is the push to replace one thing with another. These data types can coexist, and can be used where they shine. XML can be read and written stupidly fast, so it's way better as a on disk file format if people gonna touch that file.
YAML and JSON are not the best fit for configuration files. JSON is good as an on-disk serialization format if humans not gonna touch that. XML is the best format for carrying complex and big data around. TOML is the best format for human readable, human editable config files.
Even if XHTML died on the wayside, HTML is imho a stereotypical example where XML is a good fit. Most of the complexity has valid use cases, and it's mostly obvious what should be an attribute and what should be content of the tag. And at least in HTML 4 you even had a doctype tag filling the role of specifying the schema used. Of course SVG is a better showcase for some other aspects of XML, with every editor putting their own metadata in, nicely partitioned into separate namespaces.
Indeed, this was what XML was created for. From W3C's XML specification:
> The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.
Honestly, what's absurd is GP comment's cluelessness.
It wasn't until about a decade later when I finally got to use XML "for real". At my academic publishing job. One of my first real projects was having a set of academics analyze documents in a web application I built. Prior to that they were analyzing them by hand, were converted to SGML somewhere in Korea, and we would use omnimark to move them to XML and eventually a library application.
The XML community, the one's who haven't retired or passed on, have been more welcoming of the competition too. They went from XML is everywhere, to being able to return JSON from an XSLT. I am in a small shop, and so I wear many hats. But I am always satisfied when I get to work with XML, or craft an xsl/xq script that does exactly what I need. Additionally, the community as a whole is very helpful, and a bit more grey. Meaning, they are less likely to fall for trends and bullshit.
A bit disjointed, but ,in short, XML is awesome. Now only if they would move Balisage back to Montreal. I'm no fan of DC or virtual conferences.
Such a document is essentially as simple as the equivalent JSON.
As a human in a repl, I appreciate the balance of readibility between XML which uses a larger set of syntactical characters, and YAML which uses fewer.
I also appreciate JSON's ontological simplicity over XML. This primarily boils down to the lack of attribute nodes and explicit difference between objects (lists of key-values) and arrays (lists of values).
Very well put. And we could lower the baseline substantially towards simplicity, even from JSON.
It's pretty clear that a lot of people think this way. Some even seriously try to figure out what such a baseline of simplicity would look like.
There are lots of simple indentation-based designs (similar to YAML) such as NestedText[0], Tree Notation[1], StrictYAML[2], or even @Kuyawa's Dixy[3] linked in this thread.
There seem to be less new ideas based around nested brackets, the way S-expressions are. Over the years, I have developed a few in this space, most notably Jevko[4]. If there ever will be another lowering of the simplicity baseline, I believe something like Jevko is the most sensible next step.
[0] https://nestedtext.org/en/stable/ [1] https://treenotation.org/ [2] https://hitchdev.com/strictyaml/ [3] https://news.ycombinator.com/item?id=35469643 [4] https://jevko.org/
All the optional complexity that can go on top, though, is probably better specified for XML. Transformation is well defined for XML (XSLT) but not at all for JSON (I guess, you write your own code to manipulate native objects).
Schemas are basically a native feature for XML. Not so much for JSON.
All sorts of specialised vocabularies are defined for XML. A few are defined for JSON, too.
JSON is so much more ergonomic than XML as the lingua franca because I can actually read it. That being said I still have my share of problems with JSON.
Me? Schemas are a requirement in areas where you need to integrate over different technology / with different implementations. JSON Schema is in those contexts a bit of a kids toy compared to what XML can do.
First, it really depends what you're deserializing with. There is a lot of code out there that just does JSON.parse and then starts accessing the data and then you have an "undefined" get passed deep into the call stack where maybe it explodes or maybe the program just misbehaves. So if you're using a language like JavaScript or Python, then a JSON schema can be used to validate input right away. Think of it like enforcing a pre-condition.
It's also useful in cases where JSON is being used for configuration files. At my company we have quite a few places where JSON files checked-in to a git repo are our source-of-truth which then get POST'ed to an API. We can enforce the schema of those files using pre-commit hooks so no one even wastes time opening a PR that will fail to POST to the API. The same JSON schema is also used by the API to ensure the POST'ed data is correct.
It is simpler than XML/XSD. Without the schema, you never know if a certain element should be treated as being part of a list or not. When interoperating with anything other than XML, that matters.
I can remember hardcoding and manipulating a bunch of non-sense legacy fields just to get a ticket created via their SOAP enterprise service bus. Not to mention all the operations that made no clear sense.
Consuming SOAP/WSDL from languages other than the one it's published in isn't fun. Man, some of the PHP implementations were beyond horrible... well defined REST/RPC +_JSON is generally much easier in the end.
I disagree. I think personal hygiene is very important for in-office coworking.
Well, I'm about to take a shower now, and shame on you.
- generic concepts like arrays and maps
- lack of opportunity to invent names
Every xml schema is a potential DSL that reinvents things they might now.Other than that it's true that the xml era was just addressing a lot of important stuff early, I guess it was only compatible with big corp mindset and not early web dynamic / fluid / small scale apps. (a bit like how PHP started to write PSR to avoir dynamic code / effects in libs .. formalization etc.
For this JSON:
{
"part_numbers": [1, 2, 3, 4, 5]
}
You have two main ways to represent these in XML: <!-- repetition = array -->
<order>
<part_number>1</part_number>
<part_number>2</part_number>
<part_number>3</part_number>
<part_number>4</part_number>
<part_number>5</part_number>
</order>
<!-- wrapped repetition -->
<order>
<part_numbers>
<part_number>1</part_number>
<part_number>2</part_number>
<part_number>3</part_number>
<part_number>4</part_number>
<part_number>5</part_number>
</part_numbers>
</order>
Is this better than JSON? No, not particularly. But it’s no less clear than the JSON, and it compresses pretty well (it compresses better for larger documents, obviously).The larger problem with XML is that the tooling is often lacking outside of Java and C#/.NET and none of the tooling is well-built for the sort of streaming manipulation that `jq` does (it exists, but IMO one of the least usable ideas from the XML camp is XSLT), and JSON support is pretty universal everywhere, even if the advanced things like JSONpath and JSON Schema aren’t.
I also think that there’s a problem when you have to choose between SAX and DOM parsing early in your process. Most JSON usage is the equivalent of using a DOM parser because the objects are expected to be relatively small, but many XML systems are built for much larger documents, and therefore need to parse the stream because the memory use otherwise would be unacceptable. The use of a JSON streaming parser is much rarer, IME.
The hate I have for XML is the high markup overhead. Anybody who has configured a trunk of the century product with XML config files knows what I mean; the screen is usually 2/3 XML tags, which means 1/3 closing tags, which add nothing semantically
Uh... do we? I've never used any of those. Plain JSON has always worked fine for me.
You don't have to use any of those.
And I don't make any real defense of some of the darker corners of XML. In particular, I already criticized entities being a bit too much. Namespaces are also something that, while I can see the desire, the implementation is way too much for most of us.
JSON schema is going to be cursed for a long time. Just the odd treatment of it will be a problem. (In particular, that it is a subset of the numbers that javascript itself supports is... awkward.)
I also confess, though; that I'm not clear why I would want a null in the middle of a string? That feels like a gun loaded and aimed squarely at a foot.
It also means it's worse format for configs where you sometimes need to annotate a few nodes with comments.
"comment": gets littered across the JSON... or temporary changes are copied and the original property name is invalidate with a prefix. The simple structure is gone, replaced with adhoc workarounds.
Similarly when you want to use a type not supported by JSON such as datetime or binary data, you might end up with "type":"binary" and use base64 or whatever in the value (shoehorning attribs) - when it really needs a schema to follow during parse and stringify. Or OpenAPI, which is hardly lightweight and really doesn't match the simplicity of JSON.
Local schemas, not crazy remote schemas.
Or some sort of way to bless an "official" schema format.
apiVersion = "v1"
current-context = ""
kind = "Config"
[[clusters]]
name = "my-cluster"
[clusters.cluster]
certificate-authority-data = "LS0tL..."
server = "https://example.com"
[[contexts]]
name = "context0"
[contexts.context]
cluster = "my-cluster"
user = "my-user"
[[contexts]]
name = "context1"
[contexts.context]
cluster = "my-cluster"
user = "my-user"
[[users]]
name = "my-user"
[users.user]
[users.user.exec]
apiVersion = "client.authentication.k8s.io/v1beta1"
args = ["eks", "get-token"]
command = "aws" In 1996 I was at some of the initial XML meetings.
The participants� anger at HTML for �corrupting�
content with layout was intense. Some of the initial
backers of XML were frustrated SGML folks who wanted
a better cleaner world in which data was pristinely
separated from presentation. In short, they disliked
one of the great success stories of software history,
one that succeeded because of its limitations, not
despite them. I very much doubt that an HTML that had
initially shipped as a clean layered set of content
XML, Layout rules – XSLT, and Formatting- CSS) would
have had anything like the explosive uptake.
https://adambosworth.net/2004/11/18/iscoc04-talk/Also, it's way better in transferring/storing big, complex intricate data like 3D objects.
Curious how come?
Not so with XML: all the parsers were insanely complex with the namespacing and whatnot feature support and possible external URLs and everything else...and as a result however no XML library was ever adequate to interface with anything. On multiple occasions generally the best way to build XML for something was to take a working copy, and then glue text together so you would exactly replicate whatever that specific application wanted, rather then trying to use anyone's library for it.
I think it's not so much about readability but about complexity. XML is meant to represent complex data, like complex rich text or nested vector graphics. That makes XML complex, conceptually, visually, and in implementation. If you use it to represent something that could have been a csv you're going to have a bad time (as everyone had in the 90s).
Even perfunctory probing shows fairly serious problems in Firefox (where Chromium is consistently much better, in this specific area). I could file quite a few bugs in short order (e.g. these mentioned, bad document.contentType values, <template> not working properly), but I don’t think there’s any interest in fixing things.
(I wrote this comment as much for my own future reference as anything else. XML/HTML polyglot stuff makes things decidedly messy at times.)
[0]: https://dev-docs.kicad.org/en/file-formats/sexpr-intro/
ie, the benefits of simplicity have a limit.
Writing a conformant XML parser is a HUGE undertaking comparison.
I could get most places to give me the time to write a JSON parser in whatever language of it didn’t have one. I couldn’t do that with XML.
Because of this, every common language (and most uncommon ones) has a JSON parser while XML parsers are less common (and fully conformant ones are even more rare).
In reality people insert those meta-processing instructions in other ways.
But you still should have the option to at least ignore them while reading. That would make JSON config files so much better to work with.
At least use a native toml file as an example.
Also if I was handwriting that I would probably make more use of doted property names implying dictionaries like so, which though it has a little bit more repetition in property names, seems easier to read:
apiVersion = "v1"
current-context = ""
kind = "Config"
[[clusters]]
name = "my-cluster"
cluster.certificate-authority-data = "LS0tL..."
cluster.server = "https://example.com"
[[contexts]]
name = "context0"
context.cluster = "my-cluster"
context.user = "my-user"
[[contexts]]
name = "context1"
context.cluster = "my-cluster"
context.user = "my-user"
[[users]]
name = "my-user"
user.exec.apiVersion = "client.authentication.k8s.io/v1beta1"
user.exec.args = ["eks", "get-token"]
user.exec.command = "aws"
If k8s was designed with TOML in mind, it probably have been structured differently, such that "Contexts" for example might be just a dictionary mapping names to an object that has the values from the "context" property (The existing pattern of an array of objects where each object has a name, but store most of their properties in a property whose name matches the object's type is already weird, but doesn't look terrible in yaml.)Such a redesigned to be a more TOML friendly schema would then look like this:
apiVersion = "v1"
current-context = ""
kind = "Config"
[clusters.my-cluster]
certificate-authority-data = "LS0tL..."
server = "https://example.com"
[contexts.context0]
cluster = "my-cluster"
user = "my-user"
[contexts.context1]
cluster = "my-cluster"
user = "my-user"
[users.my-user.exec]
apiVersion = "client.authentication.k8s.io/v1beta1"
args = ["eks", "get-token"]
command = "aws"I disagree, this example is just sloppy programming. Passing unvalidated data deep into a program is bad, I'm not arguing for that. What I'm saying is that you should be converting your unvalidated serialized data into a structured type right on the edge. Your data type/type system should __be__ your schema/validator.
> So if you're using a language like JavaScript or Python, then a JSON schema can be used to validate input right away. Think of it like enforcing a pre-condition.
This is what I do with python+pydantic:
@dataclass
class Foo:
bar: int
foo = Foo(**json.loads(json_buff))
I'm not the biggest fan of pydantic here because you'll have to handle an exception for invalid data instead of an Option or Result in a better type system. But w/e.> It's also useful in cases where JSON is being used for configuration files. At my company we have quite a few places where JSON files checked-in to a git repo are our source-of-truth which then get POST'ed to an API. We can enforce the schema of those files using pre-commit hooks so no one even wastes time opening a PR that will fail to POST to the API. The same JSON schema is also used by the API to ensure the POST'ed data is correct.
You can easily do with serdes and a type library as well.
---
I guess schemas may be useful for crossing language boundaries, but you're going to need language specific types/objects at some point so why use schemas directly even then? (I think gRPC may have code gen tools for this purpose).
At first XML namespacing sounds simple. Each tag and attribute will have an optional uri attached to it, no big deal right?
From reading through the specification one could be forgiven from assuming that the prefixes are just arbitrary mappings that a processor can ignore, or automatically remap to alternate prefixes.
For example, it is true that <abc:a xmlns:abc="https://example.com/xyz" xmlns:def="https://example.com/xyz"><def:b>5</def:b></abc:a> (notice both namespaces are the same url) is equivalent to: <a xmlns="https://example.com/xyz"><b>5</b></a>.
Unfortunately, the data model also allows for content to reference the namespaces by prefix, and therefore every general xml processor that supports namespaces must keep around an application accessible mapping from the prefixes to namespaces, as the application may need to be able to access that information to interpret attributes or content. The only exception to this would be if the general XML processor insisted on having schema information for every namespace it might come across. In that scenario it would be able to tell if an attribute value of "abc:b" is really a string literal, or a reference to a namespace identifier (QNAME data type), where the namespace is whatever the current "abc" prefix is bound to, and the identifier is "b".
But obviously we don't want to add full schema support for a simple implementation, so we need to keep the mapping information around, just in case the application needs it. We also cannot easily offer nice features like changing a document to use preferred prefixes for certain namespaces, unless we also keep any prefixes that are used in values that could be interpreted as QNAMES, just in case they actually are, but our processor does not know, because it has omitted schema support for simplicity (or perhaps it included schema support, but does not have a schema available for some namespace).
And that is just the complexity that stems from one fairly small quirk in how XML works.
You also have no idea if an element content needs to preserve whitespace or not if you don't know the schema, and don't happen to have an xml:whitespace attribute present. Thus if you want to re-indent arbitrary xml for readability safely you could end up with something like this:
<abc
><def
>5</def
></abc
>I'm just not buying the argument that XML's complexity is somehow remediated in JSON. JSON becomes as horrible as XML when you bring it up to feature parity. And that's when there's a way to match features. Whatever people say about XSLT, it is powerful, reasonably well defined, and generic over all documents (even though complex). There's nothing like it for JSON I know of.
Around that time it was pretty nice passing around XML, as I was forced to work with VB.Net which also had an XML literal syntax on the backend and Flash/AS3 on the UI.
I had built a POC with E4X that was VERY similar to React/Redux over a decade before React, but the other browser vendors didn't have it... At the time IE and Chrome were shifting towards JSON.
An XML literal in Scala would have been:
// XML literals (to be dropped)
val mails1 = for (from, to, heading, body) <- todoList yield
<message>
<from>{from}</from><to>{to}</to>
<heading>{heading}</heading><body>{body}</body>
</message>
println(mails1)
This was replaced with "XML string interpolation", which // XML string interpolation
val mails2 = for (from, to, heading, body) <- todoList yield xml"""
<message>
<from>${from}</from><to>${to}</to>
<heading>${heading}</heading><body>${body}</body>
</message>"""
println(mails2)The closest thing we have right now I think is Spritely Goblins, though that is Scheme. (Not coincidentally, one of the other Electric Communities co-founders is also a Spritely Institute co-founder: https://spritely.institute/about/)
http://www.crockford.com/misty/
There are a few talks from a few months ago on Youtube that detail some of the rationale behind it.
We’re not using anything else from Prisma, but if we had to implement something else in JS to talk to a database, that would be a contender for our database interface layer (there are only a couple of others that are even remotely usable, having suffered through the disaster of a Sequelize implementation). We’re more likely to use Elixir and Ecto.
I don't know that I can lay the blame on either one of them directly, mind. But the industry definitely suffered from the bad faith cooperation of those companies.
I'm not clear on how JSON as a format has helped interaction. I'm reminded of like efforts to standardize how information is stored on pages. By and large, that ship sailed and sites that have remained somewhat stable have driven how we look for information on them. All without having to add new schema languages or tools.
And again, if this is your line in the sand, how do you serialize NaN and Infinity in JSON?
Edit: Playing with this a bit, I'd actually assume that allowing \0 would be a vulnerability. I was curious how browsers treat it, so I see that parsing to an html document seems to just drop the characters? Fun little rabbit hole to jump in!
{
"someSetting": true
"comment": "TODO change to false when ready"
}
Though really text-based protobufs are better for config. <cds>
<cd><title>Led Zeppelin II</title><artist>Led Zeppelin</artist><price>999</price></cd>
<cd><title>La Brise<title><artist>Arax</artist><price>999</price></cd>
</cds>
or <cds>
<cd>
<title>Led Zeppelin II</title>
<artist>Led Zeppelin</artist>
<price>999</price>
</cd>
<cd>
<title>La Brise<title>
<artist>Arax</artist>
<price>999</price>
</cd>
</cds>
vs something like [
{"title": "Led Zeppelin II", "artist": "Led Zeppelin", "price": 999},
{"title": "La Brise", "artist": "Arax", "price": 999},
]
You can probably do better using XML attributes. But then you're using more features.- If you are describing hierarchal data, JSON is great
- If you are describing text with markup, especially extensible markup, for machine generation and consumption, XML is great.
- If you are describing a graph, neither have broadly accepted standards so you are kinda on your own.
Depending on your requirements, a recipe collection might be better in XML or in a flavor of markdown. A comprehensive data schema and software support for recipes could be challenging/limiting, compared to marked-up text.
You can pretty easily translate XML to Markdown using XSLT, though.
I don't think hierarchal structure is the differentiator; recipes and web pages are hierarchical and they'd still be hell in JSON. XML handles hierarchy just fine. I think the differentiator is whether your content is a document, that is, composed significantly of multiline text. Multiline text in a JSON file tends to be human-hostile, but we're all comfortable editing eg html.
Relational databases can describe most graphs, but they rarely ever have a great text format.
Suggesting that msec timestamps resolves timezone issues is naïve at best, because anytime you are passing something that refers to a real time (that is, it is significant to humans) rather than an instant time (that is, it is something like an event log timestamp), you are dealing with time in a particular place, which has human impact — cultural, legal, linguistic.
Passing around timestamps as RFC3339 UTC strings with timezone names and offsets (much like one should be doing in databases) is what would be recommended for real (human) times.
Having the same tags many times means the file can be nicely compressed, it's being XML means it can be verified independently with a schema (and the schema can be defined as a remote location over HTTP if needed be), too.
You can always store data more efficiently with binary formats, but XML DOM parsers allows to access arbitrary parts of the tree instantly, so working with it is both easy and fast at the same time.
At this level they are both about equal in complexity: JSON has data types that XML doesn't, and XML has attributes and CDATA that JSON doesn't. JSON syntax is more succinct, but XML syntax is more regular.
JSON is good for structured data/records such as serialized data structures found in RPC protocols.
They both have their own pros and cons that make them suited to different use cases. Choose the one that best suites your data model and use cases.
If you are just worried about data loss, having null allowed in text segments is already begging for failure, as C programs will almost certainly get them wrong.
If you are transferring binary, base64 or similar will already cover you.
And again, if this is a strike on xml, how do you represent NaN in a JSON document? Do what DynamoDB does and wrap all numbers in quotes?
I don't get these types of arguments.
There's zero reason you can't write code that parses a date in an expected format (and throws an error if the date is formatted incorrectly) and then checks that the date is in the past.
Yes, it does mean you'll spend time writing more code (You know, the job you're being paid to do?), and it would be nice if your data format supported such automatic checking functionality out of the box, but to say "It can't be done!" is just plain silly.
It's a good thing I didn't say that.
>Yes, it does mean you'll spend time writing more code
The whole point of WSDLs and OpenAPI is to minimize the amount of time it takes to consume your API. Saying you have to write more code is highlighting the shortcomings of OpenAPI at doing the only thing it's built to do. Which is why companies have largely punted on providing OpenAPI specs in favor of maintaining libraries in a handful of popular languages.
All of the parsers I used, and editors I have edited XML always shown the correct place where a caret is missing or XML is broken in anyway, so I have never hunted anything down inside a big XML file.
However, this doesn't invalidate your experience about unreadable XML files, which are most definitely present in the wild.
However, I agree that none of them are good config file formats, but storing data, I'll take XML all day, every day (except when I really need a binary file format, e.g.: for compressing data).
It's still a text bound serialization format, you still have to parse a tree for it.
Is it just particularly mature libraries?
By "stupidly fast", I mean I can read a 120K XML file, parse it, create the objects which generated from that file definition under 2ms. The library I use (RapidXML [0]) can parse the file almost with the same time cost of running strlen() on the same file. That's insane.
IME the XML spec is so complex that you either end up with a slow but compliant parser or a fast one that doesn't implement the spec completely.
JSON, unlike XML, is minimal enough that writing an entire compliant parser with SIMD intrinsics [1] is actually practically feasible. That library claims 3 GBps parsing speed, which could theoretically process your 120kb of data in 1/25000th of a second instead of 2/1000ths of a second.
I would wager that JSON is faster to parse, on balance.
[0] https://web.archive.org/web/20080209172554/https://rapidxml....
What broke me were: plain string and empty node handling.
Here is a fun quiz. Which of these two documents or both or neither are valid. With explanation ofc.
Yaml#1
:
Yaml#2 :Or it is a question that people who are not familiar with technology will ask frequently and makes no sense to people who are.
I had a family member ask "will bing work with my yahoo?"
So... I know those words individually, but... man... together, in that order... I don't know how to respond. I think I said something like "don't worry about it - it's not that big a deal" and left it at that.
The hype around Windows 95 at the time was... incredible. Midnight store openings, hours-long line waits... insanity.
I left one job for another in 1994 simply because the new company had access to the windows Chicago beta program (and it was a 20% salary bump). But the main reason for me choosing g that org was the beta, because I had other offers with similar salary bumps at the same time.
Which is to say, this feels a touch of a non-issue. Yes, writing it by hand can get tedious, but that is true of any and every format. Is why you will almost certainly reach for other formats if doing a long list of data. And each and every one of them will fail for some form of input in ways that is frustrating.
If you absolutely don't care about human interface, no reason to use XML either. It's meant to be more verbose. The XML tags will often dominate the size of the payload with things like `<question>Who</question>`, so you have to start thinking about shorter names. Yes JSON has a similar problem, but at least it's halved and you don't have to instruct everyone to call each list element "e". If you super care about size, you'll use protobufs or something.
To me, this does not seem like a win that's worth much, especially since it's likely to shrink considerably even with naive fast compression.
Furthermore, as messages grow in size, the explicitly named closing tag actually kind of starts helping.
Both of these syntaxes have their annoying quirks, for sure, and I understand you really dislike the closing tag; that clearly doesn't bother many people.
But regardless of personal preference, I'm really skeptical any of this really explains json's relentless path to replace (most) xml. Other reasons, such as the extreme wordiness some xml apis chose, the poor implementation of namespaces, the problems with embedding arbitrary data (in particularly control characters), the inconsistency between attributes and elements, the lack of support for numbers, the lack of (conventional) support for key-value pairs - all of these surely played a much greater role than a fairly limited syntax issue.
And it's not even like json is without impractical quirks; lack of comments, the ban on trailing commas, and the need for quotes in object-keys spring to mind. Yet those don't mean json is likely to die out soon - even though even javascript itself from which it is derived doesn't suffer from those (anymore)!
Also, I see you took it to be a full on defense of XML. I did not really intend it that way. I think both can be fine. And insisting on either is likely a mistake.
I do find your nitpicks here amusing, still. Size of tag is just as obnoxious as size of key. And, though it can dominate the textual representation, there are clear ways to reduce that. Even knowing that BSON and Binary XML exist, though, I'd be hard pressed to say any project that failed because they weren't using them.
<div>
<p>JSON example:</p>
<pre>
[
{"title": "Led Zeppelin II", "artist": "Led Zeppelin", "price": 999},
{"title": "La Brise", "artist": "Arax", "price": 999},
]
</pre>
<p>Source: <a href="https://news.ycombinator.com/item?id=35472014">click here</a>!</p>
</div>
JSON is great for a certain domains, but there are other domains where it is a nightmare and XML shines.Use the right tool for the job.
something like duplicating info in closing tags in XML (which applies to every element) isn't really comparable to stuff like having to escape certain characters in JSON strings (which applies only to the values use those things)
perfect is the enemy of the good, and the good is the metric
As for "looks horrible"... well yeah, I always feel that xml looks "spikey" somehow. But I've been programming in curly-brace languages for 30+ years and I still find json harder to read than xml: I think my brain tries to interpret it as code, not data. I find xml easier to read (even when its unformatted) precisely because the close-tokens kind of document what element they're closing.
Each to their own I guess. At least we're not stuck using ASN1.
Prepare for trouble, and make it double: http://xml.coverpages.org/dstc-xer2.html
Like, if I were there when XMPP was created, yes I would have insisted on JSON. XML was a plainly bad choice. Edit: Oh, JSON didn't exist until a little later. Maybe something similar did.
XML does have the "benefit" of being a bit more extensible than JSON. Specifically, being able to have namespaced elements in there does make some sense on paper. For example, you could have two extensions both add in data using the same keys, but different namespace. Can't really do that with JSON.
In practice, I think it just fell flat due to way too much "forethought" in things they anticipated people wanting.
new XMLSerializer().serializeToString(new DOMParser().parseFromString("<a>hello < </a>", "text/html"))
The above in my console does as expected there. And again, entities are a very dangerous part of XML and friends.You are correct that if you tell it that that is xml, the browser will throw it back at you. Just as the JSON parser will barf on JSON.parse("{'test':'value'}").
Some items can even be other forms of xml that have their own schemas dictating what is valid. (Thinking SVG here.)
I'll also note that even there, I can see why HTML went with the odd parsing they do. XHMTL tried going with "well formed" documents, but that falls flat for the authors. Is why "sections" of a document are essentially just collecting all of the "h" tags and making an implied tree out of that. As opposed to making the tree directly. To that end, my markup language of choice for Jupyter style things is org-mode in emacs. Yes, it has some warts; but again, all formats that I have ever seen have warts.
Edit: I want to add that I don't intend this as a "correction." I should say that I agree with your post. Complicated field where I doubt I'd have done better than most others. :)
any parser that behaves otherwise is pretty clearly buggy
json has many problems but parsing ambiguity is not really one of them
json is not
show me an application that parses `{'a':'b'}` as valid JSON, i'm actually interested, probably there are some which exist, but there is no ambiguity about those applications being wrong
html is not precisely defined
HTML5 is not XHTML.
<!DOCTYPE html> <title>Title <h1>Heading
expands to
<!DOCTYPE html> <head><title>Title</title></head> <body><h1>Heading</h1></body>
afaict none of them permit keys or value strings to be expressed with single quotes
I did find https://github.com/json5/json5 no a quick search that basically says what I asserted about people just jumping to another standard for things that you hand write. I was probably also thinking heavily about python's dict syntax. (And I confess, I still don't know when to use single versus double quotes in python...)