Introducing TJSON: Tagged JSON with Rich Types

Introducing TJSON: Tagged JSON with Rich Types(tjson.org)

36 points by alexdean 9 years ago | 52 comments

Surprised no one here has mentioned Transit. It's an extensible typed data format with both JSON and binary representations. In other words, you can configure custom types, such as Immutable data structures, and they'll be automatically serialized and restored for you.

Good intro: http://cognitect.github.io/transit-tour/

GitHub: https://github.com/cognitect/transit-js

Note that in the introduction they provide a simple benchmark where Transit is both more compact and faster to parse than JSON with custom hydration.

keithwhor 9 years ago |

Biggest complaint:

Unreadable format, as mentioned in this thread.

{"key:A<A<s>>":[["values"],["here"]]}

This doesn't mean anything to me as a developer, unless I've seen the spec. It's kludgy. It's not reverse-compatible if you don't install a TJSON parser.

Two solutions immediately strike me as better, one has been mentioned here.

(1) Not optimal, but actually spell out words in key names. There's no reason "A" has to mean Array. That doesn't mean anything to me. If I'm seeing it for the first time and have no idea what TJSON is, the very next value could be "key2:B<B<t>>".

(2) Far more optimal: as an example has been provided with "date", just nest objects as values for any extended types. Then this spec is completely reverse compatible and compliant, and as a developer I don't have to worry about parsing key names.

e.g.

  {
    "some_nested_array": {
      "type": "array.array.string",
      "value": [
        ["values"],
        ["here"]
      ]
    }
  }

Extremely easy to implement and not reliant on a governing body.

kafkaesq 9 years ago | |

And that was the beauty of JSON: There was no "format", per se. And you certainly didn't need to think or care about some friggen "spec". All you had to do was sort of take a look at it, say "OK", and get moving with the business of writing your application.

tobltobs 9 years ago |

Dear JS Hipsters, even if you all suffer from NIHS, could you please take look at XML before you invent another format. I am sure you will get used to those square brackets.

bascule 9 years ago | |

Hi, TJSON creator here.

I have certainly studied XML and think XML Schema did fantastic work specifying datatypes:

https://www.w3.org/TR/xmlschema11-2/#built-in-datatypes

I briefly considered adopting this work wholesale:

https://github.com/tjson/tjson-spec/issues/37

If you'd like to see that happen, please make a note of it in the issue. Thanks!

Also note: I'm not a JS hipster, I'm part of the Rust Evangelism Strike Force.

pvg 9 years ago | | |

I'm curious what drove decisions like no top-level arrays and strict conditions on set members. It's not mentioned explicitly in the spec but if the object syntax is the same as JSON, multiple field names would be allowed in that case.

seagreen 9 years ago | |

First: XML is a markup language for encoding documents, JSON is a data-interchange language. Each can be twisted to do the job of the other, but they don't naturally do the same job.

Second, XML is extraordinarily complicated. Flipping around the XML 1.0 spec (https://www.w3.org/TR/xml/) isn't really encouraging me that all of this is there for a reason. I'd love to be proved wrong though!

In contrast, RFC 7159 is incredibly short and readable: https://tools.ietf.org/html/rfc7159. The TJSON spec isn't bad either: https://www.tjson.org/spec/. Even combining both the result is still far shorter and more clear than XML.

tootie 9 years ago | | |

First, either XML or JSON are suitable for encoding documents or data interchange. Second, XML is also very _sophisticated_ and has an array of useful features that JSON developers are suddenly realizing to be pretty valuable sometimes. XSD is verbose, but it's rock solid. XPath and XInclude are also pretty awesome.

tobltobs 9 years ago | | |

If you want something lightweight and readable JSON is fine. If you need a solution which covers 99% of all possible requirements there is XML. Everything in between will converge to the feature set of XML over time, if it lives long enough.

zeveb 9 years ago | |

Even better, research things like ASN.1 and canonical S-expressions. Re. the latter, here are some examples:

    {"hello-world:s": "Hello, world!"} → (hello-world "Hello, world!")
    {"hello-base-sixteen:d16": "48656c6c6f2c20776f726c6421"} → (hello-base-sixteen #48656c6c6f2c20776f726c6421#)
    {"base-sixty-four-is-default:d": "SGVsbG8sIHdvcmxkIQ"} → (base-sixty-four |SGVsbG8sIHdvcmxkIQ|)
    {"hello-signed-int:i": "42"} → (some-int 42)
    Ø → (some-big-int [bigint]|GY0+kwq94p4QRs2j4rHisQLgEN3zsFSZNJrgK+ZFcV0s1ShyMkMFOHip0oRuG7v+TAC7qmDaYSojFbZjNV5dSA==|)
    {"hello-timestamp:t": "2016-10-02T07:31:51Z"}  → (hello-timestamp [timestamp]2016-10-02T07:31:51Z)

Seriously, this is IMHO so clearly good I'm surprised more folks don't agree.

bascule 9 years ago | | |

TJSON is ultimately being written in service of a credential format I'm working on (however it will be using a compact binary format isomorphic with TJSON)

The main inspiration for this format is SPKI/SDSI, which was based on S-expressions. As beautiful as you think the S-expression version may be over the (T)JSON, I personally blame the use of S-expressions as one of many reasons SPKI/SDSI failed to gain more widespread traction, and personally think something like TJSON is a lot more likely to gain traction than the second coming of S-expressions. This is, of course, a debatable point, but you won't find me working on Sexp-based formats any time soon.

ASN.1 of course has a sordid history in the credential space as well, often reviled by security experts as the source of frequent vulnerabilities, particularly problematic encodings like BER. I will admit OER is nice, but nobody uses OER and the IETF prefers things be standardized in terms of DER.

"Research things", yes been there, done that.

ungzd 9 years ago | |

This thing is designed for representing common data structures (array, set, date) in JSON. XML does not have that too. It's not validator such as XML Schema (JSON has JSON schema which is quite popular), it does not check anything.

keithwhor 9 years ago | | |

If it's not a validator, then what's the value? Why would I use this?

I assume the TJSON libraries throw errors if invalid types or formats are provided --- which is good, but that makes this a validator. Developers have been representing non-standard formats in JSON for years.

Google's response to JSON's limitations was the Protocol Buffer [1], and as I understand it, it's used internally relatively extensively, but there hasn't been much adoption outside of Google. JSON is just the right mix of simple + robust for the majority of use cases.

[1] https://developers.google.com/protocol-buffers/

zepolen 9 years ago | | |

XML provisions an extensible way to markup your data though - TJSON is just a hack upon JSON...and it's full of potential problems that XML has solved years ago.

laurent123456 9 years ago |

I'm curious what would be the use case for this? JSON is a human readable/writable format, however this kind of syntax is not anymore: "{"nested-array:A<A<s>>": [["Nested"], ["Array!"]]}"

So it feels more like a machine format, but in that case why not use a more efficient one, like a binary format?

zmanian 9 years ago | |

The point of this format to push data over the wire in a format that is both semantically richer and authenticatable using techniques like object-hash. This gets us to one true unambiguous representation of the data which you need for redactable signatures and rich credentials.

bascule 9 years ago | |

Hello, I created TJSON. The answer to your question can be found in the second sentence on the page:

> TJSON documents are amenable to "content-aware hashing" where different encodings of the same data (including both TJSON and binary formats like Protocol Buffers, MessagePack, BSON, etc) can share the same content hash and therefore the same cryptographic signature.

TJSON is designed to facilitate documents that retain the same content hash when transcoded to/from binary formats.

seagreen 9 years ago | | |

Could you help clarify this? My guess is that you're saying that you have some data type with (eg) strings and timestamps. When encoded to binary these are encoded differently, resulting in hash A. But if you roundtrip the data through JSON first both come back as strings, which when encoded to binary gives hash B. Am I on the right track?

laurent123456 9 years ago | | |

If hashing is the main concern, wouldn't a "strict" spec for JSON do the job? eg. "all keys must be sorted", "all dates must be ISO-xxx", etc.?

adambrod 9 years ago | |

It's very readable if you've used a typed language before. The <> brackets are like generics.

dolmen 9 years ago | | |

JSON is a subset of JavaScript. <> brackets are not in JavaScript.

hajile 9 years ago |

If you're making it unreadable with types, you might as well switch to a statically typed binary JSON format like bson or ubjson instead. You get smaller files, faster parsing, partial parsing (skip what you don't need), and (in some implementations) streaming of large files.

http://ubjson.org/

http://bsonspec.org/

escherize 9 years ago |

Edn seems like a better solution here. Not o ly is the tagging more straightforward (wow not embedded in a string?), But you can write your own tags for custom types.

user5994461 9 years ago |

If you want JSON with type checking, use a json schema.

http://json-schema.org/examples.html

Been there for almost a decade. Already supported by all the major json libraries in all the major languages.

ungzd 9 years ago |

This is literally hungarian notation for JSON.

dolmen 9 years ago |

As any normal JSON document is not a valid TJSON document (and worse, some JSON document may be valid TJSON documents but TJSON imposes a different interpretation) using the "JSON" suffix is just misleading.

pfooti 9 years ago |

literally the only use case I see here is dates. Like everything else I can infer the type of the field based on its contents. "boolean": false, no kidding. "event_ts": 1223349483, is that an index number or milliseconds since epoch or what? Well, probably ms since epoch, but my one gripe about json is that there's no good way to push dates without domain knowledge (anything whose property name ends in _at or _ts gets converted? all numbers in a certain range get converted?)

geezerjay 9 years ago | |

> literally the only use case I see here is dates.

{ "date": "1937-01-01T12:00:27.87+00:20" }

As you can see, JSON doesn't stop anyone from using RFC3339 to encode dates.

pfooti 9 years ago | | |

Sure, and then on the other end of the connection you need to say: newThing.date = new Date(newThing.date), or else it'll deserialize as a string.

What I'm getting as is that a date gets serialized into JSON as either a string or a number, depending on who wrote the toJSON method, and that the consumer of that JSON needs knowledge about the schema of the data in order to properly deserialize it.

dep_b 9 years ago |

Backward compatibility seems terrible to me. A regular JSON parser will produce garbage from this since variable names are changed while an XML parser parsing without any context how to parse specific fields will still provide correct data. Your dates might remain strings for example but the string is still correct.

TazeTSchnitzel 9 years ago |

I question the value of the tags where they state the obvious. Does this

  {"foo:O":{}}

really tell you more than

  {"foo":{}}

The ability to encode sets, integers, binary data and time stamps is useful. But why tag things which are what they look like? It's a waste of space.

bascule 9 years ago | |

Domain separation. Unless everything is tagged, an attacker can trick the parser into misinterpreting the type of an object.

Or, a more mundane explanation: the parser will silently clobber the name because it contains a ":"

Leaving any names untagged is ambiguous.

zwerdlds 9 years ago |

Anyone care to give a first blush comparison vs protobuffs/json schema?

dolmen 9 years ago | |

JSON schema is a much more complete language for validation. And every types in the limited TJSON can be described with JSON Schema.

Besides that, in JSON Schema the schema is not bundled with the data. This is a feature for input validation: the receiver must know what it allows, not just what is received. This is a feature for readability (which is a great feature of JSON) as the data is not uncumbered with the schema. A receiver is free to use a schema or not. While TJSON imposes a receiver to recognize its dirty format.

So TJSON brings nothing new, except interoperability problems.

panic 9 years ago |

Some previous discussion: https://news.ycombinator.com/item?id=12856968

keredson 9 years ago |

Why would you define the type of something in the PARENT object?

cartercole 9 years ago | |

so you can be cute and have a reason to enforce no top level arrays with the added benefit of ajax security... but really why would you dirty your key that allows fast lookup with type information and keep yourself from using : in keys

Entangled 9 years ago |

I prefer YAML no matter how much lipstick you put on JSON.

dolmen 9 years ago | |

A major problem with YAML is that the spec is so complex that no existing parser implements it fully. And each implementation supports a different subset. YAML is just not interoperable.

{"hello-world:s": "Hello, world!"} → (hello-world "Hello, world!") {"hello-base-sixteen:d16": "48656c6c6f2c20776f726c6421"} → (hello-base-sixteen #48656c6c6f2c20776f726c6421#) {"base-sixty-four-is-default:d": "SGVsbG8sIHdvcmxkIQ"} → (base-sixty-four |SGVsbG8sIHdvcmxkIQ|) {"hello-signed-int:i": "42"} → (some-int 42) Ø → (some-big-int [bigint]|GY0+kwq94p4QRs2j4rHisQLgEN3zsFSZNJrgK+ZFcV0s1ShyMkMFOHip0oRuG7v+TAC7qmDaYSojFbZjNV5dSA==|) {"hello-timestamp:t": "2016-10-02T07:31:51Z"} → (hello-timestamp [timestamp]2016-10-02T07:31:51Z)