Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is very interesting, though the limitations for 'security' reasons seem somewhat surprising to me compared to the claim "Anything JSON can do, it can do. Anything JSON can't do, it can't do.".

Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?





"a\u0000b" ("a" followed by a vertical tabulation control code) is also a perfectly valid and in-bounds BONJSON string. What BONJSON rejects is any invalid UTF-8 sequences, which shouldn't even be present in the data to begin with.

You're thinking of "a\u000b". "a\u0000b" is the three-character string also written "a\x00b".

Bleh... This is why my text formats use \[10c0de] to escape unicode codepoints. Much easier for humans to parse.

My example was a three character string where the second one is \u0000, which is the NUL character in the middle of the string.

The spec on the GitHub says that it is banned to include NUL under a security stance, that someone that after parse someone might do strlen and accidentally truncate to a shorter string in C.

Which I think has some premise, but its a valid string contents in JSON (and in Utf8), so it is deliberately breaking 1:1 parity with JSON parity in the name of a security hypothetical.


The spec says that implementations must disable NUL by default (as in, the default configuration must disallow). https://github.com/kstenerud/bonjson/blob/main/bonjson.md#nu...

Users can of course enable NUL in the rare cases where they need it, but I want safe defaults.

Actually, I'll make that section clearer.


So I think it's a very neat format, but my feedback as a random person on the Internet is that I don't think it does uphold the claimed vision in the end of being 1:1 to JSON (the security parts, but also you do end up adding extra types too) and that's a bit of a shame compared to the top line deliverable.

Just focusing narrowly on the \0 part to explain why I say so: the spec proposed is that implementations have to either hard ban embedded \0 or disallow by default with an opt in. So someone comes with a dataset that has it, they can get support in this case only if they configure both the serializer and parser to allow it. But if you're willing to exert that level of special case extra control, I think all of the other preexisting binary-json implementations that exist do meet the top line definition you are setting as well. For some binary-json implementation which has additional types, if someone is in full end to end control to special case, then they could just choose not to use those types too, the mere existence of extra types in the binary format is no extra "problem" for 1:1 than this choice.

IMO the deliverable that a 1:1 mapping would give us "there is no bonjson data that won't losslessly round trip to JSON and vice versa". The benefit is when it is over all future data that you haven't seen yet, where the downside of using something that is not bijective is that you run for a long time suddenly you have data dependent failures in your system because you can't 1:1 map legal data.

And especially with this guarantee, what will inevitably happen is some downstream handling will also take as a given that they can strlen() since they "knew" the bonjson format spec banned it, so suddenly when you have it as in-bounds data you also won't be able to trivially flip the switch, instead you are stuck with legal JSON that you can't ingest in your system without an expensive audit because the reduction from 1:1 gets entrenched as an invariant into the handling code.

Note that my vantage point might be a bit skewed here: I work on Protobuf and this shape of ecosystem interoperability topics are top of mind for me in ways that they don't necessarily need to be for small projects, and I also recognize that "what even is legal JSON" itself is not actually completely clear, so take it all with a grain of salt (and again, I also do think it looks like a very nice encoding in general).


Oh yes, I do understand what you're getting at. I'm willing to go a little off-script in order to make things safer. The NUL thing can be configured away if needed, but requires a conscious decision to do so.

Friction? yeah, but that's just how it's gonna be.

For the invalid Unicode and duplicate key handling, I'll offer no quarter. The needs of the many outweigh the needs of the few.

But I'll still say it's 1:1 because marketing.


> But I'll still say it's 1:1 because marketing.

Isn't that lying? Marketing is when you help connect people who require a product or service (the market) with a provider of that product or service.


Did you read "Parsing JSON is a minefield"?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: