Almost, except the way Excel-style quoting works with newlines sucks - you end up with rows that span multiple lines, so you can't split on newline to get individual rows.
With JSON those new lines are \n characters which are much easier to work with.
I ended up parsing the XML format instead of the CSV format when handling paste from Excel due to the newlines issue.
CSV seemed so simple but after numerous issues, a cell with both newline and " made me realize I should keep the little hair I had left and put in the work to parse the XML.
It's not great either, with all its weird tags, but at least it's possible to parse reliably.
This is the way. jsonl where each row is a json list. It has well-defined standard quoting.
Just like csv you don't actually need the header row either, as long as there's convention about field ordering. Similar to proto bufs, where the field names are not included in the file itself.
This misses the point of standardization imo because it’s not possible to know a priori that the first line represents the variable names, that all the rows are supposed to have the same number of elements and in general that this is supposed to represent a table. An arbitrary parser or person wouldn’t know to guess since it's not standard or expected. Of course it would be parsed fine but the default result would be a kind of structure or multi-array rather than tabular.
Types at the type layer are not the same as types at the semantic layer. Sure every type in the JSON level has a "strong type" but the semantic meaning of the contents of e.g. a string are usually not expressable in pure JSON. So it is with CSV; you can think of every cell in CSV as containing a string (series of bytes) with it being up to you to enforce the semantics atop those bytes. JSON gives you a couple extra types, and if you can fit things into those types well, then that's great, but for most data concrete semantically meaningful data you won't be able to do that and you'll end up in a similar world to CSVs.
I see an array of arrays. The first and second arrays have two strings each, the last one has a float and a string. All those types are concrete.
Let's say those "1.1" and 7.4 values are supposed to be version strings. If your code is only sometimes putting quotes around the version string, the bug is in your code. You're outputting a float sometimes, but a string in others. Fix your shit. It's not your serialization format that's the problem.
If you have "7.4" as a string, and your serialization library is saying "Huh, that looks like a float, I'm going to make it a float", then get a new library, because it has a bug.
You're missing my point: basically nothing spits out data in that format because it's not ergonomic to do so. JSON is designed to represent object hierarchies, not tabular data.
JSON is lists of lists of any length and groups of key/value pairs (basically lisp S-expressions with lots of unnecessary syntax). This makes it a superset of CSV's capabilities.
JSON fundamentally IS made to represent tabular data, but it's made to represent key-value groups too.
Why make it able to represent tabular data if that's not an intended use?
> JSON is lists of lists of any length and groups of key/value pairs
The "top-level" structure of JSON is usually an object, but it can be a list.
> JSON fundamentally IS made to represent tabular data
No, it's really not. It's made to represent objects consisting of a few primitive types and exactly two aggregate types: lists and objects. It's a textual representation of the JavaScript data model and even has "Object" in the name.
> Why make it able to represent tabular data if that's not an intended use?
It's mostly a question of specialization and ergonomics, which was my original point. You can represent tabular data using JSON (as you can in JavaScript), but it was not made for it. Anything that can represent """data""" and at least 2 nesting levels of arbitrary-length sequences can represent tabular data, which is basically every data format ever regardless of how awkward actually working with it may be.
The fact that json can represent a superset of tabular data structures that csv is specifically designed to represent can be rephrased into that csv is more specialised than json in representing tabular data. The fact that json can also represent tabular data does not mean it is a better or more efficient way to represent that data instead of a format like csv.
In the same way, there are hierarchically structured datasets that can be represented by both json in hierarchical form and csv in tabular form by repeating certain variables, but if using csv would require repeating them too many times, it would be a bad idea to choose that instead of json. The fact that you can do sth does not always make it a good idea to do it. The question imo is about which way would be more natural, easy or efficient.
> The fact that json can represent a superset of tabular data structures that csv is specifically designed to represent can be rephrased into that csv is more specialised than json in representing tabular data. The fact that json can also represent tabular data does not mean it is a better or more efficient way to represent that data instead of a format like csv.
The reverse is true as well: being more specialized is a description of goals, not advantages.
> This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file
But of course, CSV is the wild west and there's no guarantee that any two encoders will do the same thing (sometimes, there's not even a guarantee that the same encoder will do the same thing with two different inputs).
Headers should have as many rows as possible that contain data items for their column and data items in a row should have a header for the respective columns, but real CSV files should be assumed to have incomplete or variable length lines.
"Any JSON primitive" does add a few requirements not semantically comparable to CSV, like numbers that are numbers, and keywords true, false, none.
When these syntaxes are parsed into objects, either the type info has to be
retaind, or some kind of attribute tag, so they can be output back to the same form.
> make it so any consumer can parse it by splitting on newline and then ...
There is something like that called JSON-lines. It has a .org domain 'n' everything:
JSON was designed to represent any data. There's plenty of systems that spit out data in exact that format because it's the natural way to represent tabular data using JSON serialization. And clearly if you're the one building the system you can choose to use it.
JSON is designed to represent JavaScript objects with literal notation. Guess what, an array of strings or an array of numbers or even an array of mixed strings and numbers is a commonly encountered format in JavaScript.