Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can easily represent it as an array:

    [“foo”,”bar”,123]
That’s as tabular as CSV but you now have optional types. You can even have lists of lists. Lists of objects. Lists of lists of objects…


Right - the JSON-newline equivalent of CSV can look like this:

    ["id", "species", "nickname"]
    [1, "Chicken", "Chunky cheesecakes"]
    [2, "Dog", "Wagging wonders"]
    [3, "Bunny", "Hopping heroes"]
    [4, "Bat", "Soaring shadows"]


Remove the [] characters and you've invented CSV with excel style quoting.


Almost, except the way Excel-style quoting works with newlines sucks - you end up with rows that span multiple lines, so you can't split on newline to get individual rows.

With JSON those new lines are \n characters which are much easier to work with.


I ended up parsing the XML format instead of the CSV format when handling paste from Excel due to the newlines issue.

CSV seemed so simple but after numerous issues, a cell with both newline and " made me realize I should keep the little hair I had left and put in the work to parse the XML.

It's not great either, with all its weird tags, but at least it's possible to parse reliably.


This is the way. jsonl where each row is a json list. It has well-defined standard quoting.

Just like csv you don't actually need the header row either, as long as there's convention about field ordering. Similar to proto bufs, where the field names are not included in the file itself.


This misses the point of standardization imo because it’s not possible to know a priori that the first line represents the variable names, that all the rows are supposed to have the same number of elements and in general that this is supposed to represent a table. An arbitrary parser or person wouldn’t know to guess since it's not standard or expected. Of course it would be parsed fine but the default result would be a kind of structure or multi-array rather than tabular.


application/jsonl+table


Typing isn't optional in JSON, every value has a concrete type, always.


Types at the type layer are not the same as types at the semantic layer. Sure every type in the JSON level has a "strong type" but the semantic meaning of the contents of e.g. a string are usually not expressable in pure JSON. So it is with CSV; you can think of every cell in CSV as containing a string (series of bytes) with it being up to you to enforce the semantics atop those bytes. JSON gives you a couple extra types, and if you can fit things into those types well, then that's great, but for most data concrete semantically meaningful data you won't be able to do that and you'll end up in a similar world to CSVs.


  [
   [“header1”,”header2”],
   [“1.1”, “”],
   [7.4, “2022-01-04”]
  ]


...and?

I see an array of arrays. The first and second arrays have two strings each, the last one has a float and a string. All those types are concrete.

Let's say those "1.1" and 7.4 values are supposed to be version strings. If your code is only sometimes putting quotes around the version string, the bug is in your code. You're outputting a float sometimes, but a string in others. Fix your shit. It's not your serialization format that's the problem.

If you have "7.4" as a string, and your serialization library is saying "Huh, that looks like a float, I'm going to make it a float", then get a new library, because it has a bug.


You're missing my point: basically nothing spits out data in that format because it's not ergonomic to do so. JSON is designed to represent object hierarchies, not tabular data.


CSV is lists of lists of fixed length.

JSON is lists of lists of any length and groups of key/value pairs (basically lisp S-expressions with lots of unnecessary syntax). This makes it a superset of CSV's capabilities.

JSON fundamentally IS made to represent tabular data, but it's made to represent key-value groups too.

Why make it able to represent tabular data if that's not an intended use?


> JSON is lists of lists of any length and groups of key/value pairs

The "top-level" structure of JSON is usually an object, but it can be a list.

> JSON fundamentally IS made to represent tabular data

No, it's really not. It's made to represent objects consisting of a few primitive types and exactly two aggregate types: lists and objects. It's a textual representation of the JavaScript data model and even has "Object" in the name.

> Why make it able to represent tabular data if that's not an intended use?

It's mostly a question of specialization and ergonomics, which was my original point. You can represent tabular data using JSON (as you can in JavaScript), but it was not made for it. Anything that can represent """data""" and at least 2 nesting levels of arbitrary-length sequences can represent tabular data, which is basically every data format ever regardless of how awkward actually working with it may be.


The fact that json can represent a superset of tabular data structures that csv is specifically designed to represent can be rephrased into that csv is more specialised than json in representing tabular data. The fact that json can also represent tabular data does not mean it is a better or more efficient way to represent that data instead of a format like csv.

In the same way, there are hierarchically structured datasets that can be represented by both json in hierarchical form and csv in tabular form by repeating certain variables, but if using csv would require repeating them too many times, it would be a bad idea to choose that instead of json. The fact that you can do sth does not always make it a good idea to do it. The question imo is about which way would be more natural, easy or efficient.


> The fact that json can represent a superset of tabular data structures that csv is specifically designed to represent can be rephrased into that csv is more specialised than json in representing tabular data. The fact that json can also represent tabular data does not mean it is a better or more efficient way to represent that data instead of a format like csv.

The reverse is true as well: being more specialized is a description of goals, not advantages.


It's hardly a bad idea to do a list of lists in JSON...

The big advantage of JSON is that it's standardized and you can reuse the JSON infrastructure for more than just tabular data.


> CSV is lists of lists of fixed length.

I'd definitely put that in my list of falsehoods programmers believe about CSV files.


It seems to be indicated by RCF-4180 which says

> This header will contain names corresponding to the fields in the file and should contain the same number of fields as the records in the rest of the file

But of course, CSV is the wild west and there's no guarantee that any two encoders will do the same thing (sometimes, there's not even a guarantee that the same encoder will do the same thing with two different inputs).

[0] https://www.ietf.org/rfc/rfc4180.txt


You should know that "should" isn't very binding.

Headers should have as many rows as possible that contain data items for their column and data items in a row should have a header for the respective columns, but real CSV files should be assumed to have incomplete or variable length lines.


NOTHING is very binding about the CSV spec and that's the biggest problem with CSV.


CSV is a text file that might have commas in it


yeah if I had a cookie for every time I've had to deal with this I'd have maybe 10 cookies - it's not a lot but it's more than it should be.


A format consisting of newline-terminated records, each containing comma-separated JSON strings would be superior to CSV.

It could use backslash escapes to denote control characters and Unicode points.

Everyone would agree exactly on what the format is, in contrast to the zoo of CSV variants.

It wouldn't have pitfalls in it, like spaces that defeat quotes

  RFC CSV         JSON strings
  a,"b c"         "a", "b c"
  a, "b c"        "a", " \" b c\""
oops; add an innocuous-looking space, and the quotes are now literal.


[deleted]


"Any JSON primitive" does add a few requirements not semantically comparable to CSV, like numbers that are numbers, and keywords true, false, none.

When these syntaxes are parsed into objects, either the type info has to be retaind, or some kind of attribute tag, so they can be output back to the same form.

> make it so any consumer can parse it by splitting on newline and then ...

There is something like that called JSON-lines. It has a .org domain 'n' everything:

https://jsonlines.org/


JSON was designed to represent any data. There's plenty of systems that spit out data in exact that format because it's the natural way to represent tabular data using JSON serialization. And clearly if you're the one building the system you can choose to use it.


JSON is designed to represent JavaScript objects with literal notation. Guess what, an array of strings or an array of numbers or even an array of mixed strings and numbers is a commonly encountered format in JavaScript.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: