I think the reference to retirement funds is in relation to the fact many of them are under collateralized. They need to take new investor's money to pay out old investors.
You might enjoy "There Is No Antimemetics Division", a story (told in a series of short stories) about the idea of "Antimemes" (if memes are ideas that spread like viruses and can't be forgotten, antimemes are the opposite, they're not invisible but you can't remember them).
This calculator is outdated -- the current price of storage oscillates around $0.01/MB. The maximum single chunk size is also 10MB now, and you can send as many chunks as you like.
I think the HN crowd has always been interested in this sort of stuff? It certainly has since I started reading it (~7 years ago). One of the things that I like about HN is that we are not completely inward looking.
From the guidelines:
> On-Topic: Anything that good hackers would find interesting. That includes more than hacking and startups. If you had to reduce it to a sentence, the answer might be: anything that gratifies one's intellectual curiosity.
> Off-Topic: Most stories about politics, or crime, or sports, unless they're evidence of some interesting new phenomenon. Videos of pratfalls or disasters, or cute animal pictures. If they'd cover it on TV news, it's probably off-topic.
You could certainly say that a BBC group being forced to sign a confession for attempting to conduct an interview 'gratifies one's intellectual curiosity' about the world, but it could certainly fall under 'Most stories about politics, or crime', too.
Either way, as you point out, once a post has over 100 points you probably have your answer!
> once a post has over 100 points you probably have your answer!
It seems that we both agree that it's an Off-Topic article at face value and yet it is On-Topic based on the votes. The votes in effect override the second criteria of relevance, not unlike the final clause in Azimov's law of robotics.
Hence I'm genuinely curious about any shifts in the audience demographics that are occurring.
EDIT: I know this version won't support escaped separator/newline characters, but I made it for a specific use case in which I knew that would not occur. Adding that functionality would make it a little messier, but still not too bad.
EDIT2: Thanks for the interesting comments! Not so trivial after all!
Perhaps a more accurate version of what I was attempting to say above is that 'it is often (not always) easy to build a CSV parser to interact with one specific program'. The four line version above works perfectly for reading the type of files I designed it for. If you want to work with human created, or more complex variants of CSV, all bets are off.
You need a lot more than that to handle CSV in the wild (quoting, Unicode, line termination, etc.) but the real killer I see is when it's edited by humans. The special cases for errors and inconsistencies will add up quickly; in some cases you may be able to reject invalid data but you may not have that option or an easy way to tell whether any particular value is wrong.
Excel takes that, adds some fun things like people using color and formatting to store data, and things like Excel auto-corrupting values which look like dates and may not have been noticed before you do something with the data.
I know of at least one company whose entire business is handling this stuff. They find growing companies as they hit critical mass and need to move their Excel data into a real database. The product is just "Your data is hideous and was entered by hand without validation or formatting; it'll never convert and it'll be wrong when it does. We can help."
They handle all kinds of theory and technical stuff, like normalization and processing Excel-corrupted dates. But they also handle a lot of easy-but-agonizing tasks like regularizing single quotes into apostrophes, which crop as soon as you let humans enter free-form data.
I used to use Google Refine (now OpenRefine [0]) for this. It lets you load up the data and then apply rules to see if they are mostly correct. It doesn't get you all the way, but it is better than going blind on manually revising a huge Excel "database".
I'll try to remember. I ran into them at a career fair a few years ago, so it's not leaping to mind, but it seemed like they had good software and a great market niche.
Let's not forget Japan Post's CSV for all the Japanese Address data that contains some lines that are line-wrapped, that is, one record spans two or more lines in the CSV file. A line-wrapped CSV... I just can't even.
That is very interesting, thanks! I hadn't thought about Unicode or tolerating human error. Although the times I have worked with it have been when it is a transport medium between two computer programs.
That's definitely a less-aggravating situation by far. I've had a lot of cases where a significant amount of specialist human time was in a spreadsheet and it's really made me wish there was an Excel-for-data which acknowledges how many people are using it for semi-structured data like this.
Relational database can be expressed in tabular form, but tabular data is not necessarily relational.
> (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)
A relation is a system of one or more functions (in the mathematical sense) each of which has a domain that is a candidate key of the relation and a range that is the composite of the non-key attributes.
Interesting definition. Do you have a source for it. It seems ambiguous.
From the Wikipedia article on relational databases, subsection relational model.
"This model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. Rows are also called records or tuples."
I submitted that edit before the post had any replies.
Also, I did try to make clear that the given code was created 'for a specific use case in which I knew' that the format of the input files was tightly defined.
You can define a narrow subset or version of CSV that is trivial, but that doesn't reflect what one finds in the wild as "CSV", which was not systematically defined or described until well after many mutually incompatible things by that name.wdre well established.
Thanks for your thoughts. As I have stated elsewhere, the code handles all of the cases I needed it to handle, due to the stability of the input file format (which was emitted from another program). I don't see that this should be too hard to believe.
I also said in my second edit, on the top line, 'Not so trivial after all!'. If I was putting on some kind of act, wouldn't that have been dropping it? Further, I noted in my first edit, before I had received any replies, that I 'know this version won't support escaped separator/newline characters', so I am not sure what you were trying to add with your example?
I think that my central point (and I totally accept that I didn't express this well) is that depending on the specifications of your program, the required CSV parser /can be/ very short. When one compares this to other data exchange formats, for example JSON, it is clear that the barrier to /entry/ is much lower. The shortest JSON parser I could find with a cursory look was 200 lines of C.
I totally appreciate that to write a CSV parser that works for all cases would be extremely longwinded. It has been interesting to hear other people's experiences and opinions about that. But the fact remains true that /in some cases/, depending on the requirements of the program, the parser can be very short.
> We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)
It is funny you should say this. I needed the CSV parser because I thought it would be fun and interesting to see if I could build an anti-malware tool in a week (I am taking a malware detection class at the moment, I wanted it done before the next lecture). I did not expect I would be able to have anything good working in that time, but by the early hours of the next morning I had a perfectly functional anti-malware tool. It can use ClamAV signatures (so it can detect everything(?) that ClamAV can), runs in parallel, has a nice text console with DSL, and is fast enough (processing 210k small files in ~5 minutes, checking against ~60k sigs). It is about 650 lines of Erlang (including comments). I am saying this not to boast(!), but to make the point that I greatly underestimated how productive I could be, beat my expectations by many fold, then people comment about my hubris online the next day. It is funny how life goes!
https://www.sciencealert.com/hoyle-wickramasinghe-thesis-com...