More

samwilliams · on Feb 6, 2022

Helpful article providing a less dense overview:

https://www.sciencealert.com/hoyle-wickramasinghe-thesis-com...

samwilliams · on Nov 2, 2021

Put aside for a moment the name for the 'scheme' and consider the mechanism design differences between USD and Bitcoin:

- USD is needed in order to pay taxes on the US, but can be printed arbitrarily by a centralized mint.

- Bitcoin has no fundamental demand driver, but does have a truly capped supply.

Why is it so obvious that USD is a preferable scheme?

qeternity · on Nov 2, 2021

What sort of straw man response is this ? I never said USD is preferable. I said that fiat currencies are not ponzis.

samwilliams · on Nov 2, 2021

I think the reference to retirement funds is in relation to the fact many of them are under collateralized. They need to take new investor's money to pay out old investors.

Some more background here:

https://www.google.com/amp/s/www.wsj.com/amp/articles/an-ohi...

The situation is pretty extreme. To be honest, these really _do_ look like Ponzi schemes.

qeternity · on Nov 2, 2021

No.

> Honestly index funds and cash from retirement savings is a ponzi scheme as well then

The comment was pretty clear: index funds, and cash.

Defined benefit pension funds are absolutely ponzis and should be outlawed. But this is not what OP said.

samwilliams · on Oct 10, 2021

That first link is an incredible rabbit hole. Thank you for sharing.

Vingdoloras · on Oct 10, 2021

You might enjoy "There Is No Antimemetics Division", a story (told in a series of short stories) about the idea of "Antimemes" (if memes are ideas that spread like viruses and can't be forgotten, antimemes are the opposite, they're not invisible but you can't remember them).

https://qntm.org/scp

samwilliams · on Oct 4, 2021

Whistleblower that spoke to WSJ.

samwilliams · on Nov 6, 2019

Hey dweller!

That isn't the mechanism. What happens instead is akin to an endowment for payment of storage. An outline is found here:

https://www.arweave.org/faqs.html#what-is-a-storage-endowmen...

And the paper has the full details:

https://www.arweave.org/files/arweave-yellowpaper.pdf

samwilliams · on Nov 5, 2019

Thanks for the comment @blacksqr!

This calculator is outdated -- the current price of storage oscillates around $0.01/MB. The maximum single chunk size is also 10MB now, and you can send as many chunks as you like.

samwilliams · on May 17, 2019

A community member built a blogging platform on our permanent, decentralised web (Arweave) today. You might want to check it out:

https://arweave.net/egqLRAbESk4BvrPZGwNM-koJZZU3_xYRb_G9-CYC...

Full disclosure: I started the Arweave project.

Sam

samwilliams · on March 3, 2017

I think the HN crowd has always been interested in this sort of stuff? It certainly has since I started reading it (~7 years ago). One of the things that I like about HN is that we are not completely inward looking.

From the guidelines:

> On-Topic: Anything that good hackers would find interesting. That includes more than hacking and startups. If you had to reduce it to a sentence, the answer might be: anything that gratifies one's intellectual curiosity.

> Off-Topic: Most stories about politics, or crime, or sports, unless they're evidence of some interesting new phenomenon. Videos of pratfalls or disasters, or cute animal pictures. If they'd cover it on TV news, it's probably off-topic.

You could certainly say that a BBC group being forced to sign a confession for attempting to conduct an interview 'gratifies one's intellectual curiosity' about the world, but it could certainly fall under 'Most stories about politics, or crime', too.

Either way, as you point out, once a post has over 100 points you probably have your answer!

rodionos · on March 3, 2017

  > once a post has over 100 points you probably have your answer!

It seems that we both agree that it's an Off-Topic article at face value and yet it is On-Topic based on the votes. The votes in effect override the second criteria of relevance, not unlike the final clause in Azimov's law of robotics.

Hence I'm genuinely curious about any shifts in the audience demographics that are occurring.

samwilliams · on Jan 17, 2017

Do you mean Excel to CSV? Or a CSV importer?

I totally agree about Excel importing, but CSV is trivial, no? Here is an Erlang version I happened to write yesterday:

  lists:map(
    fun(Row) -> string:tokens(Row, [SepChar]) end,
    string:tokens(InputStr, "\n")
  ).

EDIT: I know this version won't support escaped separator/newline characters, but I made it for a specific use case in which I knew that would not occur. Adding that functionality would make it a little messier, but still not too bad.

EDIT2: Thanks for the interesting comments! Not so trivial after all!

Perhaps a more accurate version of what I was attempting to say above is that 'it is often (not always) easy to build a CSV parser to interact with one specific program'. The four line version above works perfectly for reading the type of files I designed it for. If you want to work with human created, or more complex variants of CSV, all bets are off.

acdha · on Jan 17, 2017

You need a lot more than that to handle CSV in the wild (quoting, Unicode, line termination, etc.) but the real killer I see is when it's edited by humans. The special cases for errors and inconsistencies will add up quickly; in some cases you may be able to reject invalid data but you may not have that option or an easy way to tell whether any particular value is wrong.

Excel takes that, adds some fun things like people using color and formatting to store data, and things like Excel auto-corrupting values which look like dates and may not have been noticed before you do something with the data.

Bartweiss · on Jan 17, 2017

I know of at least one company whose entire business is handling this stuff. They find growing companies as they hit critical mass and need to move their Excel data into a real database. The product is just "Your data is hideous and was entered by hand without validation or formatting; it'll never convert and it'll be wrong when it does. We can help."

They handle all kinds of theory and technical stuff, like normalization and processing Excel-corrupted dates. But they also handle a lot of easy-but-agonizing tasks like regularizing single quotes into apostrophes, which crop as soon as you let humans enter free-form data.

coredog64 · on Jan 17, 2017

I used to use Google Refine (now OpenRefine [0]) for this. It lets you load up the data and then apply rules to see if they are mostly correct. It doesn't get you all the way, but it is better than going blind on manually revising a huge Excel "database".

[0] http://openrefine.org/

rabidrat · on Jan 17, 2017

What do you use now?

btown · on Jan 17, 2017

Could you share the company name?

Bartweiss · on Jan 17, 2017

I'll try to remember. I ran into them at a career fair a few years ago, so it's not leaping to mind, but it seemed like they had good software and a great market niche.

nicostouch · on Jan 17, 2017

Let's not forget Japan Post's CSV for all the Japanese Address data that contains some lines that are line-wrapped, that is, one record spans two or more lines in the CSV file. A line-wrapped CSV... I just can't even.

johnnyhillbilly · on Jan 17, 2017

That's why ASCII was designed with record and field separators. Unfortunately, it's not used (de facto) for delimited files.

samwilliams · on Jan 17, 2017

That is very interesting, thanks! I hadn't thought about Unicode or tolerating human error. Although the times I have worked with it have been when it is a transport medium between two computer programs.

acdha · on Jan 17, 2017

That's definitely a less-aggravating situation by far. I've had a lot of cases where a significant amount of specialist human time was in a spreadsheet and it's really made me wish there was an Excel-for-data which acknowledges how many people are using it for semi-structured data like this.

nicostouch · on Jan 17, 2017

Like Airtable?

byroot · on Jan 17, 2017

I'm not talking about parsing. It's a mess in it's own right of course (encoding, line terminators, etc as others mentioned).

I'm talking about the actual conversion from tabular data to relational. Most of the applications I've worked on had this in one form or another.

So you end up with users downloading an export of their data in CSV, editing it in Excel in various ways, and then reimporting it in the application.

Every company I worked for, this kind of feature was always in the top 3 in term of support load.

johnnyhillbilly · on Jan 17, 2017

"Relational" means "tabular". (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)

A "relationship" in an ER diagram maps to a "reference" in relational theory. This is part of the type safety/domain system of RDBMSs.

If these concepts are muddled, SQL will never quite make sense :)

dragonwriter · on Jan 17, 2017

> "Relational" means "tabular".

Relational database can be expressed in tabular form, but tabular data is not necessarily relational.

> (A "relation" in relational theory is a table with a name, fields with names and types, and the data in the table.)

A relation is a system of one or more functions (in the mathematical sense) each of which has a domain that is a candidate key of the relation and a range that is the composite of the non-key attributes.

johnnyhillbilly · on Jan 18, 2017

Interesting definition. Do you have a source for it. It seems ambiguous.

From the Wikipedia article on relational databases, subsection relational model. "This model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. Rows are also called records or tuples."

samwilliams · on Jan 17, 2017

Ah right, yes, I can imagine that would be extremely messy!

mattmanser · on Jan 17, 2017

Your edit is delightful.

"No, honest guys, I knew CSV was more complicated. I just didn't need to make my code safe."

Here's a csv parser in Erlang that actually attempts all that trivial stuff:

https://github.com/rcouch/ecsv/blob/master/src/ecsv_parser.e...

That's a lot more code than yours. And the notes even say it's not tolerant of badly formed CSVs.

samwilliams · on Jan 17, 2017

I submitted that edit before the post had any replies.

Also, I did try to make clear that the given code was created 'for a specific use case in which I knew' that the format of the input files was tightly defined.

dragonwriter · on Jan 17, 2017

> but CSV is trivial, no?

You can define a narrow subset or version of CSV that is trivial, but that doesn't reflect what one finds in the wild as "CSV", which was not systematically defined or described until well after many mutually incompatible things by that name.wdre well established.

pizza234 · on Jan 17, 2017

CSV is trivial? You may have missed this:

    http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/

:-)

samwilliams · on Jan 17, 2017

This is interesting, and as the other commenters have pointed out, creating a parser for /all/ variations of CSV can be very tricky.

always_good · on Jan 17, 2017

But your code doesn't even handle the trivial case.

    "She said, \"Hello, world!\""

You can drop the "meh, I know I didn't handle all the complicated cases" act.

We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)

samwilliams · on Jan 17, 2017

Hi,

Thanks for your thoughts. As I have stated elsewhere, the code handles all of the cases I needed it to handle, due to the stability of the input file format (which was emitted from another program). I don't see that this should be too hard to believe.

I also said in my second edit, on the top line, 'Not so trivial after all!'. If I was putting on some kind of act, wouldn't that have been dropping it? Further, I noted in my first edit, before I had received any replies, that I 'know this version won't support escaped separator/newline characters', so I am not sure what you were trying to add with your example?

I think that my central point (and I totally accept that I didn't express this well) is that depending on the specifications of your program, the required CSV parser /can be/ very short. When one compares this to other data exchange formats, for example JSON, it is clear that the barrier to /entry/ is much lower. The shortest JSON parser I could find with a cursory look was 200 lines of C.

I totally appreciate that to write a CSV parser that works for all cases would be extremely longwinded. It has been interesting to hear other people's experiences and opinions about that. But the fact remains true that /in some cases/, depending on the requirements of the program, the parser can be very short.

> We all recognize the classic developer I-could-build-that-in-a-weekend hubris when we see it. :)

It is funny you should say this. I needed the CSV parser because I thought it would be fun and interesting to see if I could build an anti-malware tool in a week (I am taking a malware detection class at the moment, I wanted it done before the next lecture). I did not expect I would be able to have anything good working in that time, but by the early hours of the next morning I had a perfectly functional anti-malware tool. It can use ClamAV signatures (so it can detect everything(?) that ClamAV can), runs in parallel, has a nice text console with DSL, and is fast enough (processing 210k small files in ~5 minutes, checking against ~60k sigs). It is about 650 lines of Erlang (including comments). I am saying this not to boast(!), but to make the point that I greatly underestimated how productive I could be, beat my expectations by many fold, then people comment about my hubris online the next day. It is funny how life goes!

Thanks,

Sam