Ask HN: How do you choose a checksum algorithm for serialized datastructures?

accrual · on May 24, 2023

I would use SHA128, 256, or 512. As far as I know, MD5 and SHA1 can be considered broken from a security standpoint.

I think more information is needed. If you're storing files PAR2 can help. If you're hashing passwords, bcrypt and scrypt should be investigated.

Securing the database against bitrot, etc. would be another question entirely.

packetlost · on May 24, 2023

Right, I'm asking purely from a data corruption and bitrot standpoint. There's no security requirements, the blocks are part of a larger data-structure (so subset of a file, or maybe a raw block device). CRC is common for smaller data structures, but 4MB is "large" compared to, say, Postgres data page (which are "usually 8kB"), which uses some (modified?) form of CRC16

codetrotter · on May 24, 2023

You could also relegate this task to ZFS.

https://en.m.wikipedia.org/wiki/ZFS

ZFS is a file system that is available on FreeBSD, and Linux, and more!

packetlost · on May 24, 2023

offloading onto the filesystem is not really applicable here, this is for a "page" or block of data within a larger data structure that may or may not even be on a file (on a filesystem).