I wish there were a "zip" standard for storing a million small files in one pack...

Xamayon · on April 1, 2023

I do exactly that for storing the result thumbnails for some of the dbs in my reverse image search engine (SauceNAO). Non compressed zip files allow quickly/easily seeking to and accessing component files without extraction. A few tens to hundreds of thousands per zip file works great. Millions would probably not be too different, but would use more resources/take more time when loading the zip file index.

mthoms · on April 1, 2023

Interesting. Have you ever considered SQLite file storage? I'm wondering how it would compare.

https://www.sqlite.org/sqlar.html

Xamayon · on April 1, 2023

Haven't looked into it, but it sounds like it would work similarly (with some nice benefits such as also being able to easily store other metadata/etc). Feasibility would depend on how quickly the indexes and such load, and the resource consumption associated with opening/closing dozens of them at a time 24/7. In my screwy case there are hundreds of thousands of zip files which are randomly accessed on the fly to grab one or two thumbnails at a time. The random access speed on unloaded files is critical, and for zip files it's extremely quick.

ElectricalUnion · on April 1, 2023

.jar/.apk (internally a zip archive) comes to mind.

AppImage (internally is either a ISO 9660 Rock Ridge or a SquashFS filesystem), .deb (internally ar archive), .rpm (internally a cpio archive) are I think relevant examples too.

eviks · on April 1, 2023

Exactly. Though yarn has this feature, and it works with zip files as mini-fs, so you don't need to unpack them to disk (generally)

Gigachad · on April 1, 2023

Isn’t that just tar?