Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The filesystem is a database.

Databases like Postgres implement such a transaction not through a lock, but by keeping multiple versions of the data/file until the transactions using them closed.

Yet other databases operate on the stream of changes and the current state of data is merely the application of all changes until a certain time, allowing multiple transactions to use different snapshots without blocking each other (you can parse a file while somebody else edits it and you won’t be interrupted).

I read about various filesystems offering some of these features, but not in IO APIs.



Then you user wonders why they see 40GB of files but 70GB of used space

> Yet other databases operate on the stream of changes and the current state of data is merely the application of all changes until a certain time, allowing multiple transactions to use different snapshots without blocking each other (you can parse a file while somebody else edits it and you won’t be interrupted).

Databases have features in-place to merge those parallel strings of changes and at worst, abort transaction. Apps using that handle that.

Now try to explain to user why their data evaporated coz they had it open in 2 apps...


> The filesystem is a database.

Nah. The file system is a slightly improved block storage management tool that isolates your database implementation from dealing directly with mass storage devices.

Thinking that the filesystem is viable as a database when it isn't — that's exactly the problem here :D


> The filesystem is a database

A totally shit database designed before we knew anything about databases. Well past time to retire them.


They are ok databases, optimized for very different use cases than normal databases. If you treat files as blobs that can only be read or written atomically, then SQLite will outperform your datasystem. But lots of applications treat files as more than that: multiple processes appending on the same log file, while another program runs the equivalent of `tail -f` on the file to get the newest changes; software changing small parts in the middle of files that don't fit in memory, even mmapping them to treat them like random access memory; using files as a poor man's named pipe; Single files that span multiple terabytes; etc.


None of those other uses are outside the scope of a real database:

> multiple processes appending on the same log file, while another program runs the equivalent of `tail -f` on the file to get the newest changes

Not a problem with SQLite. In fact it ensures the atomicity needed to avoid torn reads or writes.

> software changing small parts in the middle of files that don't fit in memory

This is exactly an example of something you don't need to worry about if you're using a database, it handles that transparently for any application, instead of every application having to replicate that logic when it's needed. Just do your reads and writes to your structured data and the database will keep only the live set in memory based on current resource constraints.

> using files as a poor man's named pipe

Even better, use a shared SQLite database instead, and that even lets you shared structured data.

> Single files that span multiple terabytes; etc.

SQLite as it stands supports databases up to 140 TB.

> even mmapping them to treat them like random access memory

This is pretty much the only use I can think of that isn't supported by SQLite out of the box. No reason it can't be extended for this case if needed.


I remember a discussion a while ago to use SQlite as a filesystem engine. I imagine that not needing a server / daemon would make it more reliable for one of the first things needed on boot.

However, I don't know what's the recommended way to handle concurrent writes with SQlite. In the end we have a single process handling all the persistence logic, which becomes essentially a server just like Postgres?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: