There are good reasons for the Emacs default that relate to atomic operations in...

elsjaako · on Oct 11, 2023

Can you explain some of these reasons? I feel like there is an opportunity for me to learn something here.

I don't know enough about filesystems to be sure, but I feel like there are more opportunities for stuff to fail with the move-and-copy technique. E.g. when moving the file it could end up existing twice (if creating the new link happens before removing the old one) or being lost (if removing the old link happens before creating the new one).

If you just copy the file, the copying can fail and you don't have a backup. But if the original is still fine then you never notice a problem with the backup, so this option is preferable (as I understand it).

> Most people would want to avoid hard links to editable files anyways.

Why would you want to avoid this? I though it would be something that Linux can easily handle.

pama · on Oct 11, 2023

Here is a breakdown of some of your questions:

1. Atomicity of Rename Operations: On many file systems, the process of renaming a file is atomic. This means that the operation either completes in full or doesn’t take effect at all. This makes it safer in cases where there might be interruptions, such as power losses. If the rename (which creates the backup in Emacs’s default behavior) is interrupted, you’re left with the original file intact.

2. Concern about Move-and-Copy Technique: Your point about the possibility of the file ending up existing twice or being lost is valid in theory. However, in practice, the renaming operation ensures that such intermediate states are avoided. A rename isn’t quite the same as creating a new link before removing an old one. Instead, it’s a reassignment of the file’s metadata, which is generally a reliable operation.

3. Drawbacks of Just Copying: While just copying the file might seem simpler, it can have issues. If there’s an interruption while writing the new copy, you can end up with a corrupted backup. With Emacs’s approach, since the original is renamed (and thus preserved in its entirety), you’re always assured of having at least one uncorrupted version.

4. Avoiding Hard Links to Editable Files: As for the avoidance of hard links for editable files, there are a few reasons:

Ambiguity: Editing a file that’s hard-linked elsewhere can lead to confusion since changes reflect in all linked locations. This can be unexpected for those unaware of the link.

Data Integrity: If there’s corruption in one location, it affects all hard-linked locations.

Backup Issues: Some backup systems might not handle hard links as expected, leading to either duplicate data or missed backups.

Linux does handle hard links well, but their usage needs careful consideration, especially when editing is involved. They’re great for static data that doesn’t change but can be problematic for editable files.

I hope this clarifies things!

eviks · on Oct 12, 2023

3. Is unclear, copy may result in a corrupted backup, but the original remains uncorrupted, so you also have one uncorrupted version just like if you renamed the file but couldn't restore the original?

4. That's an argument for raising awareness via better tools, not for avoiding the useful links

db48x · on Oct 12, 2023

> 3. Is unclear, copy may result in a corrupted backup, but the original remains uncorrupted, so you also have one uncorrupted version…

Think about what happens when your power comes back and you start editing that file again. Which is the correct version of the file? How can you tell? Suppose the backup file is newer than the original file, and 99% of the size. Is it a partial copy, or did the user delete some lines and then save?

Now consider what Emacs does by default when `backup-by-copying` is nil and the user asks Emacs to save a file:

    1. Emacs deletes the existing backup file (atomic),
    2. renames the existing file so that it becomes the backup (atomic),
    3. writes the buffer content into a new file with a temporary name (not atomic),
    4. calls fsync(2) to ensure that all written data has actually hit the disk¹,
    5. finally renames the temporary file so that it has the user’s desired filename (atomic).

If the power goes off anywhere in the middle of that process, then no corruption will occur. The state on disk will be easily observable; it will either be a missing backup file but the original still exists untouched, or it will be a backup file but no original, or it will be a backup file plus a temporary file that might be incomplete, or it will be a backup file plus the new file.

In all of these cases the editor can recover automatically without ever losing anything that was already saved on disk. Sure, in most of those cases we lose the _new_ data that wasn’t yet saved, but after all the power did go out in the middle of trying to save that very data. We’re not magicians here; this is the best we can do.

Of course, in practice the power doesn’t go out all that often and users habitually save the document after every few words, right? So backing up by copying is safe enough that most users who prefer it never lose any data. Probably. At least as far as they know; sometimes this type of corruption goes unnoticed, or gets chalked up to other factors such as inebriation or forgetfulness.

¹ We’ll ignore the fact that some operating systems have an fsync(2) that lies. See also <https://danluu.com/file-consistency/>.

eviks · on Oct 12, 2023

> How can you tell

By observing the status of a flag "wasBackupSuccessful", which wouldn't be set to true if there is a power loss after the file was copied (this flag could be set as an atomic rename operation on the copied backup file so that you could tell via the observable file system behavior whether the copy succeeded)

But it is now clear what you meant, thanks for the explanation!

elsjaako · on Oct 15, 2023

Thanks!