More

_kdave · 2026-01-15T17:54:38 1768499678

kdave.github.io

_kdave · on Feb 16, 2025

> How great Novell and Attachmate were as owners

This is the opposite how I and not only I remember it. Novell did save S.u.S.E. in 2005 by the highly controversial deal. And then started to slowly dissolve its internal identity within the whole company and almost succeeded. The Novell management was incompetent, trying to save it's dying business by bending what SUSE was doing. Barely survived the 2008+ crisis.

Attachmate, another legacy product company was interested in Novell due to assets and customers. That there was a business unit (SUSE) that was quite capable was merely a surprise to the new owners. They let us live and do our jobs. Which paid off because Novell died in a year or so after attempts to salvage it. The Attachmate management wanted to make money, no interest in SUSE. You can call it good management, but mainly due to the fact that the accounting was internally split from Novell and the numbers did not lie.

The secret sauce of SUSE is that it has strong foundation and longtimers that still somehow manage to keep the spirit and attract new poeople who appreciate it in the work environment. The miracle of SUSE it's been able to survive any shitty and clueless top management it had had installed by any of the buyers. So far. Fingers crossed.

_kdave · on March 31, 2024

I'm glad that research papers don't start with "we've analyzed linux kernel 2.6.18 sources (because this is what we had on our lab machines) and determined that ext3 is the best filesystem for our research purpose and now present you with a novel idea of using high-tech device on that". The paper acknowledges modern features, takes design from other filesystems (mentioned BTRFS and tree structures). Overall the idea is interesting and promising.

_kdave · on Dec 13, 2022

By default the metadata profile is 'DUP' (i.e. 2 copies on one device), the same can be done for data but this reduces the usable capacity. On normal HDD or SSD this should not be needed but having both data and metadata DUP/DUP has been useful on raspberry pi with the micro SD card storage. It's not perfect but increases the chances to get the data back if the card is partially damaged due to power spikes.

ac29 · on Dec 16, 2022

I have wondered about this myself - does keeping 2 copies of the same data on a single flash-based drive actually increase reliability? Or is the flash controller going to end up combining the two writes into the same block?

_kdave · on Dec 13, 2022

MDRAID had write hole until 4.4 (2015, https://lwn.net/Articles/665299/) and it had been long awaited back then. ZFS to my knowledge deals with that by variable stripe length that has its own problems but yeah it works. Btrfs changes implementation of the stripe update while preserving the on-disk format (i.e. can't do the same as ZFS without introducing incompatible change), intent log/bitmap have been proposed (which would be the MDRAID approach) but that's another incompatible change. So the 'next one' is at the cost of performance but with same compatibility.

_kdave · on Dec 12, 2022

AFAIK they use it internally, there are articles on lwn.net how, the use cases is for root filesystems and containers. I'm not sure I understand what you mean by the community sentiment, there are examples of code they'd developed internally first and sent it upstream, but in all cases I remember there were no problems. What can happen in the community is e.g. how the patches are organized or if the changelogs are complete. It's of course easier to develop something internally, if it touches other subsystems or if there's enough coverage just for the new code the test/fix/deploy cycle is much flexible. Once it's supposed to go through mailinlists or convincing other maintainers to accept changes it takes longer and must stick to the development cycle. This benefits both sides in the long run.

_kdave · on Dec 12, 2022

The development speed has to balance the schedule of linux kernel development (merge window, release candidates, 3 months cycle) and demand to merge several distinct features or core changes. There are no formal deadlines but we have to make sure that the new code is feasible to be stabilized in the given time. Once a new feature is in the wild some bugs or fixups are still needed so this takes some time from the new development and has to be accounted for.

My strategy to pull new things is to have one big feature that has ideally been reviewed and iterated in the mailinglist or there was a lot of testing already done. In addition two smaller features can be merged, with limited scope, not affecting default setup and possibly easy to debug/fix/revert if needed. Besides that there are cleanups or core updates going on so this should not touch the same code to make testing less painful. With new features the test matrix grows, code might need wide cleanups or generalizaionts before the actual feature code is merged. So this can indeed slow down development.

The raid56 is progressing but until the 6.2 pull from today there was not much to announce regarding stability/reliability. There were proposed fixes but as incompatible features, which means some changes on the user side and with backward compatibility issues. What's pending for 6.2 should fix one of the bad problems at least for raid5.

_kdave · on Dec 12, 2022

> It won't boot on a degraded array by default, requiring manual action to mount it

If you want it to behave like that then add 'degraded' to fstab. That a device is missing can have unknown reasons, the user should know better and resolve it or allow such boot. It's not automatic as there's no way to inform the user that it's degraded state.

dale_glass · on Dec 12, 2022

I don't quite understand the use case here. If I'm setting up RAID it's because I want the system to stay up. That's the only purpose for it.

If a device goes missing for "unknown reasons", then the machine should still work, and I'll figure out what happened when monitoring pokes me and says RAID is degraded.

viraptor · on Dec 12, 2022

The use case is: Enough drives failed that your raid is degraded. Any more data you write is not replicated and it may be due to software/hardware issue that will kill more drives soon.

It's up to you to choose at that point - is availability more important for you (add degraded to fstab), or data consistency (deal with the array first).

p1necone · on Dec 13, 2022

> That's the only purpose for it.

That's not the only purpose for it. There's three reasons I can think of that you might set up a RAID array:

    * You want better uptime. (your use case)

    * You want to protect from data loss. (my assumption was that this is the most common use case, but I could be wrong. This also helps with uptime because there's nothing worse for uptime than having to restore lost data from a cold backup)

    * You want better performance, data integrity be damned. (RAID 0)

Booting a RAID array with a failed disk is a bad idea if you care a lot about not losing data, because now you're one less disk failure away.

justsomehnguy · on Dec 13, 2022

> Booting a RAID array with a failed disk is a bad idea

Booting a RAID array with a failed disk is absolutely fine idea.

How else I get access to the tools to identify the bad drive and resilver RAID on a replacement, be it in the same bay or not?

wtallis · on Dec 13, 2022

Booting from a degraded array is only a fine idea in some circumstances, not all. That's why the kernel should not default to automatically doing so; but a distro or sysadmin that has better knowledge of the broader situation (eg. presence of hot spares or a working monitoring/alert system) can reasonably change that default when the risks of booting from a degraded array have been mitigated.

justsomehnguy · on Dec 13, 2022

Ie you are treating RAID as a backup.

wtallis · on Dec 13, 2022

Backups cannot be perfectly real-time unless they are very nearly RAID. Any time you are generating/collecting important data, you will unavoidably have some amount of that important data in the state of not yet backed up.

It's reasonable to want to preserve all the data you currently have—some of which probably hasn't been backed up yet—and not accept new data to be written with the durability guarantees the array was originally configured for silently violated.

Since the kernel has no way of knowing which volumes may contain important data that didn't get the chance to be backed up, it should try its best to maintain the original durability standards the filesystem was configured until some mechanism outside the kernel authorizes the relaxation of those standards.

justsomehnguy · on Dec 13, 2022

> It's reasonable to want to preserve all the data you currently have—some of which probably hasn't been backed up yet—and not accept new data to be written with the durability guarantees the array was originally configured for silently violated.

IE (by your logic) the system should stop the writes as soon as the array became degraded.

But this is not what happens with btrfs: it would happily continue to write the data on the array until reboot.

And then suddenly it's "oh my god array is degraded!!!111 you should not write to it1111".

To add on that: I never seen for a HW RAID card to stop booting by a mere degraded state of the array. Changes in configuration of arrays, loss of more than enough for redundancy drives - yes, that would halt the boot and require the operator intervention. Array in a degraded state? Just spit the warnings to the console and boot. Nobody has the time to walk to each server with a degraded array on every reboot.

happymellon · on Dec 13, 2022

No, you are treating RAID as a protection against longer outage of restoring from backups.

wtallis · on Dec 13, 2022

Another way of thinking about it: should uptime with bad data or services making false guarantees about data durability actually count as uptime?

jonhohle · on Dec 13, 2022

RAID 0 should be called AID, since it’s not really RAID.

vetinari · on Dec 13, 2022

The '0' says it exactly: the amount of data you are left with, once one of the drives fails.

_kdave · on Dec 13, 2022

Yeah but monitoring is not something that comes with the filesystem. If you have to set up the system to be a HA and configure monitoring, email notifications whatever, making sure the filesystem is created with redundant profiles, then I'm expecting that also adding the 'degraded' to the fstab is part of the configuration.

RealStickman_ · on Dec 13, 2022

The system stays up just fine. You just can't reboot it without fixing/ignoring the problem. I think that's fair.

cmurf · on Dec 13, 2022

Most distros have a udev rule in place that inhibits a mount attempt of a multiple device Btrfs until all devices are visible to the kernel. The degraded mount option won't even matter in this case, because mount isn't attempted.

If you remove this udev rule and then add degraded mount option to fstab, it's very risky because now even a small delay in drives appearing can result in a degraded mount. And it's even possible to get a split brain situation.

Btrfs needs automatic abbreviated scrub, akin to the mdadm write intent bitmap which significantly reduces the resync operation.

Hello71 · on Dec 12, 2022

according to https://arstechnica.com/gadgets/2021/09/examining-btrfs-linu..., btrfs devs say you should not add degraded to fstab, and doing so can easily result in data loss.

viraptor · on Dec 12, 2022

The quote is "At this stage of development, a disk failure should cause mount failure so you're alerted to the problem." and it's from over 9 years ago on an ancient kernel. That was just 3+ years since btrfs project got started.

Let's treat it as archived historical content it is.

_kdave · on Oct 12, 2022

> it was very unfortunate that Linus chose to dub btrfs as the ext4 successor

That quote is from Ted T'so (https://en.wikipedia.org/wiki/Btrfs#History) or do you have a link to Linus' quote?

_kdave · on Aug 17, 2021

Also tools like CUP386 did that for free, but anyway interesting read.