Write into uninit'd buffers was one of the pain points of Rust for the creator o...

jcranmer · 2025-05-21T05:04:33 1747803873

The basic problem with uninitialized buffers is that they effectively require write-only references to exist, and Rust's type system doesn't have (and doesn't easily support) write-only references, only read-only and read-write. MaybeUninit is a partial solution to the problem, but since it's a library solution and not a language solution, it suffers from a lack of integration with the language, e.g., getting MaybeUninit fields from a MaybeUninit struct is challenging.

And the most aggravating part of all of this is that the most common use case for uninitialized memory (the scenario being talked about both in the article here and the discussion you quote) is actually pretty easy to have a reasonable, safe abstraction for, so the fact that the current options requires both use of unsafe code and also potentially faulty duplication of value calculations doesn't make for a fun experience. (Also, the I/O traits predate MaybeUninit, which means the most common place to want to work with uninitialized memory is one where you can't do it properly.)

90s_dev · 2025-05-21T11:58:01 1747828681

Then is the solution to have write-only be the default for muts, so that they start out write-only at least, but can also be made to be read-write either (a) in given circumstances of creation, or (b) after certain operations on them, or (c) if created by certain APIs?

electrograv · 2025-05-21T03:40:17 1747798817

> That's roughly how it works in C, and I know that it's also UB there if you do it wrong, but one thing is different: It doesn't really ever occupy my mind as a problem. In Rust it does.

UB doesn’t occupy the author’s mind when writing C, when it really should. This kind of lazy attitude to memory safety is precisely why so much C code is notoriously riddled with memory bugs and security vulnerabilities.

mk12 · 2025-05-21T04:27:19 1747801639

There is an important difference for this case though. It C it’s fine to have pointers into uninitialized memory as you as you don’t read them until after initializing. You can write through those pointers the same way you always do. In Rust it’s UB as soon as you “produce” an invalid value, which includes references to uninitialized memory. Everything uses references in Rust but when dealing with uninitialized memory you have to scrupulously avoid them, and instead write through raw pointers. This means you can’t reuse any code that writes through &mut. Also, the rules change over time. At one point I had unsafe code that had a Vec of uninitialized elements, which was ok because I never produced a reference to any element until after I had written them (through raw pointers). But they later changed the Vec docs to say that’s UB, I guess because they want to reserve the right to use references even if you never call a method that returns a reference.

Arnavion · 2025-05-21T04:50:31 1747803031

This stopped being much of a problem when MaybeUninit was stabilized. Now you can stick to using &MaybeUninit<T> / &mut MaybeUninit<T> instead of needing to juggle *T / *mut T and carefully track converting that to &T / &mut T only when it's known to be initialized, and you can't accidentally use a MaybeUninit<T> where you meant to use a T because the types are different.

It's not as painless as it could be though, because many of the MaybeUninit<T> -> T conversion fns are unstable. Eg the code in TFA needs `&mut [MaybeUninit<T>] -> &mut [T]` but `[T]::assume_init_mut()` is unstable. But reimplementing them is just a matter of copying the libstd impl, that in turn is usually just a straightforward reinterpret-cast one-liner.

codeflo · 2025-05-21T06:36:11 1747809371

I don’t get the difference. In both C and Rust you can have pointers to uninitialized memory. In both languages, you can’t use them except in very specific circumstances (which are AFAIK identical).

There are two actual differences in this regard: C pointers are more ergonomic than Rust pointers. And Rust has an additional feature called references, which enable a lot more aggressive compiler optimizations, but which have the restriction that you can’t have a reference to uninitialized memory.

mk12 · 2025-05-21T17:14:57 1747847697

I agree with you. My point is that the additional feature (references) creates a new potential for UB that doesn’t exist in C, and that justifies the “doesn't really ever occupy my mind as a problem” statement being criticized upthread. You can’t compare C to Rust-without-references because no one writes Rust that way. It’s not like C++-without-exceptions which is a legitimate subset that people use.

nemothekid · 2025-05-21T04:55:09 1747803309

Bizarre. I think I've been writing broken Rust code for a couple years. If I understand you correctly something like:

    let mut data = Vec::with_capacity(sz);
    unsafe { data.set_len(sz) };
    buf.copy_to_slice(data.as_mut_slice());

is UB?

NobodyNada · 2025-05-21T15:04:25 1747839865

It's an open question whether creating a reference to an uninitialized value is instant UB, or only UB if that reference is misused (e.g. if copy_to_slice reads an uninitialized byte). The specific discussion is whether the language requires "recursive validity for references", which would mean constructing a reference to an invalid value is "language UB" (your program is not well specified and the compiler is allowed to "miscompile" it) rather than "library UB" (your program is well-specified, but functions you call might not expect an uninitialized buffer and trigger language UB). See the discussion here: https://github.com/rust-lang/unsafe-code-guidelines/issues/3...

Currently, the team is leaning in the direction of not requiring recursive validity for references. This would mean your code is not language UB as long as you can assume `set_len` and `copy_to_slice` never read from 'data`. However, it's still considered library UB, as this assumption is not documented or specified anywhere and is not guaranteed -- changes to safe code in your program or in the standard library can turn this into language UB, so by doing something like this you're writing fragile code that gives up a lot of Rust's safety by design.

ironhaven · 2025-05-21T05:13:08 1747804388

That's right. Line 3 is undefined behaviour because you are creating mutable references to the uninit spare capacity of the vec. copy_to_slice only works with writing to initialized slices. The proper way for you example to mess with the uninitialized memory on a vec would be only use raw pointers or calling the newly added Vec::spare_capacity_mut function on the vec that returns a slice of MaybeUninit

bombela · 2025-05-21T16:49:07 1747846147

Why not simply:

    let mut data = Vec::with_capacity(sz);
    data.extend(&buf[..sz]);

Vec::extend extends a container from an iterable. A Vec/slice is iterable.

And from the doc:

> This implementation is specialized for slice iterators, where it uses copy_from_slice to append the entire slice at once.

Of course this trivial example could also be written as:

    let mut data = buf.clone();

vgatherps · 2025-05-21T05:02:01 1747803721

Yes, this is the case that I ran into as well. You have to zero memory before reading and/or have some crazy combination of tracking what’s uninitialized capacity or initialized len, I think the rust stdlib write trait for &mut Vec got butchered over this concern.

It’s strictly more complicated and slower than the obvious thing to do and only exists to satisfy the abstract machine.

Arnavion · 2025-05-21T05:06:29 1747803989

No. The correct way to write that code is to use .spare_capacity_mut() to get a &mut [MaybeUninit<T>], then write your Ts into that using .write_copy_of_slice(), then .set_len(). And that will not be any slower (though obviously more complicated) than the original incorrect code.

vgatherps · 2025-05-21T06:11:26 1747807886

Oh this is very nice, I think it was stabilized since I wrote said code.

nemothekid · 2025-05-21T06:22:05 1747808525

write_copy_of_slice doesn't look to be stable. I'll mess around with godbolt, but my hope that whatever incantation is used compiles down to a memcpy

Arnavion · 2025-05-21T06:33:27 1747809207

As I wrote in https://news.ycombinator.com/item?id=44048391 , you have to get used to copying the libstd impl when working with MaybeUninit. For my code I put a "TODO(rustup)" comment on such copies, to remind myself to revisit them every time I update the Rust version in toolchain.toml

nemothekid · 2025-05-21T08:11:47 1747815107

In other words the """safe""" stable code looks like this:

    let mut data = Vec::with_capacity(sz);
    let mut dst_uninit = data.spare_capacity_mut();
    let uninit_src: &[MaybeUninit<T>] = unsafe { transmute(buf) };
    dst_uninit.copy_from_slice(uninit_src);
    unsafe { data.set_len(sz) };

Arnavion · 2025-05-21T09:10:44 1747818644

That's correct.

eptcyka · 2025-05-21T05:08:49 1747804129

Valgrind it :)

vlovich123 · 2025-05-21T14:06:26 1747836386

Valgrind doesn’t tell you about UB, just if the code did something incorrect with memory and that depends on what the optimizer did if you did write UB code. You’ll need Miri to tell you if this kind of code is triggering UB which works by evaluating and analyzing the mid level of compiler output to check if Rust rules about safety are followed.

eptcyka · 2025-05-22T08:32:54 1747902774

Reading from uninitialised memory is a fault that valgrind will detect.

vlovich123 · 2025-05-23T14:30:52 1748010652

But that’s precisely NOT the problem that exists in OPs code. It’s a problem Valgrind will detect if and only if the optimizer does something weird to exploit the UB in the code which may or may not happen AND doesn’t even necessarily happen on that line of code which will leave you scratching your head.

UB is weird and valgrind is not a tool for detecting UB. For that you want Miri or UBSAN. Valgrind’s equivalent is ASAN and MSAN which catch UB issues incidentally in some rare cases and not necessarily where the UB actually happened.

uecker · 2025-05-21T06:21:07 1747808467

It is also not UB to read uninitialized values through a pointer in C for types that do not have non-value representations.

usefulcat · 2025-05-21T04:32:55 1747801975

I suspect that the main reason it doesn't really occupy the author's mind is that even though it's possible to misuse read(), it's really not that hard to actually use it safely.

It sounds like the more difficult problem here has to do with explaining to the compiler that read() is not being used unsafely.

o11c · 2025-05-21T04:32:14 1747801934

The reason this particular UB doesn't need mindspace for C programmers is because it's not even meaningful to do anything with the parts of the buffer beyond the written length.

Most other UBs related to datums that you think you can do something with.

lhecker · 2025-05-21T09:32:39 1747819959

What I meant is that if I write a UTF8 --> UTF16 conversion function for my editor in C I can write

  size_t convert(state_t* state, const void* inp, void* out)

This function now works with both initialized and uninitialized data in practice. It also is transparent over whether the output buffer is an `u8` (a byte buffer to write it out into a `File`) or `u16` (a buffer for then using the UTF16). I've never had to think about whether this doesn't work (in this particular context; let's ignore any alignment concerns for writes into `out` in this example) and I don't recall running into any issues writing such code in a long long time.

If I write the equivalent code in Rust I may write

  fn convert(&mut self, inp: &[u8], out: &mut [MaybeUninit<u8>]) -> usize

The problem is now obvious to me, but at least my intention is clear: "Come here! Give me your uninitialized arrays! I don't care!". But this is not the end of the problem, because writing this code is theoretically unsafe. If you have a `[u8]` slice for `out` you have to convert it to `[MaybeUninit<u8>]`, but then the function could theoretically write uninitialized data and that's UB isn't it? So now I have to think about this problem and write this instead:

  fn convert(&mut self, inp: &[u8], out: &mut [u8]) -> usize

...and that will also be unsafe, because now I have to convert my actual `[MaybeUninit<u8>]` buffer (for file writes) to `[u8]` for calls to this API.

Long story short, this is a problem that occupies my mind when writing in Rust, but not in C. That doesn't mean that C's many unsafeties don't worry me, it just means that this _particular_ problem type described above doesn't come up as an issue in C code that I write.

Edit: Also, what usefulcat said.

ninkendo · 2025-05-21T12:01:14 1747828874

Why wouldn’t you accept a &mut [MaybeUninit<T>] and return a &mut [u8], hiding the unsafe bits that transmute the underlying reference?

Something like:

  fn convert<'i, 'o>(inp: &'i [u8], buf: &'o mut MaybeUninit<u8>) -> &'o mut [u8]

(Honest question, actually… because the above may be impossible to write and I’m on my phone and can’t try it.)

Edit: it works: https://play.rust-lang.org/?version=stable&mode=debug&editio...

lhecker · 2025-05-21T16:14:42 1747844082

That's a fair workaround for my specific example. But I believe it's possible to contrive a different example where such a solution would not be possible. Put differently, I only tried to convey the overall idea of what I think is a shortcoming in Rust at the moment.

Edit: Also, I believe your code would fail my second section, as the `convert` function would have difficulty accepting a `[u8]` slice. Converting `[u8]` to `[MaybeUninit<u8>]` is not safe per se.

ninkendo · 2025-05-21T23:04:47 1747868687

Yeah, you’d need to do something like accept an enum that is either &mut [u8] or &mut [MaybeUninit<u8>], and make a couple of impl From<>’s so callers can .into() whatever they want to pass…

But I don’t think this is really a shortcoming, so much as a simple consequence of strong typing. If you want take “whatever” as a parameter, you have to spell out the types that satisfy it, whether it’s via a trait, or an enum with specific variants, etc. You don’t get to just cast things to void and hope for the best, and still call the result safe.

ii41 · 2025-05-21T03:43:42 1747799022

I think this solves his problem. He said he wants a read function that turns the unsafe buffer into a safe buffer, and this API does that.

IIRC it's not that hard to convince the compiler to give you a safe buffer from a MaybeUninit. However, this type has really lengthy docs and makes you question everything you do with it. Thinking through all this is painful but it's not like you don't have to it with C.

lhecker · 2025-05-21T10:05:42 1747821942

Abstracting away the `assume_init` is a great idea! I think I could use something like that for the editor. The only concern I have is that the `read` function is templated on the parameter type. I'd ideally _really_ prefer it if I didn't need two copies of the same function to switch over `[u8]` and `[MaybeUninit<u8>]` due to different return types. [^1] I guess the approach could be tuned to avoid this?

Personally, I also like the simpler approach overall, compared to the `BorrowedBuf` trait, for the same reasons outlined in the article.

While this possibly solves parts of pain points that I had, what I meant to write is that in an ideal world I could write Rust while mostly not thinking about this issue much, if at all. Even with this approach, I'd still need to decide whether my API needs to take a `[u8]` or a `Buffer`, just in the mere off-chance that a caller may want to pass an uninitialized array further up in the call chain. This then requires making the call path generic for the buffer parameter which may end up duplicating any of the functions along the path, even though that's not really my intention by marking it as `Buffer`.

I think if there was a way to modify Rust so we can boldly state in writing "You may cast a `[MaybeUninit<T>]` into a `[T]` and pass it into a call _if_ you're absolutely certain that nothing reads from the slice", it would already go a long way. It may not make this more comfortable yet, but it would definitely take off a large part of my worries when writing such unsafe casts. That's basically what I meant with "occupy my mind": It's not that I wouldn't think about it at all, rather it just wouldn't be a larger concern for me anymore, for code where I know for sure that this requirement is fulfilled (i.e. similar to how I know it when writing equivalent C code).

Edit: jcranmer's suggestion of write-only references would solve this, I think? https://news.ycombinator.com/item?id=44048450

[^1]: This is of course not a problem for a simple `read` syscall, but may be an issue for more complex functions, e.g. the UTF8 <> UTF16 converter API I suggested elsewhere in this thread, particularly if it's accelerated, the way simdutf is.