My favourite C++ footgun

dvt · on June 22, 2021

The nested function execution interleaving blew my mind (imagine having to debug that), so I had to look it up. Apparently, interleaving is prohibited in C++17 onward. So this:

> The second one is there since this is C++, priority() might raise an exception, meaning that new Widget will be called, but never passed to std::shared_ptr, and thus never deleted!

Is now impossible (thank God!). See a full SO discussion here[1]. Stuff like this makes me so happy I don't write C++ anymore, but the gist of it is (from the standard):

> For each function invocation F, for every evaluation A that occurs within F and every evaluation B that does not occur within F but is evaluated on the same thread and as part of the same signal handler (if any), either A is sequenced before B or B is sequenced before A.

[1] https://stackoverflow.com/questions/38501587/what-are-the-ev...

ncmncm · on June 22, 2021

In three decades' use of C++, of all generations going back to original cfront, I have never encountered any difficulty, confusion, or actual problem arising from this phenomenon. It is a favorite of language lawyers and people borrowing trouble.

dataflow · on June 23, 2021

I've actually run into this problem before. You just haven't, is all.

rualca · on June 23, 2021

I really doubt this sort of hypothetical source of memory leaks was either significant or its root cause could be pinned to this. As this hypothetical scenario depends on an exception being thrown, your code would need to throw exceptions repeatedly for the memory leak to be a problem. If your code throws so many exceptions that not freeing the memory allocated during that code point becomes a problem then you have far more serious problems to fix than discussing language lawyer corner cases.

dataflow · on June 23, 2021

> As this hypothetical scenario depends on an exception being thrown

It doesn't.

Exercise for the reader: detect evaluation interleaving without exceptions.

> If your code throws so many exceptions that not freeing the memory allocated during that code point becomes a problem

It need not manifest itself as a mere memory leak.

Exercise for the reader: make something happen beside a memory leak.

OskarS · on June 23, 2021

> It doesn't

Yes it does. Of course you can detect different evaluation orders using side effects, but the fact that evaluation order is unspecified isn't generally a problem: if you write code like `a() + b()` and a() and b() has side-effects that depend on the order they're evaluated in, that is just a bad way of writing code. In any language.

The point here is that the compiler is allowed to interleave subexpressions of the two different arguments which causes a memory leak in seemingly safe code. That is a very unreasonable thing for the compiler to do, which is why it is a footgun. But not that it can ONLY happen if the second function throws in between the construction of the object with "new" and the construction of the shared_ptr: if the second function doesn't throw an exception, the construction of shared_ptr is guaranteed to happen.

Incidentally: those "exercise for the reader" comments are incredibly obnoxious.

dataflow · on June 23, 2021

> isn't generally a problem

generally is the key word here.

> if you write code like `a() + b()`

The code doesn't have to look like that. Add e.g. some arguments...

CJefferson · on June 23, 2021

I've frequently run into a variant of this:

    f(**x, g());

The number of stars evaluated before 'g()' (0, 1 or 2) varies between compilers, and the program I was working on would break if only one * was evaluated before g.

the-dude · on June 23, 2021

Order of parameter eval is UB in plain C

dfox · on June 23, 2021

It is unspecified, not undefined.

the-dude · on June 23, 2021

Thanks for the correction. What does it mean?

palunon · on June 23, 2021

The order of parameter evaluation is not specified by the standard, so if your program behavior is relying on it it may be incorrect. But relying on it doesn't instantly make your whole program completely meaningless, nasal demons and all.

> Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine.

> Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. —end note ]

bestouff · on June 23, 2021

"Unspecified behavior" means something correct will happen, but you don't know what (in that case you can't rely on the execution order). "Undefined behavior" means something entirely incorrect may happen (crash or totally unexpected and bad behavior, so the term "nasal daemons" which means dragons coming out of your nose).

barrkel · on June 23, 2021

If it was undefined, then anything could happen (nasal demons), but that would be silly: you wouldn't be able to safely call functions with multiple side-effecting arguments at all.

Unspecified means that you can't rely on the order of evaluation. "Anything" may not happen; the compiler controls the sequencing.

blake1 · on June 23, 2021

Agreed. This code just looks weird to me. Throwing from a constructor? Holding the result of new() in a temporary? This is just asking for trouble.

Another favorite “criticism” I have of C++ goes along the lines of “but someone could overload operator+ to be division!” I have only seen that as a contrived critic’s example, or a joke.

Koshkin · on June 23, 2021

> Throwing from a constructor?

Actually, that’s what you are supposed to do, as this is the only way for a constructor to let the caller know that something went wrong.

remram · on June 23, 2021

There is always the option to expose the "fallible construction" as a factory function instead of a constructor. Then you can return errors as needed.

TeMPOraL · on June 23, 2021

That's an option, but now you're using a factory method instead of a constructor. It's a valid choice, particularly if you want to go for exception-free style - but GP's main point was that throwing from a constructor is not only not a footgun, it's actually the correct practice if you're writing RAII style.

jnwatson · on June 23, 2021

And then every object has one more state.

Thorrez · on June 23, 2021

You mean the factory function has to return some invalid state of the class? Not so.

The factory could return std::optional<Foo> or std::unique_ptr<Foo> or std::variant<Foo, Error> or absl::StatusOr<Foo> .

jcelerier · on June 23, 2021

Your program will still have as many possible memory states as if you had a bool m_isValid; in your type - as now you don't store a Foo but an optional<Foo>.

It's an improvement in terms of code reuse but not in semantic simplification basically.

pwdisswordfish8 · on June 23, 2021

You realise you can just unwrap an optional after checking it’s not nullopt, right?

jcelerier · on June 23, 2021

Not if it's part of the state of your app ? E.g. a struct member. It also replaces an automated and enforced action by the compiler (exception being thrown) by a need for a manual check which feels very 1975

Thorrez · on June 23, 2021

>Not if it's part of the state of your app ? E.g. a struct member.

I'm not sure what you mean. Here's an example with the output from std::optional being unwrapped and put in a struct member: https://godbolt.org/z/Ms4Kar18j

Also there's opt.value_or() which can use a default value in the case opt is nullopt.

>It also replaces an automated and enforced action by the compiler (exception being thrown) by a need for a manual check which feels very 1975

I think remram's goal was to avoid exceptions. So this is a criticism of remram's goal rather than the method to achieve that goal.

Also, std::optional can integrate with exception-producing code because opt.value() throws an exception if opt is nullopt.

jcelerier · on June 23, 2021

> I'm not sure what you mean. Here's an example with the output from std::optional being unwrapped and put in a struct member: https://godbolt.org/z/Ms4Kar18j

And now you have to do this for every type that have some amount of preconditions. I don't understand in which universe this is sane - more code == more bugs.

e.g. here's what one would write with exceptions: https://godbolt.org/z/evMsh6383 (and, to be honest, if one is writing e.g. a command-line tool and not a reactive gui app, most likely https://godbolt.org/z/eW8YM5Gz4 as your OS will catch the thing anyways and then it's just a `coredumpctl gdb` / "open in debugger" away) ; you get the same guarantee of never having an invalid Foo but at much less mental cost.

It also prevents you of having aggregates as all the "parent" owner class will need their own wrapping in optional + private ctor, in case a sub-sub-sub-sub field would fail.

e.g. given

    struct DomainObject {
      Foo mainFoo;
      Foo secondaryFoo;
      int whatever;
    };
    struct GameState {
      std::vector<DomainObject> objects;
    };

now you can't just do DomainObject{initForMainFoo, initForSecondaryFoo, 123}; or DomainObject{.mainFoo = "", .secondaryFoo = "", .whatever = 123}; anymore, even less GameState{{DomainObject{...}, DomainObject{...}}; which is how modern C++ is meant to be used

pkolaczk · on June 23, 2021

Exception handling may indeed end up with less code, but often it is because the developers forgot to handle most error situations properly. And that's my biggest gripe with exceptions, particularly the unchecked ones, which can just pop up from 100 layers below. I've just seen it too many times when developers just throw exceptions and let them bubble up to a place where there is so little context that the final error message presented to the user is near useless.

On the other hand return values (using something like a try/either monad) have the property of forcing the developer to think about the unhappy path in every layer. Maybe a bit more code, but the quality of diagnostics - priceless.

IMHO exceptions/panics should be reserved for bugs only (eg assertions), and return values for all the other error handling caused by user input or environment.

jcelerier · on June 23, 2021

> I've just seen it too many times when developers just throw exceptions and let them bubble up

but it's the whole point ! that's exactly why exceptions are good !

> to a place where there is so little context that the final error message presented to the user is near useless.

there are only two valid cases: there's an error that you know you are able to recover from, you did recover, and the users does not see a message and never knows that an error happened, OR the user sees the message "The program has encountered an error. A backup of your data has been saved in c:/foo/. The program will now exit.".

Anything else makes software terrible to use. If you have logging and recovery to do, use your logging architecture and core dumps, don't misuse exceptions for that.

> On the other hand return values (using something like a try/either monad) have the property of forcing the developer to think about the unhappy path in every layer. Maybe a bit more code, but the quality of diagnostics - priceless.

Except what happens in practice is that people just put panics everywhere because most of the time there is no meaningful action one can take at the layer where the implementation happens. Actual examples in servo: https://pastebin.com/0DAFU9vS

I also invite you to run the following command:

    $ cd ~/.cargo
    $ rg panic! | wc -l

I get 2987 as a result. Choosing to program like this instead of using exceptions which at least can be recovered by the person who will use your lib, is literally hateful towards users.

I've not seen Go code to be meaningfully different either.

> IMHO exceptions/panics should be reserved for bugs only (eg assertions), and return values for all the other error handling caused by user input or environment.

Sure, this is the community consensus in C++. It's the Python people who use exceptions to get out of for loops :)

But the most important rule is that - invalid state must not be representable in the program. There must be no way to get an "invalid" object (unless the object being invalid is part of its possible domain, but this is exceedingly rare when one isn't interfacing with a 1985 C API)

pkolaczk · on June 23, 2021

> but it's the whole point ! that's exactly why exceptions are good !

Good for what? For crashing the program with a stacktrace? Probably. But for regular error handling - that's a terrible thing. They break normal control flow by adding a parallel / alternative control flow. Now with them you have to think on what happens if any of the code you call suddenly throws an exception and the number of potential control flow paths increases drastically. They are not referentially transparent and it is nearly impossible to do functional programming with them (they introduce side effects everywhere). You can also end up in a really bad situation when a destructor throws (typically crash).

> I get 2987 as a result. Choosing to program like this instead of using exceptions which at least can be recovered by the person who will use your lib, is literally hateful towards users.

But panics are Rust equivalent of exceptions. That's the whole point of them. They are used for exceptional cases, which are impossible to recover from (which means the only way to deal with is to dump core and terminate) and most of the time preventable (you can always avoid dividing by zero or going out of array bounds).

It looks like we agree they should not be used for regular errors, like e.g. file not found. I'd not like my word processor to terminate with a core dump when it couldn't load the file from the disk. In this case return values are way better error handling mechanism.

> invalid state must not be representable in the program

Returning an option/try/either strikes that goal just as well as exceptions.

_0w8t · on June 23, 2021

You can recover from panic! in Rust on the thread level.

pkolaczk · on June 23, 2021

Yeah, I know I can recover, that's why I said they are equivalent of C++ exceptions (which can be caught as well). But this is not a recommended idiomatic way of handling expected errors, e.g. input errors, improper configuration, I/O errors etc.

rualca · on June 23, 2021

> And then every object has one more state.

No, not really. Either the allocation followed the happy path, or one of the expected failure modes kicked in.

And the supposed footgun example boiled down to failing to handle one of the failure modes.

I mean, think about it: std::make_shared is a factory method, isn't it?

Koshkin · on June 23, 2021

Not always, e.g. not if your class includes a member that is a reference (and, say, aggregate initialization is not allowed for one reason or another).

Thorrez · on June 23, 2021

It seems to work for me: https://godbolt.org/z/68s8qsP6K

layer8 · on June 23, 2021

That doesn’t compose in a class hierarchy, though.

remram · on June 23, 2021

That's a good point... You might be able to work around that using a template function factory e.g. `make_foobar<ChildrenFooBar>()`. I'm not sure what's a good pattern here, I don't write much C++.

gpderetta · on June 23, 2021

As long as the every element of the hierarchy is noexcept-movable it still composes. It is just very cumbersome.

_0w8t · on June 23, 2021

You can also use out parameters to tell about errors.

johannes1234321 · on June 23, 2021

When doing that the better approach is to use a factory function wrapping this.

However mind that if you don't throw from C++'s perspective the object is fully created, so that the destructor will run.

If you throw an exception from the contractor, the object won't be fully created.

_0w8t · on June 23, 2021

In practice I have not found that running destructor is problematic. In many cases the destructor is the default one that just destructs the fields. When the destructor should be non-default the constructor can leave the object in the move out state either explicitly or via hacks like using T tmp(std::move((*this)) on the error return in the constructor.

Koshkin · on June 23, 2021

This won't work with third-party callers.

cglodt · on June 23, 2021

Throwing from a constructor is a fundamental way to make it impossible to construct invalid instances of a class. It's great, especially for immutable classes. Throwing from constructor + immutability = no invalid states ever.

LAC-Tech · on June 23, 2021

> Another favorite “criticism” I have of C++ goes along the lines of “but someone could overload operator+ to be division!” I have only seen that as a contrived critic’s example, or a joke.

Yeah there's definitely criticisms you can throw at C++, but this always struck me as stupid. You could also write an `add` method that divides in Java.

I've never liked the idea of 'operators' in the first place, tbh. Scheme opened my eyes in that they're all just procedures. And then with Smalltalk and Scala I saw that they can all just be methods as well.

TwoBit · on June 22, 2021

> Is now impossible

I don't see how that is so. C++ 17 allows new Widget to complete and then the priority() call to execute and throw before both are passed to shared_ptr(), thus creating a leak. Your cited example [1] doesn't leak because both arguments are shared_ptr. Seems to me that C++ 17 does indeed solve this latter case but not the former.

dvt · on June 22, 2021

Per [1]:

> evaluations of A and B are indeterminately sequenced: they may be performed in any order but may not overlap: either A will be complete before B, or B will be complete before A. The order may be the opposite the next time the same expression is evaluated.

And:

> 21) Every expression in a comma-separated list of expressions in a parenthesized initializer is evaluated as if for a function call (indeterminately-sequenced)

Emphasis mine. What the above basically says is that in the case of some function `f(A, B)`, the arguments `A`, and `B`, are what's known as "indeterminately-sequenced" -- this mean that their execution cannot be interleaved (overlap) -- but they still individually execute in a non-deterministic order (A before B, and sometimes B before A)!

With that said, the good news is that B can now never throw in the middle of A, which is precisely what we have in OP's example.

[1] https://en.cppreference.com/w/cpp/language/eval_order

_huayra_ · on June 22, 2021

Whew thanks for looking this up! I was afraid I'd have to add yet another entry to my gigantic C++ footguns.org document. It's getting so big now that Emacs struggles a bit to load it due to inline code examples!

shric · on June 22, 2021

At first I thought that was a domain and not an org file. :)

Looks like footguns.com is taken but footguns.org is still available.

_huayra_ · on June 22, 2021

Gotta register a 501c3 to help save poor C++ programmers from the many footguns. Donate today!

nneonneo · on June 22, 2021

Your footguns.org file legitimately sounds like something that other C++ programmers would find useful. I for one, being a rank novice at the language (despite attempting to use it for the last ten years!), would certainly appreciate a reference like that.

_huayra_ · on June 23, 2021

Most of it is basically from cppcon talks and Scott Meyer's books, but the advice I basically give to incoming C++ programmers today is this:

* Start with C++20 unless you have a very good reason not to. Not only does it obviate crufty old things like SFINAE (concepts!), but it includes a ton of usability fixes (e.g. reflexive operator== -> I only have to write the code for MyType == MyOtherType in order to get both that version and MyOtherType == MyType). A lot of lambda behavior has really been cleaned up too (e.g. capturing `this` has fewer corner cases).

* start with Google's coding guidelines [0], as they've been developed to avoid many footguns (the bottom line depends on it). Once you understand things better, keep removing these rules (e.g. about exceptions, const& parameters; many of these rules are good for big teams, but annoying to follow for individuals)

* The CppCon "back to basics" talks are absolute goldmines, and the speakers usually tell you if some technique has been outdated as of the recording. Some things in certain books, although valuable, may be outdated based on later revisions to the language.

* The best notes to take are usually some structure list followed by a bunch of godbolt links with examples, e.g. this one I made to demonstrate the reflexive operator== behavior [1]

[0] https://google.github.io/styleguide/cppguide.html [1] https://godbolt.org/z/scPdcaWT7

pjmlp · on June 23, 2021

I will do that when I am done reading Java Puzzles and JavaScript the Good Parts.

nyanpasu64 · on June 22, 2021

> both are passed to shared_ptr()

priority() is not passed to shared_ptr(), only to processWidget().

TwoBit · on June 23, 2021

OK, I must have mistaken what the original was, because I thought it was like this:

processWidget(new Widget, priority());

I think I saw the above code elsewhere.

ronyclau · on June 23, 2021

IIRC, this may also leak when `priority()` throws due to evaluation order. (Not exactly sure now, as I now always use `make_shared` whenever possible.)

However in the Effective C++ example, the `shared_ptr` constructor gave a false sense of security as it seemed the `new`-ed `Widget` was always managed by the smart pointer from its allocation.

armchairhacker · on June 23, 2021

I always wonder if compilers actually do weird optimizations taking advantage of semantics like those.

Another example is when undefined behavior occurs, like (iirc) signed integer overflow. Compilers can technically do anything they want, including weird things. But how often does that actually happen?

rocqua · on June 23, 2021

If you write something like

    int a;
    ...
    int b = a
    if (a + 1 > b){
       foo();
    }

Then compilers will use the fact that integer overflow is undefined behavior. This allows the compiler to assume the `if` statement will always be true. And hence it removes this check.

This isn't just the compiler removing a check the programmer wanted. This could occur because of constants, where other constants make the check meaningful. It could also occur after in-lining a specific function call.

Really, the only case where this optimization would be bad is if the author was trying to detect overflow. Sadly, detecting if something would cause undefined behavior in C code is hard. Because you cannot try the thing.

SAI_Peregrinus · on June 23, 2021

Others have mentioned that compilers take advantage of these things for optimization, but haven't mentioned that compilers will still often do unexpected things when encountering UB with optimizations off.

Turning optimizations on doesn't suddenly undefine any behavior, it just changes what the compiler does when encountering some of those behaviors (among other, non-UB based optimizations). A non-optimizing compiler and an optimizing compiler both have to conform to the standard.

blake1 · on June 23, 2021

Different architectures have their own function passing convention. It’s convenient to evaluate left-to-right, pushing as you go, if that matches the arch. And vice versa.

tsimionescu · on June 23, 2021

That would explain why you want it implementation-defined, but leaving it indeterminate means that the same piece of code is allowed to have different evaluation orders each time it is encountered. This is much stranger, and very specific to C and C++, I don't think there is any other language that doesn't specify the order of execution.

vnorilo · on June 23, 2021

If you think about how it interacts with inlining, it's easy to see how optimizers like this freedom. Suppose there's just one call that may throw and the others are pure functions. The compiler could bunch the pure ones together to optimize register and stack allocation.

atq2119 · on June 23, 2021

This is the correct answer, plus add loop unrolling etc.

It is simply not possible to write an optimizing compiler in which the evaluation order is truly indeterminate yet always the same.

tsimionescu · on June 23, 2021

I was claiming it could be left implementation defined instead of unspecified. That way, code for particular platforms could be evaluated in different order, but it would always be deterministic for a particular compiler.

vnorilo · on June 23, 2021

The optimized evaluation order would depend on the situation near at the call site, not the architecture or platform.

tsimionescu · on June 23, 2021

The GP that my post was replying to was claiming it was arch dependent, and I was pointing out that, if it had been, they would have likely not left it unspecified. So, I think you and I are in agreement.

tsimionescu · on June 23, 2021

I would be truly shocked if this optimization ever mattered in practice. It would mean that rewriting code from

  {
    auto& a = <expression>
    auto& b = <expression>
    foo(a, b);
  }

To

  foo(<expression>, <expression>);

Could be an optimization.

Edit: modified the code a bit to make the two samples more similar.

gpderetta · on June 23, 2021

During the standardization of C++17, a proposal specifying the left-to-right order of evaluation of parameters (and other operations) came up and the committee got very close to accepting it, but in the end there were concrete examples in actual code that was worse of because of the proposal, so it was unfortunately weakened at the last minute.

simiones · on June 23, 2021

Looking at the proposal in question [0], it seems very sad that the alternate, not recommended, choice was made instead [1] , especially since the reasoning seems to have not been put down in writing anywhere. The paper does mention that the difference in performance was <4%, with both improvements and worsening being seen by the VC++ implementers.

[0] http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0145r3....

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p040...

vnorilo · on June 23, 2021

Why shocked? Certainly a micro-optimization, but order of things often matters if the compiler cannot prove it's safe to speculate about the execution order. Maybe the surprising thing here is the theme of TFA, that the order of argument evaluation is not defined.

simiones · on June 23, 2021

Given the out of order nature of processors, and the relatively limited scope for complex expressions given as function arguments, I would not expect this to matter in real-world programs. It seems this was also the opinion of the people drafting the new evaluation order changes [0] in C++17:

> We do not believe that such a nondeterminism brings any substantial added optimization benefit, but it does perpetuate the confusion and hazards around order of evaluations in function calls.It perpetuates unnecessary confusion around brace-initialization vs. direct initialization using parenthesis.

> We found that some entries in the benchmark suite ran slower, others ran faster compared to the scenario where the evaluation of the argument list is left unspecified. The variation is between -4% and +4%. It is worth noting that these results are for the worst case scenario where the optimizers have not yet been updated to be aware of, and take advantage of the new evaluation rules and they are blindly forced to evaluate function calls from left to right. It is clear that the left-to-right evaluation strategy is triggering new optimization paths (different inlining decisions and different register allocation) affecting the variations in the benchmark performance. It appears those opportunities have not traditionally been exploited, even though permitted under the unspecified order regime.

Unfortunately, they seem to have been overruled by the Core Working Group [1], for reasons I have not been able to dig up.

[0] http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0145r3....

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p040...

mannerheim · on June 23, 2021

Haskell? Expressions have to be evaluated in order to evaluate expressions that depend on them, but otherwise the compiler is free to decide the order of evaluation.

> In this case, it's pretty easy to see that 3 will have to be printed after both 1 and 2, but it's unclear whether 1 or 2 will be printed first. The reason for this is evaluation order: in order to evaluate one + two, we'll need to evaluate both one and two. But there is nothing telling GHC which of those thunks should be evaluated first, and therefore GHC is at full liberty to choose whichever thunk to evaluate first.

https://wiki.haskell.org/Evaluation_order_and_state_tokens

tsimionescu · on June 23, 2021

Well, Haskell's lazy execution model and implicit purity makes this far different from C++. The examples there all rely on the guarantee-breaking unsafePerformIO, while in C++ you can break your program with perfectly safe,c normal C++ code, such as creating a shared pointer and throwing an exception.

Note that in Haskell evaluation order is not guaranteed anywhere, even between separate statements, whereas in C++ it is basically guaranteed everywhere except function calls.

mannerheim · on June 23, 2021

unsafePerformIO was there to make the indeterminate evaluation order visible, but isn't what causes that indeterminate evaluation order. You are right, though, that for 'safe' Haskell code the evaluation order shouldn't matter, although it is sort of a matter of convention; 'head' is unsafe, but is presumably now permanently stuck in the language, while on the other hand perhaps one could write C++ without exceptions, but that wouldn't be 'normal' C++ code.

tsimionescu · on June 23, 2021

Sure, my point was that, as far as I understand, even Haskell code that performed IO would have a well-determined order, if it was doing normal IO; whereas C++ code that does something as simple and safe looking as foo(i++, i) may produce different results in different runs (it was even UB before C++17, and still us in C; and even other common idioms, such as function chaining, had undefined execution order).

mannerheim · on June 23, 2021

Yes and no. The order IO is performed is well-defined (that's the purpose of the IO monad), but the evaluation order is not (apart from ordering arising from data dependency), and this is the case with or without unsafePerformIO. You just won't get any different behaviour with safe Haskell code, and Haskell is very much on the safe side of things, so even though the order of evaluation may be indeterminate, you would never notice it (maybe a timing attack is possible?). But this is admittedly hair-splitting.

vnorilo · on June 23, 2021

Most optimization uses of UB that I've seen is to take full liberty to remove branches and instructions. For example, if the result of UB is used in a comparison, the compiler is free to pick whichever side of the branch (even inconsistently) and elide. It is actually quite common with for example loop unrolling and vectorization (ignoring signed overflow of the indvar is a huge boon)

lokedhs · on June 23, 2021

In fact, the compile is free to assume that the comparison itself will never happen and not emit any of the paths.

UncleMeat · on June 23, 2021

The answer is yes. Compiler authors absolutely take advantage of these things for optimizations.

As for paranoia about undefined behavior? Compilers won't emit a program to call you a pizza or format your disk. There are risks, but those aren't real ones.

a_t48 · on June 23, 2021

A bit of a nit - your storage driver could absolutely emit a program that formats your disk due to undefined behavior.

UncleMeat · on June 23, 2021

If you already have a problem that contains instructions that are manipulating your disk in these sorts of ways then this is within some reason. But the usual hyper-concern about UB is that it'll take a program that otherwise does completely different things (imagine some CLI utility that just reads from stdin) and instead produce some maximally adversarial program that ruins your life. This ins't a real concern.

a_t48 · on June 23, 2021

Agree :)

cratermoon · on June 22, 2021

There was a post here not too long ago about how C/C++ compiler writers have consistently taken the wiggle room of "undefined behavior" to do terrible, terrible, damage to otherwise fine code. https://news.ycombinator.com/item?id=27221552

saagarjha · on June 22, 2021

In this case there is no undefined behavior, the order is just unspecified.

cratermoon · on June 23, 2021

> unspecified

how is that different from undefined?

Kranar · on June 23, 2021

Undefined behavior is unbounded in both space and time in terms of program semantics, that is, a program that will engender undefined behavior may do so at any point during the course of program execution (unbounded in time) as well as may be observed within any memory region (unbounded in space). A colloquial way of saying this is that a compiler is free to do anything it wants anytime it wants in the presence of undefined behavior.

Unspecified behavior is bounded both in space and time. Unspecified behavior may only affect program semantics within the statement in which it is engendered and may only be observed within the region of memory that is written to or read from.

It is often said that undefined behavior is invalid, and one can sympathize with that statement as it's commonly repeated, but it's not true. All the standard says about undefined behavior is that the standard imposes no requirements about program behavior, it does not state that the program is invalid or that the program must behave in some kind of invalid manner. Almost all non-trivial C++ programs engender undefined behavior in a manner documented by the compiler and do so in a safe and principled way.

saagarjha · on June 23, 2021

The existence of undefined behavior indicates that the program is invalid. Unspecified behavior is just something that is not precisely defined.

mhh__ · on June 23, 2021

In English, that's a reasonable point but within the C++ standard behaviour is declared to be specifically undefined in the text - that is to say, we are telling you, you may write this construct however said construct is not C++

rocqua · on June 23, 2021

Note that this post was heavily discussed and disputed. The question "is undefined behavior treated correctly?" is rather contentious.

It certainly is the case that most compiler writers have decided this is okay. And I personally tend to agree with them.

eklitzke · on June 22, 2021

The author writes "this might be why there is std::make_shared", but that's not really why it exists. When you create a std::shared_ptr it needs to allocate memory for a control block structure (which tracks the reference count) as well as the memory for the object itself. If you write std::shared_ptr<Foo>(new Foo) this will result in two memory allocations, one for the Foo object and the other for the std::shared_ptr control block. If you write std::make_shared<Foo>() then a single allocation will happen which has enough space for both the control block and the Foo object, and then placement new is used to initialize the memory for both structures. So std::make_shared exists to reduce the number of memory allocations that are being made, not for the reason suggested here.

haldean · on June 22, 2021

There's a crazy downside to make_shared that I learned recently because of this: if you have a weak pointer to a shared thing, and the refcount for the shared thing drops to zero, the weak pointers will keep the allocation for the object "alive", because they still need access to the remnant and the remnant was created in the same allocation as the object so they can't be freed separately. So now I only use make_shared if I know for sure there won't be a weak_ptr pointing at it (or if the base object has a relatively small memory footprint after it's been destructed).

Asooka · on June 22, 2021

Won't the object be destructed though? So while the memory for the object is kept, the memory for the object's heap-allocated members is not. I.e. if you have a 100mb string, after only weak references are left you won't have 100mb memory taken, but only sizeof(string) + sizeof(control block).

nneonneo · on June 22, 2021

Depends on the object. As OP notes, `make_shared` with weak pointers is fine "if the base object has a relatively small memory footprint after it's been destructed".

There's lots of cases where the object itself is big, though. Think of objects with big fixed arrays, "god objects" with a bajillion pointers, or objects which themselves allocate data in-line.

ot · on June 22, 2021

Yeah, that's definitely something to be aware of. It's usually not an issue as most objects have small footprint (and any allocations they in turn hold would be released when the strong refcount goes to 0).

quietbritishjim · on June 22, 2021

It exists for both reasons.

That's why there's also a std::make_unique even though it doesn't have a control block (although it was added later, but that was just an oversight).

MaxBarraclough · on June 22, 2021

> It exists for both reasons.

I believe this is correct. Here's [0] some old Boost documentation on Boost's make_shared which inspired the C++ standard's make_shared. It mentions both reasons.

[0] https://www.boost.org/doc/libs/1_67_0/libs/smart_ptr/doc/htm...

asveikau · on June 22, 2021

I think the idea that you are also relying on a presumably well tested library to get the exception corner cases right is also noteworthy. Memory leaks on allocation failure are pretty common in naive code, and a good thing to handle in a library where it can get well thought out.

Koshkin · on June 22, 2021

This is very important. Try to hide most of complexity in a library, and then unit-test the hell out of it.

overgard · on June 22, 2021

My (least) favorite footgun is "auto" when it comes to references.

If you want to get a pointer from something, you would write this:

    auto fooPtr = mywidget.getPtr();
    fooPtr->doCoolStuff();

But we all know references are better than pointers right? So we should just write...

   auto fooRef = myWidget.getRef();
   fooRef.doCoolStuff()

The problem is... this is valid and will probably not do what you want. It will make a copy. What you want is actually

   auto& fooRef = myWidget.getRef();

I once spent an entire day debugging a bizarre crash because of that. I was asking for a reference to a scene graph, and what was actually happening is I was getting a clone of the scene graph (which was an object that couldn't be safely copied), and then destroying a bunch of shared pointers when the function returned. Fun times.

jchw · on June 22, 2021

As others have pointed out, the idiomatic solution here is to delete the copy constructor if an instance is unsafe to copy — however, I suspect the reason why auto behaves differently from other inference in requiring this to be explicit is probably something along the lines of making it harder to have a dangling reference. It’s shockingly easy to wind up with a dangling reference with fairly innocuous code, something like QString().toUtf8().data() so maybe this makes sense. (Doesn’t help for that case since it’s a pointer to raw data, but you get the picture.)

overgard · on June 22, 2021

I think, all things considered, the way it works is probably the best way, since you might want to use "auto" to actually make a copy. It's just very surprising, because when I have explicit types I'm used to looking for that sort of error, but with an auto I wasn't (at the time) used to looking for that.

nyanpasu64 · on June 22, 2021

Worse yet is an object which is safe to copy in some cases, but you intended to not copy it at the moment, and you create a dangling reference to a field within. I wish C++ copy constructors were explicit by default. (But declaring the copy constructor explicit turns off aggregate initialization, so I can't do that.)

bregma · on June 22, 2021

Wait, you had an object that was unsafe to copy, but you told the computer it was safe by not deleting the copy constructor? I guess it should have done what you wanted, not what you told it.

suprfsat · on June 22, 2021

Simply by using C++, you tell the computer all sorts of things, such as "I know the entire language by heart", and "I have a death wish".

owl57 · on June 22, 2021

While C++ is a dangerous tool indeed, ignoring the rule of 5 is analogous to ignoring the safety guidelines, not simply using the tool.

yongjik · on June 22, 2021

A C++ object can be perfectly safe to copy in one situation and conjure up nasal demons in another situation. I don't think rule of 5 would help in that case.

flqn · on June 23, 2021

If the copy behaviour is that different between the two cases it should be two different classes.

MaxBarraclough · on June 23, 2021

I'm reminded of a quote about C, sadly I forget where it's from:

> It is fundamental tenet of C's philosophy that the programmer is always right, even when they are wrong.

overgard · on June 24, 2021

Honestly I think that's what makes it (and C++) a great tool, despite the frustrations. I'd rather have a footgun than a nerf gun.

MaxBarraclough · on June 29, 2021

> I'd rather have a footgun than a nerf gun.

That's not really the choice we're faced with though. Ada and Rust are both far safer than C, and they're also plenty powerful.

Dylan16807 · on June 22, 2021

> told the computer it was safe

> by not deleting

I think there's a problem here.

junon · on June 23, 2021

Probably one of the only common language-lawyer pieces of criticism of C++ I fully agree with. I'm a pretty firm believer that a language should be as constrained as possible with the ability to make it explicitly less-constrained.

C++ fails at this pretty spectacularly.

initplus · on June 23, 2021

Rule of three is such a common C++ concept though, I would be surprised if anyone with more than a few weeks experience with the language is still running into such issues.

overgard · on June 24, 2021

I've been coding C++ off and on since 1998. But not all the code I use is written by me... not even most of it. I'd love it if everyone followed best practices... but they don't.

rualca · on June 23, 2021

> I think there's a problem here.

There is a problem indeed, which is OP's failure to implement exception-safe code.

If someone opts to use exceptions, they have the responsibility of writing their code in a way that complies with scenarios involving throwing exceptions.

Failing to support a scenario by not following the most basic principles is something that's on the person writing the code, not the language.

It would make as much sense to blame java for the problems you create by not initializing objects in some code paths, because your code started to throw null pointer exceptions.

Dylan16807 · on June 23, 2021

Are you talking about the same thing as everyone else?

auto fooRef = myWidget.getRef(); vs. auto& fooRef = myWidget.getRef();?

That's not a problem caused by exceptions.

If you're talking about the article, the standards committee seems to think it's a confusing interpretation that never should have existed, and it's pretty ridiculous to say it's "basic principles".

contravariant · on June 22, 2021

Yeah I learned that the hard way when I found out that std::vector apparently feels free to move your objects around in memory for you, no matter if you point to them from somewhere else.

I mean in hindsight it's obvious, but still not exactly what I wanted to happen.

For reference, what I did does actually work, temporarily.

tines · on June 22, 2021

> Yeah I learned that the hard way when I found out that std::vector apparently feels free to move your objects around in memory for you

In standard terminology, this is described as "invalidating iterators". There are a bunch of member functions in std::vector that either do or don't invalidate iterators, e.g. push_back(...) does but size() doesn't. And as the name implies, if you call a function that invalidates iterators, all your existing iterators/pointers/references become invalid.

Kranar · on June 22, 2021

Iterator invalidation is different from this. Iterator invalidation, as the name suggests, only applies to iterators. The problem OP was having had to do with dangling references.

Some containers guarantee references won't dangle on mutation such as unordered_map. An unordered_map may invalidate iterators if objects are added or removed, but will never result in dangling references (unless the object is removed). That is, it is safe to have a pointer to an object owned by an unordered_map and continue using that pointer even after an iterator to that same object is invalidated.

UncleMeat · on June 22, 2021

This has its own problems. Pointer stability limits the internal implementation so much that unordered_map is embarrassingly slow given what we know about modern hashmap design. There is a good reason why the swisstables that Google built dropped this behavior.

einpoklum · on June 22, 2021

Iterators are historically a generalization of pointers. A pointer is a kind of iterator.

I mean, literally. The iterator of an std::vector<T> (not a bool vector) is a T*.

Kranar · on June 22, 2021

This is a poor understanding of what an iterator is. For one, the iterator of a std::vector<T> is not a T*, it's a std::vector<T>::iterator and you are welcome to verify that the following assertion fails for all T:

    static_assert(std::is_same_v<T\*, std::vector<T>::iterator>)

Link provided below for convenience [1].

Furthermore just like a cat is animal does not imply an animal is a cat, a pointer being an iterator does not imply that an iterator is a pointer and holding such a confusing thought is the source of many bugs and poor C++ code. An iterator not only represents a form of indirection to an object, it also represents traversal through a collection of values.

unordered_map guarantees that the objects it stores will remain alive until the unordered_map is destroyed or the object is removed from the map. However, unordered_map, as the name once again suggests, does not provide any guarantee about the ordering of objects. Hence, if you have an iterator to an object owned by such a map, and you add or remove objects to that map, the iterator becomes invalid not because the object the iterator references gets destroyed, but because the iterator loses information about where it's located within the unordered collection.

For further information, please review the following link [2], specifically the quote "References and pointers to either key or data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated.".

[1] https://godbolt.org/z/Y5f4d74Kb

[2] https://en.cppreference.com/w/cpp/container/unordered_map

saagarjha · on June 23, 2021

You're being far more patronizing than is necessary here, especially because the parent poster was correct in iterators originating out of a being a generalization over pointers, although not in their latter statement. But, while implementation defined, a vector iterator must behave identically to a pointer, and your assertion fails in both Clang and GCC not for any interesting reason but more because they wrap the pointer into a class for what I understand to be better encapsulation and earlier detection of nonportable code, not because pointers are somehow unfit to do the job of std::vector<T>::iterator. It so happens that iterators have become a useful way to generically iterate over noncontiguous containers as well, but it is very clear that they are "logically" meant to mirror the usage of a pointer (down to the operators you use to work with them) and suggesting that holding this opinion is a sign of inadequacy is condescending.

Kranar · on June 23, 2021

>because the parent poster was correct in iterators originating out of a being a generalization over pointers

And an animal is a generalization of a cat, but that doesn't mean an animal is a cat. What is true of a particular doesn't have to be true of the general.

> But, while implementation defined, a vector iterator must behave identically to a pointer

No it does not. MSVC implements a vector iterator with additional runtime safety guarantees in DEBUG mode and while those guarantees are not mandatory as per the standard, they are fully in compliance, so saying they have to behave identically is patently false as MSVC's behavior is a superset of the behavior provided by a pointer.

> suggesting that holding this opinion is a sign of inadequacy is condescending.

If you care about writing correct code, then do not mix the very concept of pointers, arrays, references, and iterators with one another. They are all related to one another but have very important differences to the point that calling them literally identical to one another is a sign of inadequate understanding. It happens far too often and inevitably leads to poor code, bugs, and a fundamentally poor understanding of what these things represent conceptually.

saagarjha · on June 23, 2021

First of all, calm down. Consider that the people you are talking to may not be idiots.

Second of all, nobody is saying that pointers, arrays, references, and iterators are identical in general. They are clearly related concepts but differ in fundamental ways. However, my point was that a vector iterator needs to behave like a pointer would and can in fact be implemented as a naked pointer. Pointing to MSVC's implementation having safety checks does not change that fact, because pointing to what an implementation does on undefined behavior is clearly out of scope when discussing specified behavior. I'm not going to bring address sanitizer into this discussion to show how pointers can actually be a superset of "pointer behavior" because that would just be irrelevant. The fact remains that iterators were originally meant to act like pointers, and in many cases still do. Believing this does not mean I have a poor understanding of the language.

Kranar · on June 23, 2021

Not knowing something does not make you an idiot, there are tons of things I don't know about C++ and I make tons of mistakes. However, not knowing something and doubling down on that lack of knowledge by defending something that is wrong because you don't like the tone of the person correcting you about it is highly problematic and suggests you are prioritizing your own sensitivity on the issue and protecting your own ego instead of taking the opportunity to learn something new.

>Second of all, nobody is saying that pointers, arrays, references, and iterators are identical in general.

The comment I replied to said, and I quote:

"I mean, literally. The iterator of an std::vector<T> is a T*."

That is what I originally replied to before you felt it necessary to interject yourself into the conversation to police my tone.

>However, my point was that a vector iterator needs to behave like a pointer would and can in fact be implemented as a naked pointer.

No, you said it is required that they behave identically, which is false, once again, I quote you:

"a vector iterator must behave identically to a pointer"

There is no such requirement that they behave identically, and there is a demonstrable example of a compiler where they do not behave identically. Furthermore your talk about it being undefined behavior is false on the basis that in general, two pointers of type T* may compare with one another, for example the following is valid:

    auto v1 = new int();
    auto v2 = new int();
    v1 == v2; // This is valid.

However, this is undefined behavior.

    auto v1 = vector_a.begin();
    auto v2 = vector_b.begin();
    v1 == v2; // This is undefined behavior.

Only iterators to the same vector may be compared, which is not true of pointers.

>Believing this does not mean I have a poor understanding of the language.

I did not say you have a poor understanding of the language as a whole, I said if you believe that an iterator to a vector is identical to pointer as you have claimed, then you have a poor understanding of iterators.

I stand by that statement and I would suggest that you not be so sensitive about being wrong about a corner case of the language and instead simply accept this to be a fact, learn something new, and move on with your day. I learn new things about C++ all the time, sometimes from polite people, sometimes from jerks, it's all the same because what matters is the learning. Something doesn't become wrong just because the person who told it to you said it in a way you disapprove of.

Having said that, all the best to you Sir and thank you for engaging in this discussion. I have said all I think is appropriate given the topic.

saagarjha · on June 23, 2021

> However, not knowing something and doubling down on that lack of knowledge by defending something that is wrong because you don't like the tone of the person correcting you about it is highly problematic and suggests you are prioritizing your own sensitivity on the issue and protecting your own ego instead of taking the opportunity to learn something new.

Reminder that I was not the one who passed up the opportunity to reply to 'einpoklum with a simple "iterators can behave like pointers, but vector's iterators don't have to be pointers. You can see this here: https://godbolt.org/z/Y5f4d74Kb". I'm just annoyed that you went after this person for coming to an incorrect conclusion from what is clearly a correct position, which is that iterators were intended to "morally" be pointers, especially in cases where they iterate over contiguous containers. You are correct that I claimed that vector iterators must behave identically to pointers, but I really meant to say that this is only in cases where the iterator has behavior defined on it: I alluded to it previously when I mentioned that GCC and Clang wrap pointers in their own custom iterator classes to prevent people from accidentally using operations on them that are not legal. There's a bunch of other things that the iterator doesn't guarantee, like being directly casted to an integer.

But, stepping back a bit: do you really disagree with a claim that a vector iterator is meant to be a little pointer into the internal buffer it has, with a bit of additional restrictions on top that are reasonable to add? This seems to be a very strange thing to disagree with, and I would like to hear more about why you hold that position.

gpderetta · on June 23, 2021

FWIW I believe that the parent was pedantic but correct.

This being C++, being pedantic is a Good Thing.

The distinction between reference and pointer invalidation, and the requirements and axioms of the various iterator concepts are all valid things to point out, especially in a thread about C++ footguns.

saagarjha · on July 1, 2021

I am always down for C++ pedanticness! But in this case my issue was not with factual accuracy, but putting down the opinion of "iterators act like pointers" as coming from ignorance rather than simplification. It was mostly a complaint about tone, not content.

CRConrad · on July 3, 2021

> It was mostly a complaint about tone, not content.

You say that as if you think that makes it better.

saagarjha · on July 3, 2021

Absolutely!

CRConrad · on July 4, 2021

> > > It was mostly a complaint about tone, not content.

> > You say that as if you think that makes it better.

> Absolutely!

Indeed, you absolutely did. Which sucks, because it absolutely isn't.

HTH: https://en.wikipedia.org/wiki/Tone_policing

(See also: https://en.wikipedia.org/wiki/Sealioning )

saagarjha · on July 16, 2021

Hacker News is a place for curious conversation, rather than a place to sneer at people's ineptness for coming to reasonable (if incorrect) conclusions. You can be completely correct, but if you're going to be a petty jerk about it then you're not helping the conversation. I provided an example response that conveys exactly the same information, except you'll note mine doesn't start with "this is a poor understanding of [general concept]". If anything, "link provided below of convenience" and "please review the following link" is far more sealioning than anything I wrote, because it hides the message of "you're an idiot who needs to read up on this topic" behind faux politeness. It's the very thing that I found disconcerting about that comment, not its factual content.

einpoklum · on June 23, 2021

Ok, as you and others have pointed out - the iterator of a vector doesn't have to be T:
https://en.cppreference.com/w/cpp/container/vector

It _may_ be a T, or anything satisfying LegacyRandomAccessIterator.

> just like a cat is animal does not imply an animal is a cat,

It's not "just like" that. Among other differences between the pairs of concepts: The concept of an animal didn't arise as a generalization of cats.

tines · on June 22, 2021

Ah, interesting. Even within that dichotomy, I'd think that if references aren't invalidated on mutation, then dereferencing iterators would be valid, but incrementing them would not be.

daniel-levin · on June 22, 2021

I think if you come from the JVM / CLR world, you are so protected by the runtime’s patching up of references that it might not even occur to you that a (raw) pointer to a data structure’s internals can dangle after the data is moved around. The runtimes mentioned pause your code, move things around and even compact the heap and your references magically still point to what they did before!

ncmncm · on June 23, 2021

Except, it's not magic. It costs processing cycles, and latency, and invalidations of cache rows you were using, and memory bus traffic. Lots of all of them.

Most of the costs are not counted as runtime for your process, so benchmarks invariably appear to show GC as costing less overhead than it does. Among the costs, as with all the myriad varieties of caching done in modern systems, is that GC makes it hard to know the costs of design choices you make. Many of the costs are in making the caches less effective.

contravariant · on June 23, 2021

Well it's not so much being protected by patching up references, but more that C++ is (as far as I know) the only language that doesn't complain when you create a reference to an object that can move away.

Most languages either don't have the concept of classes or ensure that invalidating references is an explicit operation (such as calling the destructor).

rualca · on June 23, 2021

> Yeah I learned that the hard way when I found out that std::vector apparently feels free to move your objects around in memory for you, no matter if you point to them from somewhere else.

It's not as much as "feeling free" as it is having to reallocate the array because you added elements beyond it's capacity and they had to be stored somewhere.

contravariant · on June 23, 2021

Well the problem I had was that I apparently didn't forbid it from using the move constructor, after I changed that it correctly failed to compile.

So in my case the problem was very much that it took the liberty of doing something I wasn't expecting it to.

gpderetta · on June 23, 2021

Interestingly, the dangers of reference invalidation is actually one of the reasons why auto doesn't deduce references by default:

   vector<int> v = {42};
   ...

   auto x = v[0]
   v.clear();

   // if x was deduced as reference, using x at this point 
   // would be UB
   std::cout<< x;

_huayra_ · on June 22, 2021

This is why things like std::list still have value: iterator stability. You even get this in most map/sets too.

But vector is usually what one should reach for unless one has a good reason not to. The only danger really comes when the lifetime of iterators and the thing they point to become a bit decoupled, even due to insertion!

einpoklum · on June 22, 2021

You can also use an offset instead of a pointer (or a pointer to the object and and an offset into its heap storage, or whatever).

_huayra_ · on June 22, 2021

This is actually on my list of "neat utilities to write": an iterator that encapsulates this "offset" for vector.

If anyone is looking to write a custom iterator, I definitely recommend Boost's new stl interfaces: https://github.com/boostorg/stl_interfaces

nly · on June 23, 2021

Or you could just use Boost stable_vector and save yourself the trouble

https://www.boost.org/doc/libs/1_76_0/doc/html/boost/contain...

gpderetta · on June 23, 2021

deque is also good for reference stability when doing push_backs (but of course not when inserting in the middle).

secondcoming · on June 22, 2021

No, everyone gets LHS auto wrong at the start. They don't realise it makes a copy. I have to point it out every other code review.

jcelerier · on June 22, 2021

The rule is simple: auto behaves like a template parameter

Kranar · on June 22, 2021

There is nothing simple about that (not to mention it's not actually true either).

jcelerier · on June 22, 2021

Yes it is ? Quoting the standard:

> If the placeholder-type-specifier is of the form type-constraint auto, the deduced type T' replacing T is determined using the rules for template argument deduction.

https://eel.is/c++draft/dcl.spec.auto

Now if you program in C++ regularly and don't know that in

    template<typename T>
    void f(T t) { }
    f(something);

does a copy of something, I don't know what to say - I have never met any professional c++ programmer not knowing the language works that way

Kranar · on June 22, 2021

You need to continue reading that paragraph to the very end and note the exception for when an initializer list is used. The following are not equivalent nor do they apply the same rules:

    template<typename T>
    void f(T) {}

    f({1, 2, 3});
    auto x = {1, 2, 3};

>Now if you program in C++ regularly and don't know that in...

I program in C++ daily and no, I didn't know that a copy is made. What I do know, as an actual professional C++ programmer, is that whether a copy is performed is incredibly complex and depends on numerous factors such as whether a move constructor exists and "something" is an lvalue but not an rvalue reference, or if "something" is an rvalue (which is not the same as an rvalue reference) and T defines a move constructor.

And finally if something is the same type as T and T has a copy constructor, then the final question is whether copy elision will be performed, which the standard specifies is a valid optimization even if said optimization would change the observable behavior of the program.

That's what I... as a professional C++ programmer know but I fully admit that C++ is such a complex beast of a language that I am almost definitely missing a few corner cases.

jcelerier · on June 23, 2021

> You need to continue reading that paragraph to the very end and note the exception for when an initializer list is used.

Exceptions don't mean that the general rule does not apply ? I don't know how it is possible to be more explicit that "the deduced type T' replacing T is determined using the rules for template argument deduction.". That'd be like saying that "priority to the right" when driving isn't a rule because there is sometimes a "stop".

> I program in C++ daily and no, I didn't know that a copy is made.

you're kidding

> whether a copy is performed is incredibly complex and depends on numerous factors such as whether a move constructor exists and "something" is an lvalue but not an rvalue reference, or if "something" is an rvalue (which is not the same as an rvalue reference) and T defines a move constructor.

but `something` cannot be a rvalue in f(something);

maybe in f(something()); or f(some + thing); but just a variable named "something", as is, passed to a template argument or auto will necessarily lead to the creation of a new value of the same type. Whether it is copied, moved, materializes three different types because people forgot to mark their constructors explicit or whatever else frankly does not matter much in my experience (after all people survived for 30 years without move semantics) - what matters is whether a new value is created (which I personally and improperly call a "copy" no matter what - I should have been more precise but it's really the most useful distinction to make in practice imho as it means that some function will be called) or just a reference to existing value which is always a no-op no matter what.

> And finally if something is the same type as T and T has a copy constructor, then the final question is whether copy elision will be performed,

I've been at it for 15 years now and I don't remember one time in non-toy-example-code where copy elision taking place or not did matter. Thinking about is is a self-inflicted problem ; expect that a copy happens, if it's slow, profile and if it's not, be happy.

Koshkin · on June 22, 2021

Ah, templates... Universal references... Good stuff.

Kranar · on June 22, 2021

Yep, except now we're supposed to call them forwarding references because the motivation Scott Meyers had for naming them universal references turned out to be incorrect in subtle ways.

Even the notable experts get things wrong when it comes to how complex and bloated C++ is.

overgard · on June 22, 2021

Well, the scene graph code wasn't code I wrote, and this was back in 2014. But sure, obviously it was an error on my part... we are talking about footguns afterall, I'm the one that pulled the trigger.

criddell · on June 22, 2021

If you develop on Windows and are using Visual Studio 2019 "auto fooRef = myWidget.getRef()" will get a little squiggle under it to warn you that you are making a copy.

overgard · on June 22, 2021

I am very happy tooling is helping to make these sorts of mixups easier to find, although sadly C++ is such a hard language to write tooling for that those kinds of nice features are pretty rare. I remember back when I had to work in Xcode a few years ago I could rarely even get basic intellisense to work.

cjaybo · on June 22, 2021

It's not that these features are rare for C++, it's that Xcode is generally a pretty bad C++ IDE (imo, of course).

In both Visual Studio and CLion, these types of features work well, in addition to plenty of more advanced features like address/thread sanitizers, fuzz testing, etc.

overgard · on June 23, 2021

You're too nice, Xcode is an atrocity. Still though, trying to unambiguously parse C++ without compiling miles of headers is not a simple thing... I find my typescript and C# tooling is so much nicer when C++ is the language I need that magic most for. VS is nice but compared to what I get with like resharper or intellij its a bit disappointing.

pjmlp · on June 23, 2021

It was already possible in the early 90's.

https://dreamsongs.com/Cadillac.html

"Lucid Energize Demo VHS 1993"

https://www.youtube.com/watch?v=pQQTScuApWk

VisualAge C++ Version 4.0

http://www.edm2.com/0704/vacpp4/vacpp4.html

https://books.google.de/books?id=ZwHxz0UaB54C&pg=PA206&redir...

However they were too resource hungry for the hardware of those days, then Java and .NET took over the spotlight, and until clang/LLVM came into the scene, the industry forgot of what was already possible.

Koshkin · on June 22, 2021

> and aren't using Visual Studio

OK, I tried this in Notepad, and it did not work.

malkia · on June 22, 2021

yeah, same with WordPad, but see Word would give you some squiggles :)

decker · on June 22, 2021

Sounds like the root cause was due to not obeying the rule of five: https://en.wikipedia.org/wiki/Rule_of_three_(C%2B%2B_program...

ncmncm · on June 23, 2021

Nowadays, Rule of Zero. Let the compiler generate all the special members implicitly, when the members can each do their own cleanup. This turns out to be quite often.

overgard · on June 22, 2021

Probably! I don't remember the code exactly, this was back in 2014 and I think most of the code I had to consume was primarily written by a 24 year old out of college (smart of course, but C++ takes a long time to get good at).

amluto · on June 22, 2021

> What you want is actually > > auto& fooRef = myWidget.getRef();

That variant has issues if getRef() actually returns a temporary object. You may want auto&&.

Asooka · on June 22, 2021

If it returns a temporary, you'll get a compile-time error, as the temporary cannot be bound to a non-const ref. I would prefer "const auto&" as you shouldn't be modifying temporaries, or "auto x{std::move(myWidget.getRef())}" if I want a mutable value and to avoid copying the return value of getRef.

gpderetta · on June 23, 2021

auto by default removes top level const, reference qualification, and decays arrays and function types to pointer, same as for template argument deduction [1]; this is also true when auto is used as for return type deduction.

You can use decltype(auto) to preserve the actual type of the rhs.

[1] as discussed elsewhere initializer lists behave differently, but initializer_list itself was a mistake to start with.

leetcrew · on June 22, 2021

I too have fallen victim to this and similar traps. it's a good idea to use a deleted copy constructor when you know it's not safe to copy the object. you can't completely stop others (or yourself) from doing bad things, but you can at least give them a moment to reflect on what they are doing.

nyanpasu64 · on June 22, 2021

I wish C++ copy constructors were explicit by default.

gpderetta · on June 23, 2021

The problem is that structures are implicitly (and trivially) copyable in C and requiring adding an explicit copy constructor would break interoperability. It would also break trivial copiability which is a different issue.

So, yes, backward compatibility and historical baggage.

Note that if you use explicit wrappers for all object-owning pointers you would get either the right behavior or a compilation error.

secondcoming · on June 22, 2021

It's not about safety, it's about efficiency. It's usually an unintended copy. For example, if your function `foo` returns a `const std::string&` the code `auto x = foo()` creates a copy.

leetcrew · on June 22, 2021

well I don't really see a way of solving that. I may or may not want to copy the object, depending on the object's lifetime and what I plan to do with it. warning on copies would be a step too far imo. I would immediately suppress that warning.

vips7L · on June 23, 2021

A copy of the reference or a copy of the string?

secondcoming · on June 23, 2021

The string,

einpoklum · on June 22, 2021

So, if I write:

int x; auto y = x;

do you expect y to be a reference? Surely not.

No, my friend, auto is for values, not references. You want a reference - you have to say so.

(And same goes for rvalue references - auto&& .)

overgard · on June 23, 2021

My point was simply that auto to assign pointers doesn't cause them to lose their pointiness, but using auto to assign a reference causes them to lose their referenceness. Yes I understand why it works the way it does. I'm not saying it's wrong just that it's been an occasional footgun.

Thorrez · on June 23, 2021

I think it would be a footgun the other way too.

I tend to think of pointer a as a general type aspect whereas reference is specifically on the variable that it's on and doesn't transfer. Sort or like how auto doesn't copy the static specifier.

Also the rules for auto are the same as for template argument deduction.

    template<typename T>
    std::vector<T> MakeVec(T val) {
        std::vector<T> vec;
        vec.push_back(val);
        return vec;
    }

    int main()
    {
        int x = 1;
        int& y = x;
        std::vector<int> vec = MakeVec(y);
    }

Notice how in this case MakeVec(y) makes a std::vector<int> not a std::vector<int&>. If you want your template argument function to take a reference, use T& not T (or use T&& which is a whole new complication). Similarly use auto& not auto.

einpoklum · on June 23, 2021

Ok, that's a fair point. It's easy to make the distinction when you think about how in C++, things become references or lose their reference'iness pretty easily, but nothing ever implicitly becomes a pointer.

jdashg · on June 23, 2021

Reference is a type qualifier, pointer is a type. Best to just internalize it, even if you'd rather it be different.

2bitencryption · on June 22, 2021

My favorite C++ footgun is creating a Vector with some items, taking a reference to, say, &vec[3], then adding another item to the vec, then trying to use the reference from the previous step.

If you write C++ it might be obvious what the problem is.

If you don't, this will absolutely ruin your entire day.

The worst part is, 95% of the time, it will probably work without issue.

But eventually, pushing a new item to the Vector will trigger a relocation of the whole vector, which will invalidate your reference and bring down production. Have fun debugging that.

nicoburns · on June 22, 2021

Yep. I learnt about this when I was learning Rust (which makes this a compile-time error). I was very glad I didn't have to learn this and the 100 other things like that seem to exist in C++ the hard way!

failwhaleshark · on June 23, 2021

Footguns might keep you prepared for the zombie apocalypse, but they'll inevitably make it difficult to walk. I'm glad I don't currently use C++ in production, but that may change soon. ):

kaik · on June 22, 2021

I kid you not, I spent a full working day debugging this exact same issue (taking a pointer to a vector element, before adding more elements). Very obvious if you understand how C++ and vectors work, yet it took me forever to realize, and it was miserable…

saagarjha · on June 23, 2021

Heh, you can run into this even if you understand how iterator invalidation works. Once you see the bug it's easy to understand, but finding it might not be…

account42 · on June 23, 2021

> finding it might not be…

Valgrind memcheck is your friend.

gpderetta · on June 23, 2021

also asan.

duped · on June 22, 2021

This, and the entire class of iterator invalidation bugs that force you to memorize which collections are ok for which applications.

nyanpasu64 · on June 22, 2021

Rust takes the approach of flat-out not letting you mutate any collection while you have any references to its contents. It eliminates all dangling pointer bugs... I don't know if it rules out any useful use cases of collections with iterator stability. I think any C++ collection holding unique_ptr is stable (pushing to the collection doesn't invalidate the target of the unique_ptr), and Rust doesn't have an safe ergonomic way to achieve that (perhaps Pin<Box<MagicCell<T>>>, but we don't yet have a MagicCell that makes &mut MagicCell<T> not noalias).

Animats · on June 23, 2021

Rust takes the approach of flat-out not letting you mutate any collection while you have any references to its contents. It eliminates all dangling pointer bugs... I don't know if it rules out any useful use cases of collections with iterator stability.

It does. Some collections have a "collection.retain" method.

   collection.retain(|v| true_if_we_want_to_keep(v))

which goes through the collection in linear time and deletes any items for which the test is false. This is O(N).

Some collections ("multi_map", which isn't used much, comes to mind) don't have that. You have to make a read pass over the collection, construct a list of things to delete, then go through the delete list and delete those items. Can't read and change with the same iterator. That's slower. But sound.

I'm OK with that. It avoids subtle bugs.

ridiculous_fish · on June 23, 2021

An illustrative example is a string set which remembers its insertion order. In C++ you may write:

    struct OrderedSet {
        deque<string> storage;
        set<string_view> vals;
    }

The deque holds the string contents, and the set holds views of those contents. This is safe because the deque has iterator stability, and it's pleasant because the `vals` set can just be a regular set.

As you say Rust does not allow this; its answer is IndexSet, and it is implemented differently, like:

    struct OrderedSet {
       Vec<String> buckets;
       RawTable<usize> hashToBucketIndex;
    }

RawTable is a hash table that doesn't know how to compute a hash or determine equality. Instead, you provide it with closures to do those tasks, but at the call site (not construction time).

Basically Rust achieves the same thing through a layer of indirection involving indexes, which is a common pattern.

duped · on June 22, 2021

The underlying pointer to a unique_ptr won't be invalidated but the iterator might. Consider if you had a vector of unique_ptrs and inserted into it within a for loop. Depending on the implementation of the iterator this may not be sound (if it's an index, you're probably ok, if it's a pointer, you're screwed).

If you wanted to do the same in Rust it would be Vec<Box<T>>. Mutating the collection won't invalidate the pointers.

nyanpasu64 · on June 23, 2021

> Mutating the collection won't invalidate the pointers.

Rust still won't let you mutate a Vec<Box<T>> while you hold a &T or &mut T borrowed from the Vec.

Perhaps it would be sound to do so unsafely; Rust would let you mutate a Vec<Rc<T>> while you hold a Rc<T> cloned from the Vec. But I'm not clear on whether Stacked Borrows allows moving the Box while you have a &mut T pointing to the same memory as the Box points to. It definitely does not allow dereferencing the Box. (I heard Stacked Borrows will be updated to make self-referential types sound, and I don't know if it will affect this situation.)

Measter · on June 23, 2021

In the case of a Vec<Box<T>> the Vec owns the box, so the only thing you could link the reference lifetime to is the Vec itself because if the lifetime isn't bound to the Vec, mutating the Vec might drop that item which in the case of Box<T> will also deallocate the memory. If any references point directly to that memory, then it would be dangling.

In the case of Rc<T>, cloning the Rc creates a second owner of the heap allocation, so the Vec dropping its copy won't deallocate. Though I believe one complication to this would be that the lifetime any references created through the Vec's copy of the Rc will be linked specifically to that copy, so would result in a borrow check error when it's dropped.

If the Vec is storing references, you can borrow the thing behind the reference and still mutate the Vec because that reference stays valid even if mutating the Vec drops its reference to the item.

    let (a, b, c, d) = (1, 2, 3, 4);
    let mut v = vec![&a, &b, &c, &d];
    
    // If you change the type here to &&u32, or let the compiler infer the type
    // then you'll get a borrow check error because the outer reference 
    // borrows the Vec.
    let borrowed_b: &u32 = &v[1];
    v.remove(1);
    
    println!("{}", borrowed_b);
    println!("{:?}", v);

https://play.rust-lang.org/?version=stable&mode=debug&editio...

[1]

pjmlp · on June 23, 2021

If you use a proper C++ compiler, the debug builds throw a runtime exception or abort on invalidation misuse, and it can be enabled for release build as well.

pizza234 · on June 22, 2021

Interestingly, AFAIK also Golang suffers from something similar: creating a slice from an array, then performing an operation on the array, that causes resizing - the slice will keep pointing to the old array data.

krylon · on June 22, 2021

You can run into the same problem in C, using malloc/realloc. realloc, in fact, makes for a nasty footgun, too (and remains, of course, available in C++).

flohofwoe · on June 23, 2021

An important difference is that in C it is usually obvious that a reallocation is happening while in the C++ stdlib, memory management is usually hidden (important to note that this isn't a design fault of the C++ language, but of the C++ stdlib, unfortunately the two are more and more entangled in newer C++ versions). Because of the fact that memory management is hidden in C++, I find claims that C++ is more memory-safe than C quite hilarious. If it works, it works fine, but if anything goes wrong (which is quite easy to achieve) it's much harder to find the actual problem in C++ than in C.

turminal · on June 22, 2021

Yes, but unlike with realloc and a custom dynamic array, C++ references and smart pointers and containers are half-smart and do all sorts of things on their own. Knowing when they will and when they will not be "smart" makes writing C++ really difficult sometimes.

flyingswift · on June 22, 2021

What is the safe way to achieve the same result?

Kranar · on June 22, 2021

The concept behind this is reference stability, and if you need a collection that has stable references, you must introduce a level of indirection, that is, instead of a vector<T>, you use a vector<unique_ptr<T>> and then you can take references as follows:

    auto& r = *some_vector[0];

owl57 · on June 23, 2021

Or std::deque. It's conceptually similar, but additional allocations are hidden under the hood and batched.

MauranKilom · on June 23, 2021

std::dequeue is as good as useless. The defaults for "batch size" in different compilers are at extreme opposites of the tradeoff spectrum. So unless you really don't care about performance, memory or portability, it's not a datastructure you can rely on.

From memory:

In MSVC, dequeue will allocate memory for every single element if your elements are > 8 bytes. This will never be changed, due to ABI compatibility.

Clang and gcc have batching sizes of 1K and 4K (i.e. you throw out a whole page of memory even if your dequeue contains only 1 element).