It's not moving/compacting because it doesn't need to - it doesn't have fragment...

pcwalton · on Oct 5, 2018

> It's not moving/compacting because it doesn't need to - it doesn't have fragmentation issues.

It's not about fragmentation. It's about allocation throughput.

> Bump allocation is efficient and fast in single threaded programs but almost all Go applications are multi threaded and it would require locks there. Go is using thread local caches for allocation, that's why there is no point in using bump allocation.

No production bump allocator takes locks!

> This was mentioned many times by the Go team.

Yes, and my view is that they're coming to incorrect conclusions.

> Golang have value types (which Java do not have atm, but Valhalla is coming) and allocate on the stack based on escape analysis (which gets better with time).

1. .NET has value types, and in that runtime the generational hypothesis certainly holds and therefore .NET has a generational garbage collector.

2. Java HotSpot has escape analysis too (and the generational hypothesis still holds). It's primarily for SROA and related optimizations, though, not for allocation performance. That's because, unlike Go, HotSpot allocation is already fast.

> This means that adding generational GC would not benefit Go as much as you think.

I have yet to see any evidence that bump allocation in the nursery would not help Go.

> Go GC main focus is very low latency, not throughput and handling huge heaps with a lot of generations is not so easy with HotSpot.

And for most applications, balancing throughput and latency is more desirable than trading tons of throughput for low latency.

> In many cases you need to tune GC in Java because of that.

The default GC in Java HotSpot has one main knob, just as Go does. Actually, the knob is better in Java, because "max pause time" is easier to understand than SetGCPercent.

lossolo · on Oct 5, 2018

> The default GC in Java HotSpot has one main knob

Do you write that from theory or experience? Because from my experience this was never enough. There are actually whole books written about tuning GC in Java.

> I have yet to see any evidence that bump allocation in the nursery would not help Go.

I can revert what you wrote here and it would be true too.

> Yes, and my view is that they're coming to incorrect conclusions.

I will ask again as someone did before. Did you reach to the Go team about your views and explained that you think they are coming to incorrect conclusions? You can do it easily by writing here[1].

I've seen you are committing a lot of time to write those things about Go in multiple submissions. The best way would be to spend this time talking with Go team and showing your view with proper argumentation. This would be better spent time than writing same thing over and over again on HN in my opinion. If you are right (which I doubt, but I can be wrong too, we are only humans after all) it would benefit whole Go community.

1. https://groups.google.com/forum/#!forum/golang-dev

pcwalton · on Oct 5, 2018

I've talked to the devs on Twitter, yes.

I'm more trying to correct misconceptions around GC than anything else.

lossolo · on Oct 5, 2018

> I've talked to the devs on Twitter, yes.

Can you point me to that discussion? I can't seem to find any discussion in which you participate on Twitter about Go GC with Rick Hudson or Austin Clements, Ian Lance Taylor etc.

jashmatthews · on Oct 5, 2018

https://blog.golang.org/ismmkeynote is worth a read. Apparently, the write barrier overhead makes it not really worth it for the Go ecosystem to adopt a generational or moving GC. I think WebKit also switched to a non-moving GC, but retained bump-allocation into empty pages.

pcwalton · on Oct 5, 2018

I'm familiar with that keynote, and I've written about it before. Long story short, the Go developers never tested a generational GC with bump allocation in the nursery. Without that, they haven't given generational GC a fair shake.

Regarding JavaScriptCore, they are constrained by compatibility with native code that takes pointers into the GC heap, particularly iTunes. So the situation is not really comparable. Nevertheless, Filip has done a better job with reducing allocation overhead in JSC than the Go team has, because the "bump'n'pop" fast path is inlined into allocation sites. It's still a throughput loss over a pure bump allocator, because it has a little more bookkeeping to do [1]—but not by much. Note also that JavaScriptCore has—you guessed it—a generational GC [2], though a non-moving one.

If Go doesn't switch to a generational GC, they should at least adopt bump'n'pop.

[1]: https://trac.webkit.org/browser/trunk/Source/JavaScriptCore/...

[2]: https://webkit.org/blog/7122/introducing-riptide-webkits-ret...

jashmatthews · on Oct 6, 2018

> they should at least adopt bump'n'pop.

I wrote a rudimentary hack of bump'n'pop for CRuby, but in that context, it had almost no effect. The real issue for CRuby is all objects larger than a few bytes are malloced and freed but there's a proposal now for: https://bugs.ruby-lang.org/issues/14858

chrisseaton · on Oct 5, 2018

> Bump allocation is efficient and fast in single threaded programs but almost all Go applications are multi threaded and it would require locks there.

If you’re using bump allocation you do it per thread! You don’t do it globally and use a lock!

valarauca1 · on Oct 5, 2018

    Bump allocation is efficient and fast in single threaded
    programs but almost all Go applications are multi
    threaded

This operation doesn't require locks, just an atomic add, and a branch to ensure you aren't allocating past the end of the current "arena" where memory is being allocated. Trivial optimizations are providing thread-local allocation arenas which remove the need for longer pauses as locality is improved (less cache coherence protocol work for the silicon).

OFC these schemes require some kind of relocation, but they make allocating blindingly fast. The only way you get faster is by pre-faulting the arena, and hinting for the _next_ chunk of the arena to be loaded in L1/L2 cache.