3D-Stacked CMOS Takes Moore's Law to New Heights

nabla9 · on Sept 24, 2024

With CMOS the heat puts hard limits for every type of stacking (transistor layers in a chip, or stacking CMOS chips).

In DRAM tech you can stack several (4-8 at least) HMC memory layers on top of one logic layer.

SSD is moving from 100+ towards 1000 layer memory chips.

londons_explore · on Sept 24, 2024

If you know you are thermally limited, there are a bunch of designadjustments you can use to produce less heat. Lower clock speed, more regions powered off at any given time, clock gating, etc.

By being able to stack logic, it will just change the way designs are made. for example, custom silicon to accelerate a specific operation, for example gzip compression, becomes far more attractive because now there is lots of 'spare' silicon area as long as it remains powered down 99% of the time.

nabla9 · on Sept 24, 2024

Silicon area is expensive. And the logic becomes more expensive per area when you stack it. The whole purpose is to extract performance from the chip by making it dense and distances short.

amirhirsch · on Sept 24, 2024

Maximum performance is only one corner for optimization. We do not operate chips at high temperatures in our pocket; the goal is to get low power for the same functionality to have a long battery life, in which case parallelization means doubling the number of transistors while halving clock rates lowering the total power quadratically. It is also not more expensive per area to stack logic: by lowering the area of each die, you increase the yield of each in the stack.

wtallis · on Sept 24, 2024

> We do not operate chips at high temperatures in our pocket;

Sure we do. When you actually hit your phone's processor with a non-trivial workload, the die temperature very quickly spikes to near the safe limits for silicon (ie. 90+ °C). It's only if the workload is sustained for a relatively long time (eg. when gaming) that the heat starts to be conducted to the outside of the phone and drive further throttling of the processor to prevent those surfaces from reaching temperatures that aren't safe for human hands. Even when a phone is heat-soaked enough that it is enforcing skin temperature limits, there's still a much higher temperature at the die.

> It is also not more expensive per area to stack logic: by lowering the area of each die, you increase the yield of each in the stack.

The stacking process itself does not have perfect yield, even when assembling known-good dies. Every layer added increases the risk of the stack as a whole becoming non-functional. 3D stacked DRAM has mostly been limited to single-digit stack heights, despite being the easy case for die stacking: every die is the same, has a more or less identical workload (and thus similar thermal expansion) and uses the same interface with the same TSV locations. Most of that goes out the window if you're trying to make a large stack of various co-processors.

d_tr · on Sept 24, 2024

You also save quite a bit of routing, right?

nabla9 · on Sept 24, 2024

> by lowering the area of each die, you increase the yield of each in the stack.

Nope. The yield is relative to the surface area in layers, not the die surface area.

d_tr · on Sept 24, 2024

You mean you have to stack the chips first to test them, in which case the whole stack would have to be discarded?

gdiamos · on Sept 24, 2024

Is this why there is more attention on PIM - processing in memory?

You don’t burn active power?

petra · on Sept 24, 2024

It's mainly as a way to overcome the bandwidth limits between memory and CPU.

It will increase the heat density of the memory, but there's room to play there. Plus there are some designs that use analog electronics for compute, and it reduces power.

CyberDildonics · on Sept 24, 2024

PIM - processing in memory?

I don't think this is something that actually exists in a working competitive state. I also don't think there is "more attention" by anyone actually working on cpus.

You don’t burn active power?

What does this mean?

gdiamos · on Sept 24, 2024

SSDs are 3D with high stacking. They retain data without burning power, unlike SRAM.

What if you put some of these 3D transistors tightly integrated with a 3D SSD and power gated them?

It seems like you would beat the memory wall.

Obviously manufacturing that is beyond us today.

CyberDildonics · on Sept 24, 2024

It seems like you would beat the memory wall.

I don't know what this is supposed to mean.

Why would putting coprocessing somewhere outside of the CPU be better than just sending data to storage?

gdiamos · on Sept 24, 2024

It takes a lot of energy to move data

CyberDildonics · on Sept 24, 2024

Says who?

Also wouldn't this idea just be moving data before it's fully processed? Why wouldn't the intermediate data be bigger?

Also "a lot" isn't a number and what are you comparing it to?

gdiamos · on Sept 24, 2024

Says the hotchips keynote - https://www.youtube.com/watch?v=rsxCZAE8QNA&t=2480s

- data that is moved is about 100x higher power than data that is local

You have a good point about intermediate data. This would not apply to all algorithms.

CyberDildonics · on Sept 24, 2024

You're talking about processing data, then moving data and processing it somewhere else, which is the opposite of the locality that presentation is saying valuable.

mapt · on Sept 24, 2024

The upper limit of heat extraction is an interesting problem. Intel certainly seems to have bumped into a region with the i7-12900k and later where the chip is heat limited under all load tests even on watercooling, and improving significantly on the best air-coolers requires taking off the protective 'heatspreader' for closer contact with a heatpipe. Those heatpipes seem to be mostly watercooled.

Some kind of supercritical CO2 heatspreader with direct die fluid contact perhaps?

apples_oranges · on Sept 24, 2024

SSD layers: Do you mean each cell is storing around 100 different values or are cells layered on top of each other (each addressed separately etc.)?

throwaway48476 · on Sept 24, 2024

The cells are layered but each cell stores 4 bits in QLC.

AnimalMuppet · on Sept 24, 2024

QLC = "Quad Level Cell", if you're like me and didn't know that.

d_tr · on Sept 24, 2024

Which is a strange name and underselling it btw, since there are 16 levels...

wtallis · on Sept 24, 2024

16 possible voltage levels, but a given memory cell can only exist in one voltage state at a time. The number of bits per cell is what actually matters, even if the names we assign to those types are dumb.

d_tr · on Sept 24, 2024

Sure, I just said that "QLC" basically implies two bits.

giancarlostoro · on Sept 24, 2024

> SSD is moving from 100+ towards 1000 layer memory chips.

Does this increase capacity, performance or both? I'm by no means a hardware guru.

karamanolev · on Sept 24, 2024

Given that this is from 2022 and it says:

> We see RibbonFETs as the best option for higher performance at reasonable power, and we will be introducing them in 2024 along with other innovations, such as PowerVia, our version of backside power delivery, with the Intel 20A fabrication process.

Did they introduce it in 2024? If not, are they still on the roadmap and for what year?

adrian_b · on Sept 24, 2024

The Intel 20A has been cancelled about a week ago. It was planned to be used for only one product, the Intel Arrow Lake H CPUs for laptops, to be launched in Q1 2025. Now, all Intel Arrow Lake CPUs, both for desktops and for laptops, will be made by TSMC, like also Lunar Lake. Intel will do only assembly and testing for them.

This cancellation has been done so that Intel will be able to use all their resources in developing the better 18A CMOS process. Intel hopes that 18A (with RibbonFETs and backside power delivery) will be their salvation.

18A is intended to be used both for Intel products and for the products of other companies, starting in the middle of 2025 (with Panther Lake for laptops and with Clearwater Forrest and Diamond Rapids for servers). Intel claims that they have working samples of Panther Lake laptop CPUs and Clearwater Forrest server CPUs, made with 18A.

According to Intel, the main problem that must be solved with 18A until the middle of 2025 is to improve its fabrication yields, which for now are much worse than for a mature TSMC process, so unless the yields are improved the 18A process would not be competitive in mass production.

While Intel must make serious efforts to catch up with TSMC, Samsung does not appear to be much better than Intel, because they have exactly the same problem, with fabrication yields much worse than TSMC for the processes with equivalent density.

aswanson · on Sept 24, 2024

What odds would you place on intel succeeding with 18A?

adrian_b · on Sept 24, 2024

While I hate many anonymous Intel employees who are responsible for some very ugly actions of Intel in the past, I am strongly rooting for Intel to succeed to make good enough the 18A CMOS process, because there is a desperate need in the electronics industry for more competition, after the excessive consolidation that has happened during the last 2 decades.

Unfortunately, it is completely impossible for an outsider to make any estimate of the likelihood of success of the Intel foundry division.

Based on the available public information, it would seem highly probable for Intel to reach their goals. Nevertheless, there is a history of shameless lies included by Intel employees in their presentations about the 10 nm process, which happened during the many years when Intel had failed to transition from the 14 nm process to the 10 nm process (now rebranded Intel 7). Therefore there is an uncertainty about the truthfulness of the information currently published by Intel.

Intel has never published a post mortem analysis to explain the reasons for their failure to develop the 10 nm process and especially the reasons for the discrepancies between the reality of the 10 nm process and the false information about it, which apparently was not only presented to the public, but also to the other Intel divisions, and perhaps also to the Intel management. While it is said that Intel failed because they delayed the adoption of deep UV photolithography, that explains a very little part of the 10 nm fiasco. It does not explain the great differences between the a priori estimated performance of the process and its actually achieved performance.

Because Intel has not been transparent about the causes of their earlier failures, we cannot know whether those causes have been removed. The current Intel CEO is certainly better than the previous, but there is not enough evidence that he has overcome the organizational inertia of some parts of Intel.

During the many years when Intel had no serious competition, they have become accustomed to increase their profits by not implementing every improvement in their products that they could do, but to partition the possible improvements in many small steps and to implement those steps over many years, adding each year just the minimum that could be marketed as something better than the previous generation, in order to minimize their manufacturing costs and maximize their profit.

For Intel to become competitive again, they should abandon this policy and jump over the intermediate steps to really better products, even if that appears to reduce their profits, because when their products are not being bought, there will be no profits at all.

While AMD will launch next month the Turin server CPUs, which use up-to-date "Zen 5" cores, a couple of months after their introduction in consumer CPUs, Intel will launch next month the Granite Rapids server CPUs, and it has already launched the Sierra Forest server CPUs, both of which use the already obsolete CPU cores that have also been used in Meteor Lake, and which have only minimal differences from the cores of Alder Lake from 2021. Only in late 2025 is Intel expected to launch server CPUs (Clearwater Forest and Diamond Rapids) with cores similar to those of the consumer CPUs (Lunar Lake and Arrow Lake S), which are launched today and one month from now.

Intel needs to streamline somehow their design and verification process, so that they will become able to update all their products more or less at once, like AMD, not doing like today, when they launch new products that are already obsolete, hoping that their customers will buy them anyway.

While I do not believe that the great decrease in the company valuation of Intel is justified, I think that it was a good thing, because it has forced the Intel management to cancel some intermediate steps, which were distractions from their main goals, because they did not provide great enough advances, like the 20A CMOS process or the Arrow Lake S Refresh CPUs (both planned for 2025 products). Hopefully these cancellations will bring closer the products that were planned after them.

wtallis · on Sept 24, 2024

Intel 20A got cancelled. The follow-up 18A was planned to be 20A fleshed out with a full PDK complete with transistor libraries covering high density and high performance variants, so that it could be competitive as a foundry option for non-Intel chip designs. Now all of Intel's hopes are riding on 18A next year, without the intermediate proof of a working 20A.

f_devd · on Sept 24, 2024

Seems they did "introduce" it in 20A this year but implies no commercial use yet, that would be with 18A in 2025[0].

[0]: https://www.intel.com/content/www/us/en/newsroom/opinion/con...

Tuna-Fish · on Sept 24, 2024

20A is supposedly ready for manufacturing, yet:

https://www.tomshardware.com/pc-components/cpus/intel-announ...

Cthulhu_ · on Sept 24, 2024

I've played too much recently, that header image immediately put me in mind of Satisfactory.

HPsquared · on Sept 24, 2024

It's a pretty appropriate game given the subject matter, tbh.

Cthulhu_ · on Sept 24, 2024

The more I read the article the more I'm inclined to agree, especially with connecting all the wiring to the transistor.

svantana · on Sept 24, 2024

(2022)

nmstoker · on Sept 24, 2024

You beat me to it!

high_na_euv · on Sept 24, 2024

In next months we will see if they will deliver it at 18A node

If they manage to do so, the stock price will reflect it

sjg1729 · on Sept 24, 2024

http://kadin.sdf-us.org/weblog/technology/smoking-hairy-golf...

teleforce · on Sept 24, 2024

Need 2022 in the title, a bit confused when first saw the post.