We have a GPU/CPU fusion chip that is unremarkable, performant and runs well under linux out of the box. There isn't a lot of novelty in that which is, in itself, pretty remarkable.
Plus, although I can't really swear to understand how these chips work, my read is this is basically a graphics card that can be configured with 64GB of memory. If I'm not misreading that it actually sounds quite interesting; even AMDs hopeless compute drivers might potentially be useful for AI work if enough RAM gets thrown into the mix. Although my burn wounds from buying AMD haven't healed yet so I'll let someone else fund that experiment.
I've done it. I have a GPD Pocket 4 with 64 GB of RAM and the less capable HX 370 Strix Point chip.
Using ollama, hardware acceleration doesn't really work through ROCm. The framework doesn't officially support gfx1151 (Strix Point RDNA 3.5+), though you can override it to fake gfx1150 (Strix Halo, also RDNA 3.5+ and UMA), and it works.
I think I got it to work for smaller models that fit entirely into the preallocated VRAM buffer, but my machine only allows for statically allocating up to 16 GB for the GPU, and where's the fun in that? This is a unified memory architecture chip, I want to be able to run 30+ GB models seamlessly.
It turns out, you can. Just build llama.cpp from source with the Vulkan backend enabled. You can use a 2 GB static VRAM allocation and any additional data spills into GTT which the driver handles mapping into the GPU's address space seamlessly.
You can see a benchmark I performed of a small model on GitHub [0], but I've done up to Gemma3 27b (~21 GB) and other large models with decent performance, and Strix Halo is supposed to have 2-3x the memory bandwidth and compute performance. Even 8b models perform well with the GPU in power saving mode, inside ~8W.
Come to think of it, those results might make a good blog post.
> We have a GPU/CPU fusion chip that is unremarkable, performant and runs well under linux out of the box. There isn't a lot of novelty in that which is, in itself, pretty remarkable.
With this reasoning, I'd probably argue all modern cpus and GPUs aren't particularly remarkable/novel. That could help even be fine.
At the end of the day, these benchmarks are all meant to inform on relative performance, price, and power consumption for end users to make informed decisions (imo). The relative* comparisons are low-key just as important as the new bench data point.
Plus, although I can't really swear to understand how these chips work, my read is this is basically a graphics card that can be configured with 64GB of memory. If I'm not misreading that it actually sounds quite interesting; even AMDs hopeless compute drivers might potentially be useful for AI work if enough RAM gets thrown into the mix. Although my burn wounds from buying AMD haven't healed yet so I'll let someone else fund that experiment.