Hacker Newsnew | past | comments | ask | show | jobs | submit | genpfault's commentslogin


Yes, I don't even know how I didn't know about this at the time of wiring the article. But a must read for sure!

Nice! Getting ~39 tok/s @ ~60% GPU util. (~170W out of 303W per nvtop).

System info:

    $ ./llama-server --version
    ggml_vulkan: Found 1 Vulkan devices:
    ggml_vulkan: 0 = Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
    version: 7897 (3dd95914d)
    built with GNU 11.4.0 for Linux x86_64
llama.cpp command-line:

    $ ./llama-server --host 0.0.0.0 --port 2000 --no-warmup \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --jinja --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40 --fit on \
    --ctx-size 32768

Super cool! Also with `--fit on` you don't need `--ctx-size 32768` technically anymore - llama-server will auto determine the max context size!

Nifty, thanks for the heads-up!

What am I missing here? I thought this model needs 46GB of unified memory for 4-bit quant. Radeon RX 7900 XTX has 24GB of memory right? Hoping to get some insight, thanks in advance!

MoEs can be efficiently split between dense weights (attention/KV/etc) and sparse (MoE) weights. By running the dense weights on the GPU and offloading the sparse weights to slower CPU RAM, you can still get surprisingly decent performance out of a lot of MoEs.

Not as good as running the entire thing on the GPU, of course.


Thanks to you I decided to give it a go as well (didn't think I'd be able to run it on 7900xtx) and I must say it's awesome for a local model. More than capable for more straightforward stuff. It uses full VRAM and about 60GBs of RAM, but runs at about 10tok/s and is *very* usable.


Because TFA never bothered to define it:

Broadband Network Gateway (BNG)[1]

[1]: https://github.com/codelaboratoryltd/bng#bng-broadband-netwo...


Thanks! "OLT" was also new to me. In case others find it helpful:

> OLT = Optical Line Terminal.

> In ISP fiber (typically GPON/EPON) infrastructure, it’s the provider-side device at the central office/headend that terminates and controls the passive optical network: it connects upstream into the ISP’s aggregation/core network and downstream via fiber (through splitters) to many customers’ ONTs/ONUs, handling PON line control, provisioning, QoS, and traffic aggregation.


Thanks.. was reading the article like WTF is "BNG"


Is it the FTTX equivalent of a BRAS?


Yes, exactly. BRAS is functionally the same as BNG.


So what is BRAS?



tap tap tap tap tap


tap tap tap tap tap tap tap


tap tap tap tap tap tap tap tap tap tap tap


Getting ~150 tok/s on an empty context with a 24 GB 7900XTX via llama.cpp's Vukan backend.


Again, you're using some 3rd party quantisations, not the weights supplied by Nvidia (which don't fit in 24GB).


It was right there[1] in the assembly video.

[1]: https://youtu.be/pcAEqbYwixU?t=1038


> Ctrl+S to save

XOFF ignored, mumble mumble


> triple-backtick code blocks

If only :(


> Seagate sometimes has decent prices on new.

Make sure to check the "annual powered-on hours" entry in the spec sheet though, sometimes it can be significantly less than ~8766 hours.


Probably a good time to mention systemd automount. This will auto mount and unmount drives as needed. You save on your energy bill but the trade off is that first read takes longer as drives need to mount.

You need 2 files, the mount file and the automount file. Keep this or something similar as a skeleton file somewhere and copy over as needed

  # /etc/systemd/system/full-path-drive-name.mount
  [Unit]
  Description=Some description of drive to mount
  Documentation=man:systemd-mount(5) man:systemd.mount(5)

  [Mount]
  # Find with `lsblk -f`
  What=/dev/disk/by-uuid/1abc234d-5efg-hi6j-k7lm-no8p9qrs0ruv
  # See file naming scheme
  Where=/full/path/drive/name
  # https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/storage_administration_guide/sect-using_the_mount_command-mounting-options#sect-Using_the_mount_Command-Mounting-Options
  Options=defaults,noatime
  # Fails if mounting takes longer than this (change as appropriate)
  TimeoutSec=1m

  [Install]
  # Defines when to load drive in bootup. See `man systemd.special`
  WantedBy=multi-user.target


  # /etc/systemd/system/full-path-drive-name.automount
  [Unit]
  Description=Automount system to complement systemd mount file
  Documentation=man:systemd.automount(5)
  Conflicts=umount.target
  Before=umount.target

  [Automount]
  Where=/full/path/drive/name
  # If not accessed for 15 minutes drive will spin down (change as appropriate)
  TimeoutIdleSec=15min

  [Install]
  WantedBy=local-fs.target


Late reply but this gave me a chuckle as a (I guess old) unix guy. Sun had automount in the late 80s and afaik it/autofs/auto.master stuff is largely unchanged (in usage, maybe not in implementation).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: