That's true, and one can bias logits in llama.cpp and friends too, but those are global biases that affect the entire output rather than being specified per-token. Uploading a grammar or a wasm binary to the inference engine does seem more expressive.