I wonder why they chose per minute? That method of rate limiting would seem to d...

p91paul · 2026-01-20T23:56:27 1768953387

In general, with per minute rate limiting you limit load spikes, and load spikes are what you pay for: they force you to ramp up your capacity, and usually you are then slow to ramp down to avoid paying the ramp up cost too many times. A VM might boot relatively fast, but loading a large model into GPU memory takes time.