Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder why they chose per minute? That method of rate limiting would seem to defeat their entire value proposition.


In general, with per minute rate limiting you limit load spikes, and load spikes are what you pay for: they force you to ramp up your capacity, and usually you are then slow to ramp down to avoid paying the ramp up cost too many times. A VM might boot relatively fast, but loading a large model into GPU memory takes time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: