From my personal experience building a few AI IVR demos with Asterisk in early 2025, testing STT/TTS/inference products from a handful of different vendors, a reliable maximum latency of 2-3 seconds sounds like a definite improvement. Just a year ago I saw times from 3 to 8 seconds even on short inputs rendering short outputs. One half of this is of course over-committed resources. But clearly the executional performance of these models is improving.