It's a huge industry, so a lot. Job is really stressful and has a lot of employee churn, so it's not really something I feel bad about. Pressing elevator buttons was a job too back then
feedback is generated based on evals.
example:
eval: function foo wasn't triggered even though [...]
feedback (exaggerated):
1. change stage prompt
2. change function description
3. add extra instructions to the end of the context
metrics are easy to generalize (e.g. call transfer rate), but baseline is different for each agent, so we're interpreting only the changes, not the absolute values (in the context of self-improvement).
Noisy is ok, but it doesn't work that well when there are multiple clear speakers and not much noise. We are planning to add speaker diarization to address this.
Weird, because it seems like the demo video is pretend data anyway ("Mr. Smith", etc). I agree, I would like to see a more fully-baked demo where you connect it to a testing CRM and a toy order api and get it to answer several customer queries using live information.
This is also fairly common on international bodies' initialisms for which there are multiple official languages, to not favour any one of them.
"ISO" is the International Organization for Standardization, in English, Organisation internationale de normalisation in French, and Международная организация по стандартизации in Russian, its three official languages, as one fairly well-known example.
Recently I've noticed the discrepancy in management of one's business relations and personal relations. You could argue that the latter is more important, but it has nowhere near the multitude of tools and solutions of former. It is nice that Monica tries to fix this.