I've asked Gemini to not use phrases like "final boss" and to not generate summary tables unless asked to do so, yet it always ignores my instructions.
I thought everyone was just using open telemetry traces for this? This is just a classic observability problem that isn’t unique with agents. More important yes, but not unique functionally.
Depends on how data-heavy the work is. We run a bunch of gpu training jobs on other clouds with the data ending up in S3 - the extra transfer costs wrt what we save on getting the gpus from the cheapest cloud available, it makes a lot of sense.
Also, just availability of these things on AWS has been a real pain - I think every startup got a lot of credits there, so flood of people trying to then use them.
It’s so hard to actually benchmark languages because it so much depends on the dataset, I am pretty sure with simdjson and some tricks I could write C++ (or Rust) that could top the leaderboard (see some of the techniques from the billion row challenge!).
tbh for silly benchmarks like this it will ultimately be hard to beat a language that compiles to machine code, due to jit warmup etc.
It’s hard to due benchmarks right, for example are you testing IO performance? are OS caches flushed between language runs? What kind of disk is used etc? Performance does not exist in a vacuum of just the language or algorithm.
With the new GC I really love the trend of understanding memory bandwidth is the bottleneck for many things and the combination of locality and SIMD is a big performance unlock.
you also get a very slimmed down interface that is usually way faster to load. one of the reasons I love HN is that it is super snappy to load and isn’t riddled with dependencies that take forever to load and display. Snappy UIs are always a breath of fresh air.
UIs used to be more responsive on slower hardware, if they took longer then the human reaction time, it was considered unacceptable.
Somewhere along the line we gave up and instead spend our time making skeleton loading animations as enticing as possible to try and stop the user from leaving rather then speeding things up.
It should be noted that OpenAI now has a specific compaction API which returns opaque encrypted items. This is AFAICT different from deciding when to compact, and many open source tools should indeed be inspectable to that regard.
this isn’t the right way to look at it. It’s really server side rendering where the LLM is doing the markup language generation instead of a template. The custom UI is usually higher level. Airbnb has been doing this for years: https://medium.com/airbnb-engineering/a-deep-dive-into-airbn...
reply