Hacker Newsnew | past | comments | ask | show | jobs | submit | rockwotj's commentslogin

I have done this integrating wasmtime into a a C++ seastar.io application. I’ll have to write a post about it

Claude is king for agentic workflows right now because it’s amazing at tool calling and following instructions well (among other things)

I've asked Gemini to not use phrases like "final boss" and to not generate summary tables unless asked to do so, yet it always ignores my instructions.

Codex ranks higher for instruction following

I thought everyone was just using open telemetry traces for this? This is just a classic observability problem that isn’t unique with agents. More important yes, but not unique functionally.

Can you explain more how otel traces solve this problem? I don't understand how it's related.

networking costs are so high in AWS I doubt this makes sense

Depends on how data-heavy the work is. We run a bunch of gpu training jobs on other clouds with the data ending up in S3 - the extra transfer costs wrt what we save on getting the gpus from the cheapest cloud available, it makes a lot of sense.

Also, just availability of these things on AWS has been a real pain - I think every startup got a lot of credits there, so flood of people trying to then use them.


It’s so hard to actually benchmark languages because it so much depends on the dataset, I am pretty sure with simdjson and some tricks I could write C++ (or Rust) that could top the leaderboard (see some of the techniques from the billion row challenge!).

tbh for silly benchmarks like this it will ultimately be hard to beat a language that compiles to machine code, due to jit warmup etc.

It’s hard to due benchmarks right, for example are you testing IO performance? are OS caches flushed between language runs? What kind of disk is used etc? Performance does not exist in a vacuum of just the language or algorithm.


> due to jit warmup

I think this harness actually uses JMH, which measures after warmup.


With the new GC I really love the trend of understanding memory bandwidth is the bottleneck for many things and the combination of locality and SIMD is a big performance unlock.

Reminds me of the WAND vs MAXSCORE discussion by turbopuffer: https://turbopuffer.com/blog/fts-v2-maxscore


you also get a very slimmed down interface that is usually way faster to load. one of the reasons I love HN is that it is super snappy to load and isn’t riddled with dependencies that take forever to load and display. Snappy UIs are always a breath of fresh air.


> Snappy UIs are always a breath of fresh air.

UIs used to be more responsive on slower hardware, if they took longer then the human reaction time, it was considered unacceptable.

Somewhere along the line we gave up and instead spend our time making skeleton loading animations as enticing as possible to try and stop the user from leaving rather then speeding things up.


Isn’t codex open source and you can just go read what they do?

I have read the gemini source and it’s a pretty simple prompt to summarize everything when the context window is full


It should be noted that OpenAI now has a specific compaction API which returns opaque encrypted items. This is AFAICT different from deciding when to compact, and many open source tools should indeed be inspectable to that regard.


It's likely to either be an approach like this [0] or something even less involved.

0: https://github.com/apple/ml-clara


Thankfully all these LLM labs are heavily invested in python so this seems like the likely route IMO


Just need to book a long nice walk with one of the CEOs


this isn’t the right way to look at it. It’s really server side rendering where the LLM is doing the markup language generation instead of a template. The custom UI is usually higher level. Airbnb has been doing this for years: https://medium.com/airbnb-engineering/a-deep-dive-into-airbn...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: