More

sergiomattei · 2026-03-08T03:33:45 1772940825

Zipped up tightly? This has been all over mainstream news outlets, social media, everywhere.

We are commenting on a submission linked to the New York Times.

roywiggins · 2026-03-08T03:43:09 1772941389

Compared to, say, the coverage from Ukraine during February 2022, actual information getting out from the ground is sparser. Or the opening "shock and awe" campaign in Iraq in 2003, there were Western and international media in Baghdad reporting on it in real time, shooting video from their hotels:

https://youtu.be/m8KimNtB9HI

The reason why isn't really a mystery: Iran has never been exactly welcoming to Western media, and internet access there was intentionally shut off after the recent protests. There's plenty of coverage- it's front page everywhere- but a paucity of information.

It's all over social media, but hardly any of that is from Iranians in Iran, it's just people outside it like you and me mostly just yapping. Occasionally you'll hear something second-hand from someone with family in Iran who managed some brief connectivity.

jimbo808 · 2026-03-08T03:40:48 1772941248

They can't really not talk about it, it's a world war unfolding. It's going to affect every person alive. But as much as it can be, it is absolutely being mitigated in traditional media.

sergiomattei · 2026-02-25T00:55:31 1771980931

I built my own harness on Elixir/Erlang[0]. It's very nice, but I see why TypeScript is a popular choice.

No serialization/JSON-RPC layer between a TS CLI and Elixir server. TS TUI libraries utilities are really nice (I rewrote the Elixir-based CLI prototype as it was slowing me down). Easy to extend with custom tools without having to write them in Elixir, which can be intimidating.

But you're right that Erlang's computing vision lends itself super well to this problem space.

[1]: https://github.com/matteing/opal

sergiomattei · 2026-02-21T12:33:51 1771677231

No shade, I think it looks cool and will likely use it, but next time maybe disclose that you’re the founder?

_pdp_ · 2026-02-21T12:41:29 1771677689

Good point and I will keep that in mind next time.

I am not a founder of this though. This is not a business. It is an open-source project.

sergiomattei · 2026-02-21T11:05:47 1771671947

This looks nice, congrats on the launch. I'm trying this at work on Monday, I suffer from this problem for tons of MCP tools that are MCP only, not CLI, and completely fill my context window.

First thoughts: it seems the broader community is moving towards Agent Skills as a "replacement" for MCPs to tackle the context pollution problem.

Agent harnesses like Pi don't ship with MCP support as an intentional design choice. MCP servers[0] are being rewritten as pure CLIs in order to support this new scenario.

Thoughts on this?

[0]: https://github.com/microsoft/playwright-cli

aceelric · 2026-02-21T11:57:52 1771675072

Problem with skills is that agents have to load the entire skill for them to use it, or you could break down big skills into multiple smaller ones, which is just a hassle compared to cmcp.

With cmcp you get to use all thr available mcps, while still not bloating context.

sergiomattei · 2026-02-21T10:50:01 1771671001

Papers like these are much needed bucket of ice water. We antropomorphize these systems too much.

Skimming through conclusions and results, the authors conclude that LLMs exhibit failures across many axes we'd find to be demonstrative of AGI. Moral reasoning, simple things like counting that a toddler can do, etc. They're just not human and you can reasonably hypothesize most of these failures stem from their nature as next-token predictors that happen to usually do what you want.

So. If you've got OpenClaw running and thinking you've got Jarvis from Iron Man, this is probably a good read to ground yourself.

Note there's a GitHub repo compiling these failures from the authors: https://github.com/Peiyang-Song/Awesome-LLM-Reasoning-Failur...

vagrantstreet · 2026-02-21T11:57:49 1771675069

Isn't it strange that we expect them to act like humans even though after a model was trained it remains static? How is this supposed to be even close to "human like" anyway

mettamage · 2026-02-21T12:36:43 1771677403

> Isn't it strange that we expect them to act like humans even though after a model was trained it remains static?

An LLM is more akin to interacting with a quirky human that has anterograde amnesia because it can't form long-term memories anymore, it can only follow you in a long-ish conversation.

LiamPowell · 2026-02-21T12:14:26 1771676066

If we could reset a human to a prior state after a conversation then would conversations with them not still be "human like"?

I'm not arguing that LLMs are human here, just that your reasoning doesn't make sense.

hackinthebochs · 2026-02-21T12:47:31 1771678051

Henry Molaison was exactly this.

alansaber · 2026-02-21T14:43:49 1771685029

I mean you can continue to evolve the model weights but the performance would suck so we don't do it. Models are built to an optimal state for a general set of benchmarks, and weights are frozen in that state.

otabdeveloper4 · 2026-02-21T13:31:37 1771680697

> We antropomorphize these systems too much.

They're sold as AGI by the cloud providers and the whole stock market scam will collapse if normies are allowed to peek behind the curtain.

alansaber · 2026-02-21T14:44:50 1771685090

The stock market being built on conjecture? Surely not sir.

throw310822 · 2026-02-21T14:15:01 1771683301

> conclude that LLMs exhibit failures across many axes we'd find to be demonstrative of AGI.

Which LLMs? There's tons of them and more powerful ones appear every month.

alansaber · 2026-02-21T14:40:49 1771684849

True but the fundamental architecture tends not to be radically different, it's more about the training/RL regime

throw310822 · 2026-02-21T14:54:29 1771685669

But the point is that to even start to claim that a limitation holds for all LLMs you can't use empirical results that have been demonstrated only for a few old models. You either have a theoretical proof, or you have empirical results that hold for all existing models, including the latest ones.

simianwords · 2026-02-21T14:08:47 1771682927

Most of the claims are likely falsified using current models. I wouldn’t take many of them seriously.

jibal · 2026-02-21T14:58:39 1771685919

I wouldn't take baseless "likely" claims or the people who make them seriously.

simianwords · 2026-02-21T17:24:57 1771694697

I falsified it on another thread

lostmsu · 2026-02-21T12:40:07 1771677607

https://en.wikipedia.org/wiki/List_of_cognitive_biases

Specifically, the idea that LLMs fail to solve some tasks correctly due to fundamental limitations where humans also fail periodically well may be an instance of the fundamental attribution error.

sergiomattei · 2026-02-21T05:21:09 1771651269

My small agent harness[0] does this as well.

The tasks tool is designed to validate a DAG as input, whose non-blocked tasks become cheap parallel subagent spawns using Erlang/OTP.

It works quite well. The only problem I’ve faced is getting it to break down tasks using the tool consistently. I guess it might be a matter of experimenting further with the system prompt.

[1]: https://github.com/matteing/opal

sergiomattei · 2026-02-19T19:43:56 1771530236

This is probably the most beautiful homepage + docs combo I’ve ever seen. The copy is awesome too. It feels human.

Great work.

cpcloud · 2026-02-21T15:29:24 1771687764

Much appreciated. Definitely A LOT of back and forth on the copy, design was actually very few iterations. Docs of course have their own rule in AGENTS.md, since Claude seems only slightly less likely than your average software engineer to forget about documentation. :)

sergiomattei · 2026-02-19T16:38:12 1771519092

I feel the same way.

> That includes code outside of the happy path, like error handling and input validation. But also other typing exercises like processing an entity with 10 different types, where each type must be handled separately. Or propagating one property through the system on 5 different types in multiple layers.

With AI, I feel I'm less caught up in the minutia of programming and have more cognitive space for the fun parts: engineering systems, designing interfaces and improving parts of a codebase.

I don't mind this new world. I was never too attached to my ability to pump out boilerplate at a rapid pace. What I like is engineering and this new AI world allows me to explore new approaches and connect ideas faster than I've ever been able to before.

perrygeo · 2026-02-19T16:46:21 1771519581

> explore new approaches and connect ideas faster

This is the hidden super power of LLM - prototyping without attachment to the outcome.

Ten years ago, if you wanted to explore a major architectural decision, you would be bogged down for weeks in meetings convincing others, then a few more weeks making it happen. Then if it didn't work out, it feels like failure and everyone gets frustrated.

Now it's assumed you can make it work fast - so do it four different ways and test it empirically. LLMs bring us closer to doing actual science, so we can do away with all the voodoo agile rituals and high emotional attachment that used to dominate the decision process.

empath75 · 2026-02-19T17:08:57 1771520937

I basically just _accidentally_ added a major new feature to one of my projects this week.

In the sense that, I was trying to explain what I wanted to do to a coworker and my manager, and we kept going back and forth trying to understand the shape of it and what value it would add and how much time it would be worth spending and what priority we should put on it.

And I was like -- let me just spend like an hour putting together a partially working prototype for you, and claude got _so close_ to just completely one-shotting the entire feature in my first prompt, that I ended up spending 3 hours just putting the finishing touches on it and we shipped it before we even wrote a user story. We did all that work after it was already done. Claude even mocked up a fully interactive UI for our UI designer to work from.

It's literally easier and faster to just tell claude to do something than to explain why you want to do it to a coworker.

sodapopcan · 2026-02-19T17:02:14 1771520534

That's only because no one understood agile or XP and they've become a "no one actually does that stuff" joke to many. I have first hand experience with prototyping full features in a day or two and throwing the result away. It comes with the added benefit of getting your hands dirty and being able to make more informed decisions when doing the actual implementation. It has always been possible, just most people didn't want to do it.

awepofiwaop · 2026-02-19T16:41:23 1771519283

Are you not concerned that this world is deeply tied to you having an internet connection to one of a couple companies' servers? They can jack up the price, cut you off, etc.

sergiomattei · 2026-02-19T17:10:29 1771521029

Seeing how things are moving, I'm expecting for compute requirements to go down over a longer time horizon, as most technologies do.

I'd rather spend my time preparing for this new world now.

fogzen · 2026-02-19T16:42:09 1771519329

Not going to last long though, at least not professionally. AI will do the spec and architecture too. The LLM will do the entire pipeline between customer or market research to deployment. This is already possible with bug fixes pretty much. And many features too depending on the business.

sarchertech · 2026-02-19T17:06:57 1771520817

It AI gets to that level generally, there won’t be a customer, a market research department, or a software company at all.

But if AI is capable of that it’s not a big step to being capable of doing any white collar job, and we’ll either reorganize our economy completely or collapse.

sergiomattei · 2026-02-19T17:16:30 1771521390

I don't know. LLMs are great at writing code; but you have to have the right ideas to get decent output.

I spend tons of time handholding LLMs--they're not a replacement for thinking. If you give them a closed-loop problem where it's easy to experiment and check for correctness, then sure. But many problems are open-loop where there's no clear benchmark.

LLMs are powerful if you have the right ideas. Input = output. Otherwise you get slop that breaks often and barely gets the job done, full of hallucinations and incorrect reasoning. Because they can't think for you.

sergiomattei · 2026-01-16T22:58:43 1768604323

Dude, this is awesome. El Nuevo Dia on the Wii is peak bori brain. :)

rucury · 2026-01-17T00:18:55 1768609135

Thanks, Sergio! Appreciate it. Was definitely fun getting it all working and seeing that familiar logo pop up on the Wii of all places!

sergiomattei · 2026-01-16T22:50:47 1768603847

Real courageous from that guy calling someone a "fat uncle" on a Twitter thread. Could've applied that same energy IRL and told him to tone it down.