More

mirsadm · 2026-02-03T07:42:34 1770104554

I use Claude Code a lot but one thing that really made me concerned was when I asked it about some ideas I have had which I am very familiar with. It's response was to constantly steer me away from what I wanted to do towards something else which was fine but a mediocre way to do things. It made me question how many times I've let it go off and do stuff without checking it thoroughly.

physicsguy · 2026-02-03T07:43:42 1770104622

I've had quite a bit of the "tell it to do something in a certain way", it does that at first, then a few messages of corrections and pointers, it forgets that constraint.

embedding-shape · 2026-02-03T10:49:34 1770115774

> it does that at first, then a few messages of corrections and pointers, it forgets that constraint.

Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.

My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.

Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.

physicsguy · 2026-02-03T14:39:13 1770129553

Yes, agreed. I find it interesting that people are saying they're building these huge multi-agent workflows since the projects I've tried it on are not necessarily huge in complexity. I've tried variety of different things re: isntructions files, etc. at this point.

embedding-shape · 2026-02-03T15:08:37 1770131317

So far, I haven't yet seen any demonstration of those kind of multi-agent workflows ending up with code that won't fall down over itself in some days/weeks. Most efforts so far seems to have to been focusing on producing as much code as possible, as fast as possible, while what I'd like to see, if anything, is the opposite of that.

Anytime I ask for demonstration of what the actual code looks like, when people start talking about their own "multi-agent orchestration platforms" (or whatever), they either haven't shared anything (yet), don't care at all about how the code actually is and/or the code is a horrible vibeslopped mess that contains mostly nonsense.

throwdbaaway · 2026-02-03T14:56:28 1770130588

I call this the Groundhog Day loop

embedding-shape · 2026-02-03T15:05:50 1770131150

That's a strange name, why? It's more like a "iterate and improve" loop, "Groundhog Day" to me would imply "the same thing over and over", but then you're really doing something wrong if that's your experience. You need to iterate on the initial prompt if you want something better/different.

ozlikethewizard · 2026-02-03T07:55:11 1770105311

Call me a conspiracy theorist, and granted much of this could be attributed to the fact that the majority of code in existence is shit, but im convinced that these models are trained and encouraged to produce code that is difficult for humans to work on. Further driving and cementing the usage of then when you inevitably have to come back and fix it.

exceptione · 2026-02-03T08:11:26 1770106286

I don't think they would be able to have an LLM withouth the flaws. The problem is that an LLM cannot make a distinction between sense and nonsense in the logical way. If you train an LLM on a lot of sensible material, it will try to reproduce it by matching training material context and prompt context. The system does not work on the basis of logical principles, but it can sound intelligent.

I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.

LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.

CatMustard · 2026-02-03T08:13:59 1770106439

This could be the case even without an intentional conspiracy. It's harder to give negative feedback to poor quality code that's complicated vs. poor quality code that's simple.

Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.

No clue has any research been done into this, just a thought OTTOMH.

trcf23 · 2026-02-03T08:09:16 1770106156

Or it takes a lot of time effort and intelligence to produce good code and IA is not there yet…

Perz1val · 2026-02-03T08:36:01 1770107761

It is a mathematical, averaging model after all

xgb84j · 2026-02-03T08:16:31 1770106591

Mediocre is fine for many tasks. What makes a good software engineer is that he spots the few places in every software where mediocre is not good enough.

mirsadm · 2026-02-02T13:26:23 1770038783

Besides the general awfulness of Windows that you describe, have you looked at C:\Windows recently? It is an unorganised mess with multiple different case styling all over the place. I get this is not that important but I can't help feel it illustrates just how little care is taken behind the scenes. The whole thing seems like a nightmare to deal with.

I had a fresh install of Windows on a new computer which refused to install updates until I ran a bunch of commands in the "terminal". The whole thing is beyond fixing at this point.

direwolf20 · 2026-02-02T14:53:39 1770044019

C:\Windows has always been that way. There's no demand for it to be organised in any way other than unique filenames.

mirsadm · 2026-01-25T22:33:56 1769380436

I've used Claude Code to do the same (large refactor). It has worked fairly well but it tends to introduce really subtle changes in behaviour (almost always negative) which are very difficult to identify. Even worse if you use it to fix those issues it can get stuck in a loop of constantly reintroducing issues which are slightly different leading to fixing things over and over again.

Overall I like using it still but I can also see my mental model of the codebase has significantly degraded which means I am no longer as effective in stopping it from doing silly things. That in itself is a serious problem I think.

soulofmischief · 2026-01-25T23:20:54 1769383254

Yes, if you don't stay on top of things and rule with an iron fist, you will take on tons of hidden tech debt using even Opus 4.5. But if you manage to review carefully and intercede often, it absolutely is an insane multiplier, especially in unfamiliar domains.

mirsadm · 2026-01-22T19:18:48 1769109528

There's never been a case in my long programming career so far where knowing the low level details has not benefited me. The level of value varies but it is always positive.

When you use LLMs to write all your code you will lose (or never learn) the details. Your decision making will not be as good.

cstrahan · 2026-01-22T20:45:26 1769114726

Or you already know all of the details, and you don’t want typing to be the bottleneck to getting things done.

brookst · 2026-01-22T20:35:31 1769114131

This is true.

However, your ability to write specs and validate requirements before starting to build will increase.

It’s just trading deep hand-on expertise for deep product/spec expertise.

No different than how riding the bus all the time instead of driving results in different skill development (assuming productive time on the bus).

mirsadm · 2026-01-23T06:51:36 1769151096

I think there is a big difference. You could and should have both knowledge. This applies to whether you're a lowly programmer or a CEO. Knowing the details will always help you make better decisions.

brookst · 2026-01-26T19:36:25 1769456185

That’s the credo I’ve lived my life by, but I’ve come to believe it’s not entirely true: knowing the details can lead to ratholes and blurring requirements / solutions / etc. Some of the best execs I’ve met are good precisely because they focus on the business layer, and delegate / rely on others to abstract out the details.

I can’t do that. But I’m coming around to the value in it.

theshrike79 · 2026-01-23T13:12:02 1769173922

I've seen cases in my career where people knowing the low level things is actually a hindrance.

They start to fight the system, trying to optimise things by hand for an extra 2% of performance while adding 100% of extra maintenance cost because nobody understands their hand-crafted assembler or C code.

There will always be a place for people who do that, but in the modern world in most cases it's cheaper to just throw more money at hardware instead of spending time optimising - if you control the hardware.

If things run on customer's devices, then you need the low level gurus again.

iso1631 · 2026-01-23T11:14:31 1769166871

So when you use a compiler to create your assembly you will lose (or never learn) the details.

mirsadm · 2026-01-17T14:26:07 1768659967

I would disagree on the huge boost to productivity but it is a very useful tool.

mirsadm · 2026-01-10T07:23:30 1768029810

What are those settings? I have the same TV as my monitor.

legitronics · 2026-01-10T07:29:40 1768030180

probably something to do the with RGB sub-pixel order/layout being different. https://en.wikipedia.org/wiki/Subpixel_rendering

When the OS assumes correctly what the monitor actually looks like, you get even better text rendering. When it guesses wrong you get a horrible mess.

mirsadm · 2026-01-07T18:51:16 1767811876

They pay people to generate open source libraries? I'd love to see it

mirsadm · 2026-01-06T08:37:32 1767688652

Very unlikely. The reason Garmin watches are successful is because they've carved out an audience (athletes, health and exercise focused). Pebble might have a nice UI but most people would be better off with an Apple Watch or whatever the current flavour of the week is on Android

apparent · 2026-01-06T16:56:28 1767718588

I think a lot of people bought AWs because they seemed like the right thing to get, integrated easily, and were more or less easy to use.

But most people I know who have AWs don't use most of the functionalities they provide. If you went up to 20 random AW wearers and ask them if they would give up a bunch of features they don't use (like the awful Siri assistant) in exchange for 15-30x the battery life, I think a lot of them would say yes.

Add onto that the fact that Pebbles are cheaper than AWs, and I think we're going to see a non-trivial number of people "upgrading" from AWs to Pebbles when the batteries start to degrade.

riversflow · 2026-01-06T18:30:10 1767724210

> like the awful Siri assistant

Ironically, I just talked to all my mates about our Apple Watches, and universally Siri on your wrist for setting timers and replying to messages with voice, completely hands free, was the killer app that everyone agreed on.

Setting a timer is as simple as bringing your wrist to your face and saying the amount of time.

ewoodrich · 2026-01-07T07:28:32 1767770912

I literally only use Siri on my Apple Watch, I’ve only triggered it accidentally on my iPhone and have the hot word disabled on all my other devices. Of course, all I ever use it for is setting timers and alarms on the watch, but still…

mirsadm · 2026-01-06T08:09:38 1767686978

Even the OLED Garmin watches last much longer than a day. My Venu 3 lasts for about 4 days with me running every day.

mirsadm · 2026-01-05T08:20:11 1767601211

You're overestimating people's willingness to write code even if they don't have to do it. Most people just don't want to do it even if AI made is easy to do so. Not sure who you're talking to but most people I know that aren't programmers have zero interest in writing their own software even if they could do it using prompts only.