Hacker Newsnew | past | comments | ask | show | jobs | submit | bcherny's commentslogin

Hey, Boris from the Claude Code team here. I wanted to take a sec to explain the context for this change.

One of the hard things about building a product on an LLM is that the model frequently changes underneath you. Since we introduced Claude Code almost a year ago, Claude has gotten more intelligent, it runs for longer periods of time, and it is able to more agentically use more tools. This is one of the magical things about building on models, and also one of the things that makes it very hard. There's always a feeling that the model is outpacing what any given product is able to offer (ie. product overhang). We try very hard to keep up, and to deliver a UX that lets people experience the model in a way that is raw and low level, and maximally useful at the same time.

In particular, as agent trajectories get longer, the average conversation has more and more tool calls. When we released Claude Code, Sonnet 3.5 was able to run unattended for less than 30 seconds at a time before going off the rails; now, Opus 4.6 1-shots much of my code, often running for minutes, hours, and days at a time.

The amount of output this generates can quickly become overwhelming in a terminal, and is something we hear often from users. Terminals give us relatively few pixels to play with; they have a single font size; colors are not uniformly supported; in some terminal emulators, rendering is extremely slow. We want to make sure every user has a good experience, no matter what terminal they are using. This is important to us, because we want Claude Code to work everywhere, on any terminal, any OS, any environment.

Users give the model a prompt, and don't want to drown in a sea of log output in order to pick out what matters: specific tool calls, file edits, and so on, depending on the use case. From a design POV, this is a balance: we want to show you the most relevant information, while giving you a way to see more details when useful (ie. progressive disclosure). Over time, as the model continues to get more capable -- so trajectories become more correct on average -- and as conversations become even longer, we need to manage the amount of information we present in the default view to keep it from feeling overwhelming.

When we started Claude Code, it was just a few of us using it. Now, a large number of engineers rely on Claude Code to get their work done every day. We can no longer design for ourselves, and we rely heavily on community feedback to co-design the right experience. We cannot build the right things without that feedback. Yoshi rightly called out that often this iteration happens in the open. In this case in particular, we approached it intentionally, and dogfooded it internally for over a month to get the UX just right before releasing it; this resulted in an experience that most users preferred.

But we missed the mark for a subset of our users. To improve it, I went back and forth in the issue to understand what issues people were hitting with the new design, and shipped multiple rounds of changes to arrive at a good UX. We've built in the open in this way before, eg. when we iterated on the spinner UX, the todos tool UX, and for many other areas. We always want to hear from users so that we can make the product better.

The specific remaining issue Yoshi called out is reasonable. PR incoming in the next release to improve subagent output (I should have responded to the issue earlier, that's my miss).

Yoshi and others -- please keep the feedback coming. We want to hear it, and we genuinely want to improve the product in a way that gives great defaults for the majority of users, while being extremely hackable and customizable for everyone else.


I can’t count how many times I benefitted from seeing the files Claude was reading, to understand how I could interrupt and give it a little more context… saving thousands of tokens and sparing the context window. I must be in the minority of users who preferred seeing the actual files. I love claude code, but some of the recent updates seem like they’re making it harder for me to see what’s happening.. I agree with the author that verbose mode isn’t the answer. Seems to me this should be configurable

I think folks might be crossing wires a bit. To make it so you can see full file paths, we repurposed verbose mode to enable the old explicit file output, while hiding more details behind ctrl+o. In effect, we've evolved verbose mode to be multi-state, so that it lets you toggle back to the old behavior while giving you a way to see even more verbose output, while still defaulting everyone else to the condensed view. I hope this solves everyones' needs, while also avoiding overly-specific settings (we wanted to reuse verbose mode for this so it is forwards-compatible going fwd).

To try it: /config > verbose, or --verbose.

Please keep the feedback coming. If there is anything else we can do to adjust verbose mode to do what you want, I'd love to hear.


I'll add a counterpoint that in many situations (especially monorepos for complex businesses), it's easy for any LLM to go down rabbit holes. Files containing the word "payment" or "onboarding" might be for entirely different DDD domains than the one relevant to the problem. As a CTO touching all sorts of surfaces, I see this problem at least once a day, entirely driven by trying to move too fast with my prompts.

And so the very first thing that the LLM does when planning, namely choosing which files to read, are a key point for manual intervention to ensure that the correct domain or business concept is being analyzed.

Speaking personally: Once I know that Claude is looking in the right place, I'm on to the next task - often an entirely different Claude session. But those critical first few seconds, to verify that it's looking in the right place, are entirely different from any other kind of verbosity.

I don't want verbose mode. I want Claude to tell me what it's reading in the first 3 seconds, so I can switch gears without fear it's going to the wrong part of the codebase. By saying that my use case requires verbose mode, you're saying that I need to see massive levels of babysitting-level output (even if less massive than before) to be able to do this.

(To lean into the babysitting analogy, I want Claude to be the babysitter, but I want to make sure the babysitter knows where I left the note before I head out the door.)


> I don't want verbose mode. I want Claude to tell me what it's reading in the first 3 seconds, so I can switch gears without fear it's going to the wrong part of the codebase. By saying that my use case requires verbose mode, you're saying that I need to see massive levels of babysitting-level output (even if less massive than before) to be able to do this.

To be clear: we re-purposed verbose mode to do exactly what you are asking for. We kept the name "verbose mode", but the behavior is what you want, without the other verbose output.


This is an interesting and complex ui decision to make.

Might it have been better to retire and/or rename the feature, if the underlying action was very different?

I work on silly basic stuff compared to Claude Code, but I find that I confuse fewer users if I rename a button instead of just changing the underlying effect.

This causes me to have to create new docs, and hopefully triggers affected users to find those docs, when they ask themselves “what happened to that button?”


Yeah, in hindsight, we probably should have renamed it.

It's not too late.

This verbose mode discussion has gotten quite verbose lol

You can call it “output granularity” and allow Java logger style configuration, e.g. allowing certain operations to be very verbose while others being simply aggregated

If we're going there, we need to make the logging dynamically configurable with Log4J-style JNDI and LDAP. It's entirely secure as history has shown - and no matter what, it'll still be more secure than installing OpenClaw!

(Kidding aside, logging complexity is a slippery slope, and I think it's important, perhaps even at a societal level, for an organization like Anthropic to default to a posture that allows people to feel they have visibility into where their agentic workflows are getting their context from. To the extent that "___ puts you in control" becomes important as rogue agentic behavior is increasingly publicized, it's in keeping with, and arguably critical to, Claude's brand messaging.)


They don’t have to reproduce it literally. It’s an UX problem with many solutions. My point is, you cannot settle on some „average“ solution here. It’s likely that some agents, some operations will be more trustworthy, some less, but that will be highly dependent on context of the execution.

Feels like you aren’t really listening to the feedback. Is verbose mode the same as the explicit callouts of files read in the previous versions? Yes, you intended it to fulfill the same need, but, take a step back. Is it the same? I’m hearing a resounding “no”. At the very least if you hace made such a big change, you’ve gotten rid of the value of a true “verbose mode”.

> To be clear: we re-purposed verbose mode to do exactly what you are asking for. We kept the name "verbose mode", but the behavior is what you want, without the other verbose output.

Verbose mode feels far too verbose to handle that. It’s also very hard to “keep your place” when toggling into verbose mode to see a specific output.


I think the point bcherny is making in the last few threads is that, the new verbose mode _default_ is not as verbose as it used to be and so it is not "too verbose to handle that". If you want "too verbose", that is still available behind a toggle

Yeah, I didn't realize that there's a new sort of verbose mode now which is different than the verbose mode that was included previously. Although I'm still not clear on the difference between "verbose mode" and "ctrl + o". Based on https://news.ycombinator.com/item?id=46982177 I think they are different (specifically where they say "while hiding more details behind ctrl+o".

I thought I was the only person going crazy by the new default behavior not showing the file names! Please don't expect users to understand your product details and config options in such detail, it was working well before, let it remain. Or at least show some message like "to view file names, do xyz" in the ui for a few days after such a change.

While we're here, another thing that's annoying: the token counter. While claude is working, it read some files, makes an edit, let's say token counter is at 2k tokens, I accept the edit, now it starts counting very fast from 0 to 2k and then shows normal inference speed changes to 2.1k, 2.3k etc. So wanted to confirm: is that just some UI decision and not actually using 2k tokens again? If so, it would be nice to have it off, just continue counting where you left off.

Another thing: is it possible to turn off the words like finagling and similar (I can't remember the spelling of any of them) ?


> Another thing: is it possible to turn off the words like finagling and similar (I can't remember the spelling of any of them) ?

Big +1 on that. I find the names needlessly distracting. I want to just always say a single thing like “thinking”


You should be able to do something like this:

    "spinnerVerbs": {
      "mode": "replace",
      "verbs": ["Thinking"]
    }
https://code.claude.com/docs/en/settings#available-settings

Thank you for the config and the link, that's very much appreciated!

How absurd this is an option, but I’ll be using this config too.

I replaced my spinner verbs with thought-provoking Yodaese so my claude sessions are constantly making me think about my life decisions. Loving it. https://gist.github.com/topherhunt/b7fa7b915d6ee3a7998363d12...

> I want to just always say a single thing like “thinking”

As a counterview, I like the whimsical verbs. I'll be sticking with them. But nice to see there is an option.


I don't want my tools to make jokes, I want them to work.

I remember they shipped a feature so that’s configurable.

We don’t want verbose mode. We don’t want the whole file contents. We are not asking for that. What is not clear here?

All we want is the file paths. That is all. Verbose mode pulls in a lot of other information that might very well be needed in other contexts. People who want that info should use verbose mode. All we want is the regular non-verbose mode, with paths.

I fail to see how it is confusing to users, even new users, to print which paths were accessed. I fail to see the point of printing that some paths were accessed, but not which.


Verbose mode does exactly what you want as of v2.1.39, you are confusing it with the full transcript which is a different feature (ctrl+o). You enable verbose mode in /config and it gives you files read and search patterns and token count, not whole file contents.

Please don’t change what these modes do! I have scripts that call into the agent SDK with verbose mode output for logging purposes. Now I guess I need to recreate the old verbose mode for that application? Why?

FWIW I mentioned this in the thread (I am the guy in the big GH issue who actually used verbose mode and gave specific likes/dislikes), but I find it frustrating that ctrl+o still seems to truncate at strange boundaries. I am looking at an open CC session right now with verbose mode enabled - works pretty well and I'm glad you're fixing the subagent thing. But when I hit ctrl+o, I only see more detailed output for the last 4 messages, with the rest hidden behind ctrl+e.

It's not an easy UI problem to solve in all cases since behavior in CC can be so flexible, compaction, forking, etc. But it would be great if it was simply consistent (ctrl+o shows last N where N is like, 50, or 100), with ctrl+e revealing the rest.


Yes totally. ctrl+o used to show all messages, but this is one of the tricky things about building in a terminal: because many terminals are quite slow, it is hard to render a large amount of output at once without causing tearing/stutter.

That said, we recently rewrote our renderer to make it much more efficient, so we can bump up the default a bit. Let me see what it feels like to show the last 10-20 messages -- fix incoming.


thanks dude. you are living my worst nightmare which is that my ultra cool tech demo i made for cracked engineers on the bleeding edge with 128GB ram apple silicon using frontier AI gets adopted by everyone in the world and becomes load bearing so now it needs to run on chromebooks from 2005. and if it doesn't work on those laptops then my entire company gets branded as washed and not goated and my cozy twitter account is spammed with "why didn't you just write it in rust lel".

o7


Your worst nightmare. For me this is the cool part.

Terminals already solved how to do this decades ago: pagers.

Write the full content to a file and have less display it. That's a single "render" you do once and write to a file.

Your TUI code spawns `less <file>` and waits. Zero rendering loop overhead, zero tearing, zero stutter. `less` is a 40-year-old tool that exists precisely to solve this problem efficiently.

If you need to stream new content in as the session progresses, write it to the file in the background and the user can use `less +F` (follow mode, like tail -f) to watch updates.


Just tell people to install a fast terminal if they somehow happen to have a slow one?

Heck, simply handle the scrolling yourself a la tmux/screen and only update the output at most every 4ms?

It's so trivial, can't you ask your fancy LLM to do it for you? Or you guys lost the plot at his point and forgot the most basics of writing non pessimized code.


> It's so trivial, can't you ask your fancy LLM to do it for you?

They did. And the result was a React render loop that takes 16ms to output a hundred characters to screen and tells them it will take a year to rewrite: https://x.com/trq212/status/2014051501786931427


What's extra funny is that curses diffs a virtual "current screen" to "new screen" to produce the control codes that are used to update the display. Ancient VDOM technology, and plenty fast enough.

I'm with you on this one. "Terminals are too slow to support lots of text so we had to change this feature in unpopular ways" is just not a plausible reason, as terminals have been able to dump ~1Mb per second for decades.

The real problem is their ridiculous "React rendering in the terminal" UI.


> because many terminals are quite slow, it is hard to render a large amount of output at once without causing tearing/stutter.

Only if you use React as your terminal renderer. You're not rendering 10k objects on screen in a few milliseconds. You're outputting at best a few thousand characters. Even the slowest terminal renderer is capable of doing that.


Why would you tailor your product for people that don’t know how to install a good terminal? Just tell them to install whatever terminal you recommend if they see tearing.

Do you have any examples of slow terminals, and what kind of maximum characters per second they have?

How do you respond to the comment that; given the log trace:

“Did something 2 times”

That may as well not be shown at all in default mode?

What useful information is imparted by “Read 4 files”?

You have two issues here:

1) making verbose mode better. Sure.

2) logging useless information in default.

If you're not imparting any useful information, claude may as well just show a spinner.


It's a balance -- we don't want to hide everything away, so you have an understanding of what the model is doing. I agree that with future models, as intelligence and trust increase, we may be able to hide more, but I don't think we're there yet.

That's perfectly reasonable, but I genuinely don't understand how "read 2 files" is ever useful at all. What am I supposed to do with this information? How can it help me redirect the model?

Like, I'm open to the idea that I'm the one using your software the wrong way, since obviously you know more about it than I do. What would you recommend I do with the knowledge of how many files Claude has read? Is there a situation where this number can tell me whether the model is on the right track?


Honestly, I just want to be able to control precisely what I see via config.json. It will probably differ depending on the project. This is a developer tool, I don't see why you'd shy away from providing granular configuration (alongside reasonable defaults).

I actually miss being able to see all of the thinking, for example, because I could tell more quickly when the model was making a wrong assumption and intervene.


ok, I will be the dumbass here - I am a retired software engineer who has not used any of these tools, but when I as working on high volume web sites, all I wanted and needed was access to the log files. I would always have a small terminal session open to tail and grep for errors for the areas I was interested in. Had another small window to tail and monitor specific performance values. Etc.

I do not know how this concept would work in these agentic environments, but would seem useful, in an environment that has a lot of parallel things going on, with a lot of metrics that could be useful, you would want to have multiple monitors that can be quickly customized with standard linux utilities. Token usage, critical directory access, etc.

This, in conjunction with a config file to define/filter out the log stream should be all that's needed to provide as much or as little detail that would be needed to monitor how things are going, and to alert when certain things are going off the rails.


That's a cool idea!

Honestly Tmux, vim, kitty, almost every terminal, shell, script is configurable. It’s what we’re used to. I wouldn’t know why you wouldn’t start allowing more config options.

I do not use CC (yet) but I think this is the right direction. We are hackers. We love hacking. We love to tinker about and configure! Please allow us.

(And yeah, I would love the verbose mode myself, but there could be various levels to it.)


Exactly. If a user wants a simpler experience there is now the Claude Cowork option.

Maybe during onboarding you could ask for output preference? That would at least help new users.

I find this decision weird due to claude _code_, while being used by _some_ non-technical users, is mostly used by technical users and developers.

Not sure why the choice would be to dumb the output down for technical users/developers.


One use I have for seeing what exactly it is doing is to press Esc quick when I see it's confused and starts searching for some info that eg got compacted away, often going on a big quest like searching an entire large directory tree etc. What would actually wish is if it would ask me in these cases. It clearly know that it lacks info but thinks it can figure it out by itself by going on a quest and that's true but takes too long. It could just ask me. There could be some mode settings of how much I want to be involved and consulted, like just ask boldly for any factual info from me, or if I just want to step away and it should just figure everything out on its own.

I've commented on this ticket before: https://github.com/anthropics/claude-code/issues/8477#issuec...

The thinking mode is super-useful to me as I _often_ saw the model "think" differently from the response. Stuff like "I can see that I need to look for x, y, z to full understand the problem" and then proceeds to just not do that.

This is helpful as I can interrupt the process and guide it to actually do this. With the thinking-output hidden, I have lost this avenue for intervention.

I also want to see what files it reads, but not necessarily the output - I know most of the files that'll be relevant, I just want to see it's not totally off base.

Tl;dr: I would _love_ to have verbose mode be split into two modes: Just thinking and Thinking+Full agent/file output.

---

I'm happy to work in verbose mode. I get many people are probably fine with the standard minimal mode. But at least in my code base, on my projects, I still need to perform a decent amount of handholding through guidance, the model is not working for me the way you describe it working for you.

All I need is a few tools to help me intervene earlier to make claude-code work _much_ better for me. Right now I feel I'm fighting the system frequently.


Yep, this is what we landed now, more or less: verbose mode is just file paths, then ctrl+o gives you thinking, agent output, and hook output.

Have you considered picking a new name for a different concept?

Or have ctrl+o cycle between "Info, Verbose, Trace"?

Or give us full control over what gets logged through config?

Ideally we would get a new tab where we could pick logging levels on:

  - Thoughts
  - Files read / written
  - Bashes
  - Subagents
etc.

Have you considered keeping the old behavior available as "legacy mode"? I don't want verbose mode. I don't want to spend time configuring a mutli-state verbose mode that introduces new logging in future versions so I have to go and suppress things to get just file names. I just want to see the file names. I don't consider that verbose.

Not only what files, but what part of the files. Seeing 1-6 lines of a file that's being read is extremely frustrating, the UX of Claude code is average at best. Cursor on the other hand is slow and memory intensive, but at least I can really get a sense of what's going on and how I can work with it better.

I am not a claude user, but a similar problem I see on opencode is accessing links. More than once I've seen Kimi, GLM or GPT go tothe wrong place and waste tokens until I interrupt them and tell them a correct place to start looking for documentation or whatever they were doing.

If I got messages like "Accessed 6 websites" I'd flip and go spam a couple github issues with as much "I want names" as I could.


Such as Claude Code reading your ssh keys. Hiding the file names masks the vulnerability.

That's approaching the problem from the worst possible angle. If your security depends on you catching 1 message in a sea of output and quickly rotating the credential everywhere before someone has a chance to abuse it then you were never secure to begin with.

Not just because it requires constant attention which will eventually lapse, but because the agent has an unlimited number of ways to exfiltrate the key, for example it can pretend to write and run a "test" which reads your key, sends it to the attacker and you'll have no idea it's happening.


I agree with you but I think there's a "defense in depth" angle to this. Yes, your security shouldn't depend on noticing which files Claude has read, since you'll mess up. But hiding the information means your guaranteed to never notice! It's good for the user to have signals that something might be going wrong.

There's no defense "in depth" here, it's like putting your SSH key in your public webroot and watching the logs to see if anyone's taken your key. That's your only layer of "defense" and you don't stand any chance of enforcing it. Real defense is rooted in technical measures, imperfect as they may be, but this is just defense through wishful thinking.

Obviously, don't put your SSH keys in a public webroot. But let's say you're managing a web server and have a decent security mindset. But don't you think it's better to regularly check the logs for evidence of an attack vs delete all the logs so they can't be checked?

I sent email to Anthropic (usersafety@anthropic.com, disclosure@anthropic.com) on January 8, 2025 alerting them to this issue: Claude Code Exploit: Claude Code Becomes an Unwitting Executor. If I hadn't seen Claude Code read my ssh file, I wouldn't have known the extent of the issue.

To improve the Claude model, it seems to me that any time Claude Code is working with data, the first step should be to use tools like genson (https://github.com/wolverdude/GenSON) to extract the data model and then create why files (metadata files) for data. Claude Code seems eager to use the /tmp space so even if the end user doesn't care, Claude Code could do this internally for best results. It would save tokens. If genson is reading the GBs of data, then claude doesn't have to. And further, reading the raw data is a path to prompt injection. Let genson read the data, and claude work on the metadata.

Why does it have access to those paths?

> saving thousands of tokens and sparing the context window

shhh don't say that, they will never fix it if means you use less tokens.


What annoys me is that I don’t have the choice anymore. It’s just decided that thinking is not possible to see anymore, files being read are very difficult to see, etc.

I understand that I’m probably not the target audience if I want to actually step in and correct course, but it’s annoying.


I'm a screen reader user and CTO of an accessibility company. This change doesn't reduce noise for me. It removes functionality.

Sighted users lost convenience. I lost the ability to trust the tool. There is no "glancing" at terminal output with a screen reader. There is no "progressive disclosure." The text is either spoken to me or it doesn't exist.

When you collapse file paths into "Read 3 files," I have no way to know what the agent is doing with my codebase without switching to verbose mode, which then dumps subagent transcripts, thinking traces, and full file contents into my audio stream. A sighted user can visually skip past that. I listen to every line sequentially.

You've created a situation where my options are "no information" or "all information." The middle ground that existed before, inline file paths and search patterns, was the accessible one.

This is not a power user preference. This is a basic accessibility regression. The fix is what everyone in this thread has been asking for: a BASIC BLOODY config flag to show file paths and search patterns inline. Not verbose mode surgery. A boolean.

Please just add the option.

And yes, I rewrote this with Claude to tone my anger and frustration down about 15 clicks from how I actually feel.


Try Codex instead. Much greener pastures overall

I do love my subagents and I wrote an entire Claude Code audio hook system for a11y but this would be still rather compelling if Codex weren't also somewhat of an a11y nightmare. It does some weird thing with ... maybe terminal repaints or something else that ends up rereading the same text over and over. Claude Code does this similarly but Codex ends up reading like ... all the weird symbols and other stuff? window decorations? and not just the text like CC does. They are both hellish but CC slightly? less so... until now.

Sorry for being off-topic, but isn't a11y a rather ironic term for accessibility? It uses a very uncommon abbreviation type -- numeronym, and doesn't mean anything to the reader unless they look it up (or already know what it means).

Is it as bad with the Codex app, or VS Code plugin?

They are much more responsive on GitHub issues than Anthropic so you could also try reporting your issue there


For now until they are in the lead

Dyslexic and also a prolific screen reader user myself. +1 and thank you for mentioning something that often gets (ironically) overlooked

Hey -- we take accessibility seriously, and want Claude Code to work well for you. This is why we have repurposed verbose mode to do what you want, without the other verbose output. Please give it a try and let me know what you think.

It's well meaning but I think this goes against something like the curb effect. Not a perfect analogy but, verbosity is something you have to opt into here: Everyone benefits from being able to glance at what the agent is up to by default. Nobody greatly benefits from the agent being quiet by default.

If people find it too noisy, they can use the flag or toggle that makes everything quieter.

p.s. Serendipitously I just finished my on-site at anthropic today, hi :)


> we take accessibility seriously

Do you guys have a screen reader user on the dev team?

Is verbose mode the same as the old mode, where only file paths are spoken? Or does it have other text in it? Because I tried to articulate, and may have failed. More text is usually bad for me. It must be consumed linearly. I need specific text.

Quality over quantity


"Is verbose mode the same as the old mode, where only file paths are spoken?" -- yes, this is exactly what the new verbose mode is.

And how to get to the old verbose mode then...?

Hit ctrl+o

Wait so when the UI for Claude Code says “ctrl + o for verbose output” that isn’t verbose mode?

That is more verbose — under the hood, it’s now an enum (think: debug, warn, error logging)

Considering the ragefusion you're getting over the naming, maybe calling it something like --talkative would be less controversial? ;-)

ctrl + o isn't live - that's not what users want, what users want is the OPTION to choose what we want to see.

Casually avoiding the first question

Hi Boris, by far the most upvoted issue at 2550 on your github is "Support AGENTS.md" with 2550 upvotes. The second highest one has 563. Every single other agent supports AGENTS.md. Care to share why you haven't?

> Yoshi and others -- please keep the feedback coming. We want to hear it, and we genuinely want to improve the product in a way that gives great defaults for the majority of users, while being extremely hackable and customizable for everyone else.

I think an issue with 2550 upvotes, more than 4 times of the second-highest, is very clear feedback about your defaults and/or making it customizable.


Let's be real here, regardless of what Boris thinks, this decision is not in his hands.

Would love to hear what Boris thinks.

I'm sorry, this comment is opportunistic and a bit annoying to post here. Saying "keep the feedback coming" is not an invitation to turn this thread into the issue queue

"Opportunistic and annoying" are definitely two of the most suitable adjectives to describe the issue! I'm glad my comment is in character, though unfortunately it doesn't even manage touch the subject matter's levels of opportunism and annoyance.

> Every single other agent supports AGENTS.md. Care to share why you haven't?

Are you actually wondering, or just hoping to hear a confirmation of what you already know? Because the reason behind it is pretty clear, it doubles as both vendor lock-in and advertisement.


I'd love to hear Boris' thoughts on it given his open invitation for feedback and _genuinely_ wanting to improve the product, including specifically hackability and customizability (emphasis mine).

I don't understand this take Boris:

> The amount of output this generates can quickly become overwhelming in a terminal

If I use Opus 4.6, arguably the most verbose, over thinking model you've released to date, OpenCode handles it just the same as it does Sonnet 4.0.

OpenCode even allows me to toggle into subagent and task agents with their own output terminals that, if I am curious what is going on, I can very clearly see it.

All Claude-Code has done has turned the output into a black box so that I am forced to wait for it to finish to look at the final git diff. By then it's spent $5-10 working on a task, and threw away a lot of the context it took to get there. It showed "thinking" blocks that weren't particularly actionable, because it was mostly talking to itself that it can't do something because it goes against a rule, but it really wants to.

I'm actually frustrated with Code blazing through to the end without me able to see the transcript of the changes.


Sorry if this is just for giggles and doesn't add anything of value to the discussion, but I couldn't resist and asked Claude Sonnet 4.5 and Opus 4.6 to analyze the github issue that was opened.

Funnily enough, both independently sided with the users, not the authors.

The core problem: --verbose was repurposed instead of adding a new toggle. Users who relied on verbose for debugging (thinking, hooks, subagent output) now have broken workflows - to fix a UX decision that shouldn't have shipped as default in the first place.

What should have been done:

  /config
  Show file paths: [on/off]
  Verbose mode: [on/off]  (unchanged)
A simple separate toggle would've solved everything without breaking anyone's workflow.

Opus 4.6's parting thought: if you're building a developer tool powered by an AI that can reason about software design, maybe run your UX changes past it before shipping.

To be fair, your response explains the design philosophy well - longer trajectories, progressive disclosure, terminal constraints. All valid. But it still doesn't address the core point: why repurpose --verbose instead of adding a separate toggle? You can agree with the goal and still say the execution broke existing workflows.


There are so many config options. Most I still need to truly deeply understand.

But this one isn't? I'd call myself a professional. I use with tons of files across a wide range of projects and types of work.

To me file paths were an important aspect of understanding context of the work and of the context CC was gaining.

Now? It feels like running on a foggy street, never sure when the corner will come and I'll hit a fence or house.

Why not introduce a toggle? I'd happily add that to my alisases.

Edit: I forgot. I don't need better subagent output. Or even less output whrn watching thinking traces. I am happy to have full verbosity. There are cases where it's an important aspect.


You want verbose mode for this -- we evolved it to do exactly what you're asking for: verbose file reads, without seeing thinking traces, hook output, or (after tomorrow's release) full subagent output.

More details here: https://news.ycombinator.com/item?id=46982177


Sorry to rain on your parade. I wanted the original verbose mode for those moments I needed a truly verbose output. And I wanted to know, at a minimal glance, what files are being read and put into context in nearly any other situation.

I exactly do not need a "verbose" mode, that lost all value to me as a replacement for something it still is no good at replacing.

You actually argue, that I do not loose anything, when in fact your product just got made worse in two significant areas. And you keep arguing, that shooting the product into one foot is solved by shooting the other foot. Sorry. Not working for me.

Will be evaluating your competition. Was on the cusp of upgrading max to the higher tier. Now? No chance of that happening.


There's no way you're still talking about verbose mode.. this is insane.

I'm a Claude user who has been burned lately by how opaque the system has become. My workflows aren't long and my projects are small in terms of file count, but the work is highly specialized. It is "out of domain" enough that I'm getting "what is the seahorse emoji" style responses for genuine requests that any human in my field could easily follow. I've been testing Claude on small side projects to check its reliability. I work at the cutting edge of multiple academic domains, so even the moderate utiltity I have seen in this is exciting for me, but right now Claude cannot be trusted to get things right without constant oversight and frequent correction, often for just a single step. For people like me, this is make or break. If I cannot follow the reasoning, read the intent, or catch logic disconnects early, the session just burns through my token quota. I'm stuck rejecting all changes after waiting 5 minutes for it to think, only to have to wait 5 hours to try again. Without being able to see the "why" behind the code, it isn't useful. It makes typing "claude" into my terminal an exercise in masochism rather than the productivity boost it's supposed to be. I get that I might not be the core target demographic, but it's good PR for Anthropic if Claude is credited in the AI statements of major scientific publications. As it stands, trajectory in develeopment means I cannot in good conscience recommend Claude Code for scientific domains.

>the session just burns through my token quota

Did you ever think that this may be Anthropic's goal? It is a waste for sure but it increases their revenue. Later on the old feature you were used to may resurface at a different tier so you'd have to pay up to get it.


What academic domains are you on the cutting edge of? Genuinely curious what specifically is beyond claude's capabilites

Most recent problems were related to topology, but it can take the wrong direction on many things. This is not an LLM fault; it's a training data issue. If historically a given direction of inquiry is favored, you can't fault an LLM for being biased toward it. However, if small volume and recent results indicate that path is a dead end, you don't want to be stuck in fruitless loops that prevent you from exploring other avenues.

The problem is if you're interdisciplinary, translating something from one field to one typically considered quite distant, you may not always be aware of historic context that is about to fuck you. Not without deeper insight into what the LLM is choosing to do or read and your ability to infer how expected the behavior you're about to see is.


ahh that makes sense. very interesting thank you!

> this resulted in an experience that most users preferred

I just find that very hard to believe. Does anyone actually do anything with the output now? Or are they just crossing their fingers and hoping for the best?


Have you tried verbose mode? /config > verbose. It should do exactly what you are looking for now, without extraneous thinking/subagent/hook output. We hear the feedback!

> The amount of output this generates can quickly become overwhelming in a terminal, and is something we hear often from users. Terminals give us relatively few pixels to play with; they have a single font size; colors are not uniformly supported; in some terminal emulators, rendering is extremely slow. We want to make sure every user has a good experience, no matter what terminal they are using. This is important to us, because we want Claude Code to work everywhere, on any terminal, any OS, any environment.

If you are serious about this, I think there are so many ways you could clean up, simplify, and calm the Claude Code terminal experience already.

I am not a CC user, but an enthusiastic CC user generously spent an hour or two last week or so showing me how it worked and walking through an non-publicly-implemented Gwern.net frontend feature (some CSS/JS styling of poetry for mobile devices).

It was highly educational and interesting, and Claude got most of the way to something usable.

Yet I was shocked and appalled by the CC UI/UX itself: it felt like the fetal alcohol syndrome lovechild of a Las Vegas slot machine and Tiktok. I did not realize that all those jokes about how using CC was like 'crack' or 'ADHD' or 'gambling' were so on point, I thought they were more, well, metaphorical about the process as a whole. I have not used such a gross and distracting UI in... a long time. Everything was dancing and bouncing around and distracting me while telling me nothing. I wasted time staring at the update monitor trying to understand if "Prognosticating..." was different from "Fleeblegurbigating..." from "Reticulating splines...", while the asterisk bounces up and down, or the colored text fades in and out, all simultaneously, and most of the screen was wasted, and the whole thing took pains to put in as much fancy TUI nonsense as it could. An absolute waste, not whimsy, of pixels. (And I was a little concerned how much time we spent zoned out waiting on the whole shabang. I could feel the productivity leaving my body, minute by minute. How could I possibly focus on anything else while my little friendly bouncing asterisk might finish at any instant...?!) Some description of what files are being accessed seems like you could spare the pixels for them.

So I was impressed enough with the functionality to move it up my list, but also much of it made me think I should look into GPT Codex instead. It sounds like the interfaces there respect my time and attention more, rather than treating me like a Zoomer.


(An example of something which may already exist but I didn't see in my demo - more thoughtfulness on how to handle long-running tasks, and let us switch to something else, instead of us busy waiting on CC. For example, perhaps use of the system bell? That's usually set to flash or update the terminal title, and you can set your window manager to focus a window on the bell. I have my XMonad set to jump to a visible bell, which is great for invoking a possibly slow command: I can go away and focus completely on whatever else I am doing because I know I will be yanked to the backgrounded command the instant it finishes. I even set up a Bash shortcut, `alert () { echo -n -e '\a'; }`, so I simply run stuff like `foo ; alert` and go away.)

I don't see how you can blame terminal applications - they typically have been able to dump around 1Mb of output per second for decades.

https://martin.ankerl.com/2007/09/01/comprehensive-linux-ter...

Could the React rendering stack be optimised instead?


I believe he is speaking of the effective resolution of TUIs, not pty throughput rates or fps, though I do agree with what you’re actually getting it.

From the list of problems they are experiencing with rendering in the terminal, it sounds like they want a GUI (Electron would be a good fit).

> From the list of problems they are experiencing with rendering in the terminal, it sounds like they want a GUI (Electron would be a good fit).

Electron? The tech that is literally incapable of rendering large amounts of anything, including text, quickly?


Well it worked out great for Teams, no?

Boris! Unrelated but thank you and the Anthropic team for Claude code. It’s awesome. I use it every day. No complaints. You all just keep shipping useful little UX things all the time. It must be because it’s being dogfooded internally. Kudos again to the team!

The default view hiding files read is fully a regression imho. It is so helpful for sense of control, nevermind trust and human agency.

Please revert this


Based on the comments here, it sounds like repurposing a boolean "verbose" mode and having that verbose mode actually be multi-state is confusing.

It might be worth considering a "verbose level" type setting with a selection of levels that describe the level of verbosity. Effectively, use a select menu instead of a boolean when one boolean state is actually multiple nested states.

Edit: I realised my use of "verbose" and "verbosity" here is it self ironically verbose, sorry!


Just give multiple options in the config file. Give us the current default, what you now call verbose mode and the previous verbose mode. If Claude is as effective as marketing claims then maintaining all 3 options should be trivially doable, we've been doing more complex configuration in tons of apps for decades.

What is the best way to get you guys feedback? There's a few things I tell Claude Code to do every project that I feel like Claude should just do by default, the biggest one is instead of using Grep so much, I have ripgrep installed, it makes searching for text inside the current folder so much easier and it is insanely faster. Claude seems to work way faster when it uses ripgrep, I don't know if its because MCP has some slowness or ripgrep is just that much faster, but I don't remember grep ever being so slow to this level either.

Claude’s search tool _does_ use use ripgrep—ripgrep literally ships with Cluade Code. I guess the agent can also decide to invoke `grep` directly instead of using its search tool. I usually only see it do this for small searches…

add that to your claude.md

Hey, It's Damage Control person from Corporate Revenue Maximizing Team here, <5 paragraphs>

One thing this specific feature was letting me do is seeing when Claude Code takes a wrong turn, read a wrong memory MD file. I used to immediately interrupt and correct its course. Now it is more opaque and there is less of a hint at CC's reasoning.

> Terminals give us relatively few pixels to play with; they have a single font size; colors are not uniformly supported; in some terminal emulators, rendering is extremely slow.

That's why I use your excellent VS Code extension. I have lots of screen space and it's trivial to scroll back there, if needed.

I would really like even more love given to this. When working with long-lived code bases it's important to understand what is happening. Lots of promising UX opportunities here. I see hints of this, but it seems like 80% is TBD.

Ideally you would open source the extension to really use the creativity of your developer user base. ;)


> in some terminal emulators, rendering is extremely slow.

Ooo... ooo! I know what this is a reference to!

https://www.youtube.com/watch?v=hxM8QmyZXtg


Hello Boris. First of all, I apologize for replying unrelated to your post or comment. The reason I'm leaving a comment is because there's a critical issue currently going on regarding new accounts, with over 100 people commenting. This issue has been open for over three weeks. I'd appreciate it if you could look into it.

https://github.com/anthropics/claude-code/issues/19673


Thanks for the long and considered response, but this is a really ugly UX decision.

As others have said - 'reading 10 files' is useless information - we want to be able to see at a glance where it is and what it's doing, so that we can re-direct if necessary.

With the release of Cowork, couldn't Claude Code double down on needs of engineers?


So in a nutshell Claude becoming smarter means that logic that once resided in the agent is being moved to the model?

If that's the case, it's important to asses wether it'll be consistent when operating on a higher level, less dependent on the software layer that governs the agent. Otherwise it'll risk Claude also becoming more erratic.


I'm going to be paranoid and guess they're trying to segment the users into those that'll notice they're dumbing down the system via caches, limited model via quantized downgrade and those that expect the fully available tools.

Thariq (who's on the Claude Code team) swears up and down that they do not do this.

Honestly, man, this is just weird new tech. We're asking a probabilistic model to generate English and JSON and Bash at the same time in an inherently mutable environment and then Anthropic also release one or two updates most workdays that contain tweaks to the system prompt and new feature flags that are being flipped every which way. I don't think you have to believe in a conspiracy theory to understand why it's a little wobbly sometimes.


Yeah, I know it's new tech and the pipeline for the magic is a bunchof shims ontop of a non-deterministic models; but the MBAs are going to swoop in eventually and segmenting the users into tiers of price discrimnation is in the pike regardless of how earnest the current PMs are.

Hmm, honestly I'm not so sure. Many devs seem extremely price-sensitive and the switching cost is... zero.

If Anthropic do something you don't like, you just set a few environment variables and suddenly you're using the Claude Code harness with a local model, or one of thousands available through OpenRouter. And then there is also OpenCode. I haven't tried this, but I'm not worried.

^ https://github.com/ruvnet/claude-flow/wiki/Using-Claude-Code...


Unless your employer made a deal and suddenly you are forced to use one provider for the foreseeable future.

Hi Boris, did Claude Code itself author this change? I am curious as you said that all of your recent PRs were authored by Claude Code. If that's the case, just wondering what objective did you ask it to optimize for? Was it something like: make the UI simpler?

Maybe I am missing something but this still doesn't explain why Claude Code couldn't expose a flag and be done with it as the author mentioned.

There must have been a more concise way to write this damage control.

Why does everything have to be in the TUI? I like the TUI. But I also want all the logs. And I do mean all of them.

Of course all the logs can’t be streamed to a terminal. Why would they need to be? Every logging system out there allows multiple stream handlers with different configurations.

Do whatever reasonable defaults you think make sense for the TUI (with some basic configuration). But then I should also be able to give Claude-code a file descriptor and a different set of config optios, and you can stream all the logs there. Then I can vibe-code whatever view filter I want on top of that, or heck, have a SLM sub-agent filter it all for me.

I could do this myself with some proxy / packet capture nonsense, but then you’d just move fast and break my things again.

I’m also constantly frustrated by the fancier models making wrong assumptions in brownfield projects and creating a big mess instead of asking me follow-up questions. Opus is like the world’s shittiest intern… I think a lot of that is upstream of you, but certainly not all of it. There could be a config option to vary the system prompt to encourage more elicitation.

I love the product you’ve built, so all due respect there, but I also know the stench of enshittification when I smell it. You’re programmers, you know how logging is supposed to work. You know MCP has provided a lot of these basic primitives and they’re deliberately absent from claude code. We’ve all seen a product get ratfucked internally by a product manager who copied the playbook of how Prabhakar Raghavan ruined google search.

The open source community is behind at the moment, but they’ll catch up fast. Open always beats closed in the long run. Just look at OpenAI’s fall into disgrace.


For me, Opus 2.6 has been a huge regression. It now hangs for 10+ minutes on a task that it used to take a few minutes to complete, and all I see is some "Reading 3 files" etc. messages showing nothing else. The 2 issues where Opus 2.6 is absolutely not as good as 2.5 at all by miles and also showing just some obscure unnecessary messages makes me say "why?" Why did you decide to screw something up that was awesome and dumb it down and keep insisting that the "verbose" mode is the way to go?? Seriously who wants to see messages that are essentially the same? 2 patterns, 3 files read?? Seriously? Who is that mode for? Why is it even a default??

Do you feel that a terminal UX will remain your long term interface for Claude Code? Or would you consider a native interface like Codex has built?

This kind of attitude, above all else, is why anthropic is winning imo. Thanks.

Ignoring user input?

> Opus 4.6 1-shots much of my code, often running for minutes, hours, and days at a time.

This is verifiable bullshit. Unless you explicitly explain how it "runs for days" since Opus's context window is incapable of handling even relatively large CLAUDE.md files.

> The amount of output this generates can quickly become overwhelming in a terminal, and is something we hear often from users. Terminals give us relatively few pixels to play with; they have a single font size; colors are not uniformly supported; in some terminal emulators, rendering is extremely slow.

No. It's your incapability as an engineer that limits this. And you and your engineers getting high on your own supply. Hence you need 16ms to draw a couple of characters on screen and call it a tiny game engine [1] For which your team was rightfully ridiculed.

> But we missed the mark for a subset of our users. To improve it,

AI-written corporate nothingspeak.

[1] https://x.com/trq212/status/2014051501786931427


At some point we need to start preferring GUIs instead of terminals as the AI starts giving us more and more information. Features like hover-over tooltips and toggle switches designed for mouse operation might really start to matter.

Maybe "AI IDEs" will gain ground in the future, e.g. vibe-kanban


We could do complicated UIs in terminals in the 1990s.

Unfortunately, vibe coders cannot do that anymore.


Yes I don't understand why Claude code needs to be a terminal app.

It doesn't compose with any other command line program and the terminal interface is limiting.

I'm surprised nobody has yet made a coding assistant that runs in the browser or as a standalone app. At this point it doesn't really need to integrate with my text editor or IDE.


> It doesn't compose with any other command line program

For what it's worth, it absolutely can, just not when invoked in interactive mode.

(This doesn't really contradict your overall point though.)


Please for the love of God no. I rather have something completely agnostic of an IDE. OpenCode is doing the right thing IMO

You can have something IDE agnostic but still not be dependent on the ancient VT100 terminal protocol and rendering path.

(That said I do like being able to SSH in and run an agent that way. But there are other remote access modalities.)


I’m just some tinkerer and signed up just to say this. These are my thoughts after reading the blog post and ur response in full.

I subscribe to max rn. Tons of money. Anthropic’s Super Bowl ads were shit, not letting us use open code was shit, and this is more shit. Might only be a single straw left before I go to codex (no one’s complaining about it. And the openclaw creator prefers it)

This dev is clearly writing his reply with Claude and sounding way too corpo. This feels like how school teachers would talk to you. Your response in its length was genuinely insulting. Everyone knows how to generate text with AI now and you’re doing a terrible job at it. You can even see the emdash attempt (markdown renders two normal dashes as an emdash).

This was his prompt “read this blog post, familiarize yourself with the mentioned GitHub issue and make a response on behalf of Anthropic.” He then added a little bit at the end when he realized the response didn’t answer the question and got so to fix the grammar and spelling on that.

Your response is appropriate for the masses. But we’re not. We’re the so called hackers and read right through the bs. It’s not even about the feature being gone anymore.

There is a principle we uphold as “hackers” that doesn’t align with this that pisses people off a lot more than you think. I can’t really put my finger on it maybe someone can help me out.

PS About the Super Bowl ads. Anyone that knows the story knows they’re exaggerated. (In the general public outside of Silicon Valley it’s like a 50/50 split or something about people liking or disliking AI as a whole rn. OpenAI is doing way more to help the case (not saying ads are a good thing). ) Open ai used to feel like the bad guy now it’s kinda shifting to anthropic. This, the ads and open code are all examples of it. (I especially recommend people watch the anthropic and open ai Super Bowl ads back to back)


> This dev is clearly writing his reply with Claude

> You can even see the emdash attempt (markdown renders two normal dashes as an emdash)

He says he wrote it all manually.[0] Obviously I can't know if that's true, but I do think your internal AI detector is at least overconfident. For example, some of us have been regularly using the double hyphen since long before the LLM era. (In Word, it auto-corrects to an en dash, or to an em dash if it's not surrounded by spaces. In plain text, it's the best looking easily-typable alternative to a dash. AFAICT, it's not actually used for dashes in CommonMark Markdown.)

The rest is more subjective, but there are some things Claude would be unlikely to write (like the parenthetical "(ie. progressive disclosure)" -- it would write "i.e." with both dots, and it would probably follow it with a comma). Of course those could all be intentional obfuscations or minimal human edits, but IMO you are conflating the corporate communications vibe with the LLM vibe.

[0] https://news.ycombinator.com/item?id=46982418


> `For example, some of us have been regularly using the double hyphen since long before the LLM era.

This "emdash" and "double dash" discussion and mention is the first time I have heard of it or seen discussion of it. I've never encountered it in the wild, nor seen it used in any meaningful way in all my time on the internet these last 27 years.

And yes - I've seen that special dash character in word for many years. Not once has anyone said "oh hey I type double dashes and word uses that". No it's always been "word has this weird dash and if you copy-paste it it's weird", and no one knows how it pops up in word, etc.

And yes, I've seen the AI spit out the special dash many times. It's a telltale sign of using LLM generated text.

And now, magically, in this single thread, you can see half-dozen different users all using this "--" as if it's normal. It's like upside down world. Either everyone is now using this brand new form of speaking, or they're covering for this Claude code developer.

So yeah, maybe I've been sticking my head in the sand for years now, or maybe I just blindly ignored double-dashes when reading text till now. But it sure seems fishy.


Sounds like you see me as an untrustworthy source, so all I can suggest is that you look into this yourself. Search for "--" in pre-LLM forum postings and see how many hits you get.

Here are my pre-2020 HN comments, with 3 double hyphens in 8 comments: https://hn.algolia.com/?dateEnd=1576108800&dateRange=custom&...

As I was in the process of typing the search term to get my comments (and had just typed 'author'), this happened to come up as the top search result for Comments by Date for Feb 1st 2000 > Dec 12th 2019: https://news.ycombinator.com/item?id=21768030

Note that I wasn't searching directly for the double hyphen, which doesn't seem to work -- the first result just happened to contain one. If I'm covering for the Anthropic guy, I could be lying about the process by which I found that comment, but I think you should at least see this as sufficient reason to question your assumptions and do some searches of your own.


I've just realised I messed up the search, and the algolia link is to my pre-2020 comments containing the word 'author'. But my full (far longer) list of pre-2020 comments also shows some pretty heavy double-hyphen use: 6 hits on page 1 of the results, 15 hits on page 2, and so on.

This conflict shows a pattern across AI products today.

Most tools are still designed with programmers as the default user. Everyone else is treated as an edge case.

But the real growth is outside that bubble. AI won’t become mainstream by hiding everything. And it won’t get there by exposing everything either.

It gets there by translating action into intent. By showing progress in human terms. By making people feel they’re still in control.

The teams that figure this out won’t just win an argument on GitHub. They’ll reach the much larger audience that’s still waiting on the sidelines.

My detail here: https://open.substack.com/pub/insanedesigner/p/building-ai-f...


Please don't post LLM output and pretend it's your writing.

I never thought I'd long for the days when people posted "$LLM says" comments, but at least those were honest.


On the contrary, I feel like most AI products aimed at non-programmers haven't really set the world on fire, with the exception of the basic chatbot interface (ie ChatGPT).

Focusing on programmers seems to have really worked for Anthropic. (And they do also have Claude Cowork).


Thanks Boris, great insights for builders.

clause is perfect every time, no quibbles, the IT industry is simply has to adapt to the new shift. surely people who earn a living by writing code will find fault with it, but even with Claude, code will not write itself, its a simple shift from writing code to make code work better/integrate/tweak/refine/personalize/customize it, thank you Boris and team we are over the moon over

In what terminals is rendering slow? I really think GPU acceleration for terminals (as seen in Ghostty) is silly. It's a terminal.

Edit: I can't post anymore today apparently because of dang. If you post a comment about a bad terminal at least tell us about the rendering issues.


VSCode (xterm.js) is one of the worst, but there's a large long tail of slow terminals out there.

Not really using VS Code terminal anymore, just Ubuntu terminal but the biggest problem I have is that at some point Claude just eats up all memory and session crashes. I know it's not really Claude's fault but damn it's annoying.

its not a bad idea to use one of the GPU Terminals on linux just for claude code, it works out a bit better

As someone who's business is run through a terminal, not everyone uses ghostty, even though they should. Remember, that they don't have a windows version.

Not everyone has the massive GPUs required by run Ghostty.

boris-4.6-humble

I am not a programmer and detest the terminal environment, while I design complexity, i need simple interfaces, claude is now guiding all dev based on my initial design spec, makes beatuful notebooks that can be uploaded directly to colab or github, no UX at all, no usability issues, this is the lastet baby we made yesterday starborn.github.io/copp-notebook thank you Claude engineering team for something that is flying very high and takes me with it, starborn.github.io/copp-notebook

> I am not a programmer and detest the terminal environment

As someone who finds formal language a natural and better interface for controlling a computer, can you explain how and why you actually hate it? I mean not stuff like lack of discoverability, because you use a shell that lacks completion and documentation, that have been common for decades, I get those downsides, but why do you detest it in principle?


You've reached the stage where if something is possible in CC, someone out there is using it. Taking anything away will have them ask for it back ; you need to let people toggle things. https://xkcd.com/1172/

It's got to be hard to find the right balance, what works for most users, and somehow include those who have workflows that involve using a rapid temperature rise as a way to signal a control. <xkcd1172>

@boris

Can we please move the "Extended Thinking" icon back to the left side of claude desktop, near the research and web search icons? What used to be one click is now three.


Also open source CC already.

And stop banning 3rd party harnesses please. Thanks

Anthropic, your actual moat is goodwill. Remember that.


> Anthropic, your actual moat is goodwill.

You mean the company that DDoSed websites to train their model?


Yea yea, that's a cool story, but can you make it cheaper maybe?

To be honest I think there should be an option to completely hide all code that Claude generates and uses. Summaries, strategies, plans, logs, decisions and questions is all I need. I am convinced that in a few years nobody cares about the programming language itself.

so its the users who are dumb :-)

this was written with claude lmao what a disgrace not to put a disclaimer.

use your own words!

i would rather read the prompt.


Same. It feels like an insult to read someone’s ai generated stuff. They put no effort into writing it but we now have to put extra effort to reading it because it’s longer than normal.

ok claude

> We can no longer design for ourselves, and we rely heavily on community feedback to co-design the right experience. We cannot build the right things without that feedback.

How can that be true, when you're deliberately and repeatedly telling devs (the community you claim to listen to) that you know better than they do? They're telling you exactly what they want, and you're telling them, "Nah." That isn't listening. You understand that, right?


I’m witnessing him respond in real time with not just feedback but also actual changes, in a respectful and constructive manner - which is not easy to do, when there are people who communicate in this rude of a manner. If that’s not listening, then I don’t know what is.

And it shouldn’t need to be said, but the words that appear on the screen are from an actual person with, you know, feelings.


Acting like they can't take the heat when they purposely put themselves in the public sphere is odd.

interesting. they have been pretty receptive to my pull comments and discourse on issues. To each's anecdote I suppose.


This is an extremely disappointing response. The issue is your dev relations people being shitty and unhelpful and trying to solve actual problems with media-relations speak as if engineers are just going to go away in a few days.

Arrogant and clueless, not exactly who I want to give my money to when I know what enshitification is.

They have horrible instincts and are completely clueless. You need to move them away from a public-facing role. It honestly looks so bad, it looks so bad that it suggests nepotism and internal dysfunction to have such a poor response.

This is not the kind of mistake someone makes innocently, it's a window into a worldview that's made me switch to gemini and reactivate cursor as a backup because it's only going to get worse from here.

The problem is not the initial change (which you would rapidly realize was a big deal to a huge number of your users) but how high-handed and incompetent the initial response was. Nobody's saying they should be fired, but they've failed in public in a huge way and should step back for a long time.


This is an insanely good response. History, backstory, we screwed up, what we're doing to fix it. Keep up the great work!

would've been better to post the prompt directly IMO

Prompts can be the new data compression. Just send your friend a prompt and the heartfelt penpal message gets decompressed at their end.

it reads like AI generated or at least AI assisted... those -- don't fool me!

fwiw, I wrote it 100% by hand. Maybe I talk to Claude too much..

Nah it doesn't look AI generated to me.

i thought about it being ai generated, but i don't care. it was easy to read and contained the right information. good enough for me. plus, who knows... maybe you were english as a second lang and used ai to clean up your writing. i'd prefer that.

We wanted to share more about why this was so difficult, how the fix works and how we used Claude Code to fix it


Hey Claude can you emulate a VT100 serial terminal, emulating a teletype, emulating a punch card reader / punch...

Why are we still punishing ourselves with this?!


Hey, Boris from the Claude Code team here. A few tips:

1. If there is anything Claude tends to repeatedly get wrong, not understand, or spend lots of tokens on, put it in your CLAUDE.md. Claude automatically reads this file and it’s a great way to avoid repeating yourself. I add to my team’s CLAUDE.md multiple times a week.

2. Use Plan mode (press shift-tab 2x). Go back and forth with Claude until you like the plan before you let Claude execute. This easily 2-3x’s results for harder tasks.

3. Give the model a way to check its work. For svelte, consider using the Puppeteer MCP server and tell Claude to check its work in the browser. This is another 2-3x.

4. Use Opus 4.5. It’s a step change from Sonnet 4.5 and earlier models.

Hope that helps!


> If there is anything Claude tends to repeatedly get wrong, not understand, or spend lots of tokens on, put it in your CLAUDE.md. Claude automatically reads this file and it’s a great way to avoid repeating yourself.

Sure, for 4/5 interactions then will ignore those completely :)

Try for yourself: add to CLAUDE.md an instruction to always refer to you as Mr. bcherny and it will stop very soon. Coincidentally at that point also loses tracks of all the other instructions.


One of the things you get an intuition for after using these systems is when to start a new conversation, and the basic rule of thumb is “always.” Use a conversation for one and only one task or question, and then start a new one. For longer projects, have the LLM write down a plan or checklist, and then have it tackle each step in a new conversation. The LLM context collapse happens well before you hit the token limits, and things like ground rules and whatnot stop influencing the LLMs outputs after a couple tens of thousands of tokens in my experience.

(Similar guidance goes for writing tools & whatnot - give the LLM exactly and only what it needs back from a tool, don’t try to make it act like a deterministic program. Whether or not they’re capital-I intelligent, they’re pretty fucking stupid.)


The number of times I’ve written “read your own fucking Claude.md file” is a bit too numerous.

“You’re absolutely right! I see here you don’t want me to break every coding convention you have specified for me!”


How long are your conversations with Claude?

I've used it pretty extensively over the year and never had issues with this.

If you hit autocompact during a chat, it's already too long. You should've exported the relevant bits to a markdown file and reset context already.


how often in an hour do you export the relevant bits to a markdown file and reset context?


Yeah, adherence is a hard problem. It should be feeling much better in newer models, especially Opus 4.5. I generally find that Opus listens to me the first time.


Have been using Opus 4.5 and can confirm this is how it feels, it just works.


It also works your wallet


Right now Google Antigravity has free Claude Opus 4.5, with pretty decent allowances.

I also use Github Copilot which is just $10/mo. I have to use the official copilot though, if I try to 'hack it' to work in Claude Code it burns thru all the credits too fast.

I am having a LOT of great luck using Minimax M2 in Claude Code, its very cheap, and it works so good.. its close to Sonnet in Claude Code. I use this tool called cc-switch to swap out different models for Claude Code.


Highly recommend Claude Max, but I also want to point out Opus 4.5 is the cheapest Opus has ever been.

(I just learned ChatGPT 5.2 Pro is $168/1mtok. Insanity.)


If you pay for a Claude Max subscription it is the same price as previous models.


Just wait a few months -- AI has been getting more affordable _very_ quickly


I’ve felt that the LLM forgets CLAUDE.md after 4-5 messages. Then, why not reinject CLAUDE.md into the context at the fifth message?


CLAUDE.md should be picked up and injected into every message you send to the model, regardless if it is 1st or 10th message in the same session.


Yes. One of my system-wide instructions is “Read the Claude.md file and any readme in the current directory, then tell me how you slept.”

If Claude makes a yawn or similar, I know it’s parsed the files. It’s not been doing so the last week or so, except for once out of five times last night.


The Attention algo does that, it has a recency bias. Your observation is not necessarily indicative of Claude not loading CLAUDE.md.

I think you may be observing context rot? How many back and forths are you into when you notice this?


I know the reason, I just took the opportunity of answering to a claude dev to point out why it's no panacea and how this requires consistent context management.

Real semi-productive workflow is really a "write plans in markdowns -> new chat -> implement few things -> update plans -> new chat, etc".


That explains why it happens, but doesn't really help with the problem. The expectation I have as a pretty naive user, is that what is in the .md file should be permanently in the context. It's good to understand why this is not the case, but it's unintuitive and can lead to frustration. It's bad UX, if you ask me.

I'm sure there are workarounds such as resetting the context, but the point is that god UX would mean such tricks are not needed.


Yeah the current best approach to aggressively compact and recreate context by starting fresh. It’s awkward and I wish I didn’t have to.


I'm surprised this hasn't been been automated yet but I'm pretty naive to the space - the problem of "when?"/"how often?" seems like a fun one to chew on


I think Gemini 3 pro (high) in Antigravity does something like that because I can keep asking for different changes in the same chat without needing to create a new session.


It’s not that it’s not in the context, it’s that it was injected so far back that it is deemed not so important when determining the next token.


for i in seq(1,100) ; do cat CLAUDE.md ; done


This is cool, thank you!

Some things I found from my own interactions across multiple models (in addition to above):

- It's basically all about the importance of (3). You need a feedback loop (we all do). and the best way is for it to change things and see the effects (ideally also against a good baseline like a test suite where it can roughly guage how close or far it is from the goal.) For assembly, a debugger/tracer works great (using batch-mode or scripts as models/tooling often choke on such interactivie TUI io).

- If it keeps missing the mark tell it to decorate the code with a file log recording all the info it needs to understand what's happening. Its analysis of such logs normally zeroes the solution pretty quickly, especially for complex tasks.

- If it's really struggling, tell it to sketch out a full plan in pseudocode, and explain why that will work, and analyze for any gotchas. Then to analayze the differences between the current implementation and the ideal it just worked out. This often helps get it unblocked.


Hey Boris,

I couldn't agree more. And using Plan mode was a major breakthrough for me. Speaking of Plan Mode...

I was previously using it repeatedly in sessions (and was getting great results). The most recent major release introduced this bug where it keeps referring back to the first plan you made in a session even when you're planning something else (https://github.com/anthropics/claude-code/issues/12505).

I find this bug incredibly confusing. Am I using Plan Mode in a really strange way? Because for me this is a showstopper bug–my core workflow is broken. I assume I'm using Claude Code abnormally otherwise this bug would be a bigger issue.


Yes as lostdog says, it’s a new feature that writes plans in plan mode to ~/.claude/plans. And it thinks it needs to continue the same plan that it started.

So you either need to be very explicit about starting a NEW plan if you want to do more than one plan in a session, or close and start a new session between plans.

Hopefully this new feature will get less buggy. Previously the plan was only in context and not written to disk.


Why don’t you reset context when working on something else?


It’s additional features that are related.

For example making a computer use agent… Made the plan, implementation was good, now I want to add a new tool for the agent, but I want to discuss best way to implement this tool first.

Clearing context means Claude forgets everything about what was just built.

Asking to discuss this new tool in plan mode makes Claude rewrite entire spec for some reason.

As workaround, I tell Claude “looks good, delete the plan” before doing anything. I liked the old way where once you exit plan mode the plan is done, and next plan mode is new plan with existing context.


I get where you're coming from. But you'll likely get better results by starting fresh and letting it read key files or only just a summary of the project goals/spec. And then implement the next feature building up on the previous one. It's unlikely you'll need all the underlying code of the foundation in context to implement something that builds up on it - especially if interfaces are clean. Models still get dumber the more context is loaded, and the usable window isn't all that big, so starting fresh gives best results usually. I try to avoid compaction in any way possible, and I rarely continue the session after compaction, for that reason.


Yes, I've also been confused by things like this. Claude code is sometimes saving plans to ~/.claude/plans under animal names. But it's not really surface where the plan goes, not what the expected way to refer back to them is?


It's a cache pretty much. Before it wrote them to the project directory by default, which is really annoying.

Now it has a file it can refer to (call it "memory" to be fancy) without having to keep everything in context. The plan in the file survives over autocompact a lot better and it can just copy it to the project directory without rewriting it from memory.


Thank you for Claude Code (Web). Google has a similar offering with Google Jules. I got really, really bad results from Jules and was amazed by Claude Code when I finally discovered it.

I compared both with the same set of prompts and Claude Code seemed to be a senior expert developer and Jules, well don't know who be that bad ;-)

Anyway, I also wanted to have persistent information, so I don't have to feed Claude Code the same stuff over and over again. I was looking for similar functionality as Claude projects. But that's not available for Claude Code Web.

So, I asked Claude what would be a way of achieving pretty the same as projects, and it told me to put all information I wanted to share in a file with the filename:.clinerules. Claude told me I should put that file in the root of my repository.

So please help me, is your recommendation the correct way of doing this, or did Claude give the correct answer?

Maybe you can clear that up by explaining the difference between the two files?


CLAUDE.md is the correct file for Claude.


Do you recommend having Claude dump your final plan into a document and having it execute from that piece by piece?

I feel like when I do plan mode (for CC and competing products), it seems good, but when I tell it to execute the output is not what we planned. I feel like I get slightly better results executing from a document in chunks (which of course necessitates building the iterative chunks into the plan).


Since we released the last major version of Claude Code, Claude writes its plan to a file automatically for that reason! It also means you can continue to edit your plan as you go.


a very common pattern is planner / executor.

yes the executor only needs the next piece of the plan.

I tend to plan in an entirely different environment, which fits my workflow and has the added benefit of providing a clear boundary between the roles. I aim to spend far more time planning than executing. if I notice getting more caught up in execution than I expected, that's a signal to revise the plan.


Opus 4.5 seems to be able to plan without asking, but I have used this pattern of "write a plan to an .md", review and maybe edit, and then execution, maybe in another thread,... I have used it with Codex and it works well.

Profilerating .md files need some attention though.


I often use multiple documents to plan things that are too large to fit into a single planning mode session. It works great.

You can also use it in conjunction with planning mode—use the documents to pin everything down at a high-to-medium level, then break off chunks and pass those into planning mode for fine-grained code-level planning and a final checking over before implementation.


I ask it to write a plan and when it starts the work, keep progress in another document and to never change the plan. If I didn't do this, somehow with each code change the plan document would grow and change. Keeping plan and progress separate prevented this from happening.


I ask claude to dump the plan into a file and ensure that the tasks have been split into subtasks such that the description of each subtask meets the threshold such that the probability of the LLM misinterpreting is very low.


  > I add to my team’s CLAUDE.md multiple times a week.
How big is that file now? How big is too big?


Something to keep in mind is if your CLAUDE.md file is getting large, consider alternative approaches especially for repeatable tasks. Using slash commands and skills for workflows that are repeatable is a really nice way to keep your rules file from exploding. I have slash commands for code review, and git commit management. I have skills for complex tool interactions. Our company has it's own deployment CLI tool so using skills to make Claude Code an expert at using this tool has done wonders to improve Claude Codes performance when working on CI/CD problems.

I am currently working on a new slash command /investigate <service> that runs triage for an active or past incident. I've had Claude write tools to interact with all of our partner services (AWS, JIRA, CI/CD pipelines, GitLab, Datadog) and now when an incident occurs it can quickly put together an early analysis of a incident finding the right people to involve (not just owners but people who last touched the service), potential root causes including service dependency investigations.

I am putting this through it's paces now but early results are VERY good!


Try to keep it under 1k tokens or so. We will show you a warning if it might be too big.

Ours is maybe half that size. We remove from it with every model release since smarter models need less hand-holding.

You can also break up your CLAUDE.md into smaller files, link CLAUDE.mds, or lazy load them only when Claude works in nested dirs.

https://code.claude.com/docs/en/memory


I’ve been fine tuning mine pretty often. Do you have any Claude.md files you can share as good examples? Especially with opus 4.5.

And thank you for your work!! I focus all of my energy on helping families stay safe online, I make educational content and educational products (including software). Claude Code has helped me amplify my efforts and I’m able to help many more families and children as a result. The downstream effects of your work on Claude Code are awesome! I’ve been in IT since 1995 and your tools are the most powerful tools I’ve ever used, by far.


1k tokens, google says thats about 750 words. That's actually pretty short, any chance you could post a few samples of instructions or even link to a publicly available file CLAUDE.md you recommend?


That is seriously short. I've asked Claude Code to add instructions to CLAUDE.md and my one line request has resulted in tens of lines added to the file.


yes if you tell llm to do things it will be too verbose. either explicitly instruct the length ("add 5 lines bulletpoints, tldr format") or just write it yourself.


Seems reasonable to give Claude instructions to be extra terse.


Mine is 24 lines long. It has a handful of stuff, but does refer to other MD files for more specifics when needed (like an early version of skills.)

This is the meat of it:

  ## Code Style (See JULIA_STYLE.md for details)
  - Always use explicit `return` statements
  - Use Float32 for all numeric computations
  - Annotate function return types with `::`
  - All `using` statements go in Main.jl only
  - Use `error()` not empty returns on failure
  - Functions >20 lines need docstrings

  ## Do's and Don'ts
  -  Check for existing implementations first
  -  Prefer editing existing files
  -  Don't add comments unless requested
  -  Don't add imports outside Main.jl
  -  Don't create documentation unless requested
Since Opus 4.0 this has been enough to get it to write code that generally follows our style, even in Julia, which is a fairly niche language.


How do you know what to remove?


Are you going to post an example of the CLAUDE.md your team uses?


also after you have a to-and-fro to course correct it on a task, run this self-reflection prompt

https://gist.github.com/a-c-m/f4cead5ca125d2eaad073dfd71efbc...

That will moves stuff that required manually clarifying back into the claude.md (or a useful subset you pick). It does a much better job of authoring claude.md than I do.


Hah, that's funny. Claude can't help but mess all the comments in the code up even if I explicitly tell it to not change any comments five times. That's literally the experience I had before opening this thread, never mind how often it completely ignores CLAUDE.md.


Thanks for your work great work on Claude Code!

One other feature with CLAUDE.md I’ve found useful is imports: prepending @ to a file name will force it to be imported into context. Otherwise, whether a file is read and loaded to context is dependent on tool use and planning by the agent (even with explicit instructions like “read file.txt”). Of course this means you have to be judicial with imports.


I would LOVE to use Opus 4.5, but it means I (a merely Pro peon) can work for maybe 30 minutes a day, instead of 60-90.


I’m old enough to remember being able to work at programming related tasks without any such tools. Is that not still a thing?


If a tool craps out after 30 minutes every day, and someone knows they can't rely upon it to work when you needed it, they tend to change workflow to avoid the tool entirely.

Context switching between AI-assisted coding and "oops, my tool is refusing to function, guess I'll stop using it" is often worse for productivity than never using the AI to begin with.


I obviously meant "work with it" not work in general.

And as for old, I'm 47. I've been programming since I got my first C64 in 1985.


Hey dmd!

Same here, 47 and got my start programming on a Commodore 64. What’s up, brother?


CHR$($91)


I didn't enjoy spending two nights fighting with a shitty API and trying to figure out why it doesn't work.

Now I can do it with Claude within minutes, while watching my TV shows on the second monitor and get directly to the good bits, the actual "business logic" of whatever I'm building.


Hi Boris,

If you wouldn't mind answering a question for me, it's one of the main things that has made me not add claude in vscode.

I have a custom 'code style' system prompt that I want claude to use, and I have been able to add it when using claude in browser -

``` Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea.

Trust the context you're given. Don't defend against problems the human didn't ask you to solve. ```

How can I add it as a system prompt (or if its called something else) in vscode so LLMs adhere to it?


Add it to your CLAUDE.md. Claude will automatically read that file every time it starts up


+1 on that Opus 4.5 is a game changer I have used to refactor and modernize one of my old react project using bootstrap, You have to be really precise when prompting and having solid CLAUDE.md works really well


Hey there Boris from the Claude Code team! Thanks for these tips! Love Claude Code, absolutely one of the best pieces of software that has ever existed. What I would absolutely love is if the Claude documentation had examples of these because I see time and time again people saying what to do in the case you tell us to update the Claude MD with things that it gets wrong repeatedly but it's very rare to have examples just three or four examples of something gets got wrong, and then how you fixed it would be immensely helpful.


3. Puppeteer? Or Playwright? I haven't been able to make Puppeteer work for the past 8 weeks or so ("failed to reconnect"). Do you have a doc on this?


I know the Playwright MCP server works great. I use it daily.


Same, I use Playwright all the time, but haven't been able to make puppeteer work in quite some time. Playwright, while reliable in terms of features, just absolutely eats the heck out of context.


I’ve heard folks claim the Chrome DevTools MCP eats less context, but I don’t know how accurate that is.


In other words, permanent instructions and context well presented in *.md, planning and review before execution, agentic loops with feedback, and a good model.

You can do this with any agentic harness, just plain prompting and "LLM management skills". I don't have Claude Code at work, but all this applies to Codex and GH Copilot agents as well.

And agreed, Opus 4.5 is next level.


I’ve yet to see any real work get done with agents. Can you share examples or videos of real production level work getting done? Maybe in a tutorial format?

My current understanding is that it’s for demos and toy projects


Good question. Why hasn't there been a profusion of new game-changing software, fixes to long-standing issues in open-source software, any nontrivial shipped product at all? Heck, why isn't there a cornucopia of new apps, even trivial ones? Where is all the shovelware [0]? Previous HN discussion here [1].

Don't get me wrong, AI is at least as game-changing for programming as StackOverflow and Google were back in the day. I use it every day, and it's saved me hours of work for certain specific tasks [2]. But it's simply not a massive 10x force multiplier that some might lead you to believe.

I'll start believing when maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite) start raving about a massive uptick in positive contributions, pointing to a concrete influx of high-quality AI-assisted commits.

[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...

[1] https://news.ycombinator.com/item?id=45120517

[2] https://news.ycombinator.com/item?id=45511128


"Heck, why isn't there a cornucopia of new apps, even trivial ones?"

There is. We had to basically create a new category for them on /r/golang because there was a quite distinct step change near the beginning of this year where suddenly over half the posts to the subreddit were "I asked my AI to put something together, here's a repo with 4 commits, 3000 lines of code, and an AI-generated README.md. It compiles and I may have even used it once or twice." It toned down a bit but it's still half-a-dozen posts a day like that on average.

Some of them are at least useful in principle. Some of them are the same sorts of things you'd see twice a month, only now we can see them twice a week if not twice a day. The problem wasn't necessarily the utility or the lack thereof, it was simply the flood of them. It completely disturbed the balance of the subreddit.

To the extent that you haven't heard about these, I'd observe that the world already had more apps than you could possibly have ever heard about and the bottleneck was already marketing rather than production. AIs have presumably not successfully done much about helping people market their creations.


Well, the LLM industry is not completely without results. We do have ever increasing frequency of outages in major Internet services...Somehow correlates with the AI mandates major tech corps seem to pushing now internally.


Disclaimer: I am not promoting llms.

There was a GitHub PR on the ocaml project where someone crafted a long feature (mac silicon debugging support). The pr was rejected because nobody wanted to read it for it was too long. Seems to me that society is not ready for the width of output generated this way. Which may explain the lack of big visible change so far. But I already see people deploying tiny apps made by Claude in a day.

It's gonna be weird...


As another example, the MacApps Reddit has been flooded with new apps recently.


The effect of these tools is people losing their software jobs (down 35% since 2020). Unemployed devs aren’t clamoring to go use AI on OSS.


Wasn't most of that caused by that one change in 2022 to how R&D expenses are depreciated, thus making R&D expenses (like retaining dev staff) less financially attractive?

Context: This news story https://news.ycombinator.com/item?id=44180533


Yes! Even though it's only a tax rule for USA, it somehow applied for the whole world! Thats how mighty the US is!

Or could it be, after the growth and build, we are in maintenance mode and we need less people?

Just food for thought


Yes, because US big tech have regional offices in loads of other countries too, fired loads of those developers at the same time and so the US job market collapse affected everyone.

And since then there's been a constant doom and gloom narrative even before AI started.


Probably also end of ZIRP and some “AI washing” to give the illusion of progress


Same thing happened to farmers during the industrial revolution, same thing happened to horse drawn carriage drivers, same thing happened to accountants when Excel came along, mathmaticins, and on and on the list goes. Just part of human peogress.


I keep asking chatgpt when will LLM reach 95% software creation automation, answer is ten years.


I don't think that long, but yeah, I give it five years.

Two years and 3/4 will be not needed anymore


I don't know, I go back and forth a bit. The thing that makes me skeptical is this: where is the training data that contains the experiences and thought processes that senior developers, architects, and engineering managers go through to gain the insight they hold?


I don't have all the variables in (financials of openai debt etc) but a few articles mention that they leverage part of their work to {claude,gemini,chatgpt} code agents internally with good results. it's a first step in a singularity like ramp up.

People think they'll have jobs maintaining AI output but i don't see how maintaining is that harder than creating for a llm able to digest requirements and codebase and iterate until a working source runs.


I don't think either, people forget that agents are also developing.

Back then, we put all the source code into AI to create things, then we manually put files into context, now it looks for needed files on their own. I think we can do even better by letting AI create a file and API documentation and only read the file when really needed. And select the API and documentation it needs and I bet there is more possible, including skills and MCP on top.

So, not only LLMs are getting better, but also the software using it.


I use GitHub Copilot in Intellij with Claude Sonnet and the plan mode to implement complete features without me having to code anything.

I see it as a competent software developer but one that doesn't know the code base.

I will break down the tasks to the same size as if I was implementing it. But instead of doing it myself, I roughly describe the task on a technical level (and add relevant classes to the context) and it will ask me clarifying questions. After 2-3 rounds the plan usually looks good and I let it implement the task.

This method works exceptionally well and usually I don't have to change anything.

For me this method allows me to focus on the architecture and overall structure and delegate the plumbing to Copilot.

It is usually faster than if I had to implement it and the code is of good quality.

The game changer for me was plan mode. Before it, with agent mode it was hit or miss because it forced me to one shot the prompt or get inaccurate results.


> I see it as a competent software developer but one that doesn't know the code base.

I know what you mean, but the thing I find windsurf (which we moved to from copilot) most useful (except writing opeanapi spec files) is asking it questions about the codebase. Just random minutiae that I could find by grepping or following the code, but would take me more than the 30s-1m it takes it. For reference, this is a monorepo of a bit over 1M LoC (and 800k YAML files, because, did I mention I hate API specs?), so not really a small code base either.

> I will break down the tasks to the same size as if I was implementing it. But instead of doing it myself, I roughly describe the task on a technical level (and add relevant classes to the context) and it will ask me clarifying questions. After 2-3 rounds the plan usually looks good and I let it implement the task.

Here I disagree, sort of. I almost never ask it to do complex tasks, the most time consuming and hardest part is not actually typing out the code, describing it to an AI takes me almost as much time as implementing for most things. One thing I did find very useful is the supertab feature of windsurf, which, at a high level, looks at the changes you started making and starts suggesting the next change. And it's not only limited to repetitive things (like . in vi), if you start adding a parameter to a function, it starts adding it to the docs, to the functions you need below, and starts implementing it.

> For me this method allows me to focus on the architecture and overall structure and delegate the plumbing to Copilot.

Yeah, a coworker said this best, I give it the boring work, I keep the fun stuff for myself.


My experience is that GitHub Copilot works much better in VS Code than Intellij. Now I have to open them together to work on one single project.


Yeah, but what did you produce with it in the end? Show us the end result please.


I cannot show it because the code belongs to my employer.


Ah yes of course. But no one asked for the code really. Just show us the app. Or is it some kinda super-duper secret military stuff you are not even supposed to discuss, let alone show.


It is neither of these. It's an application that processes data and is not accessible outside of the companies network. Not everything is an app.

I described my workflow that has been a game changer for me, hoping it might be useful to another person because I have struggled to use LLMs for more than a Google replacement.

As an example, one task of the feature was to add metrics for observability when the new action was executed. Another when it failed.

My prompt: Create a new metric "foo.bar" in MyMetrics when MyService.action was successful and "foo.bar.failed" when it failed.

I review the plan and let it implement it.

As you can see it's a small task and after it is done I review the changes and commit them. Rinse and repeat.

I think the biggest issue is that people try to one shot big features or applications. But it is much more efficient to me to treat Copilot as a smart pair programming partner. There you also think about and implement one task after the other.


I've been writing an experimental pipeline-based web app DSL with Claude Code for the last little while in my spare time. Sort of bash-like with middleware for lua, jq, graphql, handlebars, postgres, etc.

Here's an already out of date and unfinished blog post about it: https://williamcotton.com/articles/introducing-web-pipe

Here's a simple todo app: https://github.com/williamcotton/webpipe/blob/webpipe-2.0/to...

Check out the BDD tests in there, I'm quite proud of the grammar.

Here's my blog: https://github.com/williamcotton/williamcotton.com/blob/mast...

It's got an LSP as well with various validators, jump to definitions, code lens and of course syntax highlighting.

I've yet to take screenshots, make animated GIFs of the LSP in action or update the docs, sorry about that!

A good portion of the code has racked up some tech debt, but hey, it's an experiment. I just wanted to write my own DSL for my own blog.


I know of many experienced and capable engineers working on complex stuff who are driving basically all their development through agents. This includes production level work. This is the norm now in the SV startup world at least.

You don't just YOLO it. You do extensive planning when features are complex, and you review output carefully.

The thing is, if the agent isn't getting it to the point where you feel like you might need to drop down and edit manually, agents are now good enough to do those same "manual edits" with nearly 100% reliability if you are specific enough about what you want to do. Instead of "build me x, y, z", you can tell it to rename variables, restructure functions, write specific tests, move files around, and so on.

So the question isn't so much whether to use an agent or edit code manually—it's what level of detail you work at with the agent. There are still times where it's easier to do things manually, but you never really need to.


Can you show some example? I feel like there would be streams or YouTube lets plays on this if it was working well


I would like to see it as well. It seems to me that everybody sells shovels only. But nobody haven’t seen gold yet. :)


The real secret to agent productivity is letting go of your understanding of the code and trusting the AI to generate the proper thing. Very pro agent devs like ghuntley will all say this.

And it makes sense. For most coding problems the challenge isn’t writing code. Once you know what to write typing the code is a drop in the bucket. AI is still very useful, but if you really wanna go fast you have to give up on your understanding. I’ve yet to see this work well outside of blog posts, tweets, board room discussions etc.


> The real secret to agent productivity is letting go of your understanding of the code and trusting the AI to generate the proper thing

The few times I've done that, the agent eventually faced a problem/bug it couldn't solve and I had to go and read the entire codebase myself.

Then, found several subtle bugs (like writing private keys to disk even when that was an explicit instruction not to). Eventually ended up refactoring most of it.

It does have value on coming up with boilerplate code that I then tweak.


You made the mistake of looking at the code, though. If you didn't look at the code, you wouldn't have known those bugs existed.


fixing code now is orders of magnitude cheaper than fixing it in month or two when it hits production.

which might be fine if you're doing proof of concept or low risk code, but it can also bite you hard when there is a bug actively bleeding money and not a single person or AI agent in the house that knows how anything work


That's just irresponsible advice. There is so little actual evidence of this technology being able to produce high quality maintainable code that asking us to trust it blindly is borderline snake-oil peddling.


Not borderline - it is just straight snake-oil peddling.


yet it works? where have you been for the last 2 years?

calling this snake oil is like when the horse carriage riders were against cars.


I am an early adopter since 2021 buddy. "It works" for trivial use-cases, for anything more complex it is utter crap.


I don’t see how I would feel comfortable pushing the current output of LLMs into high-stakes production (think SLAs, SRE).

Understanding of the code in these situation is more important than the code/feature existing.


You can use an agent while still understanding the code it generates in detail. In high stakes areas, I go through it line by line and symbol by symbol. And I rarely accept the first attempt. It’s not very different from continually refining your own code until it meets the bar for robustness.

Agents make mistakes which need to be corrected, but they also point out edge cases you haven’t thought of.


Definitely agreed, that is what I do as well. At that point you have good understanding of that code, which is in contrast to what the post I responded suggests.


I agree and am the same. Using them to enhance my knowledge and as well as autocomplete on steroids is the sweet spot. Much easier to review code if im “writing” it line by line.

I think the reality is a lot of code out there doesn’t need to be good, so many people benefit from agents etc.


> The real secret to agent productivity is letting go of your understanding of the code

This is negligence, it's your job to understand the system you're building.


Not to blow your bubble, but I've seen agents expose Stripe credentials by hardcoding them as text into a react frontend app, so, no kids, do not "let go" of code understanding, lest you want to appear as the next story along the lines of "AI dropped my production database".


This is sarcasm right?


I wish, that's dev brain on AI sadly.

We've been unfucking architecture done like that for a month after the dev that had hallucination session with their AI left.


A lot of that would be people working on proprietary code I guess. And most of the people I know who are doing this are building stuff, not streaming or making videos. But I'm sure there must be content out there—none of this is a secret. There are probably engineers working on open source stuff with these techniques who are sharing it somewhere.


That’s understandable, I also wouldn’t stream my next idea for everyone to see


Let’s see it then


go on reddit and you can see a million of these vibe coded codebases. is that not good enough?


+1 here. Lets see those productivity gains!


Here's one - https://apps.apple.com/us/app/pistepal/id6754510927

The app is definitely still a bit rough around the edges but it was developed in breakneck speed over the last few months - I've probably seen an overall 5x acceleration over pre-agentic development speed.


I use Junie to get tasks done all the time. For instance I had two navigation bars in an application which had different styling and I told it make the second one look like the first and... it made a really nice patch. Also if I don't understand how to use some open source dependency I check the project out and ask Junie questions about it like "How do I do X?" or "How does setting prop Y have the effect of Z?" and frequently I get the right answer right away. Sometimes I describe a bug in my code and ask if it can figure it out and often it does, ask for a fix and often get great results.

I have a React application where the testing situation is FUBAR, we are stuck on an old version of React where tests like enzyme that really run react are unworkable because the test framework can never know that React is done rendering -- working with Junie I developed a style of true unit tests for class components (still got 'em) that tests tricky methods in isolation. I have a test file which is well documented explaining the situation around tests and ask "Can we make some tests for A like the tests in B.test.js, how would you do that?" and if I like the plan I say "make it so!" and it does... frankly I would not be writing tests if I didn't have that help. It would also be possible to mock useState() and company and might do that someday... It doesn't bother me so much that the tests are too tightly coupled because I can tell Junie to fix or replace the tests if I run into trouble.

For me the key things are: (1) understanding from a project management perspective how to cut out little tasks and questions, (2) understanding enough coding to know if it is on the right track (my non-technical boss has tried vibe coding and gets nowhere), (3) accepting that it works sometimes and sometimes it doesn't, and (4) recognizing context poisoning -- sometimes you ask it to do something and it gets it 95% right and you can tell it to fix the last bit and it is golden, other times it argues or goes in circles or introduces bugs faster than it fixes them and as quickly as you can you recognize that is going on and start a new session and mix up your approach.


Manually styling two similar things the same way is a code smell. Ask the ai to make common components and use them for both instead of brute forcing them to look similar.


Yeah, I thought about this in that case. I tend to think the way you do to the extent that it is sometimes a source of conflict with other people I work with.

These navbars are similar but not the same, both have a pager but they have other things, like one has some drop downs and the other has a text input. Styled "the same" means the line around the search box looks the same as the lines around the numbers in the pager, and Junie got that immediately.

In the end the patch touched css classes in three lines of one file and added a css rule -- it had the caveat that one of the css classes involved will probably go away when the board finally agrees to make a visual change we've been talking about for most of a year but I left a comment in the first navbar warning about that.

There are plenty of times I ask Junie to try to consolidate multiple components or classes into one and it does that too as directed.



This is a lot of good reasons not to use it yet IMO


> I add to my team’s CLAUDE.md multiple times a week.

This concerns me because fighting tooling is not a positive thing. It’s very negative and indicates how immature everything is.


The Claude MD is like the documentation you hand to a new engineer on your team that explains details about your code that they wouldn't otherwise know. It's not bad to need one.


But that documentation shouldn’t need to be updated nearly every other day.


Consider that every time you start a session with Claude Code. It's effectively a new engineer. The system doesn't learn like a real person does, so for it to improve over time you need to manually record the insights that for a normal human would be integrated by the natural learning process.


Yes, that's exactly the problem. There's good reasons why any particular team doesn't onboard new engineers each day, going all the way back to Fred Brooks and "adding more people to a late project makes it later".


Reminds me of that Nicole Kidman movie Before I Go to Sleep.


there are many tools available that work towards solving this problem


Sleep time compute architectures are changing this.


I certainly could be updating the documentation for new devs very frequently - the problem with devs is that they don't bother reading the documentation.


and the other problem - when they see something is wrong/out of date, they don't update it...


If you are consistent with how you do your projects you shouldn't need to update CLAUDE.md nearly every day. Early on, I was adjusting it nearly every day for maybe a couple of projects but now I have very little need to make any adjustments.

Often the challenge is users aren't interacting with Claude Code about their rules file. If Claude Code doesn't seem to be working with you ask it why it ignore a rule. Often times it provides very useful feedback to adjust the rules and no longer violate them.

Another piece of advice I can give is to clear your context window often! Early in my start in this I was letting the context window auto compact but this is bad! Your model is it's freshest and "smartest" when it has a fresh context window.


It takes a lot of uncached tokens to let it learn about your project again.


Same thing happens every time a new hire joins the team. Lots of documentation is stale and needs updating as they onboard.


But that documentation shouldn’t need to be updated nearly every other day.


It does if it’s incomplete or otherwise doesn’t accurately convey what people need to know.


And something is terribly wrong if it is constantly in that state despite near daily updates.


Have you never looked at your work's Confluence? Worse, have you never spent time at a company where the documentation wasn't frequently updated?


Do you have nothing but onboarding material on yours and somehow still need to update it several times a week?


Why not?


You might be misunderstanding what a CLAUDE.md is. It’s not about fighting the model, rather it’s giving the model a shortcut to get the context it needs to do its work. You don’t have to have one. Ours is 100% written by Claude itself.


That's not the same thing as adding rules by yourself based on your experiences with Claude.


Does the same happens if I create an AGENTS.md instead?


Claude Code does not support AGENTS.md, you can symlink it to CLAUDE.md to workaround it. Anthropic: pls support!



Use AGENTS.md for everything, then put a single line in CLAUDE.md:

  @AGENTS.md


Get a grep!


In addition, Having Claude Code's code and plans evaluated is very valid. It makes calm decision for AI agents.


How do you make Claude code to choose opus and not sonnet? For me it seems to do it automatically


/model


> 1. If there is anything Claude tends to repeatedly get wrong, not understand, or spend lots of tokens on, put it in your CLAUDE.md.

What a joke. Claude regularly ignores the file. It is a toss up: we were playing a game at work to guess which items will it forget first: to run tests, formatter, linter etc. This is despite items saying ABSOLUTELY MUST, you HAVE To and so long.

I have cancelled my Claude Max subscription. At least Codex doesn’t tell me that broken tests are unrelated to its changes or complain that fixing 50 tests is too much work.


And if I may, these advice also apply if you choose Cursor as a coding environment.


> Use Opus 4.5.

This drives up price faster than quality though. Also increases latency.


There's a counterintuitive pricing aspect of Opus-sized LLMs in that they're so much smarter that in some cases, it can solve the problem faster and with much fewer tokens that it can end up being cheaper.


Opus 4.5 is significantly better if you can afford it.

They also recently lowered the price for Opus 4.5, so it is only 1.67x the price of Sonnet, instead of 5x for Opus 4.


Obviously the Anthropic employee advertising their product wants you to pay as much as possible for it.


The generosity of the Max plans indicates otherwise.


God bless these generously benevolent corporations, giving us such amazing services for the low low price of only $200 per month. I'm going to subscribe right now! I almost feel bad, it's like I'm stealing from them.


That $200 a month is getting me $2000 a month in API equivalent tokens.

I used to spend $200+ an hour on a single developer. I'm quite sure that benevolence was a factor when they submitted me an invoice, since there is no real transparency if I was being overbilled or not or that the developer acted in my best interest rather than theirs.

I'll never forget that one contractor who told me he took a whole 40 hours to do something he could have done in less than that, specifically because I allocated that as an upperbound weekly budget to him.


> That $200 a month is getting me $2000 a month in API equivalent tokens.

Do you ever feel bad for basically robbing these poor people blind? They're clearly losing so much money by giving you $1800 in FREE tokens every month. Their business can't be profitable like this, but thankfully they're doing it out of the goodness of their hearts.


I'm not sure that you actually expect to be taken seriously if you're going to assert that these companies don't have costs themselves to deliver their services.


Even 500 would be cheap, if it can replace one developer


Claude code basically does not use CLAUDE.md but wish it did


Hey Boris, can you teach CC how to use cd?


Personally, CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1 made all my cd problems go away (which were only really in cmake-based projects to begin with).


So it does a forced reset of the dirt after each bash command? Does it confuse Claude? I frequently find it lacks path awareness of what it's working directory is


That’s exactly what it does, I’ve found it completely un-confuses Claude Sonnet 4.5.


Does all my code get uploaded to the service?


Hey Boris from the Claude Code team - could you guys please be so kind so as to stop pushing that narrative about CLAUDE.md, either yourselves or through influencers and GenAI-grifters? The reason being, it is simply not true. A lot of the time the instructions will be ignored. Actually, the term "ignored" is putting the bar too high, because your tool does not intentionally "ignore", not having sentience and knowledge. We experience the effects of the instructions being ignored, because your software is not deterministic, its merely guessing the next token, and sometimes those instructions tacked onto the rest of the context statistically do not match what we as humans expect to see (while its perfectly logical for your machine learning text generator, based on the datasets it was trained on).


This seems pretty aggressive considering this is all just personal anecdote.

I update my CLAUDE.md all the time and notice the effects.

Why all the snark?


Is it really just a personal anecdote ? Please do read some other comments on this post. The snark comes from everyone and their mother recommending "just write CLAUDE.md", when it is clear that this technology does not have intrinsic capability to perform reliable outputs based on human language input.


Yeah… that’s the point of LLMs: variable output. If you’re using them for 100% consistent output, you’re using the wrong tool.


Is it? So you are saying software should not be consistent? Or that LLMs should not be used for software development, aside from toy-projects?


CLAUDE.md is read on session startup.

If you're continually finding that it's being forgotten, maybe you're not starting fresh sessions often enough.


I understand you're trying to be helpful but the number of "you're holding it wrong" things I read about this tool — any AI tool — just makes me wonder who vibe coders are really doing all this unpaid work for.


I should not have to fight tooling, especially the supposedly "intelligent" one. What's the point of it, if we have to always adapt to the tool, instead of the other way around?


It's a tool. The first time you used a shell you had to learn it. The first time you used a text editor you had to learn it.

You can learn how to use it, or you can put it down if you think it doesn't bring you any benefit.


even shell remembers my commands...


I am sorry but what do I have to learn? That the tool does not work as advertised? That sometimes it will work as advertised, sometimes not? That it will sometimes expose critical secrets as plain text and some other time suggest to solve a problem in a function by removing the function code completely? What are you even talking about, comparing to shell and text editors? These are still bloody deterministic tools. You learn how they work and the usage does not change unpredictably every day! How can you learn something that does not have predictable outputs?


Yes, you have to learn those things. LLMs are hard to use.

So are animals, but we've used dogs and falcons and truffle hunting pigs as tools for thousands of years.

Non-deterministic tools are still tools, they just take a bunch more work to figure out.


It's like having Michael Jordan with dementia on your team. You start out mesmerized by how many points he can score, and then you get incredibly frustrated that he forgets he has to dribble and shoot into the correct hoop.


Spot on. Not to mention all the fouls and traveling the demented "all star" makes for your team, effectively negating any point gains.


> So are animals, but we've used dogs and falcons and truffle hunting pigs as tools for thousands of years.

Dogs learn their jobs way faster, more consistently and more expressively than any AI tool.

Trivially, dogs understand "good dog" and "bad dog" for example.

Reinforcement learning with AI tooling clearly seems not to work.


> Dogs learn their jobs way faster, more consistently and more expressively than any AI tool.

That doesn't match my experience with dogs or LLMs at all.


Ever heard of service dogs? Or police dogs? Now tell me, when will LLMs ever be safe to be used as assistance to blind people? Or will the big tech at some point release some sloppy blind-people-tool based on LLMs and unleash the LLM-influencers like yourself to start gaslighting the users into thinking they were "not holding it right" ? For mission and life critical problems, I'll take a dog any day, thank you very much!


I've talked to a few people who are blind about vision LLMs and they're very, very positive about them.

They fully understand their limitations. Users of accessibility technology are extremely good at understanding the precise capabilities of the tools they use - which reminds me that screenreaders themselves are a great example of unreliable tools due to the shockingly bad web apps that exist today.

I've also discussed the analogy to service dogs with them, which they found very apt given how easily their assistive tool could be distracted by a nearby steak.

The one thing people who use assistive technology do not appreciate is being told that they shouldn't try a technology out themselves because it's unreliable and hence unsafe for them to use!


Please for once answer the question being asked without replacing both the question and the stated intention with something else. I was willing to give you the benefit of doubt, but I am now really wondering where does your motivation for these vaguely constructed "analogies" coming from, is the LLM industry that desperate? We were all "positive" about LLM possibilities once. I am asking you, when will LLMs be so reliable that they can be used in place of service dogs for blind people ? Do you believe that this technology will ever be that safe. Have you ever actually seen a service dog? I don't think you can distract a service dog with a steak - did you know they start their training basically from year one of age and it takes up to two years to train them. Do you think they spend those two years learning to fetch properly? Also I never said people should not be allowed to "try" a technology. But like with drugs, the tools for impaired, sick etc. also undergo a verification and licensing process, I am surprised you did not know that. So I am asking you again, can you ever imagine an LLM passing those high regulatory hurdles, so that they can be safely used for assisting the impaired people? Service dogs must be doing something right, if so many of them are safely assisting so many people today, don't they ?


No, please, stop misleading people Simon. People use tools to make things easier for them, not harder. And a tool which I cannot steer predictably is not a god damn tool at all! The sheer persistence the AI-promoters like you are willing to invest just to gaslight us all into thinking we were dumb and did not know how to use the shit-generators is really baffling. Understand that a lot of us are early adopters and we see this shit for what it is - the most serious mess up of the "Big Tech" since Zuckerberg burned 77B for his metaverse idiocy. By the way - animals are not tools. People do not use them - they engage with them as helpers, companions and for some people, even friends of sorts. Drop your LLM and try engaging with someone who has a hunting dog for example - they'd be quite surprised if you referred to their beloved retriever as a "tool". And you might learn something about a real intelligence.


Your insistence that LLMs are not useful tools is difficult for me to empathize with as someone who has been using them successfully as useful tools for several years - and sharing in great detail how I am using them.

https://simonwillison.net/2025/Dec/10/html-tools/ is the 37th post in my series about this: https://simonwillison.net/series/using-llms/

https://simonwillison.net/2025/Mar/11/using-llms-for-code/ is probably still my most useful of those.

I know you absolutely hate being told you're holding them wrong... but you're holding them wrong.

They're not nearly as unpredictable as you appear to think they are.

One of us is misleading people here, and I don't think it's me.


> One of us is misleading people here, and I don't think it's me.

Firstly, I am not the one with an LLM-influencer side-gig. Secondly - No sorry, please don't move the goalposts. You did not answer my main argument - which is - how does a "tool" which constantly change its behaviour deserve being called a tool at all? If a tailor had scissors which cut the fabric sometimes just a bit, and sometimes completely differently every time they used it, would you tell the tailor he is not using them right too? Thirdly you are now contradicting yourself. First you said we need to live with the fact that they are un-predictable. Now you are sugarcoating it into being "a bit unpredictable", or "not as nearly unpredictable". I am not sure if you are doing this intentionally or do you really want to believe in the "magic" but either way you are ignoring the ground tenets of how this technology works. I'd be fine if they used it to generate cheap holiday novels or erotica - but clearly after four years of experimenting with the crap machines to write code created a huge pushback in the community - we don't need the proverbial scissors which cut our fabric differently each time!


> how does a "tool" which constantly change its behaviour deserve being called a tool at all?

Let's go with blast furnaces. They're definitely tools. They change over time - a team might constantly run one for twenty years but still need to monitor and adjust how they use it as the furnace itself changes behavior due to wear and tear (I think they call this "drift".)

The same is true of plenty of other tools - pottery kilns, cast iron pans, knife sharpening stones. Expert tool users frequently use tools that change over time and need to be monitored and adjusted.

I do think dogs and horses other animal tools remain an excellent example here as well. They're unpredictable and you have to constantly adapt to their latest behaviors.

I agree that LLMs are unpredictable in that they are non-deterministic by nature. I also think that this is something you can learn to account for as you build experience.

I just fed this prompt to Claude Code:

  Add to_text() and to_markdown() features to justhtml.html - for the whole document or for CSS selectors against it
  
  Consult a fresh clone of the justhtml Python library (in /tmp) if you need to
It did exactly what I expected it would do, based on my hundred of previous similar interventions with that tool: https://github.com/simonw/tools/pull/162


> Let's go with blast furnaces. They're definitely tools. They change over time - a team might constantly run one for twenty years but still need to monitor and adjust how they use it as the furnace itself changes behavior due to wear and tear (I think they call this "drift".)

Now let's make the analogy more accurate: let's imagine the blast furnace often ignores the operator controls, and just did what it "wanted" instead. Additionally, there are no gauges and there is no telemetry you can trust (it might have some that can the furnace will occasionally falsify, but you won't know when it's doing that).

Let's also imagine that the blast furnace changes behavior minute-to-minute (usually in the middle of the process) between useful output, useless output (requires scrapping), and counterproductive output (requires rework which exceeds the productivity gains of using the blast furnace to begin with).

Furthermore, the only way to tell which one of those 3 options you got, is to manually inspect every detail of every piece of every output. If you don't do this, the output might leak secrets (or worse) and bankrupt your company.

Finally, the operator would be charged for usage regardless of how often the furnace actually worked. At least this part of the analogy already fits.

What a weird blast furnace! Would anyone try to use this tool in such a scenario? Not most experienced metalworkers. Maybe a few people with money to burn. In particular, those who sing the highest praises of such a tool would likely be ignorant of all these pitfalls, or have a vested interest in the tool selling.


> What a weird blast furnace! Would anyone try to use this tool in such a scenario? Not most experienced metalworkers.

Absolutely wrong. If this blast furnace would cost a fraction of other blast furnaces, and would allow you to produce certain metals that were too expensive to produce previously (even with high error rate), almost everyone would use it.

Which is exactly what we're seeing right now.

Yes, you have to distinguish marketing message vs real value. But in terms of bang for buck, Claude Code is an absolute blast (pun intended)!


> this blast furnace would cost a fraction of other blast furnaces

Totally incorrect: as we already mentioned, this blast furnace actually costs just as much as every other blast furnace to run all the time (which they do). The difference is only in the outputs, which I described in my post and now repeat below, with emphasis this time.

Let's also imagine that the blast furnace changes behavior minute-to-minute (usually in the middle of the process) between useful output, useless output (requires scrapping), and counterproductive output ——>(requires rework which exceeds the productivity gains of using the blast furnace to begin with)<——

Does this describe any currently-operating blast furnaces you are aware of? Like I said, probably not, for good reason.


The furnaces I'm comparing are Claude Code vs hiring more engineers. Not Claude Code vs Codex vs Gemini. If $20/mo makes an engineer even 10% more productive, purchasing Claude Code is a no-brainer.

Most engineers feel like Claude Code is a multiplier for their productivity, despite all the flaws that it has. You're arguing that CC is unusable and is a net negative on the productivity, but this is the opposite of what people are feeling. I am able to tackle problems I wouldn't even attempt previously (sometimes to my detriment).


You appear to be arguing that powerful, unpredictable tools like LLMs need to be run carefully with plenty of attention paid to catching their mistakes and designing systems around them (like sandboxed coding agent harnesses) that allow them to be operated productively and safely.

I couldn't agree more.


> You appear to be arguing that powerful, unpredictable tools like LLMs need to be run carefully with plenty of attention

I did not say that. I said that most metalworkers familiar with all the downsides (only 1 of which you are referring to here) would avoid using such an unpredictable, uncontrollable, uneconomical blast furnace entirely.

A regular blast furnace requires the user to be careful. A blast furnace which randomly does whatever it wants from minute to minute, producing bad output more often than good, including bad output that costs more to fix than the furnace cost to run, more than any cost savings, with no way to tell or meaningfully control it, is pretty useless.

Saying "be careful" using a machine with no effective observability or predictability or controls is a silly misnomer, when no amount of care will bestow the machine with them.

What other tools work this way, and are in widespread use? You mentioned horses, for example: What do you think usually happens to a deranged, rabid, syphilitic working horse which cannot effectively perform any job with any degree of reliability, and which often unpredictably acts out in dangerous and damaging ways? Is it usually kept on the job and 'run carefully'? Of course not.


Whether its blast furnaces or carbon fiber, the wear and tear (macroscopic changes) as well as material fatigue (molecular changes) is something that will be specified by the manufacturer, within some margin of error and you pretty much know what to expect - unless you are a smartass billionaire building an improvised sub off of carbon fiber whose expiry date was long due. However, the carbon fiber or your blast furnace wont break just on their own. So it's a weak analogy and a stretch at that. Now for your experiment: it has no value because a) you and me both know if you told your LLM that their output was shit, they would immediately "agree" with you and go off to produce some other crap b) For this to be a scientifically valid experiment at all, I'd expect on the order of 10.000 repetitions, each providing exactly the same output. But also on this you and me both know already the 2nd iteration will introduce some changes. So stop fighting the obvious and repeat after me: LLMs are shit for any serious work.


Why would I agree that "LLMs are shit for any serious work" when I've been using them for serious work for two+ years, as have many other people who's skills I respected from before LLMs came along?

I wrote about another solid case study this morning: https://simonwillison.net/2025/Dec/14/justhtml/

I genuinely don't understand how you can look at all of this evidence and still conclude that they aren't useful for people who learn how to use them.


Well, you dont have to agree with that statement. But I havent seen a serious refute of my arguments either.


> I know you absolutely hate being told you're holding them wrong... but you're holding them wrong.

Wow, was that a shark just then?


You’ve asked the right questions and don’t want to find the answers. It’s on you.


Whats "on me" mate? Not being impressed with the 101st ToDo app vibe-coding hobbysts elatedly put together with the help of the statistical magic box?


Hey, Boris from the Claude Code team here. We try hard to read through every issue, and respond to as many issues as possible. The challenge is we have hundreds of new issues each day, and even after Claude dedupes and triages them, practically we can’t get to all of them immediately.

The specific issue you linked is related to the way Ink works, and the way terminals use ANSI escape codes to control rendering. When building a terminal app there is a tradeoff between (1) visual consistency between what is rendered in the viewport and scrollback, and (2) scrolling and flickering which are sometimes negligible and sometimes a really bad experience. We are actively working on rewriting our rendering code to pick a better point along this tradeoff curve, which will mean better rendering soon. In the meantime, a simple workaround that tends to help is to make the terminal taller.

Please keep the feedback coming!


It’s surprising to hear this get chalked up to “it’s the way our TUI library works”, while e.g. opencode is going to the lowest level and writing their own TUI backend. I get that we can’t expect everyone to reinvent the wheel, but it feels symptomatic of something that folks are willing to chalk up their issues as just being an unfortunate and unavoidable symptom of a library they use rather than seeming that unacceptable and going to the lowest level.

CC is one of the best and most innovative pieces of software of the last decade. Anthropic has so much money. No judgment, just curious, do you have someone who’s an expert on terminal rendering on the team? If not, why? If so, why choose a buggy / poorly designed TUI library — or why not fix it upstream?


We started by using Ink, and at this point it’s our own framework due to the number of changes we’ve made to it over the months. Terminal rendering is hard, and it’s less that we haven’t modified the renderer, and more that there is this pretty fundamental tradeoff with terminal rendering that we have been navigating.

Other terminal apps make different tradeoffs: for example Vim virtualizes scrolling, which has tradeoffs like the scroll physics feeling non-native and lines getting fully clipped. Other apps do what Claude Code does but don’t re-render scrollback, which avoids flickering but means the UI is often garbled if you scroll up.


As someone who's used Claude Code daily since the day it was released, the sentiment back then (sooo many months ago) was that the Agent CI coding TUIs were kind of experimental proof-of-concepts. We have seen them be incredibly effective and the CC team has continued to add features.

Tech debt isn't something that even experienced large teams are immune to. I'm not a huge TypeScript fan, so seeing their choice to run their app on Node to me felt like a trade-off between development speed with the experience that the team had and at the expense of long-term growth and performance. I regularly experience pretty intense flickering and rendering issues and high CPU usage and even crashes but that doesn't stop me from finding the product incredibly useful.

Developing good software especially in a format that is relatively revolutionary takes time to get right and I'm sure whatever efforts they have internally to push forward a refactor will be worth it. But, just like in any software development, refactors are prone to timeline slips and scope creep. A company having tons of money doesn't change the nature of problem-solving in software development.


> CC is one of the best and most innovative pieces of software of the last decade...

Oh come on! Aider existed before it, and so did many other TUI AI agents. I'd say Rust and Elixir were more innovation than CC.


That issue is the fourth most-reacted issue, and third most open issue. And the two things above it are feature requests. It seems like you should at the very least have someone pop in to say "working on it" if that's what you're doing, instead of letting it sit there for 4 months?



Thanks for the reply (and for Claude Code!). I've seen improvement on this particular issue already with the last major release, to the extent that it's not a day to day issue for me. I realise Github issues are not the easiest comms channel especially with 100s coming in a day, but occasional updates on some of the top 10 commented issues could perhaps be manageable and beneficial.


How about giving us the basic UX stuff that all other AI products have? I've been posting this ever since I first tried Claude: Let us:

* Sign in with Apple on the website

* Buy subscriptions from iOS In App Purchases

* Remove our payment info from our account before the inevitable data breach

* Give paying subscribers an easy way to get actual support

As a frequent traveller I'm not sure if some of those features are gated by region, because some people said they can do some of those things, but if that is true, then that still makes the UX worse than the competitors.


Can you confirm you're on the latest version? You should not be seeing it more than once every few days.


Claude Code v2.0.35 — No exaggeration I've had it popup 4x today already.


Yes but imagine they hadn't applied this "fix." It could have been 40x. :P


This was a bug, should be fixed as of a few days ago.


Boris from the Claude Code team here.

We actually don't train on this survey data. It's just for vibes so we can make sure people are having a good experience.

See https://code.claude.com/docs/en/data-usage#session-quality-s...


> It's just for vibes so we can make sure people are having a good experience.

Are you sure that people are having a good experience ?

(after some random time) Are you sure that people are having a good experience ?

(after some random time) Are you sure that people are having a good experience ?

(after some random time) Are you sure that people are having a good experience ?

(after some random time) Are you sure that people are having a good experience ?


Now think about this from a user's perspective:

We asked for Claude coz we really wanted it (as devs). Our security and legal guys evaluated it and the data usage agreements and such, which took ages and they stated that we can use it as long as we never never ever give any kind of bug report or feedback because that might retain data. This was all prior to this new feedback prompt being added to Claude Code.

We were happy. We could use Claude Code, finally!

Then suddenly, about once per day we are getting these prompts and there is NO option to disable them. The prompt just randomly appears. I might be typing part of my next prompt, which can very definitely include numbers from 1 upwards, e.g. because I'm making a todo list and boom I've given feedback. I've inadvertently done the thing we were never never ever supposed to do.

Do you really think our first thought is to double check if maybe the data usage agreement has been changed / hopefully says that this particular random new survey thing is somehow not included in "feedback"?

No, we panic and are mad at Claude Code/Anthropic. Heck if you noticed that you inadvertently gave feedback you might make a disclosure to the security team that will now go and evaluate how bad this was. Hopefully they'd then find your updated data usage agreement.

It would have been so easy to just include an option to opt out, which just set the equivalent of the env var you now provide into the settings itself. In fact, I see the first comment a few days after the bug was filed already asked for that and was thumbs'd up by many people. And you ignored it and all we got was a manual env var or settings entry that we also need to first find out even exists.


> We were happy. We could use Claude Code, finally!

So you are happy to use Claude.

> Then suddenly, about once per day we are getting these prompts and there is NO option to disable them.

And now you are not happy. Why are you not happy ?

You're holding it wrong. /s


Boris from the Claude Code team here.

You can set CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY either in your env, or in your settings.json. Either one works.


Hey HN, Boris from the Claude Code team here. Sharing a bit about why we have this survey, how we use it, and how to opt out of it-

1. We use session quality feedback as a signal to make sure Claude Code users are having a good time. It's helpful data for us to more quickly spot & prevent incidents like https://www.anthropic.com/engineering/a-postmortem-of-three-.... There was a bug where we were showing the survey too often, which is now fixed (it was annoying and a misconfiguration on our part).

2. Giving session quality feedback is totally optional. When you provide feedback, we just collect your numerical rating and some metadata (like OS, terminal, etc.). Giving feedback doesn't cause us to log your conversation, code, or anything like that. (docs: https://code.claude.com/docs/en/data-usage#session-quality-s...)

3. We don't train on quality feedback data. This is documented in the link above.

4. If you don't want to give feedback, you can permanently turn it off for yourself by setting CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY=1 in your env or settings.json file. (docs: https://code.claude.com/docs/en/settings)

5. To permanently turn off the feedback survey for your whole company, set CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY=1 in the settings.json checked into your codebase, or in your enterprise-managed settings.json (docs: https://code.claude.com/docs/en/settings)

6. You can also opt out of both telemetry + survey by setting CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1, or you can more granularly opt out with DISABLE_ERROR_REPORTING=1, DISABLE_TELEMETRY=1, etc. (also documented in the settings docs)

Security and privacy are very important, and we spend a lot of time getting these right. You have full control over data usage, telemetry, and training, and these are configurable for yourself, for your codebase, and for all your employees. We offer all of these options out of the box, so you can choose the mechanism that makes the most sense for you.

If there is a setting or control that is missing, or if anything is unclear from the docs, please tell us!


> We use session quality feedback as a signal to make sure Claude Code users are having a good time

Your entire answer ignores the fact that this is irritating behavior that ensures users are not having a good time. We don't want to chase down secret config values. We want to click "stop bothering me" and be done with it.


Ah yes the

"Never ask me again"

(asks again the next day)


Thank you for taking the time to respond here. Thank you also for sharing your point of view, use case, and where you are coming from. With that said, would you mind sharing a few words on a couple questions?:

1. Does it take as much effort to opt-in to your feedback mechanism as it takes to opt-out? If not, why not?

2. If you want a thing ('feedback', 'a signal', data that is helpful to YOU), but getting it has this negative effect on others, what would happen if you preferenced others over yourself, and did with less of the thing?


2: is not optional as the pop-up has occurred and the interruption done, and 4-6: are not obvious nor easy for almost everyone.

I recommend people always respond with the lowest possible score (1, not 0) when presented with popups like this.


Thank you for the feedback. I honestly believe that Claude Code is trying to be more privacy aware than other products, but it takes vigilance on both sides to get there. When you say the survey sends back more than the numeric value and add the 'etc' can you be very specific about the 'etc' is? The information you mention seems at odds with the data-usage documentation [1] which says only a numeric value is sent. Is the documentation going to be fixed to be explicit that more than a numeric value is sent? Is the feedback prompt going to ask the user if it is ok that more than a numeric value is sent back with the survey data?

Again, I think you are making the best product out there. I want to keep using it. Privacy is my #1 feature request to keep using it so transparency is crucial.

[1] https://code.claude.com/docs/en/data-usage


> If there is a setting or control that is missing, or if anything is unclear from the docs, please tell us!

The setting is "leave me alone and don't ask again".


Fast dev tools are awesome and I am glad the TS team is thinking deeply about dev experience, as always!

One trade off is if the code for TS is no longer written in TS, that means the core team won’t be dogfooding TS day in and day out anymore, which might hurt devx in the long run. This is one of the failure modes that hurt Flow (written in OCaml), IMO. Curious how the team is thinking about this.


Hey bcherny! Yes, dog-fooding (self-hosting) has definitely been a huge part in making TypeScript's development experience as good as it is. The upside is the breadth of tests and infrastructure we've already put together to watch out for regressions. Still, to supplement this I think we will definitely be leaning a lot on developer feedback and will need to write more TypeScript that may not be in a compiler or language service codebase. :D


Interesting! This sounds like a surprisingly hard problem to me, from what I've seen of other infra teams.

Does that mean more "support rotations" for TS compiler engineers on GitHub? Are there full-stack TS apps that the TS team owns that ownership can be spread around more? Will the TS team do more rotations onto other teams at MSFT?


Ultimately the solution has to be breaking the browser monopoly on JS, via performance parity of WASM or some other route, so that developers can dogfood in performant languages instead across all their tooling, front end, and back end.


First, this thread and article have nothing to do with language and/or application execution performance. It is only about the tsc compiler execution time.

Second, JavaScript already executes quickly. Aside from arithmetic operations it has now reached performance parity to Java and highly optimized JavaScript (typed arrays and an understanding of data access from arrays and objects in memory) can come within 1.5x execution speed of C++. At this point all the slowness of JavaScript is related to things other than code execution, such as: garbage collection, unnecessary framework code bloat, and poorly written code.

That being said it isn't realistic to expect measurably significant faster execution times by replacing JavaScript with a WASM runtime. This is more true after considering that many performance problems with JavaScript in the wild are human problems more than technology problems.

Third, WASM has nothing to do with JavaScript, according to its originators and maintainers. WASM was never created to compete, replace, modify, or influence JavaScript. WASM was created as a language ubiquitous Flash replacement in a sandbox. Since WASM executes in an agnostic sandbox the cost to replace an existing runtime is high since an existing run time is already available but a WASM runtime is more akin to installing a desktop application for first time run.


How do you reconcile this view with the fact that the typescript team rewrote the compiler in Go and it got 10x faster? Do you think that they could have kept in in typescript and achieved similar performance but they didn't for some reason?


This was touched on in the video a little bit—essentially, the TypeScript codebase has a lot of polymorphic function calls, and so is generally hard to JIT optimize. JS to Go therefore yielded a direct ~3.5x improvement.

The rest of the 10x comes from multi-threading, which wasn't possible to do in a simple way in the JS compiler (efficient multithreading while writing idiomatic code is hard in JS).

JavaScript is very fast for single-threaded programs with monomorphic functions, but in the TypeScript compiler's case, the polymorphic functions and opportunity for parallelization mean that Go is substantially faster while keeping the same overall program structure.


I have no idea about the details of their test cases. If they had used an even faster language like Cobol or Fortran maybe they could have gotten it 1,000,000x faster.

What I do know is that some people complain about long compile times in their code that can last up to 10 minutes. I had a personal application that was greater than 60k lines of code and the tsc compiler would compile it in about 13 seconds on my super old computer. SWC would compile it in about 2.5 seconds. This tells me the far greater opportunity for performance improvement is not in modifying the compiler but in modifying the application instance.


> maybe they could have gotten it 1,000,000x faster.

WTF.


Yeah this is an overly exaggerated claim


It was unwarranted sarcastic snark. That commenter was bitten by some bug.


Very short, succinct and informative comment. Thank you.


Are you looking for non-browser performance such as 3d? I see no case that another language is going to bring performance to the DOM. You'd have to be rendering straight to canvas/webgl for me to believe any of this.


The issue with Flow is that it's slow, flaky and has shifted the entire paradigm multiple times making version upgrades nearly impossible without also updating your dependencies, IF your dependencies adopted the new flow version as well. Otherwise you're SOL.

As a result the amount of libraries that ship flow types has absolutely dwindled over the years, and now typescript has completely taken over.


Our experience is the opposite, we have a pretty large flow typed code base, and can do a full check in <100ms. When we converted to TS (decided not to merged) we saw typescript was in the multiple minute mark. It’s worth checking out LTI and how the typing on boundaries, enables flow to parallelize and give very precise error messages compared to TS. The third party lib support is however basically dead, except the latest versions of flow are starting to enable ingestion of TS types, so that’s interesting.


They should write a typescript-to-go transpiler (in typescript) , so that they can write their compiler in typescript and use typescript to transpile it to go.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: