Hacker Newsnew | past | comments | ask | show | jobs | submit | fcarraldo's commentslogin

> Speaking as a developer, this becomes obvious the moment you step outside the romantic framing. I have been doing this for years, and the hardest parts of the job were never about typing out code. I have always struggled most with understanding systems, debugging things that made no sense, designing architectures that wouldn't collapse under heavy load, and making decisions that would save months of pain later.

> None of these problems can be solved LLMs. They can suggest code, help with boilerplate, sometimes can act as a sounding board. But they don't understand the system, they don't carry context in their "minds", and they certainly don't know why a decision is right or wrong.

It is becoming increasingly true that they do exactly this. In some cases, better than (some? many?) humans.

Also, the anti-Marxist Objectivism rant has no place in this article, as well as making no sense. Yes, the woodworker could continue working with their hands. But in society, today, they require money in order to have food and shelter, and that money needs to come from somewhere. If the market devalues the process of doing work by hand, and comes to value the speed and consistency of doing work through machines and automation, then the artisan cannot choose to simply continue spending dozens of hours per week working wood unless they are independently wealthy or have some other source of income. Individualism is not paramount in a society in which you are forced to participate for basic needs.


Which of these is Apple?

Not if they work outside of tech…

20 years ago was only 2006. The internet has been around for much longer. The first consumer focused ISPs launched in the early 90’s, 35 years ago, but CompuServe and others were providing access to chat and BBS’s in the 80s.

I’d say nearly 50 years is precedent enough that government intervention is unnecessary.


Yeah but most people weren't on the internet access in the early 90s. It is more a 2005+ phenomenon.

https://en.wikipedia.org/wiki/Global_Internet_usage


20 hours is low in this category. The Sony XM6s are 30h, the Bose QCs are ~24h. Sennheisers can do 40-50h. All with ANC on, the numbers are slightly higher with ANC off.

Does that include R&D? Google is an AI _provider_, which is a considerably different profile in terms of spend from companies who are consumers. I would expect Google to be investing considerable resources to keep up with Anthropic and OpenAI.

I don't think it includes all of the R&D. From what I've read that's the amount they will spend on infrastructure for AI.

I guess some of that infrastructure will get used for AI R&D, but there are other R&D costs such as salaries that wouldn't be included in the figure.


How does the model know it needs more context?

We provide the model with a tool, we call expand() that allows the model to get access to more context if needed by using it.

We state this directly appended into the outputs so the model knows exactly where the lines were removed from.


Presumably in much the same way it knows it needs to use to calls for reaching its objective.

I'd argue not, as with tool calls it has available to it at all times a description of what each tool can be used for. There's plenty of intermediate but still important information that could be compacted away, and unless there was a logical reason to go looking for it the model doesn't know what it doesn't know.

If your test can deterministically result in a race condition 100% of the time, is that a race condition? Assuming that we're talking about a unit test here, and not a race condition detector (which are not foolproof).


> Assuming that we're talking about a unit test here

I think the categorisation of tests is sometimes counterproductive and moves the discussion away from what's important: What groups of tests do I need in order to be confident that my code works in the real world?

I want to be confident that my code doesn't have race conditions in it. This isn't easy to do, but it's something I want. If that's the case then your unit test might pass sometimes and fail sometimes, but your CI run should always be red because the race test (however it works) is failing.

This is also hints at a limitation of unit tests, and why we shouldn't be over-reliant on them - often unit tests won't show a race. In my experience, it's two independent modules interacting that causes the race. The same can be true with a memory bug caused by a mismatch in passing of ownership and who should be freeing, or any of the other issues caused by interactions between modules.


> I think the categorisation of tests is sometimes counterproductive

"Unit test" refers to documentation for software-based systems that has automatic verification. Used to differentiate that kind of testing from, say, what you wrote in school with a pencil. It is true that the categorization is technically unnecessary here due to the established context, but counterproductive is a stretch. It would be useful if used in another context, like, say: "We did testing in CS class". "We did unit testing in CS class" would help clarify that you aren't referring to exams.

Yeah, Kent Beck argues that "unit test" needs to bring a bit more nuance: That it is a test that operates in isolation. However, who the hell is purposefully writing tests that are not isolated? In reality, that's a distinction without a difference. It is safe to ignore old man yelling at clouds.

But a race detector isn't rooted in providing verifiable documentation. It only observes. That is what the parent was trying to separate.

> I want to be confident that my code doesn't have race conditions in it.

Then what you really WANT is something like TLA+. Testing is often much more pragmatic, but pragmatism ultimately means giving up what you want.

> often unit tests won't show a race.

That entirely depends on what behaviour your test is trying to document and validate. A test validating properties unrelated to race conditions often won't consistently show a race, but that isn't its intent so there would be no expectation of it validating something unrelated. A test that is validating that there isn't race condition will show the race if there is one.


You can use deterministic simulation testing to reproduce a real-world race condition 100% of the time while under test.

But that's not the kind of test that will expose a race condition 1% of the time. The kinds of tests that are inadvertently finding race conditions 1% of the time are focused on other concerns.

So it is still not a case of a flaky test, but maybe a case of a missing test.


Because the Tools model allows for finer grained security controls than just bash and pipe. Do you really want Claude doing `find | exec` instead of calling an API that’s designed to prevent damage?


It might be the wrong place to do security anyway since `bash` and other hard-to-control tools will be needed. Sandboxing is likely the only way out


not for every user or use case. when developing of course i run claude —-do-whatever-u-want; but in a production system or a shared agent use case, im giving the agent least privilege necessary. being able to spawn POSIX processes is not necessary to analyze OpenTelemetry metric anomalies.


yeah, I would rather it did that. You run Claude in a sandbox that restricts visibility to only the files it should know about in the first place. Currently I use a mix of bwrap and syd for filtering.


Wow, it is really awful. This is such a pointless misstep given that Standard Notes has been around for years, was not vibe coded, is not an AI app - but this landing page makes me immediately assume it’s slop.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: