Hacker Newsnew | past | comments | ask | show | jobs | submit | NothingAboutAny's commentslogin

>Has everyone started using agents and paying $200 subscriptions?

If anything in my small circle the promise is waning a bit, in that even the best models on the planet are still kinda shitty for big project work. I work as a game dev and have found agents to only be mildly useful to do more of what I've already laid out, I only pay for the $100 annual plan with jetbrains and that's plenty. I haven't worked at a big business in a while, but my ex-coworkers are basically the same. a friend only uses chat now because the agents were "entirely useless" for what he was doing.

I'm sure someone is getting use out of them making the 10 billionth node.js express API, but not anyone I know.


I’m using it for scripts to automate yak shaving type tasks. But for code that’s expected to last, folks where I work are starting to get tired of all the early 2000s style code that solves a 15 LOC problem in 1000 lines through liberal application of enterprise development patterns. And, worse, we’re starting to notice an uptick in RCA meetings where a contributing factor was freshman errors sailing through code review because nobody can properly digest these 2,000 line pull requests at anywhere near the pace that Claude Code can generate them.

That would be fine if our value delivery rate were also higher. But it isn’t. It seems to actually be getting worse, because projects are more likely to get caught in development hell. I believe the main problem there is poorer collective understanding of generated code, combined with apparent ease of vibecoding a replacement, leads to teams being more likely to choose major rewrites over surgical fixes.

For my part, this “Duke Nukem Forever as a Service” factor feels the most intractable. Because it’s not a technology problem, it’s a human psychology problem.


So glad that I'm not the only one struggling with these huge generated PRs that are too big to honestly review, all while an AI reassuringly whispers in my ear "just trust me."

Don't get me wrong, overall I really like having AI in my workflow and have gotten many benefits. But even when I ask it to check its own work by writing test cases to prove that properties A, B and C hold, I just end up with thousands more lines of unit and integration tests that then take even more time to analyze -- like, what exactly is being tested here?, are the properties these tests purport to prove even the properties that I care about and asked the agent for in the first place, etc.

I have tried (with at least modest success) to use a second or third agent to review the work of the original coding agent(s), but my general finding has been that there is no substitute for actual human understanding from a legitimate domain expert.

Part of my work involves silicon design, which requires a lot of precision and complex timing issues, and I'll add that the best AI success I've had in those cases is a test-first approach (TDD), where I hand write a boatload of testbenches (that's what we call functional tests in chip design land), then coach my various agents to write the Verilog until my `make test` runs with no errors.


yeah it seems the usual front/back complexity is well in the training corpus of gemini and you get good enough output


I'd pay $10 a month for a browser, I pay that much for music and TV shows and I spend more time in a browser. I'm sure the market doesn't agree with me but I pay more for things that are less useful.


Let's see if you are telling the truth. I will sell you a browser for $10 a month. DM me.


Kagi and Orion have entered the chat.


man not a single one of those examples sounds like something I'd need, or even need an AI agent to do. I keep seeing the ads for AI browsers and the only thing I can think about is the complete and utter lack of a use case, and your post only solidifies that further. not that I'm disagreeing with you per se, I'm sure some people have a workflow they can't automate easily and they need a more complicated and expensive puppateer.js to do it. I just dont know what the heck I'd use it for.


I find it very hard to believe that either every site you interact with works exactly as you want it to work, or that you have the time/capacity to adjust them all with custom extensions. I get that there are downsides but you don't see any upsides?


I have extensions for the sites that need them and everything else is fine? occasionally I guess there'll be something in another language I want translated but I just copy paste the text into google translate or similar. what sites out there are so unusable you'd need an LLM to fix them for use?


Right now all the sites I frequent are good enough, otherwise I'd drop them. I don't interact with Discord, Bluesky, X, Instagram at all, and I feel like I'm missing out on a lot of high quality interpersonal communication because I have low tolerance for their UX and their lack of respect for users.


No. No upsides.

Again, what can an LLM possibly do to help? Summarize the page I'm already reading? I don't want a summary, that's dumb. People who think their time is so precious they have to optimize a five minute read into a ten cent API call and one minute read of possibly wrong output are just silly. You aren't "freeing up time", you are selling your reality.

Buy stuff for me? Why? Buying shit online is so easy most people do it on the toilet. I've bought things on the internet while blackout drunk. I also have a particular view of "Value" that no LLM will ever replicate, and not only do I have no interest in giving someone else access to my checkbook, I certainly do not want to give it to a third party who could make money off that relationship.

How would I no longer need browser extensions? You're saying the LLM would reliably block ads and that functionality will be managed by the single human being who has reliably done that for decades like uBlock origin? How will LLMs replace my gesture based navigation that all these hyper-productivity focused fools don't even seem to know exists? It certainly won't replace my corporate required password manager.

>You would be able to seamlessly communicate with the Polish internet subculture, or with Gen Alpha, all without feeling the physical pain

Come on, get over yourself.

> With an AGI-level AI

So Mozilla, who isn't even allowed to spend $6 million on a CEO is somehow magically going to invent super AI that runs locally? Get a grip.


I also hate concept of summaries, as well as related concept of AI text inflation.

I'm also against any third party being involved here. I'm pointing out the potential of AI in the browser, but for me it has to be locally run or it is a no go.

My point is that browser plays a central role in our digital interactions. Extensions help with smoothing out the experience. LLMs could write those extensions, or serve as an agent to further make the online interactions more pleasant. I could see myself using it to either optimize my existing experience, or to vastly broaden the communication surface in directions that currently hold too much friction.

The rest was a joke, sorry if it made your day worse.


I saw similar discussions around robotics, people saying "why are they making the robots humanoid? couldn't they be a more efficient shape" and it comes back to the same thing where if you want the tool to be adopted then it has to fit in a human-centric world no matter how inefficient that is. high performance applications are still always custom designed and streamlined, but mass adoption requires it to fit us not us to fit it.


Sometimes it's not even close, I went to download the PAX Australia app and the top result was Revolut. I'd love to know the set of circumstances that the algorithm picked them to sponsor there.


I tried to use perplexity to find ideal settings for my monitor, it responded with concise list of distinct settings and why. When I investigated the source it was just people guessing and arguing with each other in the Samsung forums, no official or even backed up information.

I'd love if it had a confidence rating based on the sources it found or something, but I imagine that would be really difficult to get right.


I asked gemini to do a deep research on the role of healthcare insurance companies in the decline of general practicioners in the Netherlands. It based its premise mostly on blogs and whitepapers on company websites, who's job it is to sell automation-software.

AI really needs better source-validation. Not just to combat the hallucination of sources (which gemini seems to do 80% of the time), but also to combat low quality sources that happen to correlate well to the question in the prompt.

It's similar to Google having to fight SEO spam blogs, they now need to do the same in the output of their models.


Better source validation is one of the main reasons I'm excited about GPT-5 Thinking for this. It would be interesting to try your Gemini prompts against that and see how the results compare.


I've found GPT-5 Thinking to perform worse than o3 did in tasks of a similar nature. It makes more bad assumptions that de-rail the train of thought.


I think the key is prompting, and bound boxing assumptions.


When using AI models through Kagi Assistant you can tweak the searches the LLM does with your Kagi settings (search only academic, block bullshit websites and such) which is nice. And I can chose models from many providers.

No API access though so you're stuck talking with it through the webapp.


Kagi has some tooling for this. You can set web access “lenses” that limit the results to “academic”, “forums”, etc.

Kagi also tells you the percentages “used” for each source and cites them in line.

It’s not perfect, but it’s a lot better to narrow down what you want to get out of your prompt.


Seems like the right outcome was had, by reviewing sources. I wish it went one step further and loaded those source pages and scroll/highlight the snippets where it pulled information from. That way we can easily double check at least some aspects of it's response, and content+ads can be attributed to the publisher.


But the really tricky thing is, that sometimes it _is_ these kinds of forums where you find the best stuff.

When LLMs really started to show themselves, there was a big debate about what is truth, with even HN joining in on heated debates on the number of sexes or genders a dog may have and if it was okay or not for ChatGPT to respond with a binary answer.

On one hand, I did found those discussions insufferable, but the deeper question - what is truth and how do we automated the extraction of truth from corpora - is super important and somehow completely disappeared from the LLM discourse.


In the absence of easily found authoritative information from the manufacturer, this would have been my source of information. Internet banter might actually be the best available information.


It would be interesting to see if that same question against GPT-5 Thinking produces notably better results.


Only anecdotally I can tell you when I was at a medium sized (~200 employee) fintech business in Australia, I was told by my engineering manager to hire any woman or PoC that applied. but in my 2 year tenure I think only 1 of either applied, both were hired immediately.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: