> As an example, they cited how Devin, when asked to deploy multiple application...

disambiguation · on Jan 26, 2025

>"Tasks that seemed straightforward often took days rather than hours, with Devin getting stuck in technical dead-ends or producing overly complex, unusable solutions," the researchers explain in their report. "Even more concerning was Devin’s tendency to press forward with tasks that weren’t actually possible."

Apparently we've all been working with Devin for years.

mullingitover · on Jan 26, 2025

> "Even more concerning was Devin’s tendency to press forward with tasks that weren’t actually possible."

Quickest way to get AI engineers kicked out of the company will be to patch them so they push back against unrealistic goals from management.

Seriously though, where is the AI C-suite? The AI BoD? At least with an AI BoD you don't have to worry about them doing backstabbing financial shenanigans for their own self-interest at the expense of the company.

xyzzy123 · on Jan 26, 2025

You would need much less "agreeable" AI to reliably steer a company. With current models an AI C-suite would quickly get "captured" by almost anyone interacting with it.

vrosas · on Jan 26, 2025

Like our president, whose policy stance is the stance of whoever he last talked to.

raverbashing · on Jan 26, 2025

No kidding, this is noticeable even with the simpler platforms like chatgpt or copilot

It gives you an answer that looks fine but lacks important details etc

neilv · on Jan 26, 2025

If an employee behaved like an LLM, a company should immediately get them into a debriefing with corporate counsel, HR, management, and trusted top technical personnel.

For example, to try to find out whose IP they plagiarized, and how badly we're scrod.

Or, for example, to find out how they generated so much code they don't understand at all, and how badly we're scrod.

Or, for example, to find out why they wrote a criminally negligent security vulnerability or data corruption, and how badly we're scrod.

Or, for example, to see what engineering assurance they "hallucinated", and how badly we're scrod.

infomofo · on Jan 26, 2025

I feel like this whole thing is a PR campaign to make people empathize with AI.

plagiarist · on Jan 26, 2025

"Well, of course I know him. He's me."

m463 · on Jan 26, 2025

sometimes working for him.

jsheard · on Jan 26, 2025

I wonder how much you get billed if the agent spends a whole day running around in circles. The $500/month subscription only comes with 250 vaguely defined "compute units", so past a certain point you'd have to pay extra for the time it wastes.

Move over "bankrupted by runaway cloud spending", it's time for "bankrupted by AI agents trying and failing to complete a task indefinitely".

ted_bunny · on Jan 26, 2025

Say what you want about incompetent coders, but at least the incentives are aligned.

lmm · on Jan 26, 2025

Depends on the company. We all hear stories of people writing themselves a promotion / bonus by deploying a bunch of bugs they can then save the day by fixing.

dntrkv · on Jan 26, 2025

Do people actually do that? Finding bugs in virtually any piece of software isn’t difficult if you have access to the source. Merging in a bug only to fix it later honestly seems like more work. Most bugs are pretty easy to fix…

tikhonj · on Jan 26, 2025

A much more common story would be people knowingly cutting corners because of management pressure/demotivation/etc, then fixing the resulting bugs. It's easy for somebody doing that to look like a hard-working hero compared to the programmer who just avoided the problems in the first place.

Vampiero · on Jan 26, 2025

No, if A's PRs always bounce because the testers find bugs then A is going to look like an idiot. Then again you need to work at a place that actually employs testers.

If B always submits PRs and they always go straight to merged in prod, then B knows what he's doing

imtringued · on Jan 26, 2025

Actually the testers are glad there is someone giving them work to do. Self licking ice cream cone and the like.

Republicans don't want to fix immigration, otherwise they have nothing to complain about.

einsteinx2 · on Jan 26, 2025

You guys have testers???

tbrownaw · on Jan 26, 2025

I've seen a lot of fairly explicit discussions around "this timeline will require cutting these corners and cost this much time to fix later or else it will cause these problems", and also some relatively internal discussions around "how strongly can we rely on promises that the project won't get dropped before all the cleanup is done, and how does that impact what options we can present".

elzbardico · on Jan 26, 2025

> Finding bugs in virtually any piece of software isn’t difficult if you have access to the source.

What???? Yeah, trivial bugs maybe.

Even because most "hairy" bugs (and those are the one that count by the end of the day) manifest themselves not in obvious ways, but only under some hard to predict set of pre-conditions and input data. And let's not even get started on threaded/asynchronous code.

dntrkv · on Jan 26, 2025

Most bugs are trivial to identify and trivial to fix. At least that’s been my experience from 15 years as a SWE in start ups and FANG alike.

Yes, I’ve worked on bugs that have taken weeks to fully resolve. But those are rare.

userbinator · on Jan 26, 2025

In my experience, "not supported" has a wide range of meanings from "next to impossible" to "we just don't want you to", so that wouldn't deter a human either (and I have interpreted it to mean "challenge accepted" several times), but the latter would be unlikely to hallucinate non-existent features.

unaindz · on Jan 26, 2025

Maybe not unrealistic but still bad at its job. And honestly as a hobby coder the first thing I do is find out if what I want is even possible

croes · on Jan 26, 2025

The goal is solutions, not realistic failures.