Ask HN: How did you scale AI development?

janpio · 2025-10-24T16:39:49 1761323989

Is the breaking functionality fully covered with tests, and the agent can and does run those tests when adding or changing things already? If not, that would be a promising approach to help the AI to not mess up. If yes, can that loop be further tightened to support the AI?

logicallee · 2025-10-24T16:57:09 1761325029

>Is the breaking functionality fully covered with tests,

Did you have success having AI iterate on code fully covered by tests?

I began to add tests, however, currently I am manually testing after each change. This is because I asked ChatGPT for a research study of best practices for AI development, which it produced here [1]. It suggested:

>Notably, some found that Claude’s first attempt often includes excess or "over-engineered" code. A candid blog post mentioned Claude as a "real master at shitting in the code" if not guided properly – it can "generate a ton of unnecessary code… even when you ask for minimalism, it will slap on a pile of code with useless tests that outsmart themselves and don’t work."

and:

>a developer noted they initially tried having Claude maintain extensive docs and tests for everything, but realized this added too many points of failure (the AI would waste effort updating documentation instead of focusing on code). Over-engineering the process can backfire.

Due to these reasons, I have been testing in a manual way between iterations. (Though I develop using ChatGPT 5 as well as Claude, depending on the task.)

[1] https://chatgpt.com/share/68fbaeea-f528-800b-b090-1bb6b3b2ca...

janpio · 2025-10-24T17:07:11 1761325631

Getting the agent to run tests definitely can have a very positive impact - it can actually realize itself that it broke something unrelated, and fix it (or easily be prompted if it gives up anyway).

Aside: I often remove some of the tests that seem superfluous to me, or explicitly ask for the minimal set of tests that still cover the functionality in the first place. Some models definitely can go "all in" on tests like a very eager intern that just learned about testing. For your cases where after a prompt you end up with broken functionality, just having an integration test that fails when the functionality breaks, might be enough.