I think there is one problem with defining acceptance criteria first: sometimes you don't know ahead of time what those criteria are. You need to poke around first to figure out what's possible and what matters. And sometimes the criteria are subjective, abstract, and cannot be formally specified.
Of course, this problem is more general than just improving the output of LLM coding tools
Yeah it’s extremely helpful to clarify your thoughts before starting work with LLM agents.
I find Claude Code style plan mode to be a bit restrictive for me personally, but I’ve found that creating a plan doc and then collaboratively iterating on it with an LLM to be helpful here.
I don’t really find it much different than the scoping I’d need to do before handing off some work to a more junior engineer.
I like staying within Claude Code for orchestrating its plan mode, but I needed a better way to actually review the plan, address certain parts, see plan diffs, etc all in a better visual way. The hooks system through permissionrequest:exitplanmode keep this fairly ergonomic.
I work as an ML engineer/researcher. When I implement a change in an experiment it usually takes at least an hour to get the results. I can use this time to implement a different experiment. Doesn't matter if I do it by hand or if I let an agent do it for me, I have enough time. Code isn't the bottleneck.
I also heard an opinion that since writing code is cheap, people implement things that have no economic value without really thinking it through.
+1 on the economic value line. Not everything needs to be about money but if you get paid to ship code it's about money. And now we have coworkers shipping insane amounts of "features" because it's all free to ship and being an engineer, it ends there.
Only it doesn't, there's product positioning, UX, information architecture, onboarding and training, support, QA, change management, analytics, reporting… sigh
> but if you get paid to ship code it's about money.
Tip to budding software engineers: try to not work in these sort of places, as they're about "looking busy" rather than engineering software, where the latter is where real long-lasting things are built, and the former is where startup founders spend most their money.
The last paragraph is where the tricky and valuable parts are, and also where AI isn't super helpful today, and where you as a human can actually help out a lot if you're just 10% better than the rest of the "engineers" who only want to ship as fast as possible.
This reminds me of the idea that LLMs are simulators. Given the current state (the prompt + the previously generated text), they generate the next state (the next token) using rules derived from training data.
As simulators, LLMs can simulate many things, including agents that exhibit human-like properties. But LLMs themselves are not agents.
This perspective makes a lot of sense to me. Still, I wouldn't avoid anthropomorphization altogether. First, in some cases, it might be a useful mental tool to understand some aspect of LLMs. Second, there is a lot of uncertainty about how LLMs work, so I would stay epistemically humble. The second argument applies in the opposite direction as well: for example, it's equally bad to say that LLMs are 100% conscious.
On the other hand, if someone argues against anthropomorphizing LLMs, I would avoid phrasing it as: "It's just matrix multiplication." The article demonstrates why this is a bad idea pretty well.
Problem no. 2 (Understanding user intent) is relevant not only to writing SQL but also to software development in general. Follow-up questions are something I had in mind for a long time. I wonder why this is not the default for LLMs.
At the beginning, the article mentions correlation with language skills AND problem-solving. Focusing only on language skills in the second half is misleading. According to the abstract of the original paper, problem solving and working memory capacity were FAR MORE important.
Also, the article doesn't mention "math skills". It talks about numeracy, which is defined in a cited paper as "the ability to understand, manipulate, and use numerical information, including probabilities". This is only a very small part of mathematics. I would even argue that mathematics involves a lot of problem solving and since problem solving is a good predictor, math skills are good predictor.
Going further, it seems like Language Aptitude was primarily significant in explaining variance in learning rate, measured by how many Codecademy lessons they completed in the allotted time, but wasn't explanatory for learning outcomes based on writing code or answering multiple-choice questions.
Seeing as Codecademy lessons are written in English, I would think this may just be a result of participants with higher Language Aptitude being faster readers.
I do think that language skills are undervalued for programming, if only for their impact on your ability to read and write documentations or specifications, but I'm not sure this study is demonstrating that link in a meaningful way.
Hopefully you'll forgive my ignorance, but this is the first time I hear about DuckDB. What space does it occupy in the DBMS landscape? What are its use cases? How does it compare to other DBMS solutions?
Hi, DuckDB devrel here. DuckDB is an analytical SQL database in the form factor of SQLite (i.e., in-process). This quadrant summarizes its space in the landscape:
It works as a replacement / complementary component to dataframe libraries due to it's speed and (vertical) scalability. It's lightweight and dependency-free, so it also works as part of data processing pipelines.
Hello, I'd love to use this but I work with highly confidential data. How can we be sure our data isn't leaking with this new UI? What assurances are there on this, and can you comment on the scope of the MotherDuck server interactions?
- I started to see LLMs as a kind of search engines. I cannot say they are better than traditional search engines. On one hand, they are better at personalizing the answer, on the other hand, they hallucinate a lot.
- There is a different view on how new scientific knowledge is made. It's all about connecting existing dots. Maybe LLMs can assist with this task by helping scientists discover relevant dots to connect. But as the author suggests, this is only part of the job. To find the correct ways to connect the dots, you need to ask the right questions, examine the space of counterfactuals, etc. LLMs can be useful tool, but they are not autonomous scientists (yet).
- As someone developing software on top of LLMs, I am slowly coming to a conclusion that human-in-the-loop approaches seem to work better than fully autonomous agents.
Instead of connecting language with physical existence, or entities, it's connecting tokens.
An LLM may be able to describe scenes in a video, but a model would tell you that said video is a deep fake because of some principle like conservation of energy and mass informed by experience, assumptions, inference rules, etc.
Of course, this problem is more general than just improving the output of LLM coding tools