But these models are more like generalists no? Couldn’t they simply be hooked up...

roywiggins · 2025-11-15T14:27:12 1763216832

There would be no point in going via an LLM then, if I had a specialist model ready I'd just invoke it on the images directly. I don't particularly need or want a chatbot for this.

chrischen · 2025-11-16T18:36:33 1763318193

Current LLMs are doing this for coding, and it's very effective. It delegates to tool calls, but a specialized model can just be thought of as another tool. The LLM can be weak in some stuff handled by simple shell scripts or utilities, but strong in knowing what scripts/commands to call. For example, doing math via the model natively may be inaccurate, but the model may know to write the code to do math. An LLM can automate a higher level of abstraction, in the same way a manager or CEO might delegate tasks to specialists.

roywiggins · 2025-11-17T01:59:01 1763344741

In this case I'm building a batch workflow: images come in, images get analyzed through a pipeline, images go into a GUI for review. The idea of using a VLM was just to avoid hand-building a solution, not because I actually want to use it in a chatbot. It's just interesting that a generalist model that has expert-level handwriting recognition completely falls apart on a different, but much easier, task.