I can't stand it being called "hallucinating" because it anthropomorphizes the technology. This isn't a conciousness that is "seeing" things that don't exist: it's a word generator that is generating words that don't make sense (not in a syntactic sense, but in a semantic sense).
Calling it "hallucination" implies that there are (other) moments when it is understanding the world correctly -- and that itself is not true. At those moments, it is a word generator that is generating words that DO make sense.
At no point is this a conciousness, and anthropomorphizing it gives the impression that it is one.
It isn't an error, either. It's doing exactly what it's intended to, exactly as it's intended to do it. The error is in the human assumption that the ability to construct syntactically coherent language signals self-awareness or sentience. That it should be capable of understanding the semantics correctly, because humans obviously can.
There really is no correct word to describe what's happening, because LLMs are effectively philosophical zombies. We have no metaphors for an entity that can appear to hold a coherent conversation, do useful work and respond to commands but not think. All we have is metaphors from human behavior which presume the connection between language and intellect, because that's all we know. Unfortunately we also have nearly a century of pop culture telling us "AI" is like Data from Star Trek, perfectly logical, superintelligent and always correct.
And "hallucination" is good enough. It gets the point across, that these things can't be trusted. "Confabulation" would be better, but fewer people know it, and it's more important to communicate the untrustworthy nature of LLMs to the masses than it is to be technically precise.
Calling it an error implies the model should be expected to be correct, the way a calculator should be expected to be correct. It generates syntactically correct language, and that's all it does. There is no "calculation" involved, so the concept of an "error" is meaningless - the sentences it creates either only happen to correlate to truth, or not, but it's coincidence either way.
> Calling it an error implies the model should be expected to be correct
To a degree, people do expect the output to be correct. But in my view, that's orthogonal to the use of the term "error" in this sense.
If an LLM says something that's not true, that's an erroneous statement. Whether or not the LLM is intended or expected to produce accurate output isn't relevant to that at all. It's in error nonetheless, and calling it that rather than "hallucination" is much more accurate.
After all, when people say things that are in error, we don't say they're "hallucinating". We say they're wrong.
> It generates syntactically correct language, and that's all it does.
Yes indeed. I think where we're misunderstanding each other is that I'm not talking about whether or not the LLM is functioning correctly (that's why I wouldn't call it a "bug"), I'm talking about whether or not factual statements it produces are correct.
It's a language model, trained on syntactically correct code, with a data set which presumably contains more correct examples of code than not, so it isn't surprising that it can generate syntactically correct code, or even code which correlates to valid solutions.
But if it actually had insight and knowledge about the code it generated, it would never generate random, useless (but syntactically correct) code, nor would it copy code verbatim, including comments and license text.
It's a hell of a trick, but a trick is what it is. The fact that you can adjust the randomness in a query should give it away. It's de rigueur around here to equate everything a human does with everything an LLM does, including mistakes, but human programmers don't make mistakes the way LLMs do, and human programmers don't come with temperature sliders.
It's not surprising if it generated syntactically correct code that does random things.
The fact that it instead generates syntactically correct code that, more often than not, solves - or at least tries to solve - the problem that is posited, indicates that there is a "there" there, however much one talks about stochastic parrots and such.
As for temperature sliders for humans, that's what drugs are in many ways.
Calling it "hallucination" implies that there are (other) moments when it is understanding the world correctly -- and that itself is not true. At those moments, it is a word generator that is generating words that DO make sense.
At no point is this a conciousness, and anthropomorphizing it gives the impression that it is one.