I can't stand it being called "hallucinating" because it anthropomorphizes the t...

JohnFen · on March 18, 2024

This. It's not "hallucination", it's "error".

krapp · on March 18, 2024

It isn't an error, either. It's doing exactly what it's intended to, exactly as it's intended to do it. The error is in the human assumption that the ability to construct syntactically coherent language signals self-awareness or sentience. That it should be capable of understanding the semantics correctly, because humans obviously can.

There really is no correct word to describe what's happening, because LLMs are effectively philosophical zombies. We have no metaphors for an entity that can appear to hold a coherent conversation, do useful work and respond to commands but not think. All we have is metaphors from human behavior which presume the connection between language and intellect, because that's all we know. Unfortunately we also have nearly a century of pop culture telling us "AI" is like Data from Star Trek, perfectly logical, superintelligent and always correct.

And "hallucination" is good enough. It gets the point across, that these things can't be trusted. "Confabulation" would be better, but fewer people know it, and it's more important to communicate the untrustworthy nature of LLMs to the masses than it is to be technically precise.

JohnFen · on March 18, 2024

> It isn't an error, either. It's doing exactly what it's intended to, exactly as it's intended to do it.

If the output is incorrect, that's error. It may not be a bug, but it is still error.

krapp · on March 18, 2024

Calling it an error implies the model should be expected to be correct, the way a calculator should be expected to be correct. It generates syntactically correct language, and that's all it does. There is no "calculation" involved, so the concept of an "error" is meaningless - the sentences it creates either only happen to correlate to truth, or not, but it's coincidence either way.

JohnFen · on March 19, 2024

> Calling it an error implies the model should be expected to be correct

To a degree, people do expect the output to be correct. But in my view, that's orthogonal to the use of the term "error" in this sense.

If an LLM says something that's not true, that's an erroneous statement. Whether or not the LLM is intended or expected to produce accurate output isn't relevant to that at all. It's in error nonetheless, and calling it that rather than "hallucination" is much more accurate.

After all, when people say things that are in error, we don't say they're "hallucinating". We say they're wrong.

> It generates syntactically correct language, and that's all it does.

Yes indeed. I think where we're misunderstanding each other is that I'm not talking about whether or not the LLM is functioning correctly (that's why I wouldn't call it a "bug"), I'm talking about whether or not factual statements it produces are correct.

int_19h · on March 19, 2024

That's one hell of a coincidence if it just "happens" to write syntactically correct code that does what the user asked, for example.

krapp · on March 19, 2024

It is.

It's a language model, trained on syntactically correct code, with a data set which presumably contains more correct examples of code than not, so it isn't surprising that it can generate syntactically correct code, or even code which correlates to valid solutions.

But if it actually had insight and knowledge about the code it generated, it would never generate random, useless (but syntactically correct) code, nor would it copy code verbatim, including comments and license text.

It's a hell of a trick, but a trick is what it is. The fact that you can adjust the randomness in a query should give it away. It's de rigueur around here to equate everything a human does with everything an LLM does, including mistakes, but human programmers don't make mistakes the way LLMs do, and human programmers don't come with temperature sliders.

int_19h · on March 19, 2024

It's not surprising if it generated syntactically correct code that does random things.

The fact that it instead generates syntactically correct code that, more often than not, solves - or at least tries to solve - the problem that is posited, indicates that there is a "there" there, however much one talks about stochastic parrots and such.

As for temperature sliders for humans, that's what drugs are in many ways.