Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Am I the only one who is bothered by calling this phenomenon "hallucinating"?

It's marketing-speak and corporate buzzwords to cover for the fact that their LLMs often produced wrong information because they aren't capable of understanding your request, nuance, or the training data it used is wrong, or the model just plain sucks.

Would we tolerate such doublespeak it were anything else? "Well, you ordered a side of fries with your burger but because our wait staff made a mistake...sorry, hallucinated, they brought you a peanut butter sandwich that's growing mold instead."

It gets more concerning when the stakes are raised. When LLMs (inevitably) start getting used in more important contexts, like healthcare. "I know your file says you're allergic to penicillin and you repeated when talking to our ai-doctor but it hallucinated that you weren't."



Human beings regularly hallucinate details that aren’t real when asked to provide their memories of an event, and often don’t realize they’re doing it at all. So whole AI definitely is lacking in the “can assess fact versus fiction” department, that’s an overlapping problem with “invents things that aren’t actually real”. It can, today, hallucinate accurate and inaccurate information, but it can’t determine validity at all, so it’s sometimes wrong even when not hallucinating.


I can't stand it being called "hallucinating" because it anthropomorphizes the technology. This isn't a conciousness that is "seeing" things that don't exist: it's a word generator that is generating words that don't make sense (not in a syntactic sense, but in a semantic sense).

Calling it "hallucination" implies that there are (other) moments when it is understanding the world correctly -- and that itself is not true. At those moments, it is a word generator that is generating words that DO make sense.

At no point is this a conciousness, and anthropomorphizing it gives the impression that it is one.


This. It's not "hallucination", it's "error".


It isn't an error, either. It's doing exactly what it's intended to, exactly as it's intended to do it. The error is in the human assumption that the ability to construct syntactically coherent language signals self-awareness or sentience. That it should be capable of understanding the semantics correctly, because humans obviously can.

There really is no correct word to describe what's happening, because LLMs are effectively philosophical zombies. We have no metaphors for an entity that can appear to hold a coherent conversation, do useful work and respond to commands but not think. All we have is metaphors from human behavior which presume the connection between language and intellect, because that's all we know. Unfortunately we also have nearly a century of pop culture telling us "AI" is like Data from Star Trek, perfectly logical, superintelligent and always correct.

And "hallucination" is good enough. It gets the point across, that these things can't be trusted. "Confabulation" would be better, but fewer people know it, and it's more important to communicate the untrustworthy nature of LLMs to the masses than it is to be technically precise.


> It isn't an error, either. It's doing exactly what it's intended to, exactly as it's intended to do it.

If the output is incorrect, that's error. It may not be a bug, but it is still error.


Calling it an error implies the model should be expected to be correct, the way a calculator should be expected to be correct. It generates syntactically correct language, and that's all it does. There is no "calculation" involved, so the concept of an "error" is meaningless - the sentences it creates either only happen to correlate to truth, or not, but it's coincidence either way.


> Calling it an error implies the model should be expected to be correct

To a degree, people do expect the output to be correct. But in my view, that's orthogonal to the use of the term "error" in this sense.

If an LLM says something that's not true, that's an erroneous statement. Whether or not the LLM is intended or expected to produce accurate output isn't relevant to that at all. It's in error nonetheless, and calling it that rather than "hallucination" is much more accurate.

After all, when people say things that are in error, we don't say they're "hallucinating". We say they're wrong.

> It generates syntactically correct language, and that's all it does.

Yes indeed. I think where we're misunderstanding each other is that I'm not talking about whether or not the LLM is functioning correctly (that's why I wouldn't call it a "bug"), I'm talking about whether or not factual statements it produces are correct.


That's one hell of a coincidence if it just "happens" to write syntactically correct code that does what the user asked, for example.


It is.

It's a language model, trained on syntactically correct code, with a data set which presumably contains more correct examples of code than not, so it isn't surprising that it can generate syntactically correct code, or even code which correlates to valid solutions.

But if it actually had insight and knowledge about the code it generated, it would never generate random, useless (but syntactically correct) code, nor would it copy code verbatim, including comments and license text.

It's a hell of a trick, but a trick is what it is. The fact that you can adjust the randomness in a query should give it away. It's de rigueur around here to equate everything a human does with everything an LLM does, including mistakes, but human programmers don't make mistakes the way LLMs do, and human programmers don't come with temperature sliders.


It's not surprising if it generated syntactically correct code that does random things.

The fact that it instead generates syntactically correct code that, more often than not, solves - or at least tries to solve - the problem that is posited, indicates that there is a "there" there, however much one talks about stochastic parrots and such.

As for temperature sliders for humans, that's what drugs are in many ways.


> Would we tolerate such doublespeak it were anything else?

Yes: identity theft. My identity wasn't "stolen", what really happened was a company gave a bad loan.

But calling it identity theft shifts the blame. Now it's my job to keep my data "safe", not their job to make sure they're giving the right person the loan.


You're not the only one. I will continue to fight the losing battle for "confabulation" for as long as the problem remains current.


I don't get this at all. "Hallucinate" to me only can mean "produce false information". I've only ever seen it used perjoratively re: AI, and I don't understand what it covers up- how else are people interpreting it? I could see the point if you were saying that it implies sentience that isn't there, but your analogy to a restaurant implies that's not what you're getting at.


I think people are much more conservative with their health than text generation. If the text looks funky, you can just try regenerating it, or write it yourself and have only lost a few minutes. If your health starts looking funky, you're kind of screwed.


To me it sounds pretty damning. "The tool hallucinates" makes me think it's completely out of touch with reality, spouting nonsense. While "It has made a mistake, it is factually incorrect" would apply to many of my comments if taken very literally.

Webster definition: "a sensory perception (such as a visual image or a sound) that occurs in the absence of an actual external stimulus and usually arises from neurological disturbance (such as that associated with delirium tremens, schizophrenia, Parkinson's disease, or narcolepsy) or in response to drugs (such as LSD or phencyclidine)".

I would fire with prejudice any marketing department that associated our product with "delirium tremens, schizophrenia, [...] LSD or phencyclidine".


Nonsense. It isn't marketing speak to cover for anything. It's a pretty good description of what is happening.

The reason models hallucinate is because we train them to produce linguistically plausible output, which usually overlaps well with factually correct output (because it wouldn't be plausible to say e.g. "Barack Obama is white"). But when there isn't much data to show that something that is totally made up is implausible then there's no penalty to the model for it.

It's nothing to do with not being able to understand your request, and it's rarely because the training data is wrong.


"Hallucinate" is definitely marketing.

it translates to "Creates text which contains incorrect or invalid information"

The latter just doesn't sound as good in headlines/articles/tutorials (eg. marketing material).


We already have words for when a computer program produces unexpected/incorrect output: “defect” and “bug”


The weird thing is, it’s not a bug of software, it’s a limitation.

The software is working as designed, statistics are just imperfect


So if I replied to your comment with "you are incorrect" I would be putting you in a worse light than saying "you are hallucinating"? The second is making it sound better? Doesn't feel that way to me.


My problem with "hallucination" isn't that it makes error sound better or worse, it's that it makes it sound like there's a consciousness involved when there isn't.


It's definitely not marketing. It has been in use for a lot longer than LLMs existed.


Links?

Also those two statements are not mutually exclusive.

Errors in statistical models being called hallucinations in the past does not mean that term is not marketing speak for what I said earlier.


Here's an example from 2019.

https://www.youtube.com/watch?v=wRDfzjxzj3M

> Also those two statements are not mutually exclusive.

> Errors in statistical models being called hallucinations in the past does not mean that term is not marketing speak for what I said earlier.

The implicit claim was that they call this hallucination because it sounds better. In other words that some marketing people thought "what's a nicer word for 'mistakes'?" That is categorically untrue.

I don't think there's any point arguing about whether or not the marketers like the use of the word "hallucinate" because neither of us has any evidence either way. Though I was also say the null hypothesis is that they're just using the standard word for it. So the onus is on you to provide some evidence that marketers came in an said "guys, make sure you say 'hallucinate'". Which I'm 99% sure has never happened.


It's a term of art from the days of image recognition AI that would confidently report seeing a giraffe while looking at a picture of an ambulance.

It doesn't feel right to me either, to use it in the context of generative AI, and I'd support renaming this behaviour in GenAI (text and images both) — though myself I'd call this behaviour "mis-remembering".

Edit: apparently some have suggested "delusion". That also works for me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: