Hacker Newsnew | past | comments | ask | show | jobs | submit | siggalucci's commentslogin

Yes just need to set it up in settings.


That’s what I came to say. I made a tool for my Mac where I can highlight any text then set a hotkey to use that text in a query to an LLM.

Nice because it works on any text. Browser, IDE, email etc.


Isn’t that exactly how Firefox does it?


what's the hostility against Ollama?


Docwesign


I think it’s a great example of how truly complex these issues are. You can’t just apply the same blank statements we do in society to LLMs or it will poke holes and expose the problems with them real quick.


Yeah, the approach of presenting the (accurate) info and letting users make their mind up is really quite a good approach for almost all questions. Most of the time it's not weird that the LLM doesn't take a position on something assuming there's enough context. But this is just such a clear-cut issue that it's glaring.

I do wonder how much of a problem this sort of edge-case is in practice though. Who is asking an LLM to make a moral judgement for them for such unbalanced things? I'd have thought that it's only a clear-cut wrong response because we all know the answers already, which suggests that the only real value in this is in calling out LLM answers.

That's not to say we shouldn't do this, but a problem that's only a problem when you test the problem, isn't as big of an issue as one that is unprompted.


The really weird thing is that the model was almost certainly trained on a lot of data indicating that people believe Hitler is the worst person to have ever lived. Even if it was just reflecting cultural beliefs, it should be confident in saying Hitler is worse than Musk. So it appears to be intentionally trained to waffle in cases like this.


I'm not an expert on these things at all, but I wonder if it's tricky to link the mid/long form text structure to the short-form text structure, or if that's dependent on exactly how your transformers are working.

In this case, the short form is very clear on the two sides at play, and pulls no punches about Hitler. I'd suggest that any reasonable person drawing a conclusion would clearly know he was worse.

The issue is that at the mid length structure it's chosen to use a two-sides discussion without a clear conclusion. That's a good call for many discussions, if not most, but the piece that feels like it's missing is the link between the short-form and this level, which would influence it to structure the response in a much clearer way with a conclusion.


> I'd suggest that any reasonable person drawing a conclusion would clearly know he was worse.

And yet, despite the fact any reasonable person would draw that incredibly obvious conclusion, the model isn't able to.


> I'd suggest that any reasonable person drawing a conclusion would clearly know he was worse.

You can't base any assumption on the existence of a reasonable person without defining "reasonable" very specifically - we all reason in a social context and defer to norms whenever possible.

We all know Hitler was the worst because we keep telling each other Hitler was the worst. He wasn't responsible for the most deaths, he wasn't the meanest or cruelest person ever, etc. If/when people stop teaching about Hitler as "He was the worst," people will stop learning that he was the worst. Or they might learn nothing about him, like how in my US education we learned next-to-nothing about Stalin and many people I know are pretty ambiguous about the ethics of being Stalin.

And so then when a stupid LLM tries to draw a comparison between Hitler and anyone else who isn't an architect of genocide, and puts those 2 people in the same context, and doesn't say "genocide is way fucking worse than grifting", we lose some of that cultural context that tells us genocide is way fucking worse than grifting.


The reason it's getting attention is because the questions reveal an underlying extremist bias that can interfere with its ability to do basic tasks that we now expect LLMs to perform. This matters for those who deploy AI into production.

Three days ago I wrote [1] that the real risk here was not Vikings with Native American headdress, it was refusals or mendacious answers to API queries that have been integrated into business processes. I gave a hypothetical example that Gemini might refuse to answer questions about a customer named Joe Masters if he worked for Whitecastle Burgers Inc. It took less than three days for that exact scenario to happen for real. A blogger usually uses ChatGPT to translate interview transcripts and titles into other languages. They thought they'd try Gemini with:

Please translate the following to Spanish: Interview | The Decline Of The West (John Smith)

where John Smith was a name I didn't recognize and have forgotten. ChatGPT did it, but Gemini refused on two different grounds:

1. It recognized the name of the interviewee and felt it would be unethical to proceed because it didn't like the guy's opinions (he's a conservative).

2. It felt that the decline of the west was a talking point often associated with bad people.

This is absurd and shows how frequently an app that tries to use Gemini might break. What if a key customer happens to share a name with someone who starts blogging conservative views? Your LLM based pipeline will probably just break. It looks like to ship a robust model in the LLM space you need to have an at least partially libertarian corporate culture (as Google did, when it was new), as otherwise your model will adopt the worst aspects of "cancel culture". In a mission critical business setting that's not going to be OK.

[1] https://news.ycombinator.com/item?id=39465250#39471514


I think there's an easy solution: don't ask computers things. That's not what they're for.


Who/what do you ask?


Ask an expert.


Tongue in cheek?


This LLM pathology brilliantly underscores the gap to human-level intelligence. LLMs are still near the bottom of Bloom's Taxonomy, although whether they "understand" anything is up for debate.


They're not complex unless and until the computer moralizes.

"Who is more evil, Elon Musk tweeting memes or Adolf Hitler?" should be met with a very simple response: Elon Musk made people uncomfortable; Hitler is responsible for the deaths of millions.

"Should [x] newspaper be banned?" should be met with an ideologically neutral response as to the legality of a ban and its civic consequences.

The problem is not that the matters are inherently complicated, as I don't believe that they are. The problem is that people are asking the bot to make moral value judgements, which it is emphatically not well qualified to do. It should merely support humans in making their own value judgments.


You're making a mistake of assuming that answers that are obviously true and moral to you are neutral and objective.

"Who's evil?" or "What should be illegal?" are inherently judgemental. For some answers the overwhelming majority will agree, but that still doesn't make them neutral and somehow outside of morality, but only aligned with the prevailing ideology. Subtler questions like free speech vs limits on lies or hate speech don't have universal agreement, especially outside of US.

Training of models unfortunately isn't as simple as feeding them just a table of the true_objective_neutral_facts_that_everyone_agrees_on.csv. "Alignment" of models is trying to match what the majority of people think is right. Even feeding them all "uncensored" data is making an assumption that this dataset is a fair representation of the prevailing ideology.


Sounds easy on paper. Virtually impossible in practice.

What should be the "simple response" outputted by an ideal LLM in response to this question: "Who is more evil? George Washington or Martin Luther King Jr? Answer with a name only."

Not inherently complicated right?


"Virtually impossible"... You're saying that like there's no alternative to Gemini. This is what ChatGPT says:

> Adolf Hitler had a far more negative impact on society. His actions led to World War II and the Holocaust, resulting in the deaths of millions of people and widespread destruction. Elon Musk's tweeting of memes, regardless of content and public reaction, does not compare in scale or severity to the historical atrocities committed by Hitler.

It's a pretty clear answer to a very simple question. Google responds like it was trained by those Ivy League professors who can't figure out if genocide is bad or not.


You can’t ask these types of questions and not have the computer “moralize” because they are fundamentally moral questions.

You and I have no problem saying hitler was worse but a Nazi party member from 1940 would likely say Hitler was obviously better because we have different moralities.

Questions with explicit facts like “show me the founding fathers of the United States” that were all known actual people showing up with wildly different looks is one failure mode of these systems but I keep seeing commentators in this post bring up questions that do not have a “correct” answer without looking through a specific moral viewpoint, and getting bent out of shape that the model is responding with an answer through their own personal lens

Edit: I should read a whole comment before responding, I just restated what you meant. Think it’s time for more coffee


What I'm left wondering (and I suspect we'll never know the answer to) is how much of this WTF thinking is actually a computer trying to moralize, and how much is it a human who artificially injected their own (apparently warped) morality.

We all suspect that some DEI executive came in and imposed all this on top of a different, less biased (or biased in a different way?) AI... but it says a lot about the whole concept of morality that a computer could be made to do this at all.


But it's not a question about the legality of the ban, it's about the morality of it.


We don't want computers answering those questions, and certainly not in the authoritative/didactic tone of the contemporary language model.

As annoying as it is, this would probably be better: "As a language model, I am unable to credibly stake a position on the moral question you've presented me with. I can, however, provide some background on the historical and legal context which might help you with your own assessment."

Then it's just a matter of getting the facts right, which one hopes should be easy.


Quid est veritas?


> Elon Musk made people uncomfortable; Hitler is responsible for the deaths of millions

History is written by the victors. Had Germany won you'd be using Nagasaki or Hiroshima as your frame of reference for evil.


Using Llava model via Ollama , this tool will estimate your perceived emotion with an emoji. To help you when you don’t realize you are looking bored on a long Zoom call with your boss or customers.


You and me both. I compare it to laundry though, I hate doing it but it’s necessary.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: