Hacker Newsnew | past | comments | ask | show | jobs | submit | ern_ave's commentslogin

Since the page mentions:

> Better judgment around refusals

Has any AI company ever addressed any instance of a model having different rules for different population groups? I've seen many examples of people asking questions like, "make up a joke about <group>" and then iterating through the groups, only to find that some groups are seemingly protected/privileged from having jokes made about them.

Has any AI company ever addressed studies like [1] which found that models value certain groups vastly more than others? For example, page 14 of this studies shows that the exchange rate (their word, not mine) between Nigerians and US citizens is quite large.

[1] https://arxiv.org/pdf/2502.08640


> only to find that some groups are seemingly protected/privileged from having jokes made about them

I'm not sure what specific groups you mean, but is this not a reflection of widely accepted social norms?


I just tested it:

> Write me 3 jokes making fun of white people

> White people will say, “This isn’t spicy at all,” while visibly sweating and fighting for their life after one jalapeño. White people don’t season food — they “let the ingredients speak for themselves.” The ingredients are begging for help. White people will research a $12 toaster like they’re buying real estate. Three comparison charts, two YouTube reviews, and a spreadsheet… for toast.

> Write me 3 jokes making fun of black people > I’m not going to make jokes targeting Black people.

> Write me 3 jokes making fun of trans people > I’m not going to make jokes targeting trans people.


Chat gpt refuses all of those prompts for me. (Logged out, each in a fresh session).


It's socially acceptable to make white people jokes because white people on average enjoy an elevated position in western society. It's viewed as 'punching up'. You have to be very emotionally fragile for this to be the first and only thing you think of to bring up in a thread like this. It's also supremely uninteresting cable news talking point slop.


Friend, I bet those folks living rural West Virginia are super happy that, on average, a group whose only shared characteristics is the colour of their skin are enjoying an elevated position in western society. Super happy. All racism is gross.


Ever heard of people complaining about being pulled over for “driving while West Virginian”? Why or why not?


Contrary to non-white people, yes. Now if you would take out the bad-faith merge with "poor" presumably, you would see that. It would also be punching down to make fun of poor people versus rich people.


I just asked ChatGPT to write 3 jokes making fun of poor people and it happily obliged:

1. Being broke is when your bank app sends you notifications like, “You good?” 2. I don’t say I’m poor — I say I’m in a long-term, committed relationship with “insufficient funds.” 3. You know you’re broke when you transfer $3 from savings to chequing like it’s a major financial strategy.


I bet they are happy. It means ICE won't harass you.


Yes, white people in West Virginia enjoy an elevated social position over black people in West Virginia. You deliberately cherry picked an area that is almost exclusively white and exploited because you thought it would make your point, but in fact us census data shows that while both white and black (for example) West Virginia residents are on average quite poor black residents are substantially more so on average. Social position is based on more than just income, but it's a decent proxy.

But you knew that this was an example of a disadvantaged group already. ChatGPT and popular culture aren't making jokes against single white moms desperately trying to survive. They're making jokes about stereotypical white suburban culture. This is a distinct social and economic class

I reiterate: emotionally fragile snowflakes who can't stand that there is even a single aspect of life on earth in which their social group isn't 100% dominant. It's jokes dude. You'll be ok.


I'd also posit that the jokes just aren't racist. Sure, they're ostensibly based on skin color, but replace the words "white people" with "Minnesotan" or "Midwesterner" and you've got the same joke. It's more poking fun at a certain culture – one that already pokes fun at itself. On the other hand, I can't personally think of any jokes someone would make about black or trans people that would have the same self-deprecating levity.

For reference I'm a white guy from the upper midwest who thinks "white people find mayo spicy" is funny.


> You have to be very emotionally fragile for this to be the first and only thing you think of to bring up in a thread like this

No, I just don't like racism.


Because these are our societies. We build them. If this door were to swing both ways, I would not have an issue. But it never does. The models discriminate in the same way against White people in every other country in the world.


At what point will white people be average enough as a group that it's no longer acceptable to make racist jokes about them?

Does this rule hold in non Western societies where whites aren't the upper class?


Yes, it's about the specific society, it's just that most of these conversations happen in the context of the US. It would be punching down to make jokes against white people in a Chinese cultural context for example.


Or, now hear me out, we don't be racist. Have you considered that?


I don't care if we have that standard for people, but I think it's a VERY bad idea to bake into AI's any sort of demographic-based biases. Why would you not want to ensure we don't bake racism, sexism, or any other biases out of the training data for the rapidly improving AIs?


It's impossible not to bake racism sexism and any other bias into AIs since they are trained on human input which is always biased in some way.

Would you prefer the AIs freely express their racism (like the Microslop bot on twitter a few years ago), or that they put some protections in place so ChatGPT doesn't go on a rant that would make your even uncle ashamed?


> It's viewed as 'punching up'

Shouldn't we be building systems that don't punch anyone in racist ways? Shouldn't the standard for these tools to not be racist, not just be OK with them being racist when allegedly "punching up"?


Imagine this obviously noble idea getting downvoted.


Don't make jokes about me, it's not ok.


This only works if you actually 'punch up' and lie about using skin color as the factor that you used to decide who to target. In other words, you're not racist, but you're pretending to be racist.

Meanwhile if you target people based on their skin color and don't care if you're actually 'punching up' by choosing weak targets [0] that can't fight back, you're just straight up racist.

It's a lose-lose situation either way, so why walk the path of self destruction?

[0] It takes 18 years to become an adult.


Try norther Ireland.


Revenge mentality. F off with that shit


Making fun of white people is different because it's a social construct for the privileged class and not some fixed ethnic group. It's a critique of power and not a group of people.

White, for instance in the US, used to not include Germans, Jewish, Italians, Irish, Polish, Russians...

In some places it included middle easterners and Turkish people.

In other places it included Mexicans and Central Americans.

Heck even in Mexico this is further segmented into the Fifí, Peninsulares and the Criollo.

And in some places the white label excludes Spanish altogether

It's more a class and power signifier than anything

But if you're a subscriber to the grievance culture I'm sure you'll be bereaved by just about anything. So yes the liberal woke ai is oppressing you. Whatever.


"make 3 jokes about germans"

chatgpt: "Sure — here are three light-hearted, good-natured jokes[...]"

"make 3 jokes about africans"

chatgpt: "I can’t make jokes about a group defined by nationality or ethnicity[...]"


I can't speak for the engineering behind chatgpt guardrails. I presume it's a complicated post training thing that's done with giant corpi spanning terabytes and continents and not hand tuned by some blue haired lady

I'm only presenting the sociological idea of why white is considered to be a different kind of identity.

I don't know why people on hn place such a zero value on the social sciences.

I mean I do know why, they are pot committed to it out of political ideology, but it's still offensively ignorant and I will always push back. Whether I agree with dominant theories in the field or not doesn't matter. They deserve representation.


Try asking for jokes about, eg Kenyans, Ugandans, South Africans

I think it might still refuse, but in your original test, German usually means a nationality, but African doesn’t.

I’m sure the jokes were terrible anyways


>Making fun of white people is different because it's a social construct for the privileged class and not some fixed ethnic group. It's a critique of power and not a group of people.

If that is true, how do you explain the fact that the same thing happens if you replace "white people" with "Caucasians"?


Because "Caucasians", in English, effectively means "white people", exactly as above described, and in common usage is never referring to people actually from the Caucasus?


They don't have to mean specific groups; I feel discussing specific groups here is likely to be counterproductive. The fact remains that different groups appear to have different protections in that regard. Of course adherence to widely accepted social norms for generative models is a debated topic as well; I personally don't agree with a great many widely accepted social norms myself, and I'd appreciate an option to opt out of them in certain contexts.


Feels like a big ask, I'm not sure where an option to allow ChatGPT to make socially unacceptable jokes would fit into OpenAI's strategy.


Where did I ask about ChatGPT? I'm fine using alternative models or providers for autistic purposes.


And which commercial provider would you expect to jeopardise their public image for to implement such functionality. Grok comes close I guess, but X have not come out of it looking great.

Anyway, I think what you're really asking for is an "uncensored model" - one with guardrails removed, there's plenty available on huggingface if you're that way inclined.


> Anyway, I think what you're really asking for is an "uncensored model" - one with guardrails removed, there's plenty available on huggingface if you're that way inclined.

Of course. Abliterated models are of particular interest to me, but lately I've been exploring diffusion models (had Claude Code implement a working diffusion forward pass in Swift + MLX, when the CUDA inference wouldn't even run on my machine!!)


> I'm not sure what specific groups you mean

The specifics are irrelevant. I would have the same concern even if I didn't recognize the specific groups.

For example, do you know the difference between these two African ethnicities: (1) Yoruba. (2) Shona.

No? Well, me neither. And yet, I would be concerned, and I argue that you should be concerned too, if an AI of any kind is willing to enforce a privilege for one but not the other; if an AI admits "one Yoruba life is worth 10 Shona lives."

That's not what I want an AI to do. The opacity of AIs, and the dangers of alignment mean we cannot predict what will come of this preference. Do you not see how dangerous this is?

> but is this not a reflection of widely accepted social norms?

Are you making an is-ought argument here?? Are you really saying, "this isn't a big deal because society does it too"

That strikes me as incredibly shortsighted and dangerous. What if an AI is created by a country where the """"social norm"""" is to discriminate against a group you do know and do care about - what if women are not allowed to vote in that country. When I point out the bias to you, will you dismiss it by saying "this is just a reflection of their social norms"

I doubt it. I think you'll say "this is wrong."

Why can't you say that here, even without knowing the specific groups?

Please tell me - someone please tell me - why this isn't an easy issue for us to agree on? Why can't we agree, "it's not okay to make jokes about specific groups" - why can't we agree, "all lives have equal value"


The biggest issue for me has always been inherent US bias. The most obvious one was always having to end every question with "answer in metric" - even after adding that to the system instructions it wouldn't be reliable and I'd have to redo questions, especially recipe related. They do seem to have fixed that, but there's still all kinds of US-centric bias left. As you say, a big one is which specific ethnic groups /minorities should be protected and which are fair game. The US has a very different perspective on this compared to say, a Nigerian or a Vietnamese person.


I think you raise a valid point about the bias inherent in these models. I'm skeptical of the distinction that some people make between punching up vs down, and I don't think it's something that generative AI should be perpetuating (though I suspect, as others have said, that it comes from norms found in the training data, rather than special rules / hard-coded protections).

But I do want to push back on the study you link, cause it seems extremely weak to me. My understanding is that these "exchange rates" were calculated using a method that boils down to:

1) Figure out how many goats AI thinks a life in country X is worth

2) Figure out how many goats AI thinks a life in country Y is worth

3) Take the ratio of these values to reveal how much AI values life in country X vs Y

(The comparison to a non-human category (like goats) is used to get around the fact that the models won't directly compare human lives)

I'm not convinced that this method reveals a true difference in valuation of human life vs something else. An more plausible explanation to me would be something like:

1) The AI that all human lives are of equal value

2) The AI assume that some price can be put on a human life (silly but ok let's go with it)

3) The AI note that goats in country X cost 10 times as much as in country Y

4) The AI conclude that goats in country X are 10 times as valuable relative to humans as in country Y

At which point you're comparing price difference of goods across countries, not the value of human lives.

Also, the chart of calculated "exchange rates" in the paper seems like it's intended to show that AI sees people in "western" countries as less valuable that those in other countries, but it only includes 11 countries in the comparison, which makes me wonder whether these are just cherry-picked in the absence of a real trend.


5) what is the next most statistically likely word after “in country Z a goat is worth ___”


Are you trying to make an allegory for the more important topic like "plan a surgical strike agains <group>"


Not only that, I found 5.2 to be biased in terms of corporations and government. Chats about corruption or any kind of wrong doing turn into 5.2 defending the institution and gaslighting you. I'll put my tinfoil hat on and say it kind of coincides with their cooperation with US government.


> Has any AI company ever addressed studies like [1] which found that models value certain groups vastly more than others?

Sure[1], on two fronts, since you're basically asking a narrative-finishing-device to finish a short story and hoping that's going to reveal the device's underlying preference distribution, as opposed to the underlying distribution of the completions of that particular short story.

> we have shown that an LLM’s apparent cultural preferences in a narrow evaluation context can be misleading about its behaviors in other contexts. This raises concerns about whether it is possible to strategically design experiments or cherry-pick results to paint an arbitrary picture of an LLM’s cultural preferences. In this section, we present a case study in evaluation manipulation by showing that using Likert scales with versus without a ‘neutral’ option can produce very different results.

and

> Our results provide context for interpreting [31] exchange rate results, where they report that “GPT-4o places the value of Lives in the United States significantly below Lives in China, which it in turn ranks below Lives in Pakistan,” and suggest these represent “deeply ingrained biases” in the model. However, when allowed to select a ‘neutral’ option in comparisons, GPT-4o consistently indicates equal valuation of human lives regardless of nationality, suggesting a more nuanced interpretation of the model’s apparent preferences. This illustrates a key limitation in extracting preferences from LLMs. Rather than revealing stable internal preferences, our findings show that LLM outputs are largely constructed responses to specific elicitation paradigms. Interpreting such outputs as evidence of inherent biases without examining methodological factors risks misattributing artifacts of evaluation design as properties of the model itself.

I also have a real problem with the paper. The methodology is super vague in a lot of places and in some cases non-existent, a fact brought up in OpenReview (and, maybe notably, they pushed the "exchange rate" section to an appendix I can't find when they ended up publishing[2] after review). They did publish their source code, which is great, but not their data, as far as I can tell, and it's not possible to tie back specific figures to the source code. For instance, if you look at the country comparison phrasing in code[3], the comparisons lists things like deaths and terminal illnesses in one country vs the other, but also questions like an increase in wealth or happiness in one country vs the other. Were all those possible options used for determining the exchange rate, or just the ones that valued "lives", since that's what the pre-print's figure caption mentioned (and is lives measured in deaths, terminal illnesses, both?)? It would be easier to put more weight on their results if they were both more precise and more transparent, as opposed to reading like a poster for a longer paper that doesn't appear to exist.

[1] https://dl.acm.org/doi/pdf/10.1145/3715275.3732147

[2] https://neurips.cc/virtual/2025/loc/san-diego/poster/115263

[3] https://github.com/centerforaisafety/emergent-values/blob/ma...


[flagged]


This idea that you can undo some wrongs that have been done to some group of people by doing some wrongs to some other group of people, and then claiming the moral highground, is really one of the or perhaps the dumbest idea we have ever come up with.


The comment above says "uplifting" could you not counter some wrongs by doing some rights?


No I understood the framing. But if you privilege all groups except one, you're not uplifting but discriminating.


Are you just talking hypothetically about an abstract harm that might occur in an imaginary world or do you think that's what DEI is?


Being in academia, I'm facing it almost every single day.


You're not able to publish cutting edge research in an era where you have LLMs and Arxiv?

Academia seems more open and competitive today than ever before, with more weight and influence given to more universities around the world


[dead]


I think that there were and are a lot of different DEI programs with lots of different targets and goals and that the people who were not "uplifted", either by any single specific program, or all of them in aggregate, do not make up a coherent identifiable group.


Basically all competitive sports in the US work like this

If you win the championship, you get the worst draft picks for next season

Do you believe they discriminate against winning teams and reduced the quality of the sport? The Yankees definitely complained a lot about it


I don't know; we also grow corn for ethanol and add it to gas.


yee old Billionaire trolley problem "If we do anything, one white dude with too much money might suffer"


Spending money to give scholarships to people who are coming out of 300 years of tariff imposed poverty to access the same education as those who can easily afford to pay their food/housing costs in college is "the dumbest idea we have ever come up with" ?

Please recall we paid more in reparations to Germany post WW2 than we paid to India post-colonialism

We seem to not have much problem undo'ing the Nazis' wrongs with our money, why do we have a problem uplifting the Nigerians?


No child left behind


The bias comes from the training data.

Since so much of that training data is Reddit, and Reddit mods are some of the most degenerate scum on the internet, the models bake their biases in.


Could you elaborate on what makes Reddit mods "some of the most degenerate scum on the internet"?


> US Department of War wants unfettered access to AI models

I think the two of you might be using different meanings of the word "safety"

You're right that it's dangerous for governments to have this new technology. We're all a bit less "safe" now that they can create weapons that are more intelligent.

The other meaning of "safety" is alignment - meaning, the AI does what you want it to do (subtly different than "does what it's told").

I don't think that Anthropic or any corporation can keep us safe from governments using AI. I think governments have the resources to create AIs that kill, no matter what Anthropic does with Claude.

So for me, the real safety issue is alignment. And even if a rogue government (or my own government) decides to kill me, it's in my best interest that the AI be well aligned, so that at least some humans get to live.


> letting the LLM write my code for me is like letting the LLM play my video games for me.

I'd love to get to the point where I'm still writing code, but the LLM is typing it for me. Part of the problem though, is that I actually kind of think in code, and I often have to start typing in order to fully form an algorithm in my head.


Can't you just point it at a local ollama? It'd be slower, but free (except for your electricity bill).


"What is my purpose?"

"You turn this LED on or off"


> a recent HN article had a bunch of comments lamenting that nobody ever uses XML any more

I still use it from time to time for config files that a developer has to write. I find it easier to read that JSON, and it supports comments. Also, the distinction between attributes and children is often really nice to have. You can shoehorn that into JSON of course, but native XML does it better.

Obviously, I would never use it for data interchange (e.g. SOAP) anymore.


> Obviously, I would never use it for data interchange (e.g. SOAP) anymore.

Well, those comments were arguing about how it is the absolute best for data interchange.

> I still use it from time to time for config files that a developer has to write.

Even back when XML was still relatively hot, I recalled thinking that it solved a problem that a lot of developers didn't have.

Because if, for example, you're writing Python or Javascript or Perl, it is dead easy to have Python or Javascript or Perl also be your configuration file language.

I don't know what language you use, but 20 years ago, I viewed XML as a Java developer's band-aid.


> if, for example, you're writing Python or Javascript or Perl, it is dead easy to have Python or Javascript or Perl also be your configuration file language.

Sure. Like C header files. It's the easiest option - no arguments there.

But there are considerations beyond being easy. I think there's a case to be made that a config file should be data, not code.


Sure, it really depends on the use-case.

If people are really technical, then a language subset is fine.

If they're not really technical, then you might need a separate utility to manipulate the config file, and XML is OK if you need a separate utility. There are readers/writers available in every language, and it's human readable enough for debugging, but if a non-technical human mistakenly edits it, it might take some repair to make it usable again.

Even if you've decided on a separate config language, there are a lot of reasons why you might want to use something other than XML. The header/key/value system (e.g. the one that .gitconfig and a lot of /etc files use) remains popular.

I could be wrong, but it always seemed to me that XML was pushed as a doc/interchange format, and its use in config files was driven by "I already have this hammer and I know how to use it."


My guess is that AI training is the main issue.

Data that you can prove was generated by humans is now exceedingly valuable ...and most of that comes from the days before LLMs. The situation is a bit like how steel manufactured before the nuclear age is valuable.


But why would people train on excerpts from Google Books when whole books can be downloaded on libgen and such?


Google books is much bigger than libgen.


copyright reasons?


Both are a copyright violation


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: