Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> is just annotation and labeling kind of work

Given that this is presumably all data used for RLHF'ing, I wonder how much of this is responsible for things like LLM "sycophancy" or the issue of "hallucinations". What if the reward-hacking entity isn't the LLM/optimizer but the human annotator in the loop.



Sycophancy makes sense as a cultural disconnect, western or americam culture tends to expect superiors to be treated as verbal equals while most other cultures layer obsequience from subordinates to superiors. Thanks for the thought.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: