Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Also no one would have been caught with their pants down by seeing people jailbreak their models.

Preventing jailbreak in a language model is like preventing a GO AI from drawing a dick with the pieces. You can try, but since the model doesn't have any concept of what you want it to do it is very hard to control that. Doesn't make the model smart, it just means that the model wasn't made to understand dick pictures.



It does not make the model smart, but it demonstrates our inablity to control it despite wanting it. That strongly suggests that it's not fully understood.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: