given that AI is primarily trained on web data I wonder if it's possible to atta...

a-dub · on April 19, 2022

that's the idea! we know about adversarial inputs at inference time, this paper talks about adversarial perturbation of the model itself during training. what about undetectable adversarial training inputs where people do their own training but the model still ends up with hard to find (except for the adversary) weaknesses?