Self-play works for Go, because the "world" (for lack of a better term) can be fully simulated. Human language talks about the real world, which we cannot simulate, so self-play wouldn't be able to learn new things about the world.
We might end up with more regularised language, and a more consistent model of the world, but that would come at the expense of accuracy and faithfulness (two things which are already lacking).
We might end up with more regularised language, and a more consistent model of the world, but that would come at the expense of accuracy and faithfulness (two things which are already lacking).