That's effectively what RLHF is; a means for LLMs to self train on their own out...

		fpgaminer on Sept 6, 2023 \| parent \| context \| favorite \| on: Can LLMs learn from a single example? That's effectively what RLHF is; a means for LLMs to self train on their own output exclusively by using a small human curated dataset as guidance as to what a "good" and "bad" output is.