(It's a pretty constraining interface though - the model outputs an entire distribution and then we instantly lose it by only choosing one token from it.)
(It's a pretty constraining interface though - the model outputs an entire distribution and then we instantly lose it by only choosing one token from it.)