Huh, this post is not what I thought it would be! Even after the first two parag...

Huh, this post is not what I thought it would be! Even after the first two paragraphs!

There's a line of thought which states that intelligence rhymes with compression: Identifying patterns allows better prediction, enables better compression of the data.

However, internally, LLMs typically do the opposite: Tokenization and vectorization multiply the bit rate of the input signal. Chain of thought techniques add a lot of extra text, further increasing the bit rate.