> large scale detail (corresponding to low frequency FFT components)
This isn't true in practice - images are not bandlimited like audio so there aren't really visual elements of images corresponding to low frequency cosine waves. That's why the lowest frequency DCT coefficient in a JPEG image is 16x16 pixels, which is hardly large scale.
But you do quantize all components of the DCT transform, not just the highest ones.
> 1) Throw away fine detail by discarding high frequency components
The reason it works is that fine detail is almost completely correlated across colors, so if you only keep the Y plane at full resolution it still stores it.
You couldn't just throw it out in RGB space because eg text would be unreadable.
This isn't true in practice - images are not bandlimited like audio so there aren't really visual elements of images corresponding to low frequency cosine waves. That's why the lowest frequency DCT coefficient in a JPEG image is 16x16 pixels, which is hardly large scale.
But you do quantize all components of the DCT transform, not just the highest ones.
Actually in the default JPEG quantization matrix it's the coefficient to the upper-left of the last one that gets the most quantization: https://en.wikipedia.org/wiki/Quantization_(image_processing...