Sure but OP's point is that that tiny fraction of code is (probably, I'm just guessing) responsible for a disproportionately large fraction of time spent actually running work. AVX is AFAIK mostly used in hand rolled, low level library code that then is used by a whole lot of consumers.
I know I've seen pretty huge speedups in my own code for "free" just from switching to an AVX version of BLAS. You can just think about how many different programs use BLAS (which is itself highly arcane internally), and AVX is definitely in a ton of other low level libraries out there.
>every hand-optimized vectorized x86 routine
is approximately 0 percent of code.