Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But their point is that

>every hand-optimized vectorized x86 routine

is approximately 0 percent of code.



Approximately 0 percent of a code needs optimization.

But very often approximately 0 percent of a code is 90+% of a runtime


Sure but OP's point is that that tiny fraction of code is (probably, I'm just guessing) responsible for a disproportionately large fraction of time spent actually running work. AVX is AFAIK mostly used in hand rolled, low level library code that then is used by a whole lot of consumers.

I know I've seen pretty huge speedups in my own code for "free" just from switching to an AVX version of BLAS. You can just think about how many different programs use BLAS (which is itself highly arcane internally), and AVX is definitely in a ton of other low level libraries out there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: