The Intel platform intrinsics have names like `_mm512_4dpwssd_epi32()`. The standardized SIMD intrinsics with `simd` in the name are much newer than any of the code I'm talking about in ffmpeg/x264/dav1d. These are okay, but not being platform-specific of course means you don't get platform-specific features, which you might want when you're doing this level of optimization.
The other problem is compilers (esp. gcc) were traditionally very bad at code generation for them, although these days they're okay at it.
The Intel platform intrinsics have names like `_mm512_4dpwssd_epi32()`. The standardized SIMD intrinsics with `simd` in the name are much newer than any of the code I'm talking about in ffmpeg/x264/dav1d. These are okay, but not being platform-specific of course means you don't get platform-specific features, which you might want when you're doing this level of optimization.
The other problem is compilers (esp. gcc) were traditionally very bad at code generation for them, although these days they're okay at it.