If one wants to sum two arrays instead of for looping through the elements of two arrays one might instead use iterate by chunks so that the compiler can easily tie them all together as a single operation which can then easily vectorize.
If I recall, you can absolutely loop through the elements in a tight loop and the compiler (e.g. GCC) will auto-vectorize for you (if you have the relevant optimization flag set).
The trick with coding for auto-vectorization is to keep your loops small and free of clutter.
I don't have the documentation handy but I think you only need to follow a couple rules:
- loop must have a defined size (for-loop instead of while-loop)
- don't muck with pointers inside the loop (simple pointer increment is okay)
- don't modify other variables (only the array should be modified)
The linked document describes Intel's autovectorizer, it's warnings and compiler flags that point out which loops autovectorized or not, as well as listing specific reason codes why.
Microsoft, GCC and Clang all do this too, though with different compiler flags and messages.
I'd say that the whole point of this document listed here is to build up the programmer to understanding these error messages and specifically know how to fix the errors that causes a autovectorization-fail.
If one wants to sum two arrays instead of for looping through the elements of two arrays one might instead use iterate by chunks so that the compiler can easily tie them all together as a single operation which can then easily vectorize.