The "A" in MOVDQA stands for "aligned," and requires 16-byte alignment. The corresponding MOVDQU does not require alignment but is marginally slower, at least on older CPUs.
On modern CPUs MOVDQU is still slower, but only if the data is unaligned and straddles two cachelines, if the data is properly aligned than MOVDQU and MOVDQA perform identically nowadays. It's still important to align data but it doesn't matter so much whether you use the aligned instructions. I suppose using the aligned instructions still gives you a free assertion that data you expect to be aligned is actually aligned, rather than silently running slower if it's not.
IIRC at least one compiler (maybe ICC?) no longer bothers to emit aligned instructions at all, even if it knows the data should be aligned, because they found it to be more trouble than it's worth on modern hardware.
IIRC MOV*A and MOV*U differ pretty significantly in how the secret stuff in the cache prefetching system treats them. The assertion is kind of nice, but there is a difference.