I dunno about Intel and AMD, but ARM and RISC-V use lookup tables for rsqrt. Unlike AMD and Intel, those tables are precisely defined in their respective specs.
I don't recall the coprocessor having either reciprocal or reciprocal square root? I didn't do much Intel until later in my career though, so I might be missing something though.
Both _mm_rcp_ps (rcpps) and _mm_rsqrt_ps (rsqrtps) are only good for about half the bits.
Same as Carmack's, we did a single step of Newton's method and it was definitely good enough.