Except Tesla isn’t inferring a 3D map from stereo vision either, at least not outside of the front facing cameras - they’re using monocular depth prediction.
Neither are humans. Our eyes are so close together there's almost no disparity between the eye images beyond a handful of meters. We do 3D by inference beyond the near-field.