As an amateur (non-AI-expert) it seems to me that behind every corner is lurking a sub-problem that is AGI-equivalent. I don't see any reason to believe that humans do human-quality object detection without also deploying tremendous contextual understanding of the world. So perhaps it will turn out that a computer needs something similar?
I think decision making in driving is highly contextual, but LiDAR doesn’t help there either. Purely visual field extraction is something even very simple animals can do (presumably which much weaker abstract context processing capabilities).