This is an extremely tricky question. The practical and readily-observable-from-...

This is an extremely tricky question.

The practical and readily-observable-from-output answer is "No, they cannot meaningfully identify spam or misinformation, and do indeed just accept the results as gospel"; Google's AI summary works this way and is repeatedly wrong in exactly this way. Google's repeatedly had it be wrong even in the adcopy.

The theoretical mechanism is that the attention mechanism with LLMs would be able to select which parts of the results are fed further into the results. This is how the model is capable of finding parts of the text that are "relevant". The problem is that this just isn't enough to robustly identify spam or incorrect information.

However, we can isolate this "find the relevant bit" functionality away from the rest of the LLM to enhance regular search engines. It's hard to say how useful this is; Google has intentionally damaged their search engine and it may simply not be worth the GPU cycles compared to traditional approaches, but it's an idea being widely explored right now.