Reranking is definitely the way to go. We personally found common reranker model...

Reranking is definitely the way to go. We personally found common reranker models to be a little too opaque (can't explain to the user why this result was picked) and not quite steerable enough, so we just use another LLM for reranking.

We open-sourced our impl just this week: https://github.com/with-logic/intent

We use Groq with gpt-oss-20b, which gives great results and only adds ~250ms to the processing pipeline.

If you use mini / flash models from OpenAI / Gemini, expect it to be 2.5s-3s of overhead.