I was super-excited about vector search and embeddings in 2024 but my enthusiasm...

softwaredoug · 2025-12-25T18:19:32 1766686772

The main problem isn’t embeddings, in my experience, it’s that “vector search” is the wrong conceptual framework to think about the problem

We need to think about query+content understanding before deciding a sub problem happens to be helped by embeddings. RAG naively looks like a question answering “passage retrieval” problem, when in reality it’s more structured retrieval than we first assume (and LLMs can learn how to use more structured approaches to explore data much better now than in 2022)

https://softwaredoug.com/blog/2025/12/09/rag-users-want-affo...

bonecrusher2102 · 2025-12-25T20:47:10 1766695630

Love seeing you in these threads! We use “AI Powered Search” as a bible on our team. Thanks for all your contributions to the community.

softwaredoug · 2025-12-26T05:46:14 1766727974

Thank you. Trey gets the lions share of credit for most of that book :)

markerz · 2025-12-25T08:37:17 1766651837

The problem with LLMs using full-text-search is they’re very slow compared to a vector search query. I will admit the results are impressive but often it’s because I kick off an agent query and step away for 5 minutes.

On the other hand, generating and regenerating embeddings for all your documents can be time consuming and costly, depending on how often you need to reindex

leobg · 2025-12-25T12:20:00 1766665200

Not an apples to apples comparison. Vector search is only fast after you have built an index. The same is true for full text search. That too, will be blazing fast once you have built an index (like Google pre-transformer).

markerz · 2025-12-26T03:05:03 1766718303

LLMs will always have the tool call overhead, which I find to be quite expensive (seconds) on most models. Directly using vector databases without the LLM interface gets you a lot of the semantic search ability without the multi-second latency, which is pretty nice for querying documents on a website. E.G. finding relevant pages on a documentation website, showing related pages, etc. Can be applied to GitHub Issues to deduplicate issues, or show existing issues that could match what the user is about to report. There are plenty of places where “cheap and fast” is better and an LLM interface just gets in the way. I think this is a lot of the unsqueezed juice in our industry.

Someone · 2025-12-25T09:43:52 1766655832

> The vector are usually pretty big and you need to keep them in memory to get truly great performance. FTS and grep are way less hassle.

If you find disk I/O for grep acceptable, why would it matter for vectors? They aren’t much bigger, are they?

marginalia_nu · 2025-12-25T11:22:15 1766661735

The ultimate bottleneck in any search application is IOPS; how much data can you get off disk to compare within a tolerable time span.

Embeddings are huge compared to what you need with FTS, which generally has good locality, compresses extremely well, and permits sub-linear intersection algorithms and other tricks to make the most of your IOPS.

Regardless of vector size, you are unlikely to get more than one embedding per I/O operation with a vector approach. Even if you can fit more vectors into a block, there is no good way of arranging them to ensure efficient locality like you can with e.g. a postings list.

Thus off a 500K IOPS drive, given a 100ms execution window, your theoretical upper bound is 50K embeddings ranked, assuming actual ranking takes no time and no other disk operations are performed and you have only a single user.

Given you are more than likely comparing multiple embeddings per document, this carriage turns to a pumpkin pretty rapidly.

jdthedisciple · 2025-12-25T12:10:01 1766664601

In my experience vector search (top 50 results) combined with reranking (top 5-15 of those 50 results) yields not only great results but is even quite performant if done right (which is not hard!).

croemer · 2025-12-25T10:01:08 1766656868

Doesn't ChatGPT web search use a (vector) search engine under the hood, e.g. Bing? Do we know how it works exactly?

simonw · 2025-12-25T12:43:50 1766666630

I've not heard about Bing using vector search, at least outside of their image search feature https://arxiv.org/abs/1802.04914

Information about how Bing text search works appears to be pretty sparse though.

One of the great mysteries to me right now is how ChatGPT search actually works.

It was Bing when they first launched it, but OpenAI have been investing a ton into their own search infrastructure since then. I can't figure out how much of it is Bing these days vs their own home-rolled system.

What's confusing is how secretive OpenAI are about it! I would personally value it a whole lot more if I understood how it works.

So maybe it's way more vector-based than I believe.

I'd expect any modern search engine to have aspects of vectors somewhere - some kind of hybrid BM25 + vectors thing, or using vectors for re-ranking after retrieving likely matches via FTS. That's different from being pure vectors though.

windexh8er · 2025-12-25T17:33:45 1766684025

Given that it's not documented also becomes a trust issue. OpenAI is clearly headed towards monetizing results and if search is biased / injected with unlabeled ads or questionable sources they become a new vector for both untrustworthy results and potential misdirection or misinformation.