Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To add some context, this isn't that novel of an approach. A common approach to improve RAG results is to "expand" the underlying chunks using an llm, so as to increase the semantic surface area to match against. You can further improve your results by running query expansion using HyDE[1], though it's not always an improvement. I use it as a fallback.

I'm not sure what Anthropic is introducing here. I looked at the cookbook code and it's just showing the process of producing said context, but there's no actual change to their API regarding "contextual retrieval".

The one change is prompt caching, introduced a month back, which allows you to very cheaply add better context to individual chunks by providing the entire (long) document as context. Caching is an awesome feature to expose to developers and I don't want to take anything away from that.

However, other than that, the only thing I see introduced is just a cookbook on how to do a particular rag workflow.

As an aside, Cohere may be my favorite API to work with. (no affiliation) Their RAG API is a delight, and unlike anything else provided by other providers. I highly recommend it.

1: https://arxiv.org/abs/2212.10496



I think the innovation is using caching as so to make the cost of the approach manageable. The way they implemented it is that each time you create a chunk, you ask the llm to create an atomic chunk from the whole context. You need to do this for all tens of thousands of chunks in your data. This costs a lot. By caching the documents, you can spare costs


You could also just save the first outputted atomic chunk and store it then re-use it each time yourself. Easier and more consistent.


I don't understand how that helps here. They're not regenerating each chunk every time, this is about caching the state after running a large doc through a model. You can only do this kind of thing if you have access to the model itself, or it's provided by the API you use.


To be fair, that only works if you keep chunk windows static.


Yup. Caching is very nice.. but the framing is weird. "Introducing" to me, connotes a product release, not a new tutorial.


I was trying to do this using Prompt Caching like a month ago, but then noticed there's five minute maximum lifetime for the cached prompts - doesn't really work for my RAG needs (or probably most), where the queries would be ran during the next month or a year. I can't see any changes to that policy. Little surprised to see them talk about Prompt Caching relating to RAG.


They aren’t using the prompt caching on the query side, only on the embedding side… so you cache the document in the context window when ingesting it, but not during retrieval.


It seems a little odd to make multiple requests instead of using one request to create all the context for all the chunks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: