Ask HN: Recommendation for LLM-based "documentation interaction"

simonw · on Oct 15, 2024

I have yet to see a convincing demo of fine-tuning being used to teach an LLM how to answer questions about a set of documentation.

The OpenAI docs have a list of things that fine-tuning CAN help with, and answering questions about documentation is notable absent: https://platform.openai.com/docs/guides/fine-tuning/common-u...

There are two methods that are a better bet for what you're trying to do here.

The first, if your documentation is less than maybe 50 pages of text, is to dump the entire documentation into the prompt each time. This used to be prohibitively expensive but all of the major model providers have prompt caching now which can make this option a lot cheaper.

Google Gemini can go up to 2 million tokens, so you can fit a whole lot of documentation in there.

The other, more likely option, is RAG - Retrieval Augmented Generation. That's the trick where you run searches against your documentation for pages and examples that might help answer the user's question and stuff those into the prompt.

RAG is easy to build an initial demo of and challenging to build a really GOOD implementation, but it's a well trodden path at this point to there are plenty of resources out there to help.

Here are my own notes on RAG so far: https://simonwillison.net/tags/rag/

You can quickly prototype how well these options would work using OpenAI GPTs, Claude Projects and Google's NotebookLM - each of those will let you dump in a bunch of documentation and then ask questions about it. Projects includes all of that source material in every prompt (so has a strict length limit) - GPTs and NotebookLM both implement some form of RAG, though the details are frustratingly undocumented.

tgittos · on Oct 15, 2024

This is something that you can bootstrap into a proof-of-concept in a day and learn the tools you like and don't like along the way.

Basically you'll use any LLM and a vector DB of your choice (I like ChromaDB to date). Write a tool that will walk your source documents and chunk them. Submit the chunks to your LLM with a prompt that asks the LLM to come up with search retrieval questions for the chunk. Store the document and the questions in ChromaDB, cross-referencing the question to the document source (you can add the filename/path as metadata to the question) and the relevant chunk (by it's ID).

Run this tool whenever your docs change - you can automate this. Being intelligent about detecting new/changed content and how you chunk/generate questions can save you time and money and be a place to optimize.

To use it, you need to accept user input, run the input as a text query against your vector DB and submit both the results (with filenames and relevant chunks) and the user's query to a LLM with a prompt designed to elicit a certain kind of response based on input and the relevant chunks. Show the response to the user. Loop if you want.

You can build most of this with as few tools as `litellm`, `langchain` and `huggingface` libraries. You'll be surprised how far you can get with such a dumb setup.

Yes, this is basic RAG. That's how you do it without getting overwhelmed with all the tooling/libraries out there.

ulkidoo · on Oct 15, 2024

> This is something that you can bootstrap into a proof-of-concept in a day and learn the tools you like and don't like along the way. Basically …

Where do sloppyjoes get all of this unrestrained optimism?

OP asked if such a solution exists.

This documentation assistant is an oft requested tool in regards llms on this forum. If you can do it, you could start a business around it. OP could be your first customer!

The only other comment in this thread at this time is from someone who is also a breathlessly vocal supporter of contemporary machine learning systems on this forum and yet they are saying “I have yet to see a convincing demo”, but here you are saying it’s easy; if only these damned margins were larger!

I’ve checked your GitHub. I’m unable to find an implementation of this thing that you claim is so simple to implement.

I checked your blog. Your most recent article is about you wasting 45 minutes hoping such an “ai agent” can fix a bug in your code. It proved unable to do so. You even call the experiment a failure in your post.

So, where’s this optimism coming from?!

But you do say you are having fun. Which is great! I’m glad you’re having fun.

simonw · on Oct 15, 2024

You misinterpreted my comment there. When I said "I have yet to see a convincing demo" I was talking about the idea of fine-tuning a model to answer questions against documentation. The rest of my comment described RAG - the exact same approach that tgittos is recommending.

Here are a few of my own RAG implementations - getting a basic version working really is something that can be done in a few hours... but getting a GOOD version working takes a LOT longer than that.

- https://simonwillison.net/2023/Jan/13/semantic-search-answer... - my first attempt at RAG, before I knew it was called that, using custom SQLite SQL functions

- https://til.simonwillison.net/llms/embed-paragraphs#user-con... - a Bash script implementation of RAG

- https://simonwillison.net/2024/Jun/21/search-based-rag/ - an implementation of RAG using SQLite full-text search (as opposed to embedding vectors), built on https://www.val.town/

formulaicapp · on Oct 15, 2024

Curious if you'd be good enough to go with a RAG [1] approach where you search relevant text and pass it to the llm when answering a question. I've been working on a simple tool for that called Formualic thay enables RAG workflows! Found at https://formulaic.app

1: https://en.m.wikipedia.org/wiki/Retrieval-augmented_generati...

kmgrassi · on Oct 15, 2024

I built something very similar to your "documentation interaction" agent for a client recently. I used https://fabrk.ai/ and it took a couple days to get to a proof of concept. Happy to discuss the details if you'd like. Ping me at my handle at gmail.