Don't really see why you'd need to understand how the transformer works to do LLMs at work. LLMs is just a synthetic human performing reasoning with some failure modes that in-depth knowledge of the transformer interals won't help you predict what they are (just have to use experience with the output to get a sense, or other peoples experiments).
In my experience this is a substantial difference in the ability to really get performance in LLM related engineering work from people who really understand how LLMs work vs people who think it's a magic box.
If your mental model of an LLM is:
> a synthetic human performing reasoning
You are severely overestimating the capabilities of these models and not realizing potential areas of failure (even if your prompt works for now in the happy case). Understanding how transformers work absolutely can help debug problems (or avoid them in the first place). People without a deep understanding of LLMs also tend to get fooled by them more frequently. When you have internalized the fact that LLMs are literally optimistized to trick you, you tend to be much more skeptical of the initial results (which results in better eval suites etc).
Then there's people who actually do AI engineering. If you're working with local/open weights models or on the inference end of things you can't just play around with an API, you have a lot more control and observability into the model and should be making use of it.
I still hold that the best test of an AI Engineer, at any level of the "AI" stack, is how well they understand speculative decoding. It involves understanding quite a bit about how LLMs work and can still be implemented on a cheap laptop.
But that AI engineer who is implementing speculative decoding is still just doing basic plumbing that has little to do with the actual reasoning. Yes, he/she might make the process faster, but they will know just as little about why/how the reasoning works as when they implemented a naive, slow version of the inference.
What "actual reasoning" are you referring to? I believe you're making my point for me.
Speculative decoding requires the implementer to understand:
- How the initial prompt is processed by the LLM
- How to retrieve all the probabilities of previously observed tokens in the prompt (this also help people understand things like the probability of the entire prompt itself, the entropy of the prompt etc).
- Details of how the logits generate the distribution of next tokens
- Precise details of the sampling process + the rejection sampling logic for comparing the two models
- How each step of the LLM is run under-the-hood as the response is processed.
Hardly just plumbing, especially since, to my knowledge, there are not a lot of hand-holding tutorials on this topic. You need to really internalize what's going on and how this is going to lead to a 2-5x speed up in inference.
Building all of this yourself gives you a lot of visibility into how the model behaves and how "reasoning" emerges from the sampling process.
edit: Anyone who can perform speculative decoding work also has the ability to inspect the reasoning steps of an LLM and do experiments such as rewinding the thought process of the LLM and substituting a reasoning step to see how it impacts the results. If you're just prompt hacking you're not going to be able to perform these types of experiments to understand exactly how the model is reasoning and what's important to it.
But I can make a similar argument about a simple multiplication:
- You have to know how the inputs are processed.
- You have to left-shift one of the operands by 0, 1, ... N-1 times.
- Add those together, depending on the bits in the other operand.
- Use an addition tree to make the whole process faster.
Does not mean that knowing the above process gives you a good insight in the concept of A*B and all the related math and certainly will not make you better at calculus.
I'm still confused by what you meant by "actual reasoning", which you didn't answer.
I also fail to understand how building what you described would not help your understanding of multiplication, I think it would mean you understand multiplication much better than most people. I would also say that if you want to be a "multiplication engineer" then, yes you should absolutely know how to do what you've described there.
I also suspect you might have lost the main point. The original comment I was replying to stated:
> Don't really see why you'd need to understand how the transformer works to do LLMs at work.
I'm not saying implementing speculative decoding is enough to "fully understand LLMs". I'm saying if you can't at least implement that, you don't understand enough about LLMs to really get the most out of them. No amount of twiddling around with prompts is going to give you adequate insight into how an LLMs works to be able to build good AI tools/solutions.
1) ‘human’ encompasses behaviours that include revenge cannibalism and recurrent sexual violence —- wish carefully.
2) not even a little bit, and if you want to pretend then pretend they’re a deranged delusional psych patient who will look you in the eye and say genuinely “oops, I guess I was lying, it won’t ever happen again” and then lie to you again, while making sure happens again.
3) don’t anthropomorphize LLMs, they don’t like it.
Not quite. This type of visa helps folks like me live in livable countries with good enough salaries to help our family and elderly don't die in our home countries
IMO Humans are better at understanding abstract things, taking into account not only technical requirements but also non-technical requirements. It would be horrible if I would need to argue with an AI about priorities, security concerns, etc.
Too many edge cases. Managers are working with people and we are complicated.
Will the AI Manager fight correctly about my promotion or my bonus? Will it make sure I have a good work-life balance? Will it considered I did an extraordinary job vs an underperforming job?
I have argued with managers and it's so boring and frustrating. At least with an AI I can start a conversation from scratch, or even practice with a locally running instance.
I think the magic happens in the function "run_windbg_cmd". AFAIK, the agent will use that function to pass any WinDBG command that the model thinks will be useful. The implementation basically includes the interface between the model and actually calling CDB through CDBSession.
Yeah that seems correct. It's like creating an SQLite MCP server with single tool "run_sql". Which is just fine I guess as long as the LLM knows how to write SQL (or WinDBG commands). And they definitely do know that. I'd even say this is better because this shifts the capability to LLM instead of the MCP.