| | Digging into the LLM-as-a-Judge Results (gilesthomas.com) |
| 1 point by ibobev 7 hours ago | past | discuss |
|
| | Digging into the LLM-as-a-Judge Results (gilesthomas.com) |
| 1 point by ibobev 1 day ago | past | discuss |
|
| | Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results (gilesthomas.com) |
| 1 point by gpjt 1 day ago | past | discuss |
|
| | Using DistributedDataParallel to train a base model from scratch in the cloud (gilesthomas.com) |
| 2 points by ibobev 2 days ago | past | discuss |
|
| | LLM from scratch, part 29 – using DDP to train a base model in the cloud (gilesthomas.com) |
| 2 points by gpjt 2 days ago | past | discuss |
|
| | LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com) |
| 540 points by gpjt 38 days ago | past | 121 comments |
|
| | Why smart instruction-following makes prompt injection easier (gilesthomas.com) |
| 2 points by ibobev 58 days ago | past |
|
| | Writing an LLM from scratch, part 27 – what's left, and what's next? (gilesthomas.com) |
| 1 point by gpjt 67 days ago | past |
|
| | Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (gilesthomas.com) |
| 4 points by gpjt 67 days ago | past |
|
| | Writing an LLM from scratch, part 25 – instruction fine-tuning (gilesthomas.com) |
| 2 points by gpjt 72 days ago | past |
|
| | Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com) |
| 1 point by gpjt 73 days ago | past |
|
| | Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com) |
| 1 point by ibobev 74 days ago | past |
|
| | Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com) |
| 1 point by ibobev 75 days ago | past |
|
| | Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com) |
| 3 points by gpjt 77 days ago | past |
|
| | Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com) |
| 1 point by gpjt 79 days ago | past |
|
| | Writing an LLM from scratch, part 22 – training our LLM (gilesthomas.com) |
| 254 points by gpjt 86 days ago | past | 10 comments |
|
| | Revisiting Karpathy's 'The Unreasonable Effectiveness of RNNs' (gilesthomas.com) |
| 1 point by ibobev 89 days ago | past |
|
| | Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas.com) |
| 2 points by gpjt 3 months ago | past |
|
| | Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com) |
| 1 point by ibobev 3 months ago | past |
|
| | Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com) |
| 1 point by gpjt 3 months ago | past |
|
| | Writing an LLM from scratch, part 20 – starting training, and cross entropy loss (gilesthomas.com) |
| 41 points by gpjt 3 months ago | past | 3 comments |
|
| | How Do LLMs Work? (gilesthomas.com) |
| 2 points by gpjt 3 months ago | past | 1 comment |
|
| | How Do LLMs Work? (gilesthomas.com) |
| 1 point by ibobev 3 months ago | past |
|
| | The maths you need to start understanding LLMs (gilesthomas.com) |
| 616 points by gpjt 4 months ago | past | 120 comments |
|
| | What AI chatbots are doing under the hood (gilesthomas.com) |
| 2 points by gpjt 4 months ago | past |
|
| | LLM from scratch, part 18 – residuals, shortcut connections, and the Talmud (gilesthomas.com) |
| 2 points by gpjt 4 months ago | past |
|
| | The fixed length bottleneck and the feed forward network (gilesthomas.com) |
| 1 point by gpjt 4 months ago | past |
|
| | Writing an LLM from scratch, part 17 – the feed-forward network (gilesthomas.com) |
| 8 points by gpjt 5 months ago | past |
|
| | Writing an LLM from scratch, part 16 – layer normalisation (gilesthomas.com) |
| 1 point by gpjt 6 months ago | past |
|
| | Leaving PythonAnywhere (gilesthomas.com) |
| 3 points by gpjt 7 months ago | past |
|
|
| More |