gpjt's submissions | Hacker News

1.		Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com)
		4 points by gpjt 14 hours ago \| past \| discuss
2.		Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com)
		1 point by gpjt 1 day ago \| past \| discuss
3.		Writing an LLM from scratch, part 32B – Interventions: gradient clipping (gilesthomas.com)
		2 points by gpjt 2 days ago \| past \| discuss
4.		Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com)
		1 point by gpjt 3 days ago \| past \| discuss
5.		Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com)
		1 point by gpjt 9 days ago \| past \| discuss
6.		Writing an LLM from scratch, part 31 – the models are now on Hugging Face (gilesthomas.com)
		2 points by gpjt 20 days ago \| past
7.		Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results (gilesthomas.com)
		1 point by gpjt 29 days ago \| past
8.		LLM from scratch, part 29 – using DDP to train a base model in the cloud (gilesthomas.com)
		2 points by gpjt 30 days ago \| past
9.		LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com)
		540 points by gpjt 66 days ago \| past \| 121 comments
10.		Writing an LLM from scratch, part 27 – what's left, and what's next? (gilesthomas.com)
		1 point by gpjt 3 months ago \| past
11.		Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (gilesthomas.com)
		4 points by gpjt 3 months ago \| past
12.		Writing an LLM from scratch, part 25 – instruction fine-tuning (gilesthomas.com)
		2 points by gpjt 3 months ago \| past
13.		Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com)
		1 point by gpjt 3 months ago \| past
14.		Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com)
		3 points by gpjt 3 months ago \| past
15.		Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com)
		1 point by gpjt 3 months ago \| past
16.		Writing an LLM from scratch, part 22 – training our LLM (gilesthomas.com)
		254 points by gpjt 3 months ago \| past \| 10 comments
17.		Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas.com)
		2 points by gpjt 3 months ago \| past
18.		Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com)
		1 point by gpjt 4 months ago \| past
19.		Writing an LLM from scratch, part 20 – starting training, and cross entropy loss (gilesthomas.com)
		41 points by gpjt 4 months ago \| past \| 3 comments
20.		How Do LLMs Work? (gilesthomas.com)
		2 points by gpjt 4 months ago \| past \| 1 comment
21.		The maths you need to start understanding LLMs (gilesthomas.com)
		616 points by gpjt 5 months ago \| past \| 120 comments
22.		What AI chatbots are doing under the hood (gilesthomas.com)
		2 points by gpjt 5 months ago \| past
23.		LLM from scratch, part 18 – residuals, shortcut connections, and the Talmud (gilesthomas.com)
		2 points by gpjt 5 months ago \| past
24.		The fixed length bottleneck and the feed forward network (gilesthomas.com)
		1 point by gpjt 5 months ago \| past
25.		Writing an LLM from scratch, part 17 – the feed-forward network (gilesthomas.com)
		8 points by gpjt 5 months ago \| past
26.		Writing an LLM from scratch, part 16 – layer normalisation (gilesthomas.com)
		1 point by gpjt 7 months ago \| past
27.		Leaving PythonAnywhere (gilesthomas.com)
		3 points by gpjt 8 months ago \| past
28.		Writing an LLM from scratch, part 15 – from context vectors to logits (gilesthomas.com)
		7 points by gpjt 8 months ago \| past
29.		Writing an LLM from scratch, part 14 – the complexity of self-attention at scale (gilesthomas.com)
		1 point by gpjt 8 months ago \| past
30.		Writing an LLM from scratch, part 13 – attention heads are dumb (gilesthomas.com)
		351 points by gpjt 9 months ago \| past \| 67 comments
		More