Books

The Physics of LLM Inference

Build your own LLM serving engine from scratch. Covers hardware-level optimization, memory management, custom CUDA and Triton kernel development, and throughput maximization. 113 pages + full code repository.

$5+

The RL Post-Training Handbook

Implement reinforcement learning techniques for building reasoning capabilities into language models on a single GPU. Covers policy gradients, GRPO, think tokens, and memory-efficient training. 60% code, 40% prose.

LLM Pre-Training for Dummies

Learn GPT-2 pre-training on a single GPU. Covers tokenization, embeddings, attention mechanisms, transformer architecture, training loops, and optimization. Based on nanoGPT - minimal code, maximum understanding.

$5+