The Physics of LLM Inference
Build your own LLM serving engine from scratch. Covers hardware-level optimization, memory management, custom CUDA and Triton kernel development, and throughput maximization. 113 pages + full code repository.
$5+
Build your own LLM serving engine from scratch. Covers hardware-level optimization, memory management, custom CUDA and Triton kernel development, and throughput maximization. 113 pages + full code repository.
$5+
Implement reinforcement learning techniques for building reasoning capabilities into language models on a single GPU. Covers policy gradients, GRPO, think tokens, and memory-efficient training. 60% code, 40% prose.
$5
Learn GPT-2 pre-training on a single GPU. Covers tokenization, embeddings, attention mechanisms, transformer architecture, training loops, and optimization. Based on nanoGPT - minimal code, maximum understanding.
$5+