Driftin: Single-Step Image Generation at 306 FPS
What if you could generate images in one forward pass instead of fifty? Same UNet, same parameters, 57x faster. Trained on 8x H100s, benchmarked on a 3090.
What if you could generate images in one forward pass instead of fifty? Same UNet, same parameters, 57x faster. Trained on 8x H100s, benchmarked on a 3090.
The battery simulation community is stuck on CPU. Naive GPU ports are actually slower. Hand-tuned CUDA kernels achieve 89% of RTX 3090 peak bandwidth.
A custom CUDA megakernel for Qwen3-0.6B that fuses RMSNorm, QKV projection, RoPE, attention, and MLP into a single kernel launch - achieving 527 tok/s decode on RTX 3090.
Custom CUDA kernels that eliminate computational bottlenecks in spherical harmonics and tensor product operations - the core primitives of equivariant GNNs like MACE, NequIP, and Allegro.
How discovering the original KernelBench was exploitable led to building a focused, cost-effective benchmark for evaluating LLM kernel engineering on modern architectures.
Compressing Qwen3-30B-A3B from 6,144 to 1,698 experts while retaining 91.5% HumanEval performance - fitting a frontier-class MoE model into 18GB of VRAM.
Reproducing "Attention Is Not What You Need" (arXiv 2512.19428) reveals a 22.6% performance gap vs the claimed 10-15%. Includes custom CUDA kernels with 2x inference speedup.
AI is consuming energy at a rate that Earth's grids can barely sustain. I spent several days modeling a 100 Megawatt Orbital Compute Cluster with Gemini to design a rig that lives in the vacuum.
A deep dive into MiniMax M2.1, the 230B parameter sparse MoE model that activates only 10B parameters per token while achieving SOTA performance at 10% of Claude Sonnet's cost.
A comprehensive technical analysis of GLM-4.7, the 358B parameter Mixture-of-Experts model pushing the boundaries of coding, reasoning, and agentic AI capabilities.
A down-to-earth answer considering my experience with CUDA, what it has and hasn't brought me success in, where the ecosystem is going, and how to play strategically around that.