Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks
Custom CUDA kernels that eliminate computational bottlenecks in spherical harmonics and tensor product operations - the core primitives of equivariant GNNs like MACE, NequIP, and Allegro.