Should I Learn CUDA?

A down-to-earth answer based on my experience

"Should I learn CUDA?" is a question I, everyone and their mother is faced with today (yes, even me). Here's my most down to earth answer which considers my experience, and what it has vs has NOT brought me success in. I'll also talk about where the ecosystem is going and how to play strategically around that.

I was just through the PyTorch and nanoGPT phase of my journey and got pumped up when Karpathy released llm.c. It looked cool (and fast) with such absurd complexity to a complete beginner (me). Just wanted to understand a little bit more and quickly realized I would have to further rewire my brain (again). Decided to go document my learnings on what a kernel was after prompting GPT-4 about how the whole repo was structured. Ended up thinking the same thing to myself about the previous course I built about LLMs (published before my CUDA course) which was that this stuff wasn't easy for me and won't be easy for anyone else. So I kept going, prompting my way through every detail, piece of text in the kernels I saw, looking at all the videos on explaining kernels I could find. Eventually figured out the way GPU programming was done (init data on CPU and GPU, move it to GPU, define kernel params like grid/blocks/threads, launch kernel on GPU, get results, move back to CPU, print and visualize stuff).

I want to quickly remind you this is all curiosity, eventually seeing stuff isn't taught well enough, and wanting to do it myself. This was never about getting a job (even though I expected some offers to emerge as a result of making a free course on it). There are many rabbit holes to go down, and if you have the time to spare as well as curiosity and fire in you, I fully encourage you to go all in.

If you made it this far, ask yourself the following:

Am I currently in university or college? How much do I care about grades?
Am I comfortable with one of PyTorch or JAX?
Am I just in this for the money?
Am I looking to get a job somehow as quick as possible in the field?
Do I care simply about having an impact on the world potentially at a frontier lab?
Am I (be honest with yourself) just utterly lost and need something to learn?
Am I just seeing CUDA is a cool buzzword people are posting about and I want a part of it?
Am I simply curious and CANNOT help myself since this shit is so cool? (the answer is easy for you but with some nuance)

These are designed to give you some clarity if you are able to truly reflect on each of them deeply. Getting back to it, in December 2025 (or 2026 if you're reading this later), the ecosystem is evolving so rapidly that it feels like you can't keep up, even when learning at full speed. I'll inform you that some concepts but not ALL are important. Understanding how a server/PC is built is an important skill that I think is very fun (but potentially expensive) to know better. If you stick in the software realm only, knowing what RAM, VRAM, CPU vs GPU, and these basic terms are essential. Going a level deeper, knowing what the computations look like for a neural net (CNN or transformer) is going to serve you very well and is one of the most magic parts of your learning journey. When you get to how those computations are optimized on specific hardware like a Hopper or Blackwell GPU, it can get a bit scary. There's a lot of material to cover, and you may not know if it will remain relevant. The most concrete example I can give is when if and when you decide to pick up CUDA or GPU programming, you'll likely write a kernel in a .cu file with __global__ at the start. This is not how modern kernel writing is done anymore (for the most part). All the deep learning kernels are very optimized today, and techniques like RL training LLMs to speed them up even further is an area of research that's doing well. We also have abstractions like Triton, but you'll still need to know CUDA moderately well to understand how to get the best use of this since it's tiled GPU programming that acts as a way to simplify the workflow for someone who may have come from CUDA. NVIDIA has CUTLASS, CuTe, cuda-tile, CuTe-DSL and many other open source repos coming out which simplifies the kernel writing process further (CuTe-DSL being the source code of Flash Attention 4 and the fastest MoE implementation -- Sonic MoE).

To answer the question "Should I learn CUDA?", many of our abstractions today rely on first principles which emerge from CUDA originally. There's simply no shortcutting it. If you are committing to kernels, you go all the way. It's fine to dabble and explore around the corners a bit to know how deep you actually want to get, but making the fastest deep learning kernels faster is ambitious and unrealistic given that this process will likely be fully automated in a year or so. Knowing how to use tools to generate the fastest kernels is a great skill, but optimizing them yourself may not be the best use of your time unless this is your true destiny in some way (IDK who decides that lol).

I know you didn't ask, but I should mention here that I'm writing a CUDA textbook for deep learning specifically. I chose to write this to give people a bigger piece that combines the essentials on the low level stack. I don't want to spoil it but it doesn't go into Triton or any fancy stuff at all. It's all the essentials that aren't going anywhere for a while, and are arguably needed even if you aren't working specifically on kernels all the time. There is still a point in learning some of these skills, but just enough so you can make the existing tools work for you. Experts built these to solve their own pain points, and they knew it would help other engineers when they stumbled into such tools.

When you have a minute, spin up a new LLM conversation and get it to help you reach a personalized consensus point on if you should learn CUDA or not.

Should I Learn CUDA?

Sources: