Inside NVIDIA GPUs: Anatomy of high performance matmul kernels
2025-09-29
From GPU architecture and PTX/SASS to warp-tiling and deep asynchronous tensor core pipelines.
2025-09-29
From GPU architecture and PTX/SASS to warp-tiling and deep asynchronous tensor core pipelines.
2025-08-29
From paged attention, continuous batching, prefix caching, specdec, etc. to multi-GPU, multi-node dynamic serving at scale.
2025-06-30
Humans in the post-ASI world.
2023-07-18
How does Flash Attention really work?
2021-10-05
How i landed a job at DeepMind as a research engineer without an ML degree.
2021-05-23
How I got started with reinforcement learning.
2021-02-07
How I got started with geometric/graph machine learning.
2021-02-07
How I got started with transformers.
2020-06-17
Breaking down ideas from Turing's seminal paper - part 2.
2020-06-17
Breaking down ideas from Turing's seminal paper - part 1.
2019-09-08
Learnings from Coursera course.
2019-02-10
My journey into machine learning.