Yi Wang

Yi Wang’s Page

Compare Deep Learning Toolkits
Roofline Analysis of Apple Silicon GPUs using MLX
Activations Are Critial for Deep Learning: Train an MLP Using Your Brain to Address the XOR Classification Challenge
Why Transformer Models Need Positional Encoding
Efficient Implementation of Rotary Positional Embedding
FlashAttention (Part 1): Tiled Matrix Multiplication
FlashAttention (Part 2): Online Softmax
FlashAttention (Part 3.)
Automatic Differentiation (Part 1): The Jacobian for JVP and VJP
Automatic Differentiation (Part 2): Built a Simple Deep Learning Toolkit with VJP
Evaluate Hessian-Vector Product Without The Hessian Matrix
Decipher JAX’s Tracing and JIT Compilation
A Minimalist HTTP Server Using asyncio
Unit Test Library Or Framework?
JAX’s vmap: The Fundamentals
JAX’s vmap: An Application of Speculative Decoding
Curried Functions and JAX Type Annotation
Trio: Simplifying Python Coroutines with Go’s Goroutine Style
Using Conv2D for Linear Projection on Apple Neural Engine
JAX’s Device Mesh and Tensor Sharding
The Right Way to Configure LLM Training