669 words
3 minutes
Automatic Differentiation — Autograd

IV. Automatic Differentiation — Autograd (自动微分)
1. Tensor.requires_grad
Marks whether gradient computation (梯度计算) is needed for this Tensor. It is the entry switch of the Autograd System (自动微分系统).
x = torch.tensor([2.0], requires_grad=True)y = x ** 2 + 3 * xy.backward()print(x.grad) # dy/dx = 2x+3 = 7Note: Only Leaf Nodes (叶子节点) created by the user can directly set
requires_grad. Intermediate nodes propagate it automatically.2. Tensor.backward()
Triggers Backpropagation (反向传播) from a scalar (or with a gradient tensor argument), computing gradients for all leaf nodes.
x = torch.tensor([1., 2., 3.], requires_grad=True)y = (x * 2).sum()y.backward()print(x.grad) # tensor([2., 2., 2.])Note: Gradients accumulate by default. Call
optimizer.zero_grad() before each iteration.3. torch.no_grad()
Context manager that disables gradient computation — saves memory and speeds up inference (推理) / evaluation (评估).
model.eval()with torch.no_grad(): output = model(x) loss = criterion(output, labels)
# Also usable as a decorator@torch.no_grad()def predict(x): return model(x)Note: Always enable this during inference, otherwise inference is slow and VRAM usage is high.
4. Tensor.detach()
Returns a new Tensor disconnected from the Computation Graph (计算图), sharing data but not propagating gradients.
x = torch.tensor([1., 2.], requires_grad=True)y = x * 3z = y.detach() # no gradient trackingarr = y.detach().numpy() # must detach before .numpy()Note: In GAN training, freeze the Generator by calling
fake_img.detach() before passing it to the Discriminator (判别器).5. torch.autograd.grad()
Explicitly computes gradients of outputs w.r.t. inputs. Supports Higher-order Gradients (高阶梯度) like Hessians.
x = torch.tensor(2.0, requires_grad=True)y = x ** 3dy_dx, = torch.autograd.grad(y, x, create_graph=True) # 1st orderd2y, = torch.autograd.grad(dy_dx, x) # 2nd orderNote: Core API for MAML (Model-Agnostic Meta-Learning, 模型无关元学习) and Physics-Informed Neural Networks (物理信息神经网络, PINN).
6. Tensor.grad / Tensor.grad_fn
grad: stores the accumulated gradient. grad_fn: points to the Backward Function (反向传播函数) that created this Tensor. x = torch.tensor([1., 2.], requires_grad=True)y = x * xprint(y.grad_fn) # <MulBackward0 ...>y.sum().backward()print(x.grad) # tensor([2., 4.])Note:
grad_fn=None indicates a leaf node. Check with .is_leaf.7. torch.enable_grad()
Re-enables gradient tracking inside a
no_grad context, enabling fine-grained control. with torch.no_grad(): x = model.encode(data) with torch.enable_grad(): x.requires_grad_(True) loss = head(x) # only this part trackedNote: Useful for Partial Freeze Training (部分冻结训练), e.g., fine-tuning only the last layer.
8. register_hook()
Registers a hook function on a Tensor's backward pass, enabling inspection or modification of intermediate gradients.
grads = []def save_grad(g): grads.append(g.clone())
x = torch.rand(3, requires_grad=True)y = (x**2).sum()x.register_hook(save_grad)y.backward()print(grads[0])Note: Invaluable for debugging Gradient Vanishing/Explosion (梯度消失/爆炸) and implementing gradient penalties like WGAN-GP.
💡 One-line Takeaway
Always pair
Always pair
backward() with zero_grad(), wrap inference in no_grad(), and use detach() to stop gradients from crossing module boundaries. Automatic Differentiation — Autograd
https://lxy-alexander.github.io/blog/posts/pytorch/api/04automatic-differentiation--autograd/