669 words
3 minutes
Automatic Differentiation — Autograd

IV. Automatic Differentiation — Autograd (自动微分)#

1. Tensor.requires_grad#

Marks whether gradient computation (梯度计算) is needed for this Tensor. It is the entry switch of the Autograd System (自动微分系统).
x = torch.tensor([2.0], requires_grad=True)
y = x ** 2 + 3 * x
y.backward()
print(x.grad) # dy/dx = 2x+3 = 7
Note: Only Leaf Nodes (叶子节点) created by the user can directly set requires_grad. Intermediate nodes propagate it automatically.

2. Tensor.backward()#

Triggers Backpropagation (反向传播) from a scalar (or with a gradient tensor argument), computing gradients for all leaf nodes.
x = torch.tensor([1., 2., 3.], requires_grad=True)
y = (x * 2).sum()
y.backward()
print(x.grad) # tensor([2., 2., 2.])
Note: Gradients accumulate by default. Call optimizer.zero_grad() before each iteration.

3. torch.no_grad()#

Context manager that disables gradient computation — saves memory and speeds up inference (推理) / evaluation (评估).
model.eval()
with torch.no_grad():
output = model(x)
loss = criterion(output, labels)
# Also usable as a decorator
@torch.no_grad()
def predict(x):
return model(x)
Note: Always enable this during inference, otherwise inference is slow and VRAM usage is high.

4. Tensor.detach()#

Returns a new Tensor disconnected from the Computation Graph (计算图), sharing data but not propagating gradients.
x = torch.tensor([1., 2.], requires_grad=True)
y = x * 3
z = y.detach() # no gradient tracking
arr = y.detach().numpy() # must detach before .numpy()
Note: In GAN training, freeze the Generator by calling fake_img.detach() before passing it to the Discriminator (判别器).

5. torch.autograd.grad()#

Explicitly computes gradients of outputs w.r.t. inputs. Supports Higher-order Gradients (高阶梯度) like Hessians.
x = torch.tensor(2.0, requires_grad=True)
y = x ** 3
dy_dx, = torch.autograd.grad(y, x, create_graph=True) # 1st order
d2y, = torch.autograd.grad(dy_dx, x) # 2nd order
Note: Core API for MAML (Model-Agnostic Meta-Learning, 模型无关元学习) and Physics-Informed Neural Networks (物理信息神经网络, PINN).

6. Tensor.grad / Tensor.grad_fn#

grad: stores the accumulated gradient. grad_fn: points to the Backward Function (反向传播函数) that created this Tensor.
x = torch.tensor([1., 2.], requires_grad=True)
y = x * x
print(y.grad_fn) # <MulBackward0 ...>
y.sum().backward()
print(x.grad) # tensor([2., 4.])
Note: grad_fn=None indicates a leaf node. Check with .is_leaf.

7. torch.enable_grad()#

Re-enables gradient tracking inside a no_grad context, enabling fine-grained control.
with torch.no_grad():
x = model.encode(data)
with torch.enable_grad():
x.requires_grad_(True)
loss = head(x) # only this part tracked
Note: Useful for Partial Freeze Training (部分冻结训练), e.g., fine-tuning only the last layer.

8. register_hook()#

Registers a hook function on a Tensor's backward pass, enabling inspection or modification of intermediate gradients.
grads = []
def save_grad(g):
grads.append(g.clone())
x = torch.rand(3, requires_grad=True)
y = (x**2).sum()
x.register_hook(save_grad)
y.backward()
print(grads[0])
Note: Invaluable for debugging Gradient Vanishing/Explosion (梯度消失/爆炸) and implementing gradient penalties like WGAN-GP.
💡 One-line Takeaway
Always pair backward() with zero_grad(), wrap inference in no_grad(), and use detach() to stop gradients from crossing module boundaries.

Automatic Differentiation — Autograd
https://lxy-alexander.github.io/blog/posts/pytorch/api/04automatic-differentiation--autograd/
Author
Alexander Lee
Published at
2026-03-12
License
CC BY-NC-SA 4.0