IV. Automatic Differentiation — Autograd (自动微分)#

1. `Tensor.requires_grad`#

Marks whether gradient computation (梯度计算) is needed for this Tensor. It is the entry switch of the Autograd System (自动微分系统).

1
x = torch.tensor([2.0], requires_grad=True)
2
y = x ** 2 + 3 * x
3
y.backward()
4
print(x.grad)  # dy/dx = 2x+3 = 7

Note: Only Leaf Nodes (叶子节点) created by the user can directly set requires_grad. Intermediate nodes propagate it automatically.

2. `Tensor.backward()`#

Triggers Backpropagation (反向传播) from a scalar (or with a gradient tensor argument), computing gradients for all leaf nodes.

1
x = torch.tensor([1., 2., 3.], requires_grad=True)
2
y = (x * 2).sum()
3
y.backward()
4
print(x.grad)  # tensor([2., 2., 2.])

Note: Gradients accumulate by default. Call optimizer.zero_grad() before each iteration.

3. `torch.no_grad()`#

Context manager that disables gradient computation — saves memory and speeds up inference (推理) / evaluation (评估).

1
model.eval()
2
with torch.no_grad():
3
    output = model(x)
4
    loss = criterion(output, labels)
5

6
# Also usable as a decorator
7
@torch.no_grad()
8
def predict(x):
9
    return model(x)

Note: Always enable this during inference, otherwise inference is slow and VRAM usage is high.

4. `Tensor.detach()`#

Returns a new Tensor disconnected from the Computation Graph (计算图), sharing data but not propagating gradients.

1
x = torch.tensor([1., 2.], requires_grad=True)
2
y = x * 3
3
z = y.detach()       # no gradient tracking
4
arr = y.detach().numpy()  # must detach before .numpy()

Note: In GAN training, freeze the Generator by calling fake_img.detach() before passing it to the Discriminator (判别器).

5. `torch.autograd.grad()`#

Explicitly computes gradients of outputs w.r.t. inputs. Supports Higher-order Gradients (高阶梯度) like Hessians.

1
x = torch.tensor(2.0, requires_grad=True)
2
y = x ** 3
3
dy_dx, = torch.autograd.grad(y, x, create_graph=True)  # 1st order
4
d2y,   = torch.autograd.grad(dy_dx, x)                 # 2nd order

Note: Core API for MAML (Model-Agnostic Meta-Learning, 模型无关元学习) and Physics-Informed Neural Networks (物理信息神经网络, PINN).

6. `Tensor.grad` / `Tensor.grad_fn`#

grad: stores the accumulated gradient. grad_fn: points to the Backward Function (反向传播函数) that created this Tensor.

1
x = torch.tensor([1., 2.], requires_grad=True)
2
y = x * x
3
print(y.grad_fn)   # <MulBackward0 ...>
4
y.sum().backward()
5
print(x.grad)      # tensor([2., 4.])

Note: grad_fn=None indicates a leaf node. Check with .is_leaf.

7. `torch.enable_grad()`#

Re-enables gradient tracking inside a no_grad context, enabling fine-grained control.

1
with torch.no_grad():
2
    x = model.encode(data)
3
    with torch.enable_grad():
4
        x.requires_grad_(True)
5
        loss = head(x)  # only this part tracked

Note: Useful for Partial Freeze Training (部分冻结训练), e.g., fine-tuning only the last layer.

8. `register_hook()`#

Registers a hook function on a Tensor's backward pass, enabling inspection or modification of intermediate gradients.

1
grads = []
2
def save_grad(g):
3
    grads.append(g.clone())
4

5
x = torch.rand(3, requires_grad=True)
6
y = (x**2).sum()
7
x.register_hook(save_grad)
8
y.backward()
9
print(grads[0])

Note: Invaluable for debugging Gradient Vanishing/Explosion (梯度消失/爆炸) and implementing gradient penalties like WGAN-GP.

💡 One-line Takeaway
Always pair backward() with zero_grad(), wrap inference in no_grad(), and use detach() to stop gradients from crossing module boundaries.

IV. Automatic Differentiation — Autograd (自动微分)#

1. Tensor.requires_grad#

2. Tensor.backward()#

3. torch.no_grad()#

4. Tensor.detach()#

5. torch.autograd.grad()#

6. Tensor.grad / Tensor.grad_fn#

7. torch.enable_grad()#

8. register_hook()#