XII. Convolution, Pooling & Normalization Layers (卷积、池化与正则化层)#

1. `nn.MaxPool2d()` / `nn.AvgPool2d()`#

2D Max / Average Pooling (最大/平均池化). Downsamples feature maps using a sliding window, reducing spatial size.

1
pool = nn.MaxPool2d(kernel_size=2, stride=2)
2
x = torch.rand(8, 64, 28, 28)
3
out = pool(x)  # [8, 64, 14, 14]
4

5
gap = nn.AdaptiveAvgPool2d((1, 1))
6
feat = gap(out)  # [8, 64, 1, 1] — global avg pool

Note: AdaptiveAvgPool2d((1,1)) is the standard Global Average Pooling (全局平均池化) in ResNet's classification head.

2. `nn.ConvTranspose2d()`#

Transposed Convolution (转置卷积 / 反卷积) for upsampling. Core layer in U-Net and GAN Generators.

1
deconv = nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=4, stride=2, padding=1)
2
x = torch.rand(4, 64, 14, 14)
3
out = deconv(x)  # [4, 32, 28, 28]

Note: kernel=4, stride=2, padding=1 is the classic recipe that exactly doubles the spatial size.

3. `nn.GroupNorm()`#

Splits channels into groups and normalizes within each group. Independent of batch size — outperforms BN in small-batch scenarios.

1
gn = nn.GroupNorm(num_groups=8, num_channels=32)
2
x = torch.rand(2, 32, 64, 64)
3
out = gn(x)  # shape unchanged

Note: Recommended for object detection / instance segmentation (small batch). num_groups=1 ≡ LayerNorm.

4. `nn.InstanceNorm2d()`#

Normalizes each sample and each channel independently. Standard normalization for Image Style Transfer (图像风格迁移).

1
inst = nn.InstanceNorm2d(num_features=64, affine=True)
2
x = torch.rand(4, 64, 256, 256)
3
out = inst(x)

Note: affine=True adds learnable scale/shift parameters for better style adaptation.

5. `nn.Upsample()`#

Module-form wrapper around F.interpolate. No learnable parameters; can be placed in Sequential.

1
up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)
2
x = torch.rand(4, 32, 14, 14)
3
out = up(x)  # [4, 32, 28, 28]

Note: Combine with ConvTranspose2d for learnable upsampling control (learned vs fixed).

💡 One-line Takeaway
Normalization choice: BN (large batch) → GN (small batch/detection) → LN (NLP) → IN (style transfer).

XII. Convolution, Pooling & Normalization Layers (卷积、池化与正则化层)#

1. nn.MaxPool2d() / nn.AvgPool2d()#

2. nn.ConvTranspose2d()#

3. nn.GroupNorm()#

4. nn.InstanceNorm2d()#

5. nn.Upsample()#

1. `nn.MaxPool2d()` / `nn.AvgPool2d()`#

2. `nn.ConvTranspose2d()`#

3. `nn.GroupNorm()`#

4. `nn.InstanceNorm2d()`#

5. `nn.Upsample()`#