408 words
2 minutes
Convolution, Pooling & Normalization Layers

XII. Convolution, Pooling & Normalization Layers (卷积、池化与正则化层)
1. nn.MaxPool2d() / nn.AvgPool2d()
2D Max / Average Pooling (最大/平均池化). Downsamples feature maps using a sliding window, reducing spatial size.
pool = nn.MaxPool2d(kernel_size=2, stride=2)x = torch.rand(8, 64, 28, 28)out = pool(x) # [8, 64, 14, 14]
gap = nn.AdaptiveAvgPool2d((1, 1))feat = gap(out) # [8, 64, 1, 1] — global avg poolNote:
AdaptiveAvgPool2d((1,1)) is the standard Global Average Pooling (全局平均池化) in ResNet's classification head.2. nn.ConvTranspose2d()
Transposed Convolution (转置卷积 / 反卷积) for upsampling. Core layer in U-Net and GAN Generators.
deconv = nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=4, stride=2, padding=1)x = torch.rand(4, 64, 14, 14)out = deconv(x) # [4, 32, 28, 28]Note:
kernel=4, stride=2, padding=1 is the classic recipe that exactly doubles the spatial size.3. nn.GroupNorm()
Splits channels into groups and normalizes within each group. Independent of batch size — outperforms BN in small-batch scenarios.
gn = nn.GroupNorm(num_groups=8, num_channels=32)x = torch.rand(2, 32, 64, 64)out = gn(x) # shape unchangedNote: Recommended for object detection / instance segmentation (small batch).
num_groups=1 ≡ LayerNorm.4. nn.InstanceNorm2d()
Normalizes each sample and each channel independently. Standard normalization for Image Style Transfer (图像风格迁移).
inst = nn.InstanceNorm2d(num_features=64, affine=True)x = torch.rand(4, 64, 256, 256)out = inst(x)Note:
affine=True adds learnable scale/shift parameters for better style adaptation.5. nn.Upsample()
Module-form wrapper around
F.interpolate. No learnable parameters; can be placed in Sequential. up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)x = torch.rand(4, 32, 14, 14)out = up(x) # [4, 32, 28, 28]Note: Combine with ConvTranspose2d for learnable upsampling control (learned vs fixed).
💡 One-line Takeaway
Normalization choice: BN (large batch) → GN (small batch/detection) → LN (NLP) → IN (style transfer).
Normalization choice: BN (large batch) → GN (small batch/detection) → LN (NLP) → IN (style transfer).
Convolution, Pooling & Normalization Layers
https://lxy-alexander.github.io/blog/posts/pytorch/api/12convolution-pooling--normalization-layers/