408 words
2 minutes
Convolution, Pooling & Normalization Layers

XII. Convolution, Pooling & Normalization Layers (卷积、池化与正则化层)#

1. nn.MaxPool2d() / nn.AvgPool2d()#

2D Max / Average Pooling (最大/平均池化). Downsamples feature maps using a sliding window, reducing spatial size.
pool = nn.MaxPool2d(kernel_size=2, stride=2)
x = torch.rand(8, 64, 28, 28)
out = pool(x) # [8, 64, 14, 14]
gap = nn.AdaptiveAvgPool2d((1, 1))
feat = gap(out) # [8, 64, 1, 1] — global avg pool
Note: AdaptiveAvgPool2d((1,1)) is the standard Global Average Pooling (全局平均池化) in ResNet's classification head.

2. nn.ConvTranspose2d()#

Transposed Convolution (转置卷积 / 反卷积) for upsampling. Core layer in U-Net and GAN Generators.
deconv = nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=4, stride=2, padding=1)
x = torch.rand(4, 64, 14, 14)
out = deconv(x) # [4, 32, 28, 28]
Note: kernel=4, stride=2, padding=1 is the classic recipe that exactly doubles the spatial size.

3. nn.GroupNorm()#

Splits channels into groups and normalizes within each group. Independent of batch size — outperforms BN in small-batch scenarios.
gn = nn.GroupNorm(num_groups=8, num_channels=32)
x = torch.rand(2, 32, 64, 64)
out = gn(x) # shape unchanged
Note: Recommended for object detection / instance segmentation (small batch). num_groups=1 ≡ LayerNorm.

4. nn.InstanceNorm2d()#

Normalizes each sample and each channel independently. Standard normalization for Image Style Transfer (图像风格迁移).
inst = nn.InstanceNorm2d(num_features=64, affine=True)
x = torch.rand(4, 64, 256, 256)
out = inst(x)
Note: affine=True adds learnable scale/shift parameters for better style adaptation.

5. nn.Upsample()#

Module-form wrapper around F.interpolate. No learnable parameters; can be placed in Sequential.
up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)
x = torch.rand(4, 32, 14, 14)
out = up(x) # [4, 32, 28, 28]
Note: Combine with ConvTranspose2d for learnable upsampling control (learned vs fixed).
💡 One-line Takeaway
Normalization choice: BN (large batch) → GN (small batch/detection) → LN (NLP) → IN (style transfer).

Convolution, Pooling & Normalization Layers
https://lxy-alexander.github.io/blog/posts/pytorch/api/12convolution-pooling--normalization-layers/
Author
Alexander Lee
Published at
2026-03-12
License
CC BY-NC-SA 4.0