Alexander Lee

deduplicate

Fri, 17 Apr 2026 00:00:00 GMT

I. File Deduplication (文件去重)

1. Problem Statement (题目描述)

==Given a Directory Tree (目录树), find and group files with identical Byte Content (字节内容) and output duplicate file paths where group size ≥ 2.==

Task Description (任务说明):

Input: A root directory (根目录) containing files and subdirectories
Definition: Duplicate Files (重复文件) have exactly the same byte content
Output: Groups of file paths (文件路径组), each group contains identical files
Format: One line per group, paths separated by spaces (空格分隔路径)

Example (示例):

/a/1.txt content "hello"
/b/2.txt content "hello"
/c/3.txt content "world"

Output:

/a/1.txt /b/2.txt

2. Core Approach (核心思路)

1) Directory Traversal (目录遍历)

Use DFS/BFS (深度优先/广度优先搜索) to visit all files in the directory tree. ==What should we do if the file cannot be opened?==

2) Hashing Files (文件哈希)

Use a Hash Function (哈希函数) to convert file content into a Hash Value (哈希值) for quick comparison.

3) Grouping Duplicates (分组重复文件)

Use a Hash Map (哈希表) to map hash -> list of file paths and collect duplicates.

3. Code Implementation (代码实现)

1) Python Example (可独立运行)

import os
import hashlib

def get_file_hash(file_path, chunk_size=4096):
    hasher = hashlib.md5()  # MD5哈希函数
    with open(file_path, 'rb') as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            hasher.update(chunk)
    return hasher.hexdigest()

def find_duplicates(root_dir):
    hash_map = {}  # 哈希值 -> 文件路径列表

    for dirpath, _, filenames in os.walk(root_dir):
        for filename in filenames:
            file_path = os.path.join(dirpath, filename)
            file_hash = get_file_hash(file_path)

            if file_hash not in hash_map:
                hash_map[file_hash] = []
            hash_map[file_hash].append(file_path)

    for paths in hash_map.values():
        if len(paths) >= 2:
            print(" ".join(paths))

if __name__ == "__main__":
    # Ensure example_dir exists with test files
    find_duplicates("./example_dir")

import os
import hashlib
from collections import defaultdict


def get_file_hash(file_path, chunk_size=4096):
    hasher = hashlib.md5()
    with open(file_path, "rb") as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            hasher.update(chunk)
    return hasher.hexdigest()


def find_duplicates(root_dir):
    size_map = defaultdict(list)   # 文件大小 -> 文件路径列表
    hash_map = defaultdict(list)   # 文件哈希 -> 文件路径列表

    # 1）先遍历所有文件，按文件大小分组
    for dirpath, _, filenames in os.walk(root_dir):
        for filename in filenames:
            file_path = os.path.join(dirpath, filename)
            try:
                file_size = os.path.getsize(file_path)
                size_map[file_size].append(file_path)
            except (OSError, PermissionError) as e:
                print(f"无法读取文件大小: {file_path}, 错误: {e}")

    # 2）只对“大小相同”的文件计算哈希
    for file_size, paths in size_map.items():
        if len(paths) < 2:
            continue

        for file_path in paths:
            try:
                file_hash = get_file_hash(file_path)
                hash_map[file_hash].append(file_path)
            except (OSError, PermissionError) as e:
                print(f"无法读取文件内容: {file_path}, 错误: {e}")

    # 3）输出真正重复的文件
    found = False
    for paths in hash_map.values():
        if len(paths) >= 2:
            found = True
            print(" ".join(paths))

    if not found:
        print("没有找到重复文件")


if __name__ == "__main__":
    find_duplicates("./example_dir")

import os
import hashlib
from collections import defaultdict
from concurrent.futures import ThreadPoolExecutor, as_completed


def get_file_hash(file_path, chunk_size=4096):
    hasher = hashlib.md5()
    with open(file_path, "rb") as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            hasher.update(chunk)
    return hasher.hexdigest()


def hash_file_worker(file_path):
    try:
        file_hash = get_file_hash(file_path)
        return file_path, file_hash, None
    except (OSError, PermissionError) as e:
        return file_path, None, e


def find_duplicates(root_dir, max_workers=8):
    size_map = defaultdict(list)

    # 1）先按文件大小分组
    for dirpath, _, filenames in os.walk(root_dir):
        for filename in filenames:
            file_path = os.path.join(dirpath, filename)
            try:
                file_size = os.path.getsize(file_path)
                size_map[file_size].append(file_path)
            except (OSError, PermissionError) as e:
                print(f"无法读取文件大小: {file_path}, 错误: {e}")

    duplicate_groups = []

    # 2）只对大小相同的文件组计算哈希
    for file_size, paths in size_map.items():
        if len(paths) < 2:
            continue

        hash_map = defaultdict(list)

        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            futures = [executor.submit(hash_file_worker, file_path) for file_path in paths]

            for future in as_completed(futures):
                file_path, file_hash, error = future.result()
                if error is not None:
                    print(f"无法读取文件内容: {file_path}, 错误: {error}")
                    continue
                hash_map[file_hash].append(file_path)

        # 3）收集真正重复的文件
        for same_files in hash_map.values():
            if len(same_files) >= 2:
                duplicate_groups.append(same_files)

    # 4）输出结果
    if duplicate_groups:
        print("找到重复文件：")
        for i, group in enumerate(duplicate_groups, 1):
            print(f"\n第{i}组重复文件：")
            for path in group:
                print(path)
    else:
        print("没有找到重复文件")


if __name__ == "__main__":
    find_duplicates("./example_dir", max_workers=8)

4. Complexity Analysis (复杂度分析)

1) Time Complexity (时间复杂度)

The Time Complexity (时间复杂度) is $$O(N \cdot S)$$ where N is number of files and S is average file size.

2) Space Complexity (空间复杂度)

The Space Complexity (空间复杂度) is $$O(N)$$ due to storing hash mappings.

5. Optimization Strategies (优化策略)

1) I/O Bound Optimization (I/O瓶颈优化)

Reduce Disk I/O (磁盘读写) by filtering files using File Size (文件大小) before hashing.

2) CPU Bound Optimization (CPU瓶颈优化)

Reduce Hash Computation (哈希计算) cost by using faster hash functions or parallel processing.

II. Detect Duplicate Files (文件去重-按大小+哈希)

1. Problem Statement (题目描述)

==Given File Metadata (文件元数据) and a Content Reading Interface (文件读取接口), detect Duplicate Files (重复文件) where two files are duplicates iff their contents are identical (内容完全相同).==

Requirements (要求):

Input: A list files, each contains:
- path (路径)
- size (文件大小，字节)
Helper Functions (辅助函数):
- read_file(path) -> bytes (读取文件内容)
- hash_bytes(data) -> str (计算哈希值)
Output: List[List[str]], each group contains duplicate file paths (每组至少2个文件)

Constraints (约束):

Number of Files (文件数量) up to $$10^6$$
Files may be very large (大文件GB级)
Need Streaming Processing (流式处理) to avoid loading entire file into memory

Optimization Requirement (优化要求):

Stage 1: Group by File Size (按文件大小分组)
Stage 2: For same size files, compute Content Hash (内容哈希)

Example (示例):

Input:

[a.txt size=3 content=abc,
 b.txt size=3 content=abc,
 c.txt size=3 content=abd,
 d.txt size=10 content=0123456789]

Output:

[[a.txt, b.txt]]

2. Core Idea (核心思路)

1) Two-Stage Filtering (两阶段过滤)

First use File Size (文件大小) to prune candidates, then use Hash Function (哈希函数) to confirm duplicates.

2) Performance Insight (性能关键点)

This approach reduces expensive I/O (磁盘读取) and Hash Computation (哈希计算).

3. Algorithm Steps (算法步骤)

1) Step Flow (步骤流程)

Build Size Map (大小映射): size -> list of paths
Filter groups with size ≥ 2
For each group, compute Hash (计算哈希)
Build Hash Map (哈希映射): hash -> list of paths
Collect groups with size ≥ 2

4. Code Implementation (代码实现)

1) Python Example (可独立运行)

import hashlib
from collections import defaultdict

# Mock read_file (模拟读取函数)
def read_file(path):
    data_map = {
        "a.txt": b"abc",
        "b.txt": b"abc",
        "c.txt": b"abd",
        "d.txt": b"0123456789"
    }
    return data_map[path]

def hash_bytes(data):
    return hashlib.sha256(data).hexdigest()

def find_duplicates(files):
    size_map = defaultdict(list)

    # Stage 1: group by file size
    for f in files:
        size_map[f["size"]].append(f["path"])

    result = []

    # Stage 2: group by content hash
    for paths in size_map.values():
        if len(paths) < 2:
            continue

        hash_map = defaultdict(list)
        for path in paths:
            data = read_file(path)
            h = hash_bytes(data)
            hash_map[h].append(path)

        for group in hash_map.values():
            if len(group) >= 2:
                result.append(group)

    return result

if __name__ == "__main__":
    files = [
        {"path": "a.txt", "size": 3},
        {"path": "b.txt", "size": 3},
        {"path": "c.txt", "size": 3},
        {"path": "d.txt", "size": 10},
    ]

    print(find_duplicates(files))

5. Complexity Analysis (复杂度分析)

1) Time Complexity (时间复杂度)

The Time Complexity (时间复杂度) is $$O(N + K \cdot S)$$ where K is number of candidate files and S is file size.

2) Space Complexity (空间复杂度)

The Space Complexity (空间复杂度) is $$O(N)$$ for storing mappings.

6. System Design Discussion (系统设计讨论)

1) Large File Handling (大文件处理)

Use Streaming Hashing (流式哈希) to process files in chunks to avoid Memory Overflow (内存溢出).

2) I/O Bound Optimization (I/O瓶颈优化)

Use Concurrent I/O (并发I/O) and Batch Processing (批处理) to reduce disk latency.

3) CPU Bound Optimization (CPU瓶颈优化)

Use Parallel Hashing (并行哈希) with Multi-processing (多进程) to speed up computation.

4) Real-time Detection (实时检测)

Use File System Watcher (文件系统监听器) and Incremental Indexing (增量索引) to detect duplicates dynamically.

III. Find Duplicate Files by Content (按内容查找重复文件)

1. Problem Statement (题目描述)

Given a Directory Structure (目录结构), find all files with duplicate content where file content can be compared by a Hash String (哈希字符串).

Requirements (要求):

Input: A list of strings paths, each string contains:
- Directory Path (目录路径)
- File Name (文件名)
- File Content (文件内容)
Output: List[List[str]], each group contains file paths with identical content (内容相同的文件路径分组)

Example (示例):

Input:

[
    "root/a 1.txt(abcd) 2.txt(efgh)",
    "root/c 3.txt(abcd)",
    "root/c/d 4.txt(efgh)",
    "root 4.txt(1234)"
]

Output:

[
    ["root/a/1.txt", "root/c/3.txt"],
    ["root/a/2.txt", "root/c/d/4.txt"]
]

Constraints (约束):

Each input string length (每个输入字符串长度) is less than $$300$$
Number of files (文件数量) is less than $$10^4$$

Extra Example (额外示例):

Input:

["root/a 1.txt(abcd) 2.txt(efgh)"]

Output:

[]

2. Core Idea (核心思路)

1) Hash Map Grouping (哈希表分组)

Use a Hash Map (哈希表) to map Content (内容) to Full Paths (完整路径), because the same content should belong to the same group.

2) String Parsing (字符串解析)

Split each record into Directory (目录) and File Info (文件信息), then extract File Name (文件名) and Content (内容) from each file token.

3. Algorithm Steps (算法步骤)

1) Step Flow (步骤流程)

Traverse each path string
Split it by spaces into Directory (目录) and File Entries (文件项)
For each file entry, parse File Name (文件名) and Content (内容)
Build Full Path (完整路径)
Store it in Hash Map (哈希表): content -> list of full paths
Return groups whose size is at least 2

4. Code Implementation (代码实现)

1) Python Example (可独立运行)

from collections import defaultdict


def find_duplicate(paths):
    content_map = defaultdict(list)

    for record in paths:
        parts = record.split(" ")
        directory = parts[0]

        for file_info in parts[1:]:
            left = file_info.find("(")
            right = file_info.rfind(")")

            file_name = file_info[:left]
            content = file_info[left + 1:right]
            full_path = directory + "/" + file_name

            content_map[content].append(full_path)

    return [group for group in content_map.values() if len(group) >= 2]


if __name__ == "__main__":
    paths = [
        "root/a 1.txt(abcd) 2.txt(efgh)",
        "root/c 3.txt(abcd)",
        "root/c/d 4.txt(efgh)",
        "root 4.txt(1234)"
    ]

    result = find_duplicate(paths)
    print(result)

    extra_input = ["root/a 1.txt(abcd) 2.txt(efgh)"]
    print(find_duplicate(extra_input))

from collections import defaultdict
from concurrent.futures import ThreadPoolExecutor


def parse_record(record):
    parts = record.split(" ")
    directory = parts[0]

    local_map = defaultdict(list)

    for file_info in parts[1:]:
        left = file_info.find("(")
        right = file_info.rfind(")")

        file_name = file_info[:left]
        content = file_info[left + 1:right]
        full_path = directory + "/" + file_name

        local_map[content].append(full_path)

    return local_map


def find_duplicate(paths, max_workers=4):
    content_map = defaultdict(list)

    # 多线程解析
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        results = executor.map(parse_record, paths)

    # 合并结果
    for local_map in results:
        for content, file_list in local_map.items():
            content_map[content].extend(file_list)

    return [group for group in content_map.values() if len(group) >= 2]


if __name__ == "__main__":
    paths = [
        "root/a 1.txt(abcd) 2.txt(efgh)",
        "root/c 3.txt(abcd)",
        "root/c/d 4.txt(efgh)",
        "root 4.txt(1234)"
    ]

    result = find_duplicate(paths, max_workers=4)
    print(result)

    extra_input = ["root/a 1.txt(abcd) 2.txt(efgh)"]
    print(find_duplicate(extra_input))

5. Complexity Analysis (复杂度分析)

1) Time Complexity (时间复杂度)

The Time Complexity (时间复杂度) is $$O(N \cdot K)$$, where $N$ is the number of files and $K$ is the average parsing cost.

2) Space Complexity (空间复杂度)

The Space Complexity (空间复杂度) is $$O(N \cdot K)$$, because we store Content (内容) and File Paths (文件路径) in a Hash Map (哈希表).

6. Interview Notes (面试要点)

1) Why Hash Map (为什么用哈希表)

A Hash Map (哈希表) is the most direct way to group files by the same Content (内容).

2) Why Not Compare Every Pair (为什么不两两比较)

Pairwise Comparison (两两比较) is too slow at $$O(N^2)$$, so grouping by key is the standard optimization.

3) Edge Case (边界情况)

If every file has unique content, the answer is an empty list because no group has at least two files.

Prefill-Decode Disaggregation

Wed, 15 Apr 2026 00:00:00 GMT

I. Prefill-Decode Disaggregation (PD 分离)

1. Motivation (动机)

In standard LLM serving, prefill and decode run on the same GPU and interfere with each other:

Prefill is compute-bound (计算密集型): processes hundreds of tokens in parallel, saturates CUDA cores, one long iteration.
Decode is memory-bandwidth-bound (内存带宽密集型): reads the full KV cache per step for just 1 new token, starved of compute.

Running them together causes prefill-decode interference (干扰):

A large prefill blocks decode iterations → spikes in Inter-Token Latency (ITL, 令牌间延迟).
Decode's need for low batch size conflicts with prefill's need for large batches.

PD disaggregation puts them on separate GPU pools so each can be tuned independently.

2. Architecture (架构)

1) Two Pools (两个资源池)

Pool	Role	Bottleneck	Optimal hardware
Prefill pool (预填充池)	Process prompt tokens, build KV cache	Compute (FLOPs)	High-FLOP GPUs (e.g. H100 SXM)
Decode pool (解码池)	Autoregressive token generation	Memory bandwidth	High-bandwidth GPUs or more GPUs

2) KV Cache Transfer (KV缓存传输)

After prefill completes, the KV cache must be migrated from the prefill GPU to the decode GPU. This is the central engineering challenge of PD disaggregation.

$$ \text{Transfer cost} = \frac{2 \times n_{\text{layers}} \times d_{\text{model}} \times L_{\text{prompt}}}{\text{NVLink / RDMA bandwidth}} $$

Where $L_{\text{prompt}}$ is the prompt length (提示长度). A 4096-token prompt on a 70B model generates ~8 GB of KV cache — transfer latency directly adds to TTFT (首个令牌时间).

Transfer methods (传输方式):

NVLink — within a node, ~600 GB/s, negligible latency.
RDMA over InfiniBand — across nodes, ~200–400 GB/s.
TCP/IP — fallback, much slower, not recommended.

3. Runnable Example (可运行示例)

# pd_disaggregation_sim.py
# Simulates PD-disaggregated scheduling with KV transfer cost.
# No external dependencies required.

import time
import threading
from queue import Queue

PREFILL_TIME_PER_TOKEN = 0.0005   # seconds per token (compute-bound)
DECODE_TIME_PER_TOKEN  = 0.020    # seconds per token (memory-bound)
KV_TRANSFER_GBPS       = 200      # simulated NVLink bandwidth (GB/s)
BYTES_PER_KV_TOKEN     = 2 * 80 * 8192  # 2 (K+V) × 80 layers × 8192 bytes

class Request:
    def __init__(self, req_id: str, prompt_len: int, max_new_tokens: int):
        self.req_id = req_id
        self.prompt_len = prompt_len
        self.max_new_tokens = max_new_tokens

def prefill_worker(req: Request, kv_queue: Queue):
    """Prefill pool: process prompt, produce KV cache."""
    t0 = time.time()
    time.sleep(req.prompt_len * PREFILL_TIME_PER_TOKEN)   # simulate compute
    prefill_ms = (time.time() - t0) * 1000

    # Simulate KV cache transfer
    kv_bytes = BYTES_PER_KV_TOKEN * req.prompt_len
    transfer_s = kv_bytes / (KV_TRANSFER_GBPS * 1e9)
    time.sleep(transfer_s)
    transfer_ms = transfer_s * 1000

    print(f"[Prefill→Transfer] {req.req_id}: "
          f"prefill={prefill_ms:.1f}ms  transfer={transfer_ms:.1f}ms  "
          f"KV={kv_bytes/1e6:.1f}MB")
    kv_queue.put(req)    # hand off to decode pool

def decode_worker(kv_queue: Queue):
    """Decode pool: consume KV cache, generate tokens."""
    while True:
        req = kv_queue.get()
        if req is None:
            break
        t0 = time.time()
        time.sleep(req.max_new_tokens * DECODE_TIME_PER_TOKEN)
        decode_ms = (time.time() - t0) * 1000
        ttft = (req.prompt_len * PREFILL_TIME_PER_TOKEN
                + BYTES_PER_KV_TOKEN * req.prompt_len / (KV_TRANSFER_GBPS * 1e9)
                + DECODE_TIME_PER_TOKEN) * 1000
        print(f"[Decode Done]     {req.req_id}: "
              f"decode={decode_ms:.1f}ms  est.TTFT={ttft:.1f}ms")
        kv_queue.task_done()

if __name__ == "__main__":
    requests = [
        Request("R1", prompt_len=512,  max_new_tokens=50),
        Request("R2", prompt_len=2048, max_new_tokens=20),
        Request("R3", prompt_len=256,  max_new_tokens=100),
    ]

    kv_queue: Queue = Queue()

    # Start decode pool (always listening)
    decoder = threading.Thread(target=decode_worker, args=(kv_queue,), daemon=True)
    decoder.start()

    # Prefill pool: process all requests (could be parallel in real systems)
    threads = [threading.Thread(target=prefill_worker, args=(r, kv_queue))
               for r in requests]
    for t in threads:
        t.start()
    for t in threads:
        t.join()

    kv_queue.join()
    kv_queue.put(None)   # signal decoder to exit

4. Benefits and Trade-offs (优缺点)

Aspect	Coupled (耦合)	Disaggregated (分离)
TTFT	Higher (prefill blocks decode)	Lower (dedicated prefill GPUs)
ITL	Spikey under long prompts	Stable (no prefill interference)
Independent scaling (独立扩缩容)	No	Yes — scale each pool by workload
KV transfer overhead	None	Adds latency on long prompts
Hardware cost	Lower	Higher (more GPUs)

5. Key Formula — Transfer Latency (传输延迟)

For a LLaMA-3 70B with a 4096-token prompt over 200 GB/s RDMA:

$$ T_{\text{transfer}} \approx \frac{8,\text{GB}}{200,\text{GB/s}} = 40,\text{ms} $$

This 40 ms is added directly to TTFT — the central cost of PD disaggregation.

6. Related Concepts (相关概念)

Chunked Prefill (分块预填充) — an alternative to PD disaggregation that interleaves prefill and decode on the same GPU; lower cost but less isolation.
Continuous Batching (连续批处理) — iteration-level scheduling; used within each pool in PD disaggregation.
KV Cache Migration (KV缓存迁移) — the core engineering problem: moving large tensors across GPUs with minimal TTFT penalty.
Mooncake / Splitwise / DistServe — research systems that implement PD disaggregation at production scale.

nn.Module

Wed, 15 Apr 2026 00:00:00 GMT

I. `nn.Module` (神经网络模块基类)

nn.Module is ==the base class (基类) for all PyTorch models== — it provides ==parameter tracking== (参数追踪), ==device management== (设备管理), and ==serialization== (序列化), so subclasses only need to ==define __init__ and forward.==

1. Lifecycle (生命周期)

nn.Module enforces a two-method contract (两方法约定) that separates structure from computation.

1) `init` — Structure (结构定义)

__init__ registers submodules (子模块) and parameters (参数) into ==PyTorch's internal registry== (内部注册表) via super().__init__(). Skipping super().__init__() leaves the registry uninitialized — every subsequent attribute assignment silently fails to register.

2) `forward` — Computation (计算定义)

forward defines ==the computation graph (计算图) traced by autograd (自动微分)== on each call. Call the module as model(x) rather than model.forward(x) — the __call__ wrapper fires registered hooks (钩子) before and after forward.

import torch
import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, in_dim: int, out_dim: int):
        super().__init__()
        self.fc = nn.Linear(in_dim, out_dim)  # auto-registered (自动注册)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return torch.relu(self.fc(x))

model = MLP(4, 2)
print(model(torch.randn(3, 4)).shape)  # torch.Size([3, 2])

2. Parameter Management (参数管理)

nn.Module distinguishes three kinds of named tensors stored inside a module.

1) `nn.Parameter` — Learnable (可学习参数)

nn.Parameter wraps a tensor so that requires_grad=True by default and it appears in model.parameters(). Use it for weights that the optimizer (优化器) must update — plain tensors assigned as attributes are invisible to the optimizer.

2) `register_buffer` — Non-learnable State (非可学习状态)

register_buffer attaches a tensor to the module that moves with .to(device) but is excluded from parameters(). Prefer it over plain attributes for fixed tensors like running statistics (运行统计量) in BatchNorm.

import torch
import torch.nn as nn

class NormLayer(nn.Module):
    def __init__(self, dim: int):
        super().__init__()
        self.weight = nn.Parameter(torch.ones(dim))        # learnable (可学习)
        self.register_buffer("running_mean", torch.zeros(dim))  # non-learnable (非可学习)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return (x - self.running_mean) * self.weight

model = NormLayer(4)
print(dict(model.named_parameters()).keys())  # weight only
print(dict(model.named_buffers()).keys())     # running_mean only

3) `named_parameters` vs `parameters` (命名参数 vs 参数迭代器)

parameters() yields tensors for the optimizer; named_parameters() yields (name, tensor) pairs for debugging or selective freezing (选择性冻结). Freeze a layer by setting param.requires_grad = False — the optimizer skips tensors where requires_grad is false.

3. Hooks (钩子)

Hooks intercept (拦截) the forward and backward passes without modifying forward itself.

1) Forward Hook (前向钩子)

register_forward_hook(fn) fires after forward completes, receiving (module, input, output). Use it for activation logging (激活值记录) or feature extraction (特征提取) without altering model code.

2) Backward Hook (反向钩子)

register_full_backward_hook(fn) fires during the backward pass (反向传播), receiving (module, grad_input, grad_output). Use it to inspect or clip gradients (梯度裁剪) at the module level; the trade-off is a slight overhead on every backward call.

import torch
import torch.nn as nn

model = nn.Linear(4, 2)
activations = {}

def fwd_hook(module, inp, out):
    activations["linear"] = out.detach()

handle = model.register_forward_hook(fwd_hook)
model(torch.randn(3, 4))
print(activations["linear"].shape)  # torch.Size([3, 2])
handle.remove()  # always remove hooks when done (用完及时移除)

Note: Always call handle.remove() after use — unreleased hooks accumulate (积累) and slow down every forward pass.

4. `state_dict` and Serialization (`state_dict` 与序列化)

state_dict is the canonical (规范) way to save and restore model weights in PyTorch.

1) Save and Load (保存与加载)

state_dict() returns an OrderedDict of all parameters and buffers keyed by their registered names. Prefer torch.save(model.state_dict(), path) over pickling the entire model — it decouples weights from the class definition (解耦权重与类定义), making loading robust across code refactors.

2) `load_state_dict` — `strict` Flag (`strict` 标志)

load_state_dict(sd, strict=True) raises an error on any key mismatch (键不匹配); set strict=False when loading a pretrained backbone (预训练骨干网络) into a model with extra heads — missing or unexpected keys are silently ignored.

import torch
import torch.nn as nn

model = nn.Linear(4, 2)
torch.save(model.state_dict(), "/tmp/weights.pt")

# Restore on any device (在任意设备恢复)
new_model = nn.Linear(4, 2)
new_model.load_state_dict(torch.load("/tmp/weights.pt", map_location="cpu"))
print(new_model(torch.randn(3, 4)).shape)  # torch.Size([3, 2])

5. Training vs Eval Mode (训练模式 vs 推理模式)

.train() and .eval() toggle (切换) the behavior of stateful layers like Dropout (随机失活) and BatchNorm (批归一化).

1) `.train()` / `.eval()` — Mode Switch (模式切换)

.train() enables Dropout and uses per-batch statistics in BatchNorm; ==.eval() disables Dropout and switches BatchNorm to its running statistics (运行统计量).==

==Forgetting .eval() at inference (推理) is one of the most common bugs in PyTorch== — Dropout randomly zeroes activations and BatchNorm uses noisy batch stats instead of the learned ones.

import torch
import torch.nn as nn

model = nn.Sequential(nn.Linear(4, 8), nn.Dropout(0.5), nn.Linear(8, 2))

model.train()
out_train = model(torch.randn(3, 4))  # dropout active (激活)

model.eval()
with torch.no_grad():
    out_eval = model(torch.randn(3, 4))  # dropout disabled (禁用)

print(out_train.shape, out_eval.shape)  # both torch.Size([3, 2])

Note: ==torch.no_grad() disables gradient tracking (梯度追踪)== for memory efficiency but does not switch layer behavior — always pair it with .eval() at inference.

Summary: nn.Module tracks parameters via nn.Parameter and register_buffer, intercepts passes via hooks, serializes via state_dict, and gates layer behavior via .train() / .eval() — mastering these five mechanisms covers the vast majority of real-world PyTorch interview questions.

NumPy Overview

Wed, 15 Apr 2026 00:00:00 GMT

I. NumPy Overview(数值计算库)

==NumPy is a fundamental library for numerical computing (数值计算) in Python==. It provides efficient data structures and operations for handling large-scale numerical data.

1. What is NumPy (是什么)

NumPy is a library that provides the ndarray (多维数组) object for storing and manipulating numerical data efficiently.

1) Core Data Structure

ndarray (多维数组) A homogeneous (同质的) multi-dimensional array.

import numpy as np

x = np.array([[1, 2], [3, 4]])
print(x)
print(type(x))
print(x.shape)

2. Why Learn NumPy (为什么需要学习)

1) High Performance (高性能)

NumPy operations are implemented in C (底层C实现), making them much faster than Python loops.

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)  # element-wise addition (逐元素加法)

2) Vectorization (向量化)

Vectorization avoids explicit loops (避免显式循环), improving speed and readability.

import numpy as np

a = np.array([1, 2, 3])
print(a * 2)  # vectorized operation (向量化操作)

3) Foundation of AI Libraries (AI基础)

Many libraries depend on NumPy:

PyTorch (深度学习框架)
TensorFlow (深度学习框架)
SciPy (科学计算库)

3. Comparison

1) NumPy (数值计算库)

Focus: numerical computing (数值计算)
Core object: ndarray (多维数组)
Mainly used for CPU computation (CPU计算)

import numpy as np

x = np.array([1, 2, 3])
print(x * 2)

2) PyTorch (深度学习框架)

Focus: deep learning (深度学习)
Core object: Tensor (张量)
Supports GPU acceleration (GPU加速)
Supports automatic differentiation (自动求导)

import torch

x = torch.tensor([1, 2, 3], dtype=torch.float32)
print(x * 2)

4. Basic Operations (基础操作)

1) Create Arrays (创建数组)

import numpy as np

a = np.zeros((2, 3))   # all zeros (全0)
b = np.ones((2, 3))    # all ones (全1)
c = np.arange(0, 10)   # range (范围)

2) Shape and Reshape (形状操作)

import numpy as np

x = np.array([1, 2, 3, 4])
y = x.reshape(2, 2)

print(y)

3) Indexing (索引)

import numpy as np

x = np.array([[1, 2], [3, 4]])

print(x[0, 1])  # access element (访问元素)

II. NumPy Interview Questions

1. What is NumPy (是什么)

NumPy is a numerical computing library (数值计算库) in Python that provides the ndarray (多维数组) for efficient computation.

2. What is ndarray (多维数组)

ndarray is a homogeneous array (同质数组) that stores elements of the same data type in contiguous memory (连续内存).

3. Difference between list and ndarray (区别)

list allows mixed types (可混合类型), while ndarray requires a single type (统一类型) and supports vectorized operations (向量化运算).

4. What is vectorization (向量化)

Vectorization is performing operations on entire arrays without explicit loops (无需显式循环).

5. What is broadcasting (广播机制)

Broadcasting allows arrays of different shapes (不同形状) to be operated together automatically.

6. Why is NumPy faster than Python (为什么更快)

Because it uses C implementation (底层C实现), contiguous memory (连续内存), and vectorization (向量化).

7. How to create an array (创建数组)

Use functions like np.array(), np.zeros(), np.ones(), np.arange().

8. What is shape (形状)

shape describes the dimensions (维度) of an array.

9. What is reshape (重塑)

reshape changes the shape (改变形状) of an array without changing its data.

10. Difference between copy and view (拷贝 vs 视图)

copy creates new memory (新内存), while view shares memory (共享内存).

11. What is slicing (切片)

Slicing extracts a subset (子数组) of an array.

12. What is dtype (数据类型)

dtype defines the type (数据类型) of elements in an array.

13. What is axis (轴)

axis specifies the direction (方向) along which operations are performed.

14. What is flatten (展平)

flatten converts a multi-dimensional array (多维数组) into one dimension.

15. Difference between arange and linspace (区别)

arange uses step size (步长), while linspace uses number of points (点的数量).

16. What is indexing (索引)

Indexing accesses specific elements (访问元素) using positions.

17. What is boolean indexing (布尔索引)

Boolean indexing selects elements based on conditions (条件筛选).

18. What is aggregation (聚合)

Aggregation performs operations like sum, mean on arrays.

19. What is matrix multiplication (矩阵乘法)

Matrix multiplication follows:

$$ C = A \cdot B $$

using np.dot() or @.

20. What is the main purpose of NumPy (核心作用)

NumPy is used for efficient numerical computation (高效数值计算) and is the foundation of scientific computing (科学计算基础).

Dynamic Graph vs Static Graph

Wed, 15 Apr 2026 00:00:00 GMT

I. Dynamic Graph vs Static Graph (动态图 vs 静态图)

1. What is Dynamic Graph? (动态图是什么)

1) Definition (定义)

A Dynamic Computation Graph (动态图计算图) is built during runtime (运行时构建).

👉 The graph changes as the program executes (执行时动态变化).

2) Characteristics (特点)

Defined on-the-fly (即时定义)
Flexible control flow (灵活控制流)
Easy debugging (易调试)

3) Example (示例)

import torch

x = torch.tensor(2.0, requires_grad=True)
y = x * x + 3

y.backward()
print(x.grad)  # dy/dx = 2x = 4

👉 The graph is created dynamically when y is computed.

2. What is Static Graph? (静态图是什么)

1) Definition (定义)

A Static Computation Graph (静态计算图) is defined before execution (运行前定义).

👉 The graph structure does not change during runtime (运行时不变).

2) Characteristics (特点)

Predefined graph (预先定义)
Optimized before execution (执行前优化)
Better performance (更高性能)

3) Conceptual Example (概念示例)

# Pseudo-code (伪代码)
x = placeholder()
y = x * x + 3

# Build graph first (先构建图)
graph = build_graph(x, y)

# Execute later (再执行)
run(graph, feed_dict={x: 2})

3. Key Differences (核心区别)

Feature (特性)	Dynamic Graph	Static Graph
Build Time (构建时间)	Runtime (运行时)	Before execution (运行前)
Flexibility (灵活性)	High (高)	Low (低)
Debugging (调试)	Easy (容易)	Hard (困难)
Performance (性能)	Medium (中等)	High (高)

4. Mathematical View (数学视角)

A computation graph (计算图) represents operations:

$$ y = f(x) $$

Dynamic graph: build $f(x)$ during execution (运行时构建函数)
Static graph: define $f(x)$ before execution (运行前定义函数)

5. When to Use (何时使用)

1) Dynamic Graph

Research (科研)
Prototyping (快速实验)

2) Static Graph

Production systems (生产环境)
Performance-critical tasks (高性能场景)

6. One-Line Summary (一句话总结)

👉 Dynamic graph (动态图) = flexible and easy (灵活易用) 👉 Static graph (静态图) = efficient and optimized (高效优化)

PyTorch Overview

Wed, 15 Apr 2026 00:00:00 GMT

I. PyTorch Overview (概述)

1. What is PyTorch? (它是做什么的)

1) Definition (定义)

==PyTorch is a deep learning framework (深度学习框架) used for building and training neural networks (神经网络).==

👉 It provides:

Tensor computation (张量计算)
Automatic differentiation (自动求导)
GPU acceleration (GPU加速)

2) Core Idea (核心思想)

PyTorch uses a ==dynamic computation graph== (动态图计算图).

👉 This means:

The graph is built during execution (运行时构建)
Easier debugging (更易调试)

2. Why Learn PyTorch? (为什么学习)

1) Widely Used in Industry (工业应用广泛)

PyTorch is used in:

Computer Vision (计算机视觉)
Natural Language Processing (自然语言处理)
Large Language Models (大模型)

2) Easy to Use (易用性强)

Pythonic syntax (Python风格语法)
Flexible design (灵活设计)

3) Strong Ecosystem (生态系统强大)

Integrated with libraries (库集成)
Active community (活跃社区)

3. Comparison (对比)

1) PyTorch vs TensorFlow

PyTorch:
- Dynamic graph (动态图)
- Easier debugging (易调试)
TensorFlow:
- Static graph (静态图)
- More production tools (生产工具多)

2) PyTorch vs NumPy

PyTorch:
- Supports GPU (支持GPU)
- Automatic differentiation (自动求导)
NumPy:
- CPU only (仅CPU)
- No gradients (无梯度)

4. Runnable Example (可运行示例)

1) Simple Linear Model (线性模型)

import torch
import torch.nn as nn
import torch.optim as optim

# Define model (定义模型)
model = nn.Linear(1, 1)

# Loss function (损失函数)
criterion = nn.MSELoss()

# Optimizer (优化器)
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training data (训练数据)
x = torch.tensor([[1.0], [2.0], [3.0]])
y = torch.tensor([[2.0], [4.0], [6.0]])

# Training loop (训练循环)
for epoch in range(100):
    y_pred = model(x)
    loss = criterion(y_pred, y)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# Output result (输出结果)
print("Weight:", model.weight.item())
print("Bias:", model.bias.item())

Triton Overview

Wed, 15 Apr 2026 00:00:00 GMT

I. Triton Overview (概述)

1. What is Triton? (它是做什么的)

1) Definition (定义)

Triton is a GPU programming language (GPU编程语言) and compiler (编译器).

👉 It is used to write custom GPU kernels (自定义GPU算子) using Python.

2) Core Function (核心功能)

Triton allows you to:

Control GPU computation (控制GPU计算)
Optimize memory access (优化内存访问)
Improve performance (提升性能)

2. Why Learn Triton? (为什么学习)

1) Performance Optimization (性能优化)

Deep learning systems (深度学习系统) rely heavily on GPU computation (GPU计算).

👉 Triton helps:

Reduce runtime (减少运行时间)
Improve efficiency (提升效率)

2) Simpler than CUDA (比CUDA简单)

CUDA: low-level programming (底层编程)
Triton: high-level abstraction (高层抽象)

👉 Easier to write and maintain (更易编写和维护)

3. Comparison (对比)

1) Triton vs CUDA

Triton:
- Python-based (基于Python)
- Automatic optimization (自动优化)
CUDA:
- C++-based (基于C++)
- Manual optimization (手动优化)

2) Triton vs PyTorch

Triton:
- Kernel-level programming (算子级编程)
- Used for optimization (用于优化)
PyTorch:
- Model-level framework (模型级框架)
- Used for training (用于训练)

4. Runnable Example (可运行示例)

1) Vector Addition (向量加法)

import torch
import triton
import triton.language as tl

@triton.jit
def add_kernel(x_ptr, y_ptr, output_ptr, n, BLOCK_SIZE: tl.constexpr):
    pid = tl.program_id(0)
    offsets = pid * BLOCK_SIZE + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n

    x = tl.load(x_ptr + offsets, mask=mask)
    y = tl.load(y_ptr + offsets, mask=mask)
    tl.store(output_ptr + offsets, x + y, mask=mask)

def add(x, y):
    n = x.numel()
    output = torch.empty_like(x)

    grid = lambda meta: (triton.cdiv(n, meta['BLOCK_SIZE']),)

    add_kernel[grid](x, y, output, n, BLOCK_SIZE=1024)
    return output

# Test
x = torch.randn(1024, device='cuda')
y = torch.randn(1024, device='cuda')

z = add(x, y)
print(z[:5])

nano-vllm

Fri, 20 Mar 2026 00:00:00 GMT

1. 初始化阶段（LLMEngine.init）

用户：LLM(path, ...)

步骤	内容
1	解析 Config
2	启动 Tensor Parallel 子进程（tp>1 时）
3	创建 ModelRunner（主进程 rank 0）
4	加载 Tokenizer
5	创建 Scheduler

2. 模型构建阶段（ModelRunner.init）

ModelRunner(config, 0, events)

步骤	内容
1	`dist.init_process_group`
2	`Qwen3ForCausalLM(hf_config)`：construct model structure + allocate space for parameters
3	`load_model(model, path)`：从 safetensors 加载权重
4	`warmup_model()`：跑一次 prefill，触发 JIT
5	`allocate_kv_cache()`：分配 KV cache blocks，挂到 Attention
6	`capture_cudagraph()`：为 decode 捕获 CUDAGraph（可选）

3. 请求入队阶段

用户：add_request(prompt) 或 generate(prompts)

步骤	内容
1	Tokenizer 编码 prompt → token_ids
2	构建 Sequence（含 sampling_params）
3	Scheduler.add(seq) → 进入 waiting 队列

4. 推理循环阶段（step()）

while not is_finished():
    step()

每个 step 内部：

子阶段	职责
调度	`scheduler.schedule()` → 选出 seqs，决定 prefill 或 decode
数据准备	`prepare_prefill` / `prepare_decode` → input_ids, positions, slot_mapping 等
模型前向	`run_model` → embed → layers → lm_head → logits
采样	`sampler(logits)` → token_ids
后处理	`scheduler.postprocess` → append_token，更新状态，回收 blocks

==Prefill 存在的意义就是：利用 prompt 已知的特点，用 GPU 并行一次性处理完，避免逐 token 串行的巨大开销。==

5. 输出阶段

generate() 返回后

步骤	内容
1	按 seq_id 收集 outputs
2	Tokenizer.decode(token_ids) → 文本
3	返回 `[{"text": ..., "token_ids": ...}, ...]`

阶段关系示意

┌─────────────────────────────────────────────────────────────────────┐
│  1. 初始化    Config + ModelRunner + Scheduler + Tokenizer          │
└─────────────────────────────────────────────────────────────────────┘
                                    │
┌─────────────────────────────────────────────────────────────────────┐
│  2. 模型构建  建图 → 加载权重 → warmup → 分配 KV cache → CUDAGraph   │
└─────────────────────────────────────────────────────────────────────┘
                                    │
┌─────────────────────────────────────────────────────────────────────┐
│  3. 请求入队  prompt → tokenize → Sequence → Scheduler.add()        │
└─────────────────────────────────────────────────────────────────────┘
                                    │
┌─────────────────────────────────────────────────────────────────────┐
│  4. 推理循环  schedule → prepare → run_model → sampler → postprocess │
│             （prefill 和 decode 交替）                                │
└─────────────────────────────────────────────────────────────────────┘
                                    │
┌─────────────────────────────────────────────────────────────────────┐
│  5. 输出     收集完成 seq → decode → 返回 texts                      │
└─────────────────────────────────────────────────────────────────────┘

简要对照表

阶段	入口	主要动作
1 初始化	`LLM(...)`	Config、进程、Tokenizer、Scheduler
2 模型构建	`ModelRunner.__init__`	模型结构、加载权重、KV cache、CUDAGraph
3 请求入队	`add_request` / `generate`	Tokenize → Sequence → waiting
4 推理循环	`step()`	schedule → prepare → run → sample → postprocess
5 输出	`generate` 返回	收集、decode、返回文本

itertools.count()

Fri, 20 Mar 2026 00:00:00 GMT

I. `itertools.count()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <span style="color:#E8600A;font-weight:700">itertools.count() (计数迭代器)</span> creates an <span style="color:#E8600A;font-weight:700">infinite iterator (无限迭代器)</span> that generates evenly spaced numbers（等间距数字） starting from a specified value. </div>

1. Basic Usage

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> <span style="color:#E8600A;font-weight:700">itertools.count() (计数迭代器)</span> generates an infinite arithmetic progression. <span style="color:#2980B9;font-weight:700">Use it when you need an endless sequence of numbers</span> for generating IDs, indices, or combining with other iterators. <span style="color:#C0392B;font-weight:700">Warning: Always provide a termination condition when iterating over count()</span> to avoid infinite loops.</div>

1) Function Parameters

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> <span style="color:#E8600A;font-weight:700">itertools.count(start=0, step=1) (起始值, 步长)</span> accepts two numeric parameters. <span style="color:#2980B9;font-weight:700">start (起始值)</span> defines the first value in the sequence, while <span style="color:#2980B9;font-weight:700">step (步长)</span> determines the increment between consecutive values. <span style="color:#E8600A;font-weight:700">Both parameters can be integers, floats, or any numeric type</span> that supports addition.</div>

import itertools

# Default: start=0, step=1
counter = itertools.count()
print(next(counter))  # 0
print(next(counter))  # 1

# Custom start and step
counter = itertools.count(start=5, step=3)
print(next(counter))  # 5
print(next(counter))  # 8
print(next(counter))  # 11

# Using float step
counter = itertools.count(start=1.0, step=0.5)
print([next(counter) for _ in range(3)])  # [1.0, 1.5, 2.0]

<div style="background:#F5F5F5;border-left:4px solid #C0392B;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85;font-size:0.80em"><span style="color:#C0392B;font-weight:700">Note: </span> Using floating-point steps may lead to <span style="color:#C0392B;font-weight:700">precision accumulation errors</span> over many iterations. Consider using integers and dividing when precise decimal values are needed.</div>

2. Practical Applications

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> <span style="color:#E8600A;font-weight:700">itertools.count() (计数迭代器)</span> shines in scenarios requiring automatic indexing or sequence generation. <span style="color:#2980B9;font-weight:700">Common use cases include adding line numbers to data, generating unique IDs, and creating paginated sequences.</span> Its infinite nature makes it particularly useful when the iteration length is determined by another iterable.</div>

1) Adding Indices to Data

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> Combine <span style="color:#E8600A;font-weight:700">count()</span> with <span style="color:#2980B9;font-weight:700">zip()</span> to <span style="color:#E8600A;font-weight:700">automatically number items (自动编号项目)</span> in any iterable. <span style="color:#2980B9;font-weight:700">This pattern is memory-efficient because it generates indices on-the-fly</span> rather than storing them in a list.</div>

import itertools

# Adding line numbers to text lines
lines = ["First line", "Second line", "Third line"]
numbered_lines = zip(itertools.count(1), lines)

for num, line in numbered_lines:
    print(f"{num}: {line}")
# Output:
# 1: First line
# 2: Second line
# 3: Third line

# Creating dictionary with auto-generated keys
names = ["Alice", "Bob", "Charlie"]
user_dict = dict(zip(itertools.count(100), names))
print(user_dict)  # {100: 'Alice', 101: 'Bob', 102: 'Charlie'}

2) Generating Infinite Sequences

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> Use <span style="color:#E8600A;font-weight:700">count()</span> with <span style="color:#2980B9;font-weight:700">itertools.islice()</span> to <span style="color:#E8600A;font-weight:700">generate finite slices of arithmetic sequences (生成有限段的算术序列)</span>. <span style="color:#2980B9;font-weight:700">This approach is ideal for generating test data, mathematical sequences, or pagination</span> where you need predictable, spaced values.</div>

import itertools

# First 5 multiples of 10
multiples_of_10 = itertools.islice(itertools.count(10, 10), 5)
print(list(multiples_of_10))  # [10, 20, 30, 40, 50]

# Skip first 3, then take 4 numbers
skip_take = itertools.islice(itertools.count(100, -1), 3, 7)
print(list(skip_take))  # [97, 96, 95, 94]

# Generating powers of 2 using indices
powers_of_2 = (2 ** i for i in itertools.islice(itertools.count(), 6))
print(list(powers_of_2))  # [1, 2, 4, 8, 16, 32]

<span style="color:#2980B9;font-weight:700">Pattern (模式)</span>	<span style="color:#2980B9;font-weight:700">Code Example (代码示例)</span>	<span style="color:#2980B9;font-weight:700">Use Case (使用场景)</span>
<span style="color:#2980B9;font-weight:700">1-based indexing</span>	`zip(itertools.count(1), data)`	Displaying numbered lists, generating SQL IDs
<span style="color:#E8600A;font-weight:700">Staggered steps</span>	`itertools.islice(itertools.count(0, 5), 10)`	Creating evenly spaced time intervals, pagination offsets
<span style="color:#E8600A;font-weight:700">Descending sequences</span>	`itertools.islice(itertools.count(100, -1), 5)`	Generating countdowns, reverse numbering

3. Performance Comparison

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> <span style="color:#E8600A;font-weight:700">itertools.count() (计数迭代器)</span> is implemented in C, making it <span style="color:#E8600A;font-weight:700">significantly faster than manual Python counter loops (比手动Python计数循环快得多)</span>. <span style="color:#2980B9;font-weight:700">Choose count() when you need infinite sequences or functional composition</span>, and use range() for simple finite sequences.</div>

import itertools
import time

# Manual counter (Python loop)
def manual_counter(n):
    result = []
    i = 0
    while i < n:
        result.append(i)
        i += 1
    return result

# Count with islice (C implementation)
def count_islice(n):
    return list(itertools.islice(itertools.count(), n))

# Range (most optimized for finite sequences)
def range_approach(n):
    return list(range(n))

# n = 10,000,000
# manual_counter: ~0.85s
# count_islice: ~0.42s
# range_approach: ~0.28s

<span style="color:#2980B9;font-weight:700">Method (方法)</span>	<span style="color:#2980B9;font-weight:700">Implementation (实现)</span>	<span style="color:#2980B9;font-weight:700">Best For (最佳场景)</span>	<span style="color:#C0392B;font-weight:700">Limitation (限制)</span>
<span style="color:#2980B9;font-weight:700">Manual counter</span>	Python loop	Simple educational examples	Slow for large iterations
<span style="color:#E8600A;font-weight:700">`itertools.count()`</span>	C implementation	Infinite sequences, functional pipelines	Requires `islice` for finite use
<span style="color:#E8600A;font-weight:700">`enumerate()`</span>	Built-in function	Indexing existing iterables	Fixed start=0, no custom step
<span style="color:#E8600A;font-weight:700">`range()`</span>	C implementation	Simple finite sequences	Cannot be infinite

dataclass

Thu, 19 Mar 2026 00:00:00 GMT

I. Dataclass

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Python dataclass is a <span style="color:#E8600A;font-weight:700">decorator (装饰器)</span> that automatically generates special methods like <code>init</code> and <code>repr</code> for classes primarily used to <span style="color:#2980B9;font-weight:700">store data</span>. It reduces boilerplate code by letting you <span style="color:#2980B9;font-weight:700">declare fields as class variables</span> with type annotations. The dataclass makes your code more <span style="color:#E8600A;font-weight:700">readable and maintainable (可读性和可维护性)</span> by eliminating repetitive method definitions. </div>

1. Basic Dataclass Definition

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> The <span style="color:#E8600A;font-weight:700">@dataclass decorator (装饰器)</span> automatically adds <span style="color:#2980B9;font-weight:700">init</span>, <span style="color:#2980B9;font-weight:700">repr</span>, and <span style="color:#2980B9;font-weight:700">eq</span> methods based on the class variables you define with <span style="color:#E8600A;font-weight:700">type hints (类型提示)</span>. Use this when you need a simple container for data without writing repetitive constructor code. </div> The @dataclass decorator auto-generates:

<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">init(self, x, y)</code> — constructor
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">repr</code> — pretty string representation
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">eq</code> — equality comparison

1) Basic Implementation

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    email: str = "unknown@email.com"  # Default value

# Usage example
person1 = Person("Alice", 25, "alice@email.com")
person2 = Person("Bob", 30)  # Uses default email

print(person1)  # Automatically generated __repr__
print(person1 == person2)  # Automatically generated __eq__

<div style="background:#F5F5F5;border-left:4px solid #C0392B;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85;font-size:0.80em"><span style="color:#C0392B;font-weight:700">Note: </span> Fields without default values <span style="color:#C0392B;font-weight:700">must come before</span> fields with default values, otherwise Python raises a <span style="color:#C0392B;font-weight:700">SyntaxError (语法错误)</span>. </div>

2. Field Customization

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> The <span style="color:#E8600A;font-weight:700">field() function (字段函数)</span> provides <span style="color:#2980B9;font-weight:700">fine-grained control</span> over individual dataclass fields, allowing you to set <span style="color:#E8600A;font-weight:700">default factories (默认工厂)</span>, exclude fields from comparisons, or mark fields as <span style="color:#2980B9;font-weight:700">private (私有)</span>. </div>

1) Using field() with Parameters

from dataclasses import dataclass, field
import random
from typing import List

@dataclass
class Student:
    name: str
    student_id: int = field(init=False)  # Not in __init__
    grades: List[int] = field(default_factory=list)  # Mutable default
    _internal_id: int = field(default=0, repr=False)  # Hidden in __repr__
    
    def __post_init__(self):
        # Initialize after dataclass generation
        self.student_id = random.randint(1000, 9999)
        self._internal_id = hash(self.name)

# Usage example
student = Student("Alice")
student.grades.append(95)  # Works with mutable default
print(student)  # Shows name and grades, but not _internal_id

<div style="background:#F5F5F5;border-left:4px solid #C0392B;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85;font-size:0.80em"><span style="color:#C0392B;font-weight:700">Note: </span> Always use <span style="color:#E8600A;font-weight:700">default_factory (默认工厂)</span> for mutable types like lists or dictionaries. Using <code>grades: List[int] = []</code> would cause all instances to <span style="color:#C0392B;font-weight:700">share the same list</span>. </div>

3. Dataclass Parameters

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> The <span style="color:#E8600A;font-weight:700">@dataclass decorator</span> accepts parameters that <span style="color:#2980B9;font-weight:700">control which methods are generated</span>. Use <span style="color:#2980B9;font-weight:700">frozen=True</span> for immutable objects, <span style="color:#2980B9;font-weight:700">order=True</span> for sorting capabilities, and <span style="color:#2980B9;font-weight:700">kw_only=True</span> to enforce keyword arguments. </div>

1) Configuration Options

from dataclasses import dataclass

@dataclass(frozen=True, order=True)
class Point:
    x: int
    y: int

@dataclass(kw_only=True)  # Python 3.10+
class Configuration:
    host: str
    port: int = 8080

# Usage examples
p1 = Point(1, 2)
p2 = Point(1, 3)
# p1.x = 5  # This would raise FrozenInstanceError
print(p1 < p2)  # Works because order=True

# Must use keyword arguments
config = Configuration(host="localhost", port=3000)
# config = Configuration("localhost", 3000)  # This would fail

<div style="background:#F5F5F5;border-left:4px solid #C0392B;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85;font-size:0.80em"><span style="color:#C0392B;font-weight:700">Note: </span> When using <span style="color:#E8600A;font-weight:700">frozen=True</span>, the dataclass becomes <span style="color:#E8600A;font-weight:700">immutable (不可变的)</span> — you cannot modify attributes after creation. This is ideal for <span style="color:#2980B9;font-weight:700">configuration objects</span> or <span style="color:#2980B9;font-weight:700">value objects</span>. </div>

4. Inheritance with Dataclasses

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> Dataclasses <span style="color:#2980B9;font-weight:700">support inheritance (继承)</span>, with fields from parent classes being combined with child class fields. Use this when you need to <span style="color:#2980B9;font-weight:700">extend data containers</span> while maintaining the automatic method generation. </div>

1) Extending Dataclasses

from dataclasses import dataclass

@dataclass
class Vehicle:
    brand: str
    model: str
    year: int

@dataclass
class Car(Vehicle):
    doors: int
    electric: bool = False

# Usage example
my_car = Car("Tesla", "Model 3", 2023, doors=4, electric=True)
print(my_car)  # Includes all fields from both classes

<div style="background:#F5F5F5;border-left:4px solid #C0392B;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85;font-size:0.80em"><span style="color:#C0392B;font-weight:700">Note: </span> When inheriting, the <span style="color:#E8600A;font-weight:700">field order matters</span> — child class fields are appended after parent fields. All fields without defaults in the parent must come before child fields with defaults. </div>

5. Comparison Table: Regular Class vs Dataclass

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;line-height:1.85"> This table compares the <span style="color:#E8600A;font-weight:700">boilerplate code (样板代码)</span> required for a simple data container using a regular class versus a dataclass. </div>

Feature	Regular Class	Dataclass
<span style="color:#2980B9;font-weight:700">Lines of Code</span>	~10-15 lines	~3-5 lines
<span style="color:#E8600A;font-weight:700">init method</span>	Manual implementation	Auto-generated
<span style="color:#E8600A;font-weight:700">repr method</span>	Manual implementation	Auto-generated
<span style="color:#E8600A;font-weight:700">eq method</span>	Manual implementation	Auto-generated
<span style="color:#2980B9;font-weight:700">Type hints</span>	Optional in body	Required for fields
<span style="color:#2980B9;font-weight:700">Default values</span>	In init method	Direct field assignment
<span style="color:#C0392B;font-weight:700">Mutable defaults</span>	Safe with proper code	Must use default_factory

1) Code Comparison Example

# Regular class - 15 lines
class RegularPerson:
    def __init__(self, name: str, age: int, email: str = "unknown"):
        self.name = name
        self.age = age
        self.email = email
    
    def __repr__(self):
        return f"RegularPerson(name='{self.name}', age={self.age}, email='{self.email}')"
    
    def __eq__(self, other):
        if not isinstance(other, RegularPerson):
            return False
        return (self.name, self.age, self.email) == (other.name, other.age, other.email)

# Dataclass - 4 lines
@dataclass
class DataclassPerson:
    name: str
    age: int
    email: str = "unknown"

Git LFS (Large File Storage)

Sat, 14 Mar 2026 00:00:00 GMT

I. `git lfs install` (Git LFS 初始化命令)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <span style="color:#E8600A;font-weight:700">Git LFS (Large File Storage，大文件存储)</span> is an extension of <span style="color:#E8600A;font-weight:700">Git (版本控制系统)</span> used to manage large files such as datasets, models, or binaries. The command <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">git lfs install</code> initializes <span style="color:#E8600A;font-weight:700">Git LFS</span> on your machine by configuring Git hooks and global settings. <span style="color:#2980B9">In simple words</span>, it prepares Git so that large files will automatically be handled by the LFS system instead of normal Git storage. </div>

<span style="color:#E8600A">1.</span> What `git lfs install` Does (命令作用)

Running

git lfs install

performs several setup steps:

Step	Explanation
Install hooks	Adds <span style="color:#E8600A;font-weight:700">Git Hooks (Git钩子)</span> such as `pre-push`
Configure Git	Enables <span style="color:#E8600A;font-weight:700">LFS Filters (LFS过滤器)</span>
Activate LFS	Allows Git to replace large files with <span style="color:#E8600A;font-weight:700">pointer files (指针文件)</span>

After installation, Git will automatically:

detect large files
store them in <span style="color:#E8600A;font-weight:700">LFS storage (LFS存储)</span>
keep only small pointer references in the repository

<span style="color:#E8600A">2.</span> How Git LFS Works (工作原理)

Normal Git workflow:

file → git repository

Git LFS workflow:

large file → LFS server
pointer file → git repository

Example pointer file:

version https://git-lfs.github.com/spec/v1
oid sha256:xxxx
size 104857600

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"> <span style="color:#E8600A;font-weight:700">Note: </span> A <span style="color:#E8600A;font-weight:700">pointer file (指针文件)</span> is a small text file that references the real large file stored in the LFS server. </div>

<span style="color:#E8600A">3.</span> Typical Usage Workflow (常见使用流程)

1) Install Git LFS

git lfs install

2) Track large files

Example:

git lfs track "*.pt"

This tells Git to manage .pt files using LFS.

3) Commit tracking rules

git add .gitattributes
git commit -m "track model files with LFS"

4) Add large file

git add model.pt
git commit -m "add model"
git push

The file will be stored in <span style="color:#E8600A;font-weight:700">LFS storage (LFS服务器)</span>.

<span style="color:#E8600A">4.</span> When You Need `git lfs` (什么时候需要)

You should use Git LFS when managing:

File Type	Example
Machine learning models	`.pt`, `.pth`
Datasets	`.csv`, `.parquet`
Game assets	textures, audio
Large binaries	compiled files

<span style="color:#C0392B;font-weight:600">Warning:</span> Normal Git performs poorly with very large files because the entire file history is stored inside the repository.

<span style="color:#E8600A">5.</span> Verify Installation (验证安装)

Check whether Git LFS is installed:

git lfs version

Example output:

git-lfs/3.4.0

Check tracked files:

git lfs ls-files

NumPy Array Creation

Thu, 12 Mar 2026 00:00:00 GMT

I. NumPy Array Creation (数组创建)

1. Basic Array Creation (基本数组创建)

1) `np.array()` — Create from List (从列表创建)

The np.array() function converts a Python List (列表) or Tuple (元组) into an ndarray (多维数组).

import numpy as np

# 1D Array (一维数组)
a = np.array([1, 2, 3, 4, 5])
print(a)          # [1 2 3 4 5]
print(a.dtype)    # int64

# 2D Array (二维数组)
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b)
# [[1 2 3]
#  [4 5 6]]
print(b.shape)    # (2, 3)

# Specify dtype (指定数据类型)
c = np.array([1, 2, 3], dtype=np.float32)
print(c)          # [1. 2. 3.]

2) `np.zeros()` / `np.ones()` / `np.full()` — Constant Arrays (常量数组)

These functions create arrays filled with a Constant Value (常量值).

python

import numpy as np

# All zeros (全零数组)
a = np.zeros((2, 3))
print(a)
# [[0. 0. 0.]
#  [0. 0. 0.]]

# All ones (全一数组)
b = np.ones((3, 2), dtype=int)
print(b)
# [[1 1]
#  [1 1]
#  [1 1]]

# Fill with a specific value (填充指定值)
c = np.full((2, 2), 7)
print(c)
# [[7 7]
#  [7 7]]

# Like versions: same shape as another array (与另一个数组形状相同)
d = np.zeros_like(b)
print(d)
# [[0 0]
#  [0 0]]

3) `np.arange()` / `np.linspace()` — Sequence Arrays (序列数组)

Use these to generate Evenly Spaced (等间隔) values.

python

import numpy as np

# arange: similar to range() but returns ndarray
# arange(start, stop, step)
a = np.arange(0, 10, 2)
print(a)    # [0 2 4 6 8]

# Float step (浮点步长)
b = np.arange(0, 1, 0.3)
print(b)    # [0.  0.3 0.6 0.9]

# linspace: evenly spaced over an interval (等分区间)
# linspace(start, stop, num)
c = np.linspace(0, 1, 5)
print(c)    # [0.   0.25 0.5  0.75 1.  ]

# logspace: logarithmically spaced (对数等间隔)
d = np.logspace(0, 3, 4)   # 10^0 to 10^3
print(d)    # [   1.   10.  100. 1000.]

4) `np.eye()` / `np.identity()` — Identity Matrix (单位矩阵)

python

import numpy as np

# Identity Matrix (单位矩阵)
a = np.eye(3)
print(a)
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

# Offset diagonal (偏移对角线)
b = np.eye(3, k=1)
print(b)
# [[0. 1. 0.]
#  [0. 0. 1.]
#  [0. 0. 0.]]

5) `np.empty()` / `np.diag()` — Other Creation Methods (其他创建方式)

python

import numpy as np

# empty: uninitialized (未初始化, values are random)
a = np.empty((2, 2))
print(a)    # random values, fast allocation

# diag: create diagonal matrix or extract diagonal (对角矩阵)
b = np.diag([1, 2, 3])
print(b)
# [[1 0 0]
#  [0 2 0]
#  [0 0 3]]

# Extract diagonal from a matrix (提取对角线)
c = np.array([[1, 2], [3, 4]])
print(np.diag(c))    # [1 4]

NumPy Array Operations

Thu, 12 Mar 2026 00:00:00 GMT

II. NumPy Array Operations (数组操作)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Array operations (数组操作) let you change an array's <span style="color:#E8600A;font-weight:700">shape (形状)</span>, <span style="color:#E8600A;font-weight:700">dimensions (维度)</span>, and structure — without touching the underlying data values. These are the core tools for preparing data before computation. </div>

1. `reshape()` — Change Shape Without Copying (改变形状)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Returns a new view of the same data with a different shape. Total elements must stay the same.

import numpy as np

a = np.arange(12)          # [0, 1, 2, ..., 11]
b = a.reshape(3, 4)        # 3 rows × 4 cols
c = a.reshape(2, 3, 2)     # 3-D: 2×3×2

# Use -1 to let NumPy infer one dimension
d = a.reshape(4, -1)       # → shape (4, 3)

2. `resize()` — Reshape In-Place (原地改变形状)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Like reshape, but modifies the array <strong>in-place</strong> and can change total element count by repeating or truncating data.

a = np.array([1, 2, 3, 4])
a.resize(2, 3)   # repeats values to fill: [[1,2,3],[4,1,2]]
print(a)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">resize() modifies the original array permanently — use with caution.</span></div>

3. `flatten()` — Collapse to 1-D (展平为一维)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Always returns a copy as a flat 1-D array.

b = np.array([[1, 2], [3, 4]])
print(b.flatten())    # [1 2 3 4]
print(b.ravel())      # [1 2 3 4] — same result, but returns a VIEW

Method	Returns	Modifies original?
`flatten()`	Copy	No
`ravel()`	View (usually)	Yes (if view)

4. `transpose()` — Swap Axes (转置)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Swap rows and columns (or any axes in higher dimensions). Shortcut: .T

a = np.array([[1, 2, 3],
              [4, 5, 6]])   # shape (2, 3)

print(a.T)                  # shape (3, 2)
print(a.transpose())        # same as a.T

# For 3-D: specify axis order
c = np.ones((2, 3, 4))
c.transpose(2, 0, 1)        # new shape: (4, 2, 3)

5. `concatenate()` — Join Arrays (拼接数组)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Join a sequence of arrays along an existing axis (轴).

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

# axis=0: stack rows (垂直拼接)
np.concatenate([a, b], axis=0)   # shape (3, 2)

# axis=1: stack columns (水平拼接)
c = np.array([[7], [8]])
np.concatenate([a, c], axis=1)   # shape (2, 3)

6. `split()` — Divide an Array (分割数组)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Split an array into multiple sub-arrays along an axis.

a = np.arange(12).reshape(4, 3)

# Split into 2 equal halves along rows (axis=0)
parts = np.split(a, 2, axis=0)   # [shape(2,3), shape(2,3)]

# Split at specific indices
np.split(a, [1, 3], axis=0)      # rows 0, rows 1–2, rows 3

7. Quick Comparison Table

Function (函数)	In-place?	Returns
`reshape(shape)`	No	View (same data)
`resize(shape)`	<span style="color:#C0392B;font-weight:600">Yes</span>	None (modifies array)
`flatten()`	No	Copy, 1-D
`ravel()`	No	View, 1-D
`transpose()` / `.T`	No	View
`concatenate()`	No	New array
`split()`	No	List of views

NumPy Math Operations

Thu, 12 Mar 2026 00:00:00 GMT

III. NumPy Math Operations (数学运算)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> NumPy math functions are <span style="color:#E8600A;font-weight:700">element-wise (逐元素)</span> — they operate on each element independently and return a new array of the same shape. They are implemented in C, making them far faster than Python loops. </div>

1. Basic Arithmetic (四则运算)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Operator symbols (+, -, *, /) and their functional equivalents work element-by-element.

import numpy as np

a = np.array([10, 20, 30])
b = np.array([1,  2,  3])

np.add(a, b)        # [11 22 33]  → same as a + b
np.subtract(a, b)   # [ 9 18 27]  → same as a - b
np.multiply(a, b)   # [10 40 90]  → same as a * b
np.divide(a, b)     # [10. 10. 10.] → same as a / b

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Broadcasting (广播机制) allows operations between arrays of different shapes. E.g., <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">a + 5</code> adds 5 to every element.</div>

2. `power()` — Exponentiation (幂运算)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Raise each element to a given power.

a = np.array([2, 3, 4])

np.power(a, 3)    # [ 8 27 64]  → a³
a ** 2            # [ 4  9 16]  → shorthand

np.sqrt(a)        # [1.41 1.73 2.0] → square root (平方根)

3. `exp()` / `log()` — Exponential & Logarithm (指数与对数)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Apply the natural exponential $e^x$ or logarithm $\ln(x)$ element-wise.

a = np.array([0, 1, 2])

np.exp(a)         # [1.    2.718 7.389]  → e^x
np.log(a + 1)     # [0.    0.693 1.099]  → ln(x)
np.log2(np.array([1, 2, 8]))   # [0. 1. 3.]
np.log10(np.array([1, 10, 100]))  # [0. 1. 2.]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">log(0) returns -inf and raises a warning — always check for zero values before applying log.</span></div>

4. `sin()` / `cos()` / `tan()` — Trigonometry (三角函数)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Input angles must be in <span style="color:#E8600A;font-weight:700">radians (弧度)</span>, not degrees.

angles = np.array([0, np.pi/6, np.pi/4, np.pi/2])

np.sin(angles)   # [0.    0.5   0.707 1.   ]
np.cos(angles)   # [1.    0.866 0.707 0.   ]
np.tan(angles)   # [0.    0.577 1.    inf  ]

# Convert degrees to radians (角度转弧度)
deg = np.array([0, 30, 45, 90])
np.sin(np.deg2rad(deg))  # same result

5. Rounding Functions (取整函数)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Control how floating-point values are rounded.

a = np.array([1.4, 1.5, 2.6, -1.7])

np.round(a)    # [ 1.  2.  3. -2.]  → nearest even
np.floor(a)    # [ 1.  1.  2. -2.]  → round down (向下取整)
np.ceil(a)     # [ 2.  2.  3. -1.]  → round up (向上取整)
np.trunc(a)    # [ 1.  1.  2. -1.]  → truncate toward zero (截断)

6. Absolute Value & Sign (绝对值与符号)

a = np.array([-3, -1, 0, 2, 5])

np.abs(a)     # [3 1 0 2 5]
np.sign(a)    # [-1 -1  0  1  1]

7. Quick Comparison Table

Function (函数)	Operation	Example Input → Output
`add / subtract`	`±` element-wise	`[1,2] + [3,4]` → `[4,6]`
`multiply / divide`	`×÷` element-wise	`[2,4] * [3,2]` → `[6,8]`
`power(a, n)`	$a^n$	`[2,3]^2` → `[4,9]`
`sqrt(a)`	$\sqrt{a}$	`[4,9]` → `[2,3]`
`exp(a)`	$e^a$	`[0,1]` → `[1, 2.718]`
`log(a)`	$\ln(a)$	`[1, e]` → `[0, 1]`
`sin / cos / tan`	Trig (radians)	`[0, π/2]` → `[0, 1]`

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> All NumPy math functions are element-wise and support broadcasting — they are always faster than Python loops; just watch out for <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">log(0)</code> and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">tan(π/2)</code> edge cases.</div>

NumPy Linear Algebra

Thu, 12 Mar 2026 00:00:00 GMT

V. NumPy Linear Algebra (线性代数)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> The <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">numpy.linalg</code> module provides standard linear algebra (线性代数) operations on 2-D arrays treated as matrices (矩阵). These are essential for machine learning, physics simulations, and engineering calculations. </div>

1. Matrix Multiplication (矩阵乘法)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Use @ or np.dot() for matrix multiplication — NOT * (which is element-wise).

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

A @ B          # Matrix multiply (矩阵乘法)
np.dot(A, B)   # Same result: [[19 22] [43 50]]

A * B          # Element-wise multiply (逐元素乘法) — different!

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Always use @ or np.dot() for matrix multiplication. Using * gives element-wise results, which is a common mistake.</span></div>

2. `linalg.inv()` — Inverse Matrix (逆矩阵)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Find matrix $A^{-1}$ such that $A \cdot A^{-1} = I$ (identity matrix).

A = np.array([[1, 2], [3, 4]])

A_inv = np.linalg.inv(A)
# [[-2.   1. ]
#  [ 1.5 -0.5]]

# Verify: A @ A_inv ≈ Identity matrix (单位矩阵)
np.round(A @ A_inv)   # [[1. 0.] [0. 1.]]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Only square, non-singular matrices (非奇异矩阵) are invertible. A singular matrix (奇异矩阵) will raise LinAlgError.</span></div>

3. `linalg.det()` — Determinant (行列式)

<span style="color:#E8600A;font-weight:700">Core idea:</span> A scalar value describing matrix properties. If det = 0, the matrix is singular (not invertible).

A = np.array([[1, 2], [3, 4]])
np.linalg.det(A)   # -2.0

B = np.array([[1, 2], [2, 4]])  # rows are proportional
np.linalg.det(B)   # 0.0  → singular matrix!

4. `linalg.eig()` — Eigenvalues & Eigenvectors (特征值与特征向量)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Find $\lambda$ and $v$ such that $A \cdot v = \lambda \cdot v$. Core of PCA (主成分分析) and many ML algorithms.

A = np.array([[4, 1], [2, 3]])

eigenvalues, eigenvectors = np.linalg.eig(A)
# eigenvalues:  [5. 2.]
# eigenvectors: columns are the eigenvectors

print(eigenvalues)    # [5. 2.]
print(eigenvectors)   # [[0.707 -0.447]
                      #  [0.707  0.894]]

5. `linalg.svd()` — Singular Value Decomposition (奇异值分解)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Decompose any matrix $M = U \Sigma V^T$. Used in data compression, image processing, and recommendation systems.

A = np.array([[1, 2], [3, 4], [5, 6]])

U, S, Vt = np.linalg.svd(A)
# U: left singular vectors (shape: 3×3)
# S: singular values (shape: 2,) — diagonal of Σ
# Vt: right singular vectors transposed (shape: 2×2)

print(S)   # [9.525 0.514]

6. `linalg.solve()` — Solve Linear Equations (线性方程组)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Solve $Ax = b$ for $x$. More numerically stable than computing $A^{-1} \cdot b$.

$$Ax = b \quad \Rightarrow \quad x = \text{linalg.solve}(A, b)$$

# Solve: 2x + y = 5
#        x + 3y = 10
A = np.array([[2, 1], [1, 3]])
b = np.array([5, 10])

x = np.linalg.solve(A, b)
# [1. 3.] → x=1, y=3

# Verify
np.allclose(A @ x, b)   # True

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Always prefer linalg.solve() over inv(A) @ b for numerical stability and performance.</span></div>

7. Quick Comparison Table

Function (函数)	Mathematical Operation	Use Case
`@` / `dot(A,B)`	$AB$	Matrix multiply
`linalg.inv(A)`	$A^{-1}$	Invert matrix
`linalg.det(A)`	$\det(A)$	Check singularity
`linalg.eig(A)`	$Av = \lambda v$	PCA, stability
`linalg.svd(A)`	$U\Sigma V^T$	Compression, rank
`linalg.solve(A,b)`	$Ax = b$	Linear systems

NumPy Random Sampling

Thu, 12 Mar 2026 00:00:00 GMT

VI. NumPy Random Sampling (随机抽样)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">numpy.random</code> generates pseudo-random numbers (伪随机数) following various distributions (分布). Always set a <span style="color:#E8600A;font-weight:700">seed (随机种子)</span> for reproducible experiments. </div>

1. Setting the Seed (设置随机种子)

import numpy as np

np.random.seed(42)   # All subsequent random calls are reproducible

2. `random.rand()` — Uniform Distribution (均匀分布)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Floats uniformly distributed in [0, 1).

np.random.rand(3)        # 1-D: 3 random floats
np.random.rand(2, 4)     # 2-D: shape (2, 4)

# Scale to [a, b]: a + (b - a) * rand()
np.random.rand(5) * 10   # uniform in [0, 10)

3. `random.randn()` — Standard Normal Distribution (标准正态分布)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Floats from a distribution with mean=0, std=1 (Bell curve / 钟形曲线).

np.random.randn(4)         # 1-D: 4 values near 0
np.random.randn(3, 3)      # 2-D: 3×3 matrix

# Scale to mean=μ, std=σ
mu, sigma = 5, 2
mu + sigma * np.random.randn(100)

4. `random.randint()` — Random Integers (随机整数)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Random integers in [low, high) — high is excluded.

np.random.randint(0, 10)           # single integer, 0–9
np.random.randint(1, 7, size=5)    # five dice rolls
np.random.randint(0, 100, size=(3, 4))  # 3×4 matrix

5. `random.choice()` — Sample from Array (从数组中抽取)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Randomly pick elements from a 1-D array — with or without replacement (有/无放回).

arr = np.array([10, 20, 30, 40, 50])

np.random.choice(arr, 3)                # 3 samples WITH replacement
np.random.choice(arr, 3, replace=False) # 3 samples WITHOUT replacement

# Weighted sampling (带权重抽取)
np.random.choice(arr, 3, p=[0.1, 0.2, 0.4, 0.2, 0.1])

6. `random.normal()` — Custom Normal Distribution (自定义正态分布)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Generate samples from a normal distribution with any mean (均值) and standard deviation (标准差).

np.random.normal(loc=0, scale=1, size=5)   # = randn(5)
np.random.normal(loc=170, scale=10, size=1000)  # heights in cm

7. Other Useful Distributions (其他常用分布)

np.random.uniform(low=1, high=6, size=10)   # Continuous uniform (连续均匀)
np.random.binomial(n=10, p=0.5, size=5)     # Binomial (二项分布)
np.random.poisson(lam=3, size=10)           # Poisson (泊松分布)
np.random.shuffle(arr)                       # Shuffle in-place (原地打乱)
np.random.permutation(arr)                  # Shuffled copy (打乱副本)

8. Quick Comparison Table

Function (函数)	Distribution	Output Range
`rand(*shape)`	Uniform	[0, 1) floats
`randn(*shape)`	Standard Normal	≈ [-3, 3] floats
`randint(lo, hi, size)`	Discrete Uniform	[lo, hi) integers
`choice(a, n)`	Custom array	Elements of `a`
`normal(μ, σ, size)`	Normal	Floats near μ
`uniform(lo, hi, size)`	Continuous Uniform	[lo, hi) floats

VII. NumPy Input&Output

Thu, 12 Mar 2026 00:00:00 GMT

VII. NumPy Input / Output (输入输出)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> NumPy supports two file formats: <span style="color:#E8600A;font-weight:700">text files (文本文件)</span> like CSV for human-readable data, and <span style="color:#E8600A;font-weight:700">binary files (二进制文件)</span> like <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.npy</code> / <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.npz</code> for fast, compact storage. Use text for sharing; use binary for speed. </div>

1. `savetxt()` / `loadtxt()` — Text Files (文本文件)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Save/load arrays as human-readable text (CSV, TSV, etc.).

import numpy as np

a = np.array([[1.0, 2.0, 3.0],
              [4.0, 5.0, 6.0]])

# Save to CSV (保存为CSV)
np.savetxt('data.csv', a, delimiter=',', fmt='%.2f',
           header='col1,col2,col3', comments='')

# Load from CSV (从CSV加载)
b = np.loadtxt('data.csv', delimiter=',', skiprows=1)
print(b)
# [[1. 2. 3.]
#  [4. 5. 6.]]

2. `save()` / `load()` — Binary `.npy` Format (二进制格式)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Save a single array to a .npy binary file — preserves dtype, shape, and is much faster than text.

a = np.array([1, 2, 3, 4, 5])

np.save('my_array.npy', a)       # auto-adds .npy extension

b = np.load('my_array.npy')
print(b)   # [1 2 3 4 5]

Format	Extension	Speed	Human-readable?
Text	`.csv`, `.txt`	Slow	✅ Yes
Binary	`.npy`	<span style="color:#E8600A;font-weight:700">Fast</span>	❌ No

3. `savez()` / `load()` — Multiple Arrays (多数组存储)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Save multiple arrays into a single .npz file (a zip of .npy files).

a = np.array([1, 2, 3])
b = np.array([[4, 5], [6, 7]])

np.savez('multi.npz', x=a, y=b)            # uncompressed
np.savez_compressed('multi.npz', x=a, y=b) # compressed (smaller)

data = np.load('multi.npz')
print(data['x'])   # [1 2 3]
print(data['y'])   # [[4 5] [6 7]]

4. `tofile()` / `fromfile()` — Raw Binary (原始二进制)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Write raw bytes to disk — no metadata (shape/dtype) is saved. You must know the dtype and shape to reload correctly.

a = np.array([1, 2, 3, 4], dtype=np.int32)
a.tofile('raw.bin')

b = np.fromfile('raw.bin', dtype=np.int32)
print(b)   # [1 2 3 4]  — shape info is LOST, always 1-D

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">tofile/fromfile do NOT save dtype or shape. Always use save/load (.npy) unless you need raw binary for interoperability with C/Fortran code.</span></div>

5. `genfromtxt()` — Robust Text Loading (鲁棒文本加载)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Like loadtxt but handles missing values (缺失值) and mixed data types.

# CSV with missing values
# 1,2,NaN
# 4,,6
data = np.genfromtxt('messy.csv', delimiter=',',
                     filling_values=0)   # replace NaN with 0

6. Quick Comparison Table

Function	Format	Preserves dtype/shape?	Use case
`savetxt / loadtxt`	Text (CSV)	❌	Share data
`save / load`	`.npy` binary	✅	Fast single array
`savez / load`	`.npz` binary	✅	Multiple arrays
`tofile / fromfile`	Raw binary	❌	C interop
`genfromtxt`	Text	❌	Missing values

NumPy Statistical Analysis

Thu, 12 Mar 2026 00:00:00 GMT

IV. NumPy Statistical Analysis (统计分析)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> NumPy's statistical functions let you summarize arrays with one line of code. Most functions accept an <span style="color:#E8600A;font-weight:700">axis (轴)</span> argument — without it, they operate on <strong>all elements</strong>; with it, they reduce along the specified dimension. </div>

1. `sum()` / `mean()` — Total & Average (总和与均值)

import numpy as np

a = np.array([[1, 2, 3],
              [4, 5, 6]])

np.sum(a)           # 21   — sum of ALL elements
np.sum(a, axis=0)   # [5 7 9] — column sums (按列求和)
np.sum(a, axis=1)   # [6 15]  — row sums (按行求和)

np.mean(a)          # 3.5
np.mean(a, axis=0)  # [2.5 3.5 4.5]

2. `min()` / `max()` — Extreme Values (极值)

np.min(a)           # 1
np.max(a)           # 6
np.min(a, axis=1)   # [1 4] — min of each row
np.max(a, axis=0)   # [4 5 6] — max of each column

np.ptp(a)           # 5 — peak-to-peak = max - min (极差)

3. `argmin()` / `argmax()` — Index of Extreme Values (极值索引)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Returns the <strong>index (索引)</strong> of the minimum or maximum element, not the value itself.

b = np.array([3, 1, 4, 1, 5, 9, 2])

np.argmin(b)   # 1  (index of first minimum value 1)
np.argmax(b)   # 5  (index of maximum value 9)

# Along an axis
np.argmax(a, axis=0)  # [1 1 1] → row index of max in each column

4. `std()` / `var()` — Spread Measures (离散程度)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Measure how spread out the data is. Standard deviation (标准差) = $\sqrt{\text{variance (方差)}}$

a = np.array([2, 4, 4, 4, 5, 5, 7, 9])

np.std(a)    # 2.0  — population std (总体标准差)
np.var(a)    # 4.0  — population variance (总体方差)

# Sample std/var (样本标准差/方差): use ddof=1
np.std(a, ddof=1)   # 2.138...
np.var(a, ddof=1)   # 4.571...

5. `cumsum()` / `cumprod()` — Cumulative Functions (累积函数)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Returns running totals — each output element is the sum (or product) of all elements up to that position.

a = np.array([1, 2, 3, 4])

np.cumsum(a)    # [1  3  6 10]  — running sum (累积和)
np.cumprod(a)   # [1  2  6 24]  — running product (累积积)

# 2-D with axis
m = np.array([[1,2],[3,4]])
np.cumsum(m, axis=0)  # [[1,2],[4,6]] — cumulative down columns

6. `median()` / `percentile()` — Percentile Stats (百分位数)

a = np.array([1, 2, 3, 4, 5])

np.median(a)                    # 3.0 — middle value (中位数)
np.percentile(a, 25)            # 2.0 — 25th percentile (四分位数)
np.percentile(a, [25, 50, 75])  # [2. 3. 4.]

7. Quick Comparison Table

Function (函数)	Returns	axis support?
`sum()`	Total of elements	✅
`mean()`	Average value	✅
`min()` / `max()`	Smallest / largest value	✅
`argmin()` / `argmax()`	Index of min / max	✅
`std()`	Standard deviation	✅
`var()`	Variance	✅
`cumsum()`	Running sum array	✅
`median()`	Middle value	✅
`percentile(a, q)`	q-th percentile	✅

NumPy Set Operations

Thu, 12 Mar 2026 00:00:00 GMT

VIII. NumPy Set Operations (集合运算)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> NumPy provides set-like operations (集合运算) on <span style="color:#E8600A;font-weight:700">1-D arrays</span>. These treat the array as a set of values and support finding unique elements, intersections (交集), unions (并集), and differences (差集) — all returned as <strong>sorted</strong> arrays. </div>

1. `unique()` — Find Unique Values (求唯一值)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Return sorted unique elements. Optionally return counts or indices.

import numpy as np

a = np.array([3, 1, 2, 1, 3, 3, 2])

np.unique(a)                        # [1 2 3]

# Also return counts (每个值出现的次数)
vals, counts = np.unique(a, return_counts=True)
# vals:   [1 2 3]
# counts: [2 2 3]

# Also return first-occurrence indices (首次出现的索引)
vals, idx = np.unique(a, return_index=True)
# idx: [1 2 0]  (positions in original array)

2. `intersect1d()` — Intersection (交集)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Elements that appear in both arrays.

a = np.array([1, 2, 3, 4, 5])
b = np.array([3, 4, 5, 6, 7])

np.intersect1d(a, b)   # [3 4 5]

# Also return indices in each array
common, ia, ib = np.intersect1d(a, b, return_indices=True)
# ia: [2 3 4] (positions in a)
# ib: [0 1 2] (positions in b)

3. `union1d()` — Union (并集)

<span style="color:#E8600A;font-weight:700">Core idea:</span> All elements from either array, deduplicated and sorted.

a = np.array([1, 2, 3])
b = np.array([2, 3, 4, 5])

np.union1d(a, b)   # [1 2 3 4 5]

4. `setdiff1d()` — Difference (差集)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Elements in a that are not in b (order matters: a − b).

a = np.array([1, 2, 3, 4, 5])
b = np.array([3, 4])

np.setdiff1d(a, b)   # [1 2 5]  — in a but not in b
np.setdiff1d(b, a)   # []       — b minus a (empty here)

5. `in1d()` — Membership Test (成员检测)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Returns a boolean array — True where elements of a appear in b.

a = np.array([1, 2, 3, 4, 5])
b = np.array([2, 4])

mask = np.in1d(a, b)        # [False  True False  True False]
a[mask]                     # [2 4]  — filter using the mask

# Modern equivalent (NumPy 1.24+)
np.isin(a, b)               # same result, more readable

6. `setxor1d()` — Symmetric Difference (对称差集)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Elements in either array but not in both (XOR logic).

a = np.array([1, 2, 3, 4])
b = np.array([3, 4, 5, 6])

np.setxor1d(a, b)   # [1 2 5 6]  — not in common

7. Visual Summary

a = [1, 2, 3, 4, 5]
b =       [3, 4, 5, 6, 7]

intersect1d: [3, 4, 5]         ← overlap
union1d:     [1, 2, 3, 4, 5, 6, 7]  ← all
setdiff1d(a,b): [1, 2]         ← only in a
setxor1d:    [1, 2, 6, 7]      ← not shared

8. Quick Comparison Table

Function (函数)	Result	Analogy
`unique(a)`	Deduplicated `a`	Remove duplicates
`intersect1d(a, b)`	a ∩ b	Both have it
`union1d(a, b)`	a ∪ b	Either has it
`setdiff1d(a, b)`	a − b	Only `a` has it
`setxor1d(a, b)`	a △ b	Only one has it
`in1d(a, b)`	Boolean mask	Is `a[i]` in `b`?

NumPy Logical Operations

Thu, 12 Mar 2026 00:00:00 GMT

IX. NumPy Logical Operations (逻辑运算)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Logical operations (逻辑运算) in NumPy work <span style="color:#E8600A;font-weight:700">element-wise</span> on arrays and always return a <span style="color:#E8600A;font-weight:700">boolean array (布尔数组)</span>. These are the foundation of <strong>masking (掩码)</strong> and <strong>conditional filtering (条件筛选)</strong>. </div>

1. Comparison Operators (比较运算符)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Compare each element to a value or to another array. Returns True/False for each position.

import numpy as np

a = np.array([1, 2, 3, 4, 5])

a > 3          # [False False False  True  True]
a == 3         # [False False  True False False]
a != 3         # [ True  True False  True  True]
a >= 3         # [False False  True  True  True]

# Functional equivalents (函数式写法)
np.greater(a, 3)       # same as a > 3
np.less(a, 3)          # same as a < 3
np.equal(a, 3)         # same as a == 3
np.not_equal(a, 3)     # same as a != 3

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Never use Python's and / or / not on arrays — they raise errors. Always use np.logical_and / logical_or / logical_not instead.</span></div>

2. Logical AND / OR / NOT (逻辑与/或/非)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Combine boolean arrays element-wise. Use &, |, ~ as shorthand.

a = np.array([1, 2, 3, 4, 5])

# AND (与): both conditions must be True
mask = (a > 2) & (a < 5)             # [F F T T F]
np.logical_and(a > 2, a < 5)         # same

# OR (或): at least one condition is True
mask = (a < 2) | (a > 4)             # [T F F F T]
np.logical_or(a < 2, a > 4)          # same

# NOT (非): invert boolean
mask = ~(a > 3)                       # [T T T F F]
np.logical_not(a > 3)                 # same

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Always wrap individual conditions in parentheses when using & and |, because & has higher precedence than > in Python.</span> E.g., write <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">(a > 2) & (a < 5)</code>, not <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">a > 2 & a < 5</code>.</div>

3. Boolean Masking — Filter Arrays (布尔掩码筛选)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Use a boolean array as an index to select matching elements.

a = np.array([10, 25, 3, 47, 8, 60])

mask = a > 20
print(mask)    # [False  True False  True False  True]
print(a[mask]) # [25 47 60]  — only values where mask is True

# One-liner
a[a > 20]      # [25 47 60]
a[(a > 10) & (a < 50)]  # [25 47]

4. `np.where()` — Conditional Selection (条件选择)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Like a vectorized ternary: where(condition, value_if_true, value_if_false).

a = np.array([1, -2, 3, -4, 5])

np.where(a > 0, a, 0)       # [1  0  3  0  5]  — replace negatives with 0
np.where(a > 0, 'pos', 'neg')  # ['pos' 'neg' 'pos' 'neg' 'pos']

# Without x, y: returns indices where condition is True (返回满足条件的索引)
np.where(a > 0)    # (array([0, 2, 4]),)

5. `any()` / `all()` — Global Boolean Tests (全局布尔检验)

<span style="color:#E8600A;font-weight:700">Core idea:</span> Test whether any or all elements satisfy a condition.

a = np.array([1, 2, 3, 4, 5])

np.any(a > 4)          # True  — at least one element > 4
np.all(a > 0)          # True  — all elements > 0
np.all(a > 3)          # False — not all > 3

# With axis
m = np.array([[1, 2], [0, 4]])
np.any(m == 0, axis=1)  # [False  True]

6. `isnan()` / `isinf()` — Special Value Checks (特殊值检验)

a = np.array([1.0, np.nan, np.inf, -np.inf, 2.0])

np.isnan(a)    # [F  T  F  F  F]
np.isinf(a)    # [F  F  T  T  F]
np.isfinite(a) # [T  F  F  F  T]

# Clean NaN values (清除NaN值)
a[~np.isnan(a)]   # [1.  inf -inf  2.]

7. Quick Comparison Table

Operation (操作)	Shorthand	Function
Greater than (大于)	`a > b`	`np.greater(a, b)`
Less than (小于)	`a < b`	`np.less(a, b)`
Equal (等于)	`a == b`	`np.equal(a, b)`
AND (与)	`mask1 & mask2`	`np.logical_and(m1, m2)`
OR (或)	`mask1	mask2`
NOT (非)	`~mask`	`np.logical_not(mask)`
Conditional replace	—	`np.where(cond, x, y)`
Any true?	—	`np.any(cond)`
All true?	—	`np.all(cond)`

Tensor Shape & Dimension Transforms

Thu, 12 Mar 2026 00:00:00 GMT

II. Tensor Shape & Dimension Transforms (张量形状与维度变换)

1. `Tensor.view()` / `Tensor.reshape()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Reshapes a Tensor without changing its data. <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">view</code> requires Contiguous Memory (连续内存); <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">reshape</code> handles non-contiguous cases automatically. </div>

x = torch.arange(12)   # shape [12]
y = x.view(3, 4)       # shape [3, 4]
z = x.reshape(2, 6)    # shape [2, 6]
w = x.reshape(-1, 3)   # -1 auto-infers → shape [4, 3]

2. `torch.squeeze()` / `torch.unsqueeze()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">squeeze</code>: Removes dimensions of size 1. <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">unsqueeze</code>: Inserts a size-1 dimension at a specified position. </div>

x = torch.zeros(1, 3, 1, 5)
y = x.squeeze()       # [3, 5]

z = torch.zeros(3, 5)
w = z.unsqueeze(0)    # [1, 3, 5]
v = z.unsqueeze(-1)   # [3, 5, 1]

3. `torch.cat()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Concatenates multiple Tensors along an <strong>existing dimension (已有维度)</strong>. Does <span style="color:#C0392B;font-weight:600">not</span> create a new axis. </div>

a = torch.zeros(2, 3)
b = torch.ones(4, 3)
c = torch.cat([a, b], dim=0)  # shape [6, 3]

d = torch.cat([torch.zeros(2, 2), torch.ones(2, 4)], dim=1)  # shape [2, 6]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> All Tensors must have the same shape on every dimension except the concatenation axis (拼接轴).</div>

4. `torch.stack()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Stacks Tensors along a <strong>new dimension (新维度)</strong>. All input Tensors must be exactly the same shape. </div>

a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
c = torch.stack([a, b])        # [2, 3]  — new dim=0
d = torch.stack([a, b], dim=1) # [3, 2]  — new dim=1

5. `Tensor.permute()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Reorders dimensions according to a specified axis order. Equivalent to NumPy's <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">transpose(axes)</code>. </div>

# Convert NCHW → NHWC
x = torch.zeros(8, 3, 224, 224)
y = x.permute(0, 2, 3, 1)  # shape [8, 224, 224, 3]

6. `Tensor.transpose()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Swaps exactly two specified dimensions. A simplified version of <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">permute</code> for axis swapping. </div>

x = torch.zeros(4, 5, 6)
y = x.transpose(1, 2)   # shape [4, 6, 5]

m = torch.rand(3, 4)
mt = m.t()              # 2D matrix transpose → shape [4, 3]

7. `torch.split()` / `torch.chunk()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">split</code>: Splits by specified sizes. <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">chunk</code>: Splits into equal pieces; the last chunk may be smaller. </div>

x = torch.arange(10)
parts = torch.split(x, 3)   # (tensor([0,1,2]), tensor([3,4,5]), tensor([6,7,8]), tensor([9]))
chunks = torch.chunk(x, 3)  # 3 chunks: [0–3], [4–6], [7–9]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Multi-GPU Sharding (多GPU分片) and DataLoader batch splitting internally rely on <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">chunk</code> / <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">split</code> logic.</div>

8. `Tensor.flatten()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Flattens a Tensor to 1D, or flattens a specific range of dimensions. </div>

x = torch.zeros(2, 3, 4)
y = x.flatten()       # shape [24]
z = x.flatten(1, 2)   # shape [2, 12]  — only flatten dims 1–2

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Most commonly used at the CNN → Fully Connected (全连接) transition. Equivalent to <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">x.view(x.size(0), -1)</code>.</div>

9. `torch.broadcast_to()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Broadcasts (广播) a Tensor to a target shape as a read-only view. No data is copied. </div>

x = torch.tensor([1, 2, 3])      # shape [3]
y = torch.broadcast_to(x, (4, 3)) # shape [4, 3]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Broadcasting is the underlying mechanism of most PyTorch arithmetic operations. Understanding it helps avoid Shape Errors (形状错误).</div>

10. `Tensor.expand()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Expands size-1 dimensions to a specified size. Shares storage (no memory copy), unlike <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">repeat</code>. </div>

x = torch.zeros(3, 1)
y = x.expand(3, 4)   # shape [3, 4] — zero memory copy
z = x.repeat(1, 4)   # shape [3, 4] — actual data copy

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">expand</code> results in a Non-contiguous Tensor; call <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.contiguous()</code> or <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.clone()</code> before writing.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Master <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">reshape</code>, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">squeeze/unsqueeze</code>, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">cat/stack</code>, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">permute</code> — together they cover 90% of all shape manipulation needs.</div>

Tensor Creation & Basic Operations

Thu, 12 Mar 2026 00:00:00 GMT

I. Tensor Creation & Basic Operations (张量创建与基础操作)

1. `torch.tensor()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Creates a Tensor (张量) directly from a Python list or NumPy array. You can specify the Data Type (数据类型) and Device (设备) at creation time. </div>

import torch
x = torch.tensor(
    [[1.0, 2.0], [3.0, 4.0]],
    dtype=torch.float32
)
print(x.shape)  # torch.Size([2, 2])


import numpy as np

arr = np.array([1, 2, 3], dtype=np.float32)
x = torch.as_tensor(arr)

arr[0] = 100 # share the memory with array
print(x)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Every call <strong>copies</strong> the data. To share memory with the source array, use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">torch.as_tensor()</code> instead.</div>

2. `torch.zeros()` / `torch.ones()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Creates all-zero or all-one Tensors (全零/全一张量). Commonly used for bias initialization (偏置初始化) and mask generation (掩码生成). </div>

z = torch.zeros(3, 4)                    # 3×4 all zeros
o = torch.ones(2, 3, dtype=torch.int32) # 2×3 all ones, int type


out = torch.empty(2, 3,  dtype=torch.float16)
print(out)
out = torch.zeros(2, 3, dtype=torch.float16, out=out)
print(out)

# tensor([[ 5.5680e+03, -9.3126e-04,         nan],
#         [ 0.0000e+00,  7.8906e-01,  1.1133e-01]], dtype=torch.float16)
# tensor([[0., 0., 0.],
#         [0., 0., 0.]], dtype=torch.float16)

3.`torch.arange()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Generates an Arithmetic Sequence Tensor (等差数列张量), analogous to Python's <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">range()</code>. Supports float step sizes. </div>

t = torch.arange(0, 10, 2)       # tensor([0, 2, 4, 6, 8])
f = torch.arange(0.0, 1.0, step = 0.25) # tensor([0.00, 0.25, 0.50, 0.75])

print(t)
print(f)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Floating-point steps may cause Boundary Precision Issues (边界精度问题). For exact equal-interval sampling, use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">torch.linspace()</code>.</div>

4. `torch.linspace()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Uniformly generates <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">steps</code> points in the interval [start, end]. More precise than <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">arange</code> for float ranges. </div>

t = torch.linspace(0, 1, steps=5)
# tensor([0.00, 0.25, 0.50, 0.75, 1.00])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Commonly used for plotting function curves (函数曲线) and generating uniformly sampled frequency axes (均匀采样频率轴).</div>

5. `torch.rand()` / `torch.randn()`/ `torch.randint()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">rand</code>: Uniform Distribution (均匀分布) U[0,1). <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">randn</code>: Standard Normal Distribution (标准正态分布) N(-∞, +∞). </div>

u = torch.rand(2, 3)   # Uniform distribution
n = torch.randn(2, 3)  # Normal distribution
i = torch.randint(low=0, high=10, size=(2, 3))

# Fix seed for reproducibility (可复现性)
torch.manual_seed(42)
x = torch.rand(3)
print(x)

print(u)
print(n)
print(i)

# tensor([0.8823, 0.9150, 0.3829])
# tensor([[0.9593, 0.3904, 0.6009],
#         [0.2566, 0.7936, 0.9408]])
# tensor([[ 1.5231,  0.6647, -1.0324],
#         [-0.2770, -0.1671, -0.1079]])
# tensor([[6, 3, 1],
#         [9, 3, 1]])

6. `torch.eye()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Creates an Identity Matrix (单位矩阵) with ones on the diagonal and zeros elsewhere. Commonly used in Linear Algebra (线性代数) and regularization (正则化). </div>

I = torch.eye(3)
# tensor([[1., 0., 0.],
#         [0., 1., 0.],
#         [0., 0., 1.]])

7. `torch.full()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Creates a Tensor of specified shape where all elements equal a given Fill Value (填充值). </div>

t = torch.full((2, 3), fill_value=7.0)
# tensor([[7., 7., 7.],
#         [7., 7., 7.]])

8. `torch.from_numpy()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Converts a NumPy ndarray to a Tensor. The two <strong>share memory (共享内存)</strong> — modifying one affects the other. </div>

import numpy as np
arr = np.array([1.0, 2.0, 3.0])
t = torch.from_numpy(arr) # as_tensor(arr)
arr[0] = 99
print(t)  # tensor([99., 2., 3.])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Shared memory is a <span style="color:#C0392B;font-weight:600">double-edged sword</span>: it saves memory but can cause unintended modifications to the source array.</div>

9. `Tensor.numpy()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Converts a CPU Tensor back to a NumPy ndarray. Also shares the underlying memory (底层内存). </div>

t = torch.tensor([1.0, 2.0, 3.0])
arr = t.numpy()
t[0] = 100
print(arr)  # [100.  2.  3.]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">GPU Tensors must call <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.cpu()</code> first</span>, and Tensors with <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">requires_grad=True</code> must call <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.detach()</code> first.</div>

10.`torch.empty()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Allocates Uninitialized Memory (未初始化内存) for a Tensor — the fastest allocation method. Values are whatever remains in memory. </div>

t = torch.empty(3, 3)  # Values are undefined
t.fill_(0.5)           # Must fill before reading

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Never read values before filling</span>. Best for performance-sensitive scenarios where you immediately overwrite the buffer.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">torch.tensor()</code> for known data, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">torch.zeros/ones/rand/randn()</code> for initialized buffers, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">torch.empty()</code> only when you'll immediately overwrite every element.</div>

Math & Statistical Operations

Thu, 12 Mar 2026 00:00:00 GMT

III. Math & Statistical Operations (数学与统计运算)

1. `torch.matmul()` / `@`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> General Matrix Multiplication (通用矩阵乘法). Supports 2D matrices, batched matrix multiplication (批量矩阵乘), and mixed broadcasting. </div>

a = torch.rand(3, 4)
b = torch.rand(4, 5)
c = torch.matmul(a, b)  # [3, 5]
d = a @ b               # equivalent

# Batched matmul
x = torch.rand(8, 3, 4)
y = torch.rand(8, 4, 5)
z = x @ y               # [8, 3, 5]

2. `torch.sum()` / `torch.mean()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Computes the sum or mean over all elements or a specified axis. <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">keepdim=True</code> preserves the reduced dimension. </div>

x = torch.tensor([[1., 2., 3.], [4., 5., 6.]])
print(x.sum())                        # 21.0
print(x.sum(dim=0))                   # [5, 7, 9]
print(x.mean(dim=1, keepdim=True))    # [[2.], [5.]]

3) `torch.max()` / `torch.min()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Returns the maximum/minimum value. When a <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">dim</code> is specified, returns both values and indices (argmax/argmin). </div>

x = torch.tensor([3., 1., 4., 1., 5.])
print(x.max())                 # tensor(5.)
vals, idx = x.max(dim=0)       # vals=5.0, idx=4
idx2 = x.argmax()              # tensor(4)

4) `torch.abs()` / `torch.sqrt()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Element-wise absolute value or square root. Used in loss computation (损失计算) and feature normalization (特征归一化). </div>

x = torch.tensor([-1., 4., -9.])
print(torch.abs(x))   # [1., 4., 9.]
y = torch.tensor([1., 4., 9.])
print(torch.sqrt(y))  # [1., 2., 3.]

5) `torch.clamp()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Clips values to the range [min, max]. Values outside the range are truncated to the boundary. </div>

x = torch.tensor([-2., 0., 3., 8.])
y = torch.clamp(x, min=0., max=5.)  # tensor([0., 0., 3., 5.])
z = torch.clamp(x, min=0.)          # equivalent to ReLU

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> The go-to tool for Gradient Clipping (梯度裁剪), normalization, and avoiding <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">log(0)</code>.</div>

6) `torch.pow()` / `torch.exp()` / `torch.log()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Element-wise power, natural exponent, and natural logarithm. </div>

x = torch.tensor([1., 2., 3.])
print(torch.pow(x, 2))    # [1., 4., 9.]
print(torch.exp(x))       # [e^1, e^2, e^3]
print(torch.log(x))       # [0., 0.693, 1.099]
print(torch.log1p(x))     # Numerically stable log(1+x)

7) `torch.dot()` / `torch.cross()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">dot</code>: Inner product of 1D vectors. <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">cross</code>: Cross product (叉积) of 3D vectors (physics / 3D graphics). </div>

a = torch.tensor([1., 2., 3.])
b = torch.tensor([4., 5., 6.])
print(torch.dot(a, b))  # 32.0

8) `torch.norm()` / `torch.linalg.norm()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Computes vector/matrix norms: L1, L2, Frobenius Norm (Frobenius范数), etc. </div>

x = torch.tensor([3., 4.])
print(torch.linalg.norm(x))          # L2: 5.0
print(torch.linalg.norm(x, ord=1))   # L1: 7.0

9) `torch.topk()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Returns the top-k largest (or smallest) values and their indices from a Tensor. </div>

x = torch.tensor([3., 1., 4., 1., 5., 9.])
vals, idx = torch.topk(x, k=3)
# vals: tensor([9., 5., 4.])
# idx:  tensor([5, 4, 2])

_, top5 = logits.topk(5, dim=1)  # Top-5 accuracy evaluation

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Standard approach for Top-5 Accuracy (Top-5准确率) evaluation. Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">largest=False</code> to get the smallest k values.</div>

10) `torch.unique()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Returns unique elements from a Tensor, with optional sorting, counting, and inverse mapping (逆映射). </div>

x = torch.tensor([1, 2, 2, 3, 1, 4])
u, cnt = torch.unique(x, return_counts=True)
# u:   tensor([1, 2, 3, 4])
# cnt: tensor([2, 2, 1, 1])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Commonly used for processing Category Labels (类别标签) and deduplicating tokens.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br><code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">matmul/@</code> powers Transformers, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">clamp</code> guards numerical safety, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">topk</code> drives classification evaluation.</div>

Automatic Differentiation — Autograd

Thu, 12 Mar 2026 00:00:00 GMT

IV. Automatic Differentiation — Autograd (自动微分)

1. `Tensor.requires_grad`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Marks whether gradient computation (梯度计算) is needed for this Tensor. It is the entry switch of the Autograd System (自动微分系统). </div>

x = torch.tensor([2.0], requires_grad=True)
y = x ** 2 + 3 * x
y.backward()
print(x.grad)  # dy/dx = 2x+3 = 7

2. `Tensor.backward()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Triggers Backpropagation (反向传播) from a scalar (or with a gradient tensor argument), computing gradients for all leaf nodes. </div>

x = torch.tensor([1., 2., 3.], requires_grad=True)
y = (x * 2).sum()
y.backward()
print(x.grad)  # tensor([2., 2., 2.])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Gradients <strong>accumulate</strong> by default. <span style="color:#C0392B;font-weight:600">Call <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">optimizer.zero_grad()</code> before each iteration</span>.</div>

3. `torch.no_grad()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Context manager that disables gradient computation — saves memory and speeds up inference (推理) / evaluation (评估). </div>

model.eval()
with torch.no_grad():
    output = model(x)
    loss = criterion(output, labels)

# Also usable as a decorator
@torch.no_grad()
def predict(x):
    return model(x)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Always enable this during inference</span>, otherwise inference is slow and VRAM usage is high.</div>

4. `Tensor.detach()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Returns a new Tensor disconnected from the Computation Graph (计算图), sharing data but not propagating gradients. </div>

x = torch.tensor([1., 2.], requires_grad=True)
y = x * 3
z = y.detach()       # no gradient tracking
arr = y.detach().numpy()  # must detach before .numpy()

5. `torch.autograd.grad()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Explicitly computes gradients of outputs w.r.t. inputs. Supports Higher-order Gradients (高阶梯度) like Hessians. </div>

x = torch.tensor(2.0, requires_grad=True)
y = x ** 3
dy_dx, = torch.autograd.grad(y, x, create_graph=True)  # 1st order
d2y,   = torch.autograd.grad(dy_dx, x)                 # 2nd order

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Core API for MAML (Model-Agnostic Meta-Learning, 模型无关元学习) and Physics-Informed Neural Networks (物理信息神经网络, PINN).</div>

6. `Tensor.grad` / `Tensor.grad_fn`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">grad</code>: stores the accumulated gradient. <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">grad_fn</code>: points to the Backward Function (反向传播函数) that created this Tensor. </div>

x = torch.tensor([1., 2.], requires_grad=True)
y = x * x
print(y.grad_fn)   # <MulBackward0 ...>
y.sum().backward()
print(x.grad)      # tensor([2., 4.])

7. `torch.enable_grad()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Re-enables gradient tracking inside a <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">no_grad</code> context, enabling fine-grained control. </div>

with torch.no_grad():
    x = model.encode(data)
    with torch.enable_grad():
        x.requires_grad_(True)
        loss = head(x)  # only this part tracked

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Useful for Partial Freeze Training (部分冻结训练), e.g., fine-tuning only the last layer.</div>

8. `register_hook()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Registers a hook function on a Tensor's backward pass, enabling inspection or modification of intermediate gradients. </div>

grads = []
def save_grad(g):
    grads.append(g.clone())

x = torch.rand(3, requires_grad=True)
y = (x**2).sum()
x.register_hook(save_grad)
y.backward()
print(grads[0])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Invaluable for debugging Gradient Vanishing/Explosion (梯度消失/爆炸) and implementing gradient penalties like WGAN-GP.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Always pair <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">backward()</code> with <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">zero_grad()</code>, wrap inference in <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">no_grad()</code>, and use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">detach()</code> to stop gradients from crossing module boundaries.</div>

Neural Network Modules

Thu, 12 Mar 2026 00:00:00 GMT

V. Neural Network Modules — `nn.Module` (神经网络模块)

1. `nn.Module`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> The base class for all neural networks in PyTorch. Manages parameters (参数), sub-modules (子模块), and defines the forward pass (前向传播) logic. </div>

import torch.nn as nn

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

2. `nn.Linear()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Fully Connected Layer (全连接层) / Affine Transformation (仿射变换): y = xW<sup>T</sup> + b. The most fundamental learnable layer. </div>

fc = nn.Linear(in_features=128, out_features=64, bias=True)
x = torch.rand(32, 128)
out = fc(x)  # shape [32, 64]

3. `nn.Conv2d()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> 2D Convolutional Layer (二维卷积层). Extracts local spatial features; the core building block of CNNs (卷积神经网络). </div>

conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1)
x = torch.rand(8, 3, 224, 224)
out = conv(x)  # [8, 64, 224, 224]

4. `nn.BatchNorm2d()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Normalizes each channel of a mini-batch (小批量归一化). Accelerates training and mitigates gradient vanishing (梯度消失). </div>

bn = nn.BatchNorm2d(num_features=64)
x = torch.rand(8, 64, 28, 28)
out = bn(x)
# Standard order: Conv → BN → ReLU

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">BN is unstable when batch_size=1</span>. Switch to GroupNorm or LayerNorm in that case.</div>

5. `nn.Dropout()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> During training, randomly zeros out a fraction of neurons — a Regularization (正则化) technique to prevent Overfitting (过拟合). </div>

dropout = nn.Dropout(p=0.5)
x = torch.rand(4, 128)
out = dropout(x)       # 50% elements zeroed during train mode

dropout.eval()
out_eval = dropout(x)  # identical to x in eval mode

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Forgetting <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">model.eval()</code></span> is the #1 most common bug causing non-deterministic inference results.</div>

6. `nn.Sequential()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Chains a series of layers in order, executing each <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">forward</code> call sequentially. Simplifies model definition. </div>

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(256, 10)
)
out = model(x)

7. `nn.ModuleList()` / `nn.ModuleDict()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Registers sub-modules as a list or dictionary so that their parameters are correctly tracked and saved. </div>

layers = nn.ModuleList([nn.Linear(64, 64) for _ in range(6)])
for layer in layers:
    x = torch.relu(layer(x))

heads = nn.ModuleDict({
    'cls': nn.Linear(64, 10),
    'reg': nn.Linear(64, 1)
})

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Plain Python <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">list</code> / <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">dict</code> are not registered</span> — <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">parameters()</code> will miss them!</div>

8. `nn.Embedding()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Maps integer indices to dense vectors (稠密向量). The standard Word Embedding Lookup Table (词向量查找表) in NLP. </div>

vocab_size, embed_dim = 10000, 128
emb = nn.Embedding(vocab_size, embed_dim)
ids = torch.randint(0, vocab_size, (16, 50))  # [batch, seq_len]
out = emb(ids)  # [16, 50, 128]

9. `nn.LSTM()` / `nn.GRU()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Long Short-Term Memory (长短时记忆) and Gated Recurrent Unit (门控循环单元) — classic recurrent layers for sequence data (序列数据). </div>

lstm = nn.LSTM(input_size=128, hidden_size=256, num_layers=2, batch_first=True, dropout=0.2)
x = torch.rand(8, 50, 128)  # [batch, seq, feat]
out, (h, c) = lstm(x)

10. `nn.MultiheadAttention()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Multi-head Self-Attention (多头自注意力机制) — the core component of the Transformer Architecture (Transformer架构). </div>

attn = nn.MultiheadAttention(embed_dim=512, num_heads=8, batch_first=True)
x = torch.rand(4, 100, 512)
out, weights = attn(query=x, key=x, value=x)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">key_padding_mask</code> to mask padding tokens; use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">attn_mask</code> for Causal Masking (因果掩码) in decoders.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Every custom network inherits from <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">nn.Module</code>; use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ModuleList/Dict</code> (not plain lists) to ensure parameters are tracked.</div>

Activation Functions & Loss Functions

Thu, 12 Mar 2026 00:00:00 GMT

VI. Activation Functions & Loss Functions (激活函数与损失函数)

1. `nn.ReLU()` / `F.relu()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Rectified Linear Unit (修正线性单元): max(0, x). Alleviates gradient vanishing; the most widely used activation function. </div>

import torch.nn.functional as F
x = torch.randn(4, 64)
out1 = F.relu(x)                     # functional call
relu = nn.ReLU(inplace=True)
out2 = relu(x)                       # module call (can go in Sequential)

2. `nn.GELU()` / `nn.SiLU()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">GELU</code>: the Transformer standard activation. <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">SiLU</code> (Swish): used in EfficientNet and mobile models. </div>

gelu = nn.GELU()
silu = nn.SiLU()
x = torch.randn(4, 64)
print(gelu(x).shape)  # [4, 64]
print(silu(x).shape)  # [4, 64]

3. `nn.Softmax()` / `F.softmax()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Converts logits to a Probability Distribution (概率分布) summing to 1. Output layer for multi-class classification (多分类任务). </div>

logits = torch.tensor([2.0, 1.0, 0.1])
probs = F.softmax(logits, dim=0)   # tensor([0.659, 0.242, 0.099])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Do NOT manually add Softmax when using <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">CrossEntropyLoss</code></span> — it already includes it internally.</div>

4. `nn.CrossEntropyLoss()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Multi-class Cross-Entropy Loss (多分类交叉熵损失) — internally fuses LogSoftmax + NLLLoss for numerical stability. </div>

criterion = nn.CrossEntropyLoss()
logits = torch.rand(8, 10)
labels = torch.randint(0, 10, (8,))
loss = criterion(logits, labels)
loss.backward()

5. `nn.BCEWithLogitsLoss()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Binary / Multi-label Classification Loss (二分类/多标签分类损失). More numerically stable than applying Sigmoid then BCE. </div>

criterion = nn.BCEWithLogitsLoss()
logits = torch.rand(8, 1)                        # no sigmoid needed
targets = torch.randint(0, 2, (8, 1)).float()
loss = criterion(logits, targets)

6. `nn.MSELoss()` / `nn.L1Loss()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Mean Squared Error (均方误差) and Mean Absolute Error (平均绝对误差) — for continuous value prediction (连续值预测) / regression (回归). </div>

mse = nn.MSELoss()
mae = nn.L1Loss()
pred = torch.rand(4, 1)
target = torch.rand(4, 1)
print(mse(pred, target))
print(mae(pred, target))

7. `nn.KLDivLoss()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> KL Divergence Loss (KL散度损失) — measures the difference between two probability distributions. Used in knowledge distillation (知识蒸馏) and VAE. </div>

kl = nn.KLDivLoss(reduction='batchmean')
log_p = F.log_softmax(student_logits, dim=-1)   # input: log prob
q = F.softmax(teacher_logits, dim=-1)            # target: prob
loss = kl(log_p, q)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Input must be log-probabilities; target must be probabilities</span>. This matches the mathematical definition.</div>

8. `nn.LayerNorm()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Normalizes over the last N dimensions. Independent of batch size — the standard normalization in Transformers (Transformer标配). </div>

ln = nn.LayerNorm(normalized_shape=512)
x = torch.rand(4, 100, 512)
out = ln(x)  # shape [4, 100, 512]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Outperforms BatchNorm for variable-length NLP sequences (可变长序列) and small batch sizes.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">CrossEntropyLoss</code> for multi-class, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">BCEWithLogitsLoss</code> for multi-label, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">SmoothL1</code> for regression — and never apply Softmax before CrossEntropy.</div>

Optimizers & Learning Rate Schedulers

Thu, 12 Mar 2026 00:00:00 GMT

VII. Optimizers & Learning Rate Schedulers (优化器与学习率调度)

1. `torch.optim.SGD()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Stochastic Gradient Descent (随机梯度下降). Supports momentum (动量), weight decay (权重衰减), and Nesterov momentum. </div>

optimizer = torch.optim.SGD(
    model.parameters(), lr=0.01, momentum=0.9,
    weight_decay=1e-4, nesterov=True
)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> SGD+momentum is still common in CV; final accuracy sometimes surpasses Adam.</div>

2. `torch.optim.Adam()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Adaptive Moment Estimation (自适应矩估计). Combines AdaGrad and RMSProp. The default optimizer for most tasks. </div>

optimizer = torch.optim.Adam(
    model.parameters(), lr=1e-3, betas=(0.9, 0.999), weight_decay=1e-4
)
optimizer.zero_grad()
loss.backward()
optimizer.step()

3. `torch.optim.AdamW()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Improved Adam with correctly decoupled L2 regularization (解耦L2正则). The go-to optimizer for training Transformers. </div>

optimizer = torch.optim.AdamW(
    model.parameters(), lr=5e-5, weight_decay=0.01
)  # Standard config for BERT/GPT fine-tuning

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> In original Adam, L2 regularization is entangled with adaptive learning rate scaling. AdamW fixes this by decoupling them.</div>

4. `optimizer.zero_grad()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Clears all parameter gradient buffers. <strong>Must be called before each backward pass</strong>. </div>

for epoch in range(10):
    for x, y in dataloader:
        optimizer.zero_grad()   # 1. clear
        pred = model(x)
        loss = criterion(pred, y)
        loss.backward()         # 3. backward
        optimizer.step()        # 4. update

5. `lr_scheduler.StepLR()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Multiplies the learning rate by <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">gamma</code> every <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">step_size</code> epochs — stepwise decay (阶梯式衰减). </div>

from torch.optim import lr_scheduler
scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
# Call at end of each epoch:
scheduler.step()

6. `lr_scheduler.CosineAnnealingLR()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Cosine Annealing Decay (余弦退火调度): LR oscillates between [eta_min, lr] following a cosine curve. Excellent convergence. </div>

scheduler = lr_scheduler.CosineAnnealingLR(
    optimizer, T_max=50, eta_min=1e-6
)

7. `lr_scheduler.OneCycleLR()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Super-Convergence Training Strategy (超融合训练策略): LR rises then falls in a single cycle. Significantly reduces convergence time. </div>

scheduler = lr_scheduler.OneCycleLR(
    optimizer, max_lr=0.01,
    steps_per_epoch=len(loader), epochs=10
)
scheduler.step()  # call after every step (not epoch)

8. `torch.optim.LBFGS()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Quasi-Newton second-order optimizer (拟牛顿二阶优化器). Suited for small datasets. Requires a <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">closure</code> function. </div>

optimizer = torch.optim.LBFGS(model.parameters(), lr=1)

def closure():
    optimizer.zero_grad()
    loss = criterion(model(x), y)
    loss.backward()
    return loss

optimizer.step(closure)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Preferred for Neural Style Transfer (神经风格迁移) and other small-scale, high-precision convergence tasks.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Default to <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">AdamW</code> + <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">CosineAnnealingLR</code> for Transformers, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">SGD</code> + <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">StepLR</code> for classic CNN image tasks.</div>

Data Loading & Preprocessing

Thu, 12 Mar 2026 00:00:00 GMT

VIII. Data Loading & Preprocessing (数据加载与预处理)

1. `torch.utils.data.Dataset`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Abstract base class for custom datasets. Must implement <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">len</code> and <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">getitem</code>. </div>

from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

2. `torch.utils.data.DataLoader`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Wraps a Dataset into an iterable batch loader with parallel reading (并行读取) and data shuffling (数据打乱). </div>

from torch.utils.data import DataLoader
loader = DataLoader(
    dataset=train_ds, batch_size=32,
    shuffle=True, num_workers=4, pin_memory=True
)
for x, y in loader:
    ...

3. `torchvision.transforms`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Image preprocessing and data augmentation (数据增强) library. Chain multiple transforms with <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">Compose</code>. </div>

from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> The Normalize parameters are ImageNet statistics. Keep them consistent when using Transfer Learning (迁移学习).</div>

4. `torchvision.datasets.ImageFolder`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Automatically builds an image classification dataset from directory structure — subdirectory names become class labels (类别标签). </div>

from torchvision.datasets import ImageFolder
# data/train/cat/*.jpg, data/train/dog/*.jpg
ds = ImageFolder(root='data/train', transform=transform)
print(ds.classes)   # ['cat', 'dog']

5. `torch.utils.data.random_split()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Randomly splits a dataset into train/validation subsets by specified lengths. </div>

from torch.utils.data import random_split
n_val = int(len(dataset) * 0.2)
train_ds, val_ds = random_split(dataset, [len(dataset) - n_val, n_val])

6. `torchvision.models` (pretrained)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Provides many pre-trained models: ResNet, VGG, ViT, etc. Enables rapid Transfer Learning (迁移学习). </div>

import torchvision.models as models
model = models.resnet50(weights='IMAGENET1K_V2')
model.fc = nn.Linear(2048, 10)  # replace head for fine-tuning

Model Saving, Loading & Deployment

Thu, 12 Mar 2026 00:00:00 GMT

IX. Model Saving, Loading & Deployment (模型保存、加载与部署)

1. `torch.save()` / `torch.load()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Serializes / deserializes any Python object (model, tensor, dict) to/from a file. </div>

# Recommended: save only the weights dict
torch.save(model.state_dict(), 'model_weights.pth')

# Load
state = torch.load('model_weights.pth', map_location='cpu')
model.load_state_dict(state)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Saving the entire model object couples code paths. <span style="color:#C0392B;font-weight:600">Always save only <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">state_dict</code></span>.</div>

2. `model.state_dict()` / `load_state_dict()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Gets / loads an ordered dictionary of model parameters. The core interface for Transfer Learning (迁移学习) and checkpoint resuming (断点续训). </div>

checkpoint = {
    'epoch': epoch,
    'model': model.state_dict(),
    'optim': optimizer.state_dict(),
    'loss': best_loss
}
torch.save(checkpoint, 'ckpt.pth')

3. `torch.jit.script()` / `torch.jit.trace()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Compiles a model to TorchScript for deployment in Python-free environments (C++, mobile). </div>

# trace: follow execution path (no control flow)
traced = torch.jit.trace(model, torch.rand(1, 3, 224, 224))
traced.save('traced.pt')

# script: supports dynamic control flow
scripted = torch.jit.script(model)

4. `torch.onnx.export()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Exports a PyTorch model to ONNX format for cross-framework deployment (TensorRT, OpenVINO). </div>

torch.onnx.export(
    model, torch.rand(1, 3, 224, 224), 'model.onnx',
    opset_version=17,
    input_names=['input'], output_names=['output'],
    dynamic_axes={'input': {0: 'batch'}}
)

5. `model.parameters()` / `named_parameters()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Iterates over all learnable parameters. The <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">named_</code> version also returns parameter names. </div>

total = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'Params: {total/1e6:.1f}M')

for name, p in model.named_parameters():
    print(name, p.shape)

6. `model.train()` / `model.eval()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Switches between training and evaluation modes — affects Dropout and BatchNorm behavior. </div>

model.train()
for x, y in train_loader:
    loss = criterion(model(x), y)
    loss.backward(); optimizer.step()

model.eval()
with torch.no_grad():
    for x, y in val_loader:
        pred = model(x)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Forgetting <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">model.eval()</code></span> is the most common reason for unstable inference results.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Always save <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">state_dict</code> (not the model object), and remember the deploy path: PyTorch → TorchScript / ONNX → Runtime.</div>

GPU Acceleration & Distributed Training

Thu, 12 Mar 2026 00:00:00 GMT

X. GPU Acceleration & Distributed Training (GPU加速与分布式训练)

1. `Tensor.to()` / `.cuda()` / `.cpu()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Moves Tensors or models to a specified device (GPU/CPU). The fundamental operation for GPU training (GPU训练). </div>

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
x = x.to(device)
result = output.cpu().numpy()  # move back to CPU

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Model and data must be on the same device</span>. Mixing CPU/GPU Tensors throws a runtime error.</div>

2. `torch.cuda.amp` — Automatic Mixed Precision (自动混合精度)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Automatically switches between FP16 and FP32, reducing VRAM usage and accelerating training. </div>

from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()

with autocast():
    output = model(x)
    loss = criterion(output, y)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

3. `nn.DataParallel()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Single-machine multi-GPU Data Parallel (数据并行) training. Automatically splits batches and aggregates gradients. </div>

if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model)
model = model.to('cuda')

# Access original model
sd = model.module.state_dict() if isinstance(model, nn.DataParallel) else model.state_dict()

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">DataParallel efficiency is limited by Python GIL</span>. For large-scale training, use DistributedDataParallel (DDP).</div>

4. `nn.parallel.DistributedDataParallel()` — DDP

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Distributed Data Parallel: one process per GPU. Communication efficiency far exceeds DataParallel. </div>

import torch.distributed as dist
dist.init_process_group('nccl')
local_rank = int(os.environ['LOCAL_RANK'])
model = model.to(local_rank)
model = nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])

5. `torch.cuda.memory_summary()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Prints detailed GPU VRAM usage to help diagnose Out-of-Memory (OOM, 显存溢出) issues. </div>

print(torch.cuda.memory_summary())

alloc = torch.cuda.memory_allocated()
total = torch.cuda.get_device_properties(0).total_memory
print(f'{alloc/1e9:.1f}GB used')

6. `torch.compile()` (PyTorch 2.0+)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Compiles the model into optimized kernels using graph capture and Triton operators, dramatically accelerating training/inference. </div>

model = torch.compile(model)

# Different modes
model = torch.compile(model, mode='reduce-overhead', fullgraph=True)

Advanced Features & Utilities

Thu, 12 Mar 2026 00:00:00 GMT

XI. Advanced Features & Utilities (高级特性与实用工具)

1. `torch.nn.utils.clip_grad_norm_()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Clips the global L2 norm of all parameter gradients to prevent Gradient Explosion (梯度爆炸). Essential for RNN/Transformer training. </div>

optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()

2. `torch.nn.utils.weight_norm()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Decomposes parameters into direction and magnitude, accelerating convergence. Used in WaveNet and generative models. </div>

from torch.nn.utils import weight_norm, remove_weight_norm
wn_conv = weight_norm(nn.Conv1d(64, 64, 3, padding=1))
remove_weight_norm(wn_conv)  # merge before deployment

3. `torch.nn.functional.interpolate()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Upsamples or downsamples feature maps with bilinear, nearest, bicubic, etc. interpolation modes. </div>

x = torch.rand(1, 64, 28, 28)
up = F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
print(up.shape)  # [1, 64, 56, 56]

4. `torch.nn.functional.grid_sample()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Samples from a feature map at normalized grid coordinates. The core of Spatial Transformer Networks (空间变换网络, STN). </div>

theta = torch.eye(2, 3, dtype=torch.float).unsqueeze(0)
grid = F.affine_grid(theta, x.size())
out = F.grid_sample(x, grid, mode='bilinear')

5. `torch.einsum()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Einstein Summation (爱因斯坦求和): expresses complex tensor operations as a concise string equation. </div>

c = torch.einsum('ij,jk->ik', a, b)  # matrix multiply

# Attention scores: Q:[B,H,L,D], K:[B,H,L,D]
scores = torch.einsum('bhld,bhmd->bhlm', Q, K)

6. `torch.profiler.profile()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Performance profiler that records per-operator CPU/GPU time and memory usage to locate bottlenecks. </div>

from torch.profiler import profile, ProfilerActivity
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
    model(x)
print(prof.key_averages().table(sort_by='cuda_time_total'))

7. `torch.nn.init.*`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Provides Xavier, Kaiming, Orthogonal and other Parameter Initialization (参数初始化) strategies. Directly impacts training stability. </div>

def init_weights(m):
    if isinstance(m, nn.Linear):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        nn.init.zeros_(m.bias)

model.apply(init_weights)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Sigmoid/Tanh → Xavier; ReLU family → Kaiming; Transformer → Orthogonal initialization.</div>

8. `torch.Tensor.item()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Converts a single-element Tensor to a Python scalar. Commonly used to log loss values. </div>

loss = criterion(output, label)
loss_val = loss.item()        # detaches from graph
print(f'Loss: {loss_val:.4f}')

# WRONG: total_loss += loss  ← graph grows → OOM
# CORRECT:
total_loss += loss.item()

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Accumulating <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">loss</code> tensors directly causes the computation graph to grow unboundedly → OOM</span>. Always use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.item()</code>.</div>

9. `torch.Tensor.clone()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Creates a deep copy (深拷贝) of a Tensor with fully independent data, while preserving gradient propagation. </div>

x = torch.rand(3, requires_grad=True)
y = x.clone()           # gradient can still propagate
z = x.detach().clone()  # gradient detached

buf = torch.empty(3)
buf.copy_(x)            # in-place copy

10. `torch.where()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Conditional selection: returns values from <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">x</code> where condition is True, else from <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">y</code>. Vectorized if-else. </div>

x = torch.tensor([-1., 2., -3., 4.])
y = torch.zeros_like(x)
out = torch.where(x > 0, x, y)  # tensor([0., 2., 0., 4.])  — manual ReLU

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Both branches participate in gradient computation; the gradient of the unselected branch is zero.</div>

11. `torch.gather()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Gathers values from a source Tensor by an index Tensor — enables irregular indexing (不规则索引) like NMS. </div>

logits = torch.rand(4, 10)
targets = torch.tensor([3, 7, 1, 5]).unsqueeze(1)  # [4, 1]
scores = logits.gather(dim=1, index=targets)         # [4, 1]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Core tool for sequence decoding, top-k sampling, and Q-value selection in Reinforcement Learning (强化学习).</div>

12. `torch.scatter_()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Scatters values from <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">src</code> into <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">self</code> at positions specified by <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">index</code> (in-place). </div>

y = torch.zeros(4, 5)
labels = torch.tensor([[2], [0], [4], [1]])
y.scatter_(dim=1, index=labels, value=1.0)  # one-hot encoding

13. `torch.masked_fill()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Fills positions where the mask is True with a specified value. Essential for Attention Masking (注意力掩码). </div>

L = 5
mask = torch.triu(torch.ones(L, L), diagonal=1).bool()
scores = torch.rand(L, L)
scores = scores.masked_fill(mask, float('-inf'))  # causal mask

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Transformer decoder's Causal Self-Attention (因果自注意力) must use this to block future information.</div>

14. `torch.nn.utils.rnn.pad_sequence()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Pads a list of variable-length sequences to a uniform-length tensor for NLP batch processing. </div>

from torch.nn.utils.rnn import pad_sequence
seqs = [torch.tensor([1, 2, 3]), torch.tensor([4, 5]), torch.tensor([6])]
padded = pad_sequence(seqs, batch_first=True)  # [3, 3]

15. `nn.TransformerEncoder()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Multi-layer Transformer Encoder with built-in Multi-head Attention + FFN + Residual Normalization. </div>

enc_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8, dim_feedforward=2048, batch_first=True)
encoder = nn.TransformerEncoder(enc_layer, num_layers=6)
out = encoder(src, src_key_padding_mask=m)

16. `torch.Tensor.contiguous()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Returns a Contiguous Memory (连续内存) copy of the Tensor. Returns itself (zero overhead) if already contiguous. </div>

x = torch.rand(4, 5, 6)
y = x.permute(2, 0, 1)         # non-contiguous
print(y.is_contiguous())        # False
z = y.contiguous()
w = z.view(6, -1)               # safe to view now

17. `torch.Tensor.type()` / `.to(dtype)`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Converts the Tensor's Data Type (数据类型): float32 ↔ float16 ↔ int64, etc. </div>

x = torch.tensor([1, 2, 3])
f = x.float()   # int → float32
h = x.half()    # float32 → float16
l = x.long()    # → int64
y = x.to(dtype=torch.float32)  # recommended

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Cross-entropy labels need <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">long()</code>; normalized image pixels need <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">float()</code>; inference acceleration uses <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">half()</code>.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>The advanced toolkit: <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">clip_grad_norm_</code> (stability), <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">einsum</code> (clarity), <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">gather/scatter</code> (indexing), <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">masked_fill</code> (attention), <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">item()</code> (memory safety).</div>

Convolution, Pooling & Normalization Layers

Thu, 12 Mar 2026 00:00:00 GMT

XII. Convolution, Pooling & Normalization Layers (卷积、池化与正则化层)

1. `nn.MaxPool2d()` / `nn.AvgPool2d()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> 2D Max / Average Pooling (最大/平均池化). Downsamples feature maps using a sliding window, reducing spatial size. </div>

pool = nn.MaxPool2d(kernel_size=2, stride=2)
x = torch.rand(8, 64, 28, 28)
out = pool(x)  # [8, 64, 14, 14]

gap = nn.AdaptiveAvgPool2d((1, 1))
feat = gap(out)  # [8, 64, 1, 1] — global avg pool

2. `nn.ConvTranspose2d()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Transposed Convolution (转置卷积 / 反卷积) for upsampling. Core layer in U-Net and GAN Generators. </div>

deconv = nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=4, stride=2, padding=1)
x = torch.rand(4, 64, 14, 14)
out = deconv(x)  # [4, 32, 28, 28]

3. `nn.GroupNorm()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Splits channels into groups and normalizes within each group. Independent of batch size — outperforms BN in small-batch scenarios. </div>

gn = nn.GroupNorm(num_groups=8, num_channels=32)
x = torch.rand(2, 32, 64, 64)
out = gn(x)  # shape unchanged

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Recommended for object detection / instance segmentation (small batch). <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">num_groups=1</code> ≡ LayerNorm.</div>

4. `nn.InstanceNorm2d()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Normalizes each sample and each channel independently. Standard normalization for Image Style Transfer (图像风格迁移). </div>

inst = nn.InstanceNorm2d(num_features=64, affine=True)
x = torch.rand(4, 64, 256, 256)
out = inst(x)

5. `nn.Upsample()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Module-form wrapper around <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">F.interpolate</code>. No learnable parameters; can be placed in <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">Sequential</code>. </div>

up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)
x = torch.rand(4, 32, 14, 14)
out = up(x)  # [4, 32, 28, 28]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Combine with ConvTranspose2d for learnable upsampling control (learned vs fixed).</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Normalization choice: BN (large batch) → GN (small batch/detection) → LN (NLP) → IN (style transfer).</div>

Indexing, Selection & Advanced Operations

Thu, 12 Mar 2026 00:00:00 GMT

XIII. Indexing, Selection & Advanced Operations (索引、选择与高级操作)

1. `torch.nonzero()` / `torch.argwhere()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Returns the coordinates of all non-zero (or True) elements. Used for Sparse Operations (稀疏操作). </div>

x = torch.tensor([[0, 1, 0], [2, 0, 3]])
idx = torch.nonzero(x)           # tensor([[0,1],[1,0],[1,2]])
idx2 = torch.argwhere(x > 0)     # PyTorch 1.9+

2. `torch.index_select()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Selects elements along a dimension by index tensor. Similar to NumPy fancy indexing. </div>

x = torch.rand(5, 4)
idx = torch.tensor([0, 2, 4])
out = torch.index_select(x, dim=0, index=idx)  # shape [3, 4]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Index must be a 1D LongTensor; more efficient than boolean masking for this case.</div>

3. `torch.masked_select()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Selects elements by boolean mask. Returns a flattened 1D Tensor. </div>

x = torch.randn(3, 3)
mask = x > 0
pos_vals = torch.masked_select(x, mask)  # all positive values, 1D

4. `torch.sort()` / `torch.argsort()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Sorts a Tensor along a dimension, returning sorted values and original indices. </div>

x = torch.tensor([3., 1., 4., 1., 5., 9.])
vals, idx = torch.sort(x, descending=True)  # [9,5,4,3,1,1]
order = torch.argsort(x)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Key step in NMS (Non-Maximum Suppression, 非极大抑制): sort boxes by confidence descending.</div>

5. `torch.cumsum()` / `torch.cumprod()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Cumulative sum (累积和) or cumulative product (累积积) along a dimension. </div>

x = torch.tensor([1., 2., 3., 4.])
print(torch.cumsum(x, dim=0))   # tensor([1., 3., 6., 10.])
print(torch.cumprod(x, dim=0))  # tensor([1., 2., 6., 24.])

6. `torch.flip()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Flips a Tensor along specified dimensions — mirror flip augmentation or reverse operation. </div>

x = torch.tensor([[1, 2, 3], [4, 5, 6]])
h = torch.flip(x, dims=[1])  # horizontal: [[3,2,1],[6,5,4]]
v = torch.flip(x, dims=[0])  # vertical: [[4,5,6],[1,2,3]]

7. torch.bucketize()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Assigns continuous values to discrete buckets (离散化) by given boundaries. Analogous to NumPy's <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">digitize</code>. </div>

boundaries = torch.tensor([0.0, 0.5, 1.0])
x = torch.tensor([-0.1, 0.3, 0.7, 1.5])
bins = torch.bucketize(x, boundaries)  # tensor([0, 1, 2, 3])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Useful for Feature Engineering (特征工程) — binning continuous features and custom quantile normalization.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br><code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">gather</code> picks values by index; <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">scatter_</code> puts values by index; <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">masked_fill</code> overwrites by condition.</div>

Randomness & Reproducibility

Thu, 12 Mar 2026 00:00:00 GMT

XIV. Randomness & Reproducibility (随机性与可复现性)

1. `torch.manual_seed()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Sets the global random seed to ensure consistent results across runs (实验可复现性). </div>

import random, numpy as np

def set_seed(seed=42):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)

set_seed(42)

2. `torch.Generator()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Creates an independent Random Number Generator (随机数生成器) object, avoiding interference with the global seed. </div>

g = torch.Generator()
g.manual_seed(42)
x = torch.rand(3, generator=g)  # independent state

loader = DataLoader(ds, shuffle=True, generator=g)

3. `torch.distributions.*`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Probability distribution library supporting sampling and <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">log_prob</code> computation. Foundation for VAE and Reinforcement Learning (强化学习). </div>

from torch.distributions import Normal, Categorical

# VAE reparameterization trick (重参数化技巧)
dist = Normal(loc=mu, scale=torch.exp(logvar))
z = dist.rsample()      # differentiable sampling
log_p = dist.log_prob(z)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">rsample()</code> is differentiable (reparameterization); <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">sample()</code> is not (for policy gradients).</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Always call <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">set_seed()</code> at the start of every experiment and use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">rsample()</code> for differentiable stochastic layers.</div>

Utilities & Performance Tips

Thu, 12 Mar 2026 00:00:00 GMT

XV. Utilities & Performance Tips (实用工具与性能技巧)

1. `torch.no_grad()` vs `torch.inference_mode()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">inference_mode</code> is more aggressive than <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">no_grad</code>: it skips version counting entirely for faster pure inference. </div>

with torch.no_grad():
    out1 = model(x)

@torch.inference_mode()
def predict(x):
    return model(x)

2. `torch.Tensor.pin_memory()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Pins CPU Tensors to Page-locked Memory (页锁定内存), dramatically accelerating CPU→GPU data transfer. </div>

loader = DataLoader(dataset, pin_memory=True, num_workers=4)
for x, y in loader:
    x = x.to('cuda', non_blocking=True)  # async transfer

3. `torch.utils.checkpoint.checkpoint()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Gradient Checkpointing (梯度检查点): trades recomputation for VRAM savings. Can save 50%+ VRAM for very large models. </div>

from torch.utils.checkpoint import checkpoint

def forward(self, x):
    x = checkpoint(self.heavy_block, x)  # no intermediate activations saved
    return self.head(x)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Trades ~30% extra training time for drastically reduced VRAM — enables training models that would otherwise OOM.</div>

4. `torch.nn.functional.one_hot()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Converts integer class indices to One-hot Encoding (独热编码) tensors. </div>

labels = torch.tensor([0, 2, 1, 3])
one_hot = F.one_hot(labels, num_classes=4).float()
# tensor([[1,0,0,0],[0,0,1,0],[0,1,0,0],[0,0,0,1]])

5. `torch.nn.functional.cosine_similarity()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Computes Cosine Similarity (余弦相似度) between two groups of vectors. Core metric in Contrastive Learning (对比学习) — SimCLR, CLIP. </div>

a = torch.randn(8, 128)
b = torch.randn(8, 128)
sim = F.cosine_similarity(a, b, dim=-1)  # shape [8], range [-1, 1]

# Similarity matrix for contrastive learning
mat = a @ b.T  # [8, 8]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Equivalent to (a/||a||) · (b/||b||). In contrastive learning, L2-normalize first then use dot product.</div>

6. `nn.SyncBatchNorm.convert_sync_batchnorm()`

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Replaces all BatchNorm layers with cross-GPU Synchronized BatchNorm (跨GPU同步批归一化). Essential for DDP training with BN. </div>

model = MyModel()
model = nn.SyncBatchNorm.convert_sync_batchnorm(model)  # before DDP wrap
model = nn.parallel.DistributedDataParallel(model, device_ids=[local_rank])

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Must convert before wrapping with DDP</span>. Without SyncBN, each GPU computes its own BN statistics — inaccurate.</div> <div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br>Performance stack: <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">pin_memory + non_blocking</code> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">AMP</code> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">torch.compile</code> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">gradient_checkpoint</code> (if OOM).</div>

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:20px 24px;margin-top:32px"> <span style="color:#3B5BDB;font-weight:700;font-size:16px">🎯 Master Summary — 120 APIs in 6 Core Concepts</span><br><br> <strong>1. Tensor Ops</strong>: Create (<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">tensor/zeros/rand</code>) → Shape (<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">reshape/permute/cat</code>) → Math (<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">matmul/clamp/topk</code>)<br> <strong>2. Autograd</strong>: <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">requires_grad</code> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">backward()</code> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">no_grad / detach</code><br> <strong>3. Networks</strong>: <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">nn.Module</code> → Layers (<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Linear/Conv2d/LSTM/Attention</code>) → Norm + Dropout<br> <strong>4. Training</strong>: Loss → Optimizer (<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">AdamW</code>) → Scheduler → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">clip_grad_norm_</code><br> <strong>5. Data</strong>: <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Dataset</code> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">transforms</code> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">DataLoader</code> → pretrained models<br> <strong>6. Deploy</strong>: <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">state_dict</code> → TorchScript / ONNX → AMP / DDP / <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">torch.compile</code> </div>

Chunked Prefill

Tue, 10 Mar 2026 00:00:00 GMT

I. Chunked Prefill (分块预填充)

1. Background

In Large Language Model (大语言模型) inference, processing a request has two phases:

Prefill (预填充): The model ==processes all input prompt tokens in parallel, building the KV Cache== (键值缓存). This is compute-bound (计算密集型).
Decode (解码): The ==model generates one output token per iteration==, autoregressively. This is memory-bandwidth-bound (内存带宽密集型).

==Without chunked prefill, a single long-prompt prefill request can monopolize the GPU for many milliseconds, blocking decode iterations for other requests== and inflating the Time to First Token (TTFT, 首个令牌时间) and Inter-Token Latency (ITL, 令牌间延迟) for concurrent users.

==It improves GPU utilization, reduces latency, increases throughput, and ensures fair scheduling across requests.==

2. What Is Chunked Prefill?

==Chunked Prefill splits a long prefill sequence into fixed-size pieces called chunks, In the same GPU iteration, it processes: - decode tokens from some requests - prefill chunks from other requests==.

Key Insight: Instead of "finish all prefill, then decode," we interleave them so the GPU is always doing useful, mixed work.

1) Core Idea

$$ \text{Iteration}t = \underbrace{\text{Prefill Chunk}k}{\text{partial prompt}} + \underbrace{\text{Decode Tokens}{r_1, r_2, \ldots}}_{\text{running requests}} $$

Each iteration processes:

A chunk of C tokens from one (or more) prefilling requests.
All current decode tokens from already-started requests.

2) Chunk Size (块大小)

chunk_size (e.g., 512 or 1024 tokens) is a tunable hyperparameter (超参数):

Chunk size	Effect
Smaller	Lower TTFT jitter, better fairness, more scheduling overhead
Larger	Higher GPU utilization, less overhead, longer decode stalls

3. Algorithm

1) Scheduler Logic (调度器逻辑)

# Pseudocode — runs once per GPU iteration
def schedule_iteration(prefill_queue, decode_queue, chunk_size):
    batch = []

    # Step 1: Add decode tokens for all running requests
    for req in decode_queue:
        batch.append(DecodeToken(req))          # 1 token per running request

    # Step 2: Fill remaining compute budget with one prefill chunk
    if prefill_queue:
        req = prefill_queue[0]
        start = req.processed_tokens            # where we left off
        end   = min(start + chunk_size, len(req.prompt))
        batch.append(PrefillChunk(req, start, end))
        req.processed_tokens = end
        if req.processed_tokens == len(req.prompt):
            prefill_queue.pop(0)                # prefill done → move to decode_queue

    return batch

2) KV Cache Construction (键值缓存构建)

Because prefill is chunked, the KV cache is filled incrementally:

$$ \text{KV}[0:n] = \text{KV}[0:C] ;|; \text{KV}[C:2C] ;|; \cdots ;|; \text{KV}[\lfloor n/C \rfloor \cdot C : n] $$

Each chunk appends its computed key-value pairs to the existing cache. This is correct because attention (注意力机制) only requires the cache to be causally complete — past tokens never need updating.

4. Runnable Example

The following standalone example simulates chunked prefill scheduling:

# chunked_prefill_demo.py
# Simulates chunked prefill scheduling on a toy model.
# Requires: pip install numpy

import numpy as np

CHUNK_SIZE = 4          # tokens processed per prefill chunk per iteration
VOCAB = 32              # toy vocabulary size
D_MODEL = 8             # tiny embedding dimension

class Request:
    """A single inference request (推理请求)."""
    def __init__(self, req_id: str, prompt_tokens: list[int]):
        self.req_id = req_id
        self.prompt = prompt_tokens
        self.processed = 0          # how many prompt tokens have been prefilled
        self.kv_cache: list = []    # accumulated KV pairs (simulated as ints)
        self.output_tokens: list[int] = []
        self.is_decoding = False

    def is_prefill_done(self) -> bool:
        return self.processed >= len(self.prompt)

def fake_attention(tokens: list[int], kv_cache: list) -> list:
    """Simulates attention output — returns dummy KV entries."""
    return [t * 2 for t in tokens]   # fake KV = token_id * 2

def fake_decode(kv_cache: list, rng: np.random.Generator) -> int:
    """Samples a token given the completed KV cache."""
    return int(rng.integers(0, VOCAB))

def run_chunked_prefill(requests: list[Request], max_new_tokens: int = 5):
    rng = np.random.default_rng(42)
    prefill_queue = list(requests)
    decode_queue: list[Request] = []
    iteration = 0

    while prefill_queue or decode_queue:
        iteration += 1
        print(f"\n--- Iteration {iteration} ---")

        # ── Decode step (解码步骤): one token per running request ──────────
        for req in list(decode_queue):
            tok = fake_decode(req.kv_cache, rng)
            req.output_tokens.append(tok)
            print(f"  [Decode] {req.req_id}: generated token {tok}")
            if len(req.output_tokens) >= max_new_tokens:
                decode_queue.remove(req)
                print(f"  [Done]   {req.req_id} finished.")

        # ── Prefill chunk (预填充分块): one chunk from the head of the queue ─
        if prefill_queue:
            req = prefill_queue[0]
            start = req.processed
            end   = min(start + CHUNK_SIZE, len(req.prompt))
            chunk = req.prompt[start:end]
            kv    = fake_attention(chunk, req.kv_cache)
            req.kv_cache.extend(kv)
            req.processed = end
            print(f"  [Prefill] {req.req_id}: tokens [{start}:{end}] → KV len={len(req.kv_cache)}")

            if req.is_prefill_done():
                prefill_queue.pop(0)
                decode_queue.append(req)
                print(f"  [Ready]  {req.req_id}: prefill complete → decode queue")

if __name__ == "__main__":
    reqs = [
        Request("R1", list(range(10))),   # 10-token prompt → needs 3 chunks
        Request("R2", list(range(6))),    # 6-token prompt  → needs 2 chunks
    ]
    run_chunked_prefill(reqs, max_new_tokens=3)

Expected output (abridged):

--- Iteration 1 ---
  [Prefill] R1: tokens [0:4] → KV len=4

--- Iteration 2 ---
  [Prefill] R1: tokens [4:8] → KV len=8

--- Iteration 3 ---
  [Prefill] R1: tokens [8:10] → KV len=12
  [Ready]  R1: prefill complete → decode queue

--- Iteration 4 ---
  [Decode] R1: generated token 17
  [Prefill] R2: tokens [0:4] → KV len=4
...

5. Benefits and Trade-offs (优缺点)

Aspect	Without Chunked Prefill	With Chunked Prefill
TTFT (首个令牌时间)	Unpredictable, spikes on long prompts	More predictable, bounded by chunk size
Decode stalls (解码停顿)	Long stalls during large prefills	Eliminated — decode runs every iteration
GPU utilization (GPU利用率)	Suboptimal during pure decode	Higher — compute + memory-BW co-utilized
Implementation complexity	Simple	Moderate (state tracking per request)

6. Key Formula — Prefill Iterations per Request

$$ N_{\text{iters}} = \left\lceil \frac{L}{C} \right\rceil $$

Where $L$ is the prompt length (提示长度) and $C$ is the chunk size (块大小). The request enters the decode queue after $N_{\text{iters}}$ prefill iterations.

7. Related Concepts (相关概念)

PagedAttention (分页注意力) — manages KV cache memory in pages; works naturally with chunked prefill.
Continuous Batching (连续批处理) — iterates dynamically; chunked prefill is one scheduling strategy on top.
Speculative Decoding (推测解码) — orthogonal technique to speed up the decode phase.
TTFT / ITL / TBT — latency metrics affected by chunked prefill scheduling.

Offline Inference

Tue, 10 Mar 2026 00:00:00 GMT

1. Async LLM Streaming 异步流式推理。用 asyncio 非阻塞地逐 token 输出结果，适合 Web 服务实时返回响应，避免等待整个序列生成完才返回。

2. Audio Language 音频+语言多模态推理。将音频输入（语音、声音）与文本结合送入模型，支持语音问答等场景。

3. Automatic Prefix Caching (APC) 自动前缀缓存。对相同前缀的请求复用已计算的 KV Cache，避免重复计算 system prompt，显著降低首 token 延迟。

4. Batch LLM Inference 离线批量推理。一次性处理大量请求，最大化 GPU 吞吐，适合数据处理、评估等非实时场景。

5. Chat With Tools 工具调用推理。模型在对话中调用外部函数/API（Function Calling），实现搜索、计算等能力扩展。

6. Context Extension 上下文窗口扩展。通过 RoPE 缩放等技术让模型处理超出训练长度的输入，支持长文档场景。

7. Data Parallel 数据并行推理。多个 GPU/进程各自持有完整模型副本，并行处理不同请求，横向扩展吞吐量。

8. Disaggregated Prefill V1 预填充分离 V1 版。将计算密集的 prefill 阶段和内存密集的 decode 阶段分配到不同实例，各自优化资源利用。

9. Disaggregated Prefill 预填充分离基础版。同上，是早期/基础实现版本，概念相同但架构实现有差异。

10. Encoder Decoder Multimodal 编码器-解码器多模态。使用 encoder-decoder 架构（如 T5、Whisper 类结构）处理多模态任务，与纯 decoder 架构不同。

11. Extract Hidden States 提取隐藏层状态。获取模型中间层的向量表示，用于 embedding、分类、RAG 检索等下游任务。

12. KV Load Failure Recovery Test KV Cache 加载失败恢复测试。验证当 KV Cache 读取失败时系统能否优雅降级、自动重算，保证服务稳定性。

13. LLM Engine Example LLM 引擎基础示例。直接使用 vLLM 底层 LLMEngine API，展示最基本的请求提交与结果获取流程。

14. LLM Engine Reset KV 引擎运行时重置 KV Cache。演示如何在不重启服务的情况下清空缓存，用于状态管理和内存回收。

15. Load Sharded State 加载分片模型权重。将大模型按层/张量切分成多个文件分片加载，解决单文件过大或显存不足问题。

16. Custom Logits Processors 自定义 logits 处理器。在采样前对模型输出的原始分数进行自定义修改，实现关键词屏蔽、格式约束等控制。

17. LoRA With Quantization Inference LoRA + 量化联合推理。在量化（INT4/INT8）的基础模型上叠加 LoRA 适配器，兼顾显存节省和个性化能力。

18. Metrics 推理性能指标监控。暴露 throughput、latency、TTFT（首 token 时间）等指标，对接 Prometheus/Grafana 用于生产监控。

19. Mistral-Small Mistral Small 模型推理示例。展示如何加载和运行 Mistral 系列小参数量模型，适合资源受限场景。

20. MLPSpeculator MLP 投机解码器。用一个轻量 MLP 网络预测后续多个 token，再由主模型并行验证，加速生成速度 2~3 倍。

21. MultiLoRA Inference 多 LoRA 并发推理。单个 vLLM 实例同时加载多个 LoRA 适配器，按请求动态切换，服务多租户场景。

22. New Weight Syncing 新权重同步机制。在训练-推理协同场景（如在线 RLHF）中，将最新训练好的权重热更新到推理引擎，无需重启。

23. Offline Inference with OpenAI Batch File Format 兼容 OpenAI Batch API 格式的离线推理。读取 .jsonl 格式批量请求文件，输出兼容 OpenAI 格式的结果，便于迁移。

24. Pause Resume 推理暂停与恢复。演示如何中断正在进行的推理任务并在之后恢复，用于资源调度和优先级抢占场景。

25. Prefix Caching 手动前缀缓存示例。与 APC 对应，展示显式控制缓存行为的方式，更灵活地管理共享前缀的缓存策略。

26. Prompt Embed Inference 直接输入 embedding 推理。跳过 tokenizer，直接传入预计算的向量作为模型输入，用于特殊的 embedding 注入场景。

27. Qwen2.5-Omni Offline Inference Qwen2.5-Omni 离线推理示例。展示阿里 Qwen 全模态模型（文本+图像+音频+视频）的本地批量推理用法。

28. Qwen3 Omni Qwen3 全模态推理示例。Qwen3 版本的多模态推理，支持更强的模态理解和更长上下文。

29. Qwen 1M Qwen 百万上下文推理。演示如何运行支持 100 万 token 超长上下文的 Qwen 模型，处理超长文档。

30. Reproducibility 推理结果可复现性。通过固定随机种子、采样参数等手段，确保相同输入每次产生相同输出，用于测试和调试。

31. RLHF 基于人类反馈的强化学习训练示例。展示 vLLM 作为 rollout 引擎参与 PPO 等 RLHF 训练流程的基本用法。

32. RLHF Colocate 训练推理共置的 RLHF。将训练器和推理引擎放在同一组 GPU 上交替运行，节省跨机通信开销和硬件成本。

33. RLHF Online Quant 在线量化 RLHF。训练过程中动态对模型量化后推理，降低 rollout 阶段显存占用，提升训练效率。

34. RLHF Utils RLHF 工具函数库。提供 reward 计算、数据格式转换、采样策略等 RLHF 流程中复用的通用工具。

35. Run One Batch 单批次运行示例。最简化的批量推理演示，用于快速验证环境配置和模型加载是否正常。

36. Save Sharded State 保存分片模型权重。将运行中的模型权重按分片格式导出保存，配合 Load Sharded State 使用，加速下次启动。

37. Simple Profiling 简单性能分析。使用 PyTorch Profiler 等工具记录推理各阶段耗时，定位性能瓶颈。

38. Skip Loading Weights In Engine Init 初始化时跳过权重加载。在测试或调试引擎逻辑时跳过耗时的权重加载步骤，加快启动速度。

39. Spec Decode 投机解码通用示例。用 draft 模型生成候选 token，主模型并行验证，显著提升生成吞吐，适合延迟敏感场景。

40. Structured Outputs 结构化输出约束。强制模型输出符合 JSON Schema 或正则表达式的格式，保证下游程序可靠解析。

41. Torchrun DP Example torchrun 数据并行示例。用 torchrun 启动多进程数据并行推理，展示标准 PyTorch 分布式启动方式。

42. Torchrun Example torchrun 基础示例。展示用 torchrun 启动 vLLM 的最基本分布式配置，不限于数据并行。

43. Vision Language 视觉语言单图推理。将单张图片与文本一起输入模型，支持图像问答、图像描述等 VQA 任务。

44. Vision Language Multi Image 视觉语言多图推理。支持在单次对话中传入多张图片，处理图像对比、多帧理解等更复杂的视觉任务。

Continuous Batching

Mon, 09 Mar 2026 00:00:00 GMT

I. Continuous Batching (连续批处理)

1. Background

Traditional LLM serving uses static batching (静态批处理): a fixed group of requests is loaded together, the GPU runs until every request in the batch finishes, then the next batch starts. Two problems arise:

Padding waste (填充浪费): short requests must be padded to match the longest request in the batch, wasting compute.
Head-of-line blocking (队头阻塞): new requests wait for the entire current batch to complete, even if most slots are already idle.

Continuous Batching (also called iteration-level scheduling, 迭代级调度) solves both by scheduling at the granularity of a single iteration rather than a whole batch.

2. Core Idea

As soon as one request finishes, its GPU slot is freed and a new request is admitted — within the same next iteration.

The scheduler runs once per forward pass (前向传播). It looks at:

Which running requests still need decode steps.
Whether any free capacity exists to admit a new prefill request.

This means the batch composition changes every iteration, hence "continuous."

3. Algorithm

1) Iteration-Level Scheduler (迭代级调度器)

# continuous_batching_demo.py
# Standalone simulation of continuous batching.
# No external dependencies required.

from collections import deque

class Request:
    """One inference request (推理请求)."""
    def __init__(self, req_id: str, prompt_len: int, max_new_tokens: int):
        self.req_id = req_id
        self.prompt_len = prompt_len
        self.max_new_tokens = max_new_tokens
        self.generated = 0          # tokens generated so far
        self.prefill_done = False

    def is_done(self) -> bool:
        return self.prefill_done and self.generated >= self.max_new_tokens

    def step(self):
        """Simulate one decode step (解码步骤)."""
        if not self.prefill_done:
            self.prefill_done = True    # single-step prefill (simplified)
        else:
            self.generated += 1

def run_continuous_batching(
    all_requests: list[Request],
    max_batch_size: int = 3,
    max_iterations: int = 20,
):
    waiting_queue = deque(all_requests)   # requests not yet admitted
    running: list[Request] = []           # currently active requests
    finished: list[Request] = []

    for iteration in range(1, max_iterations + 1):
        # ── 1. Evict finished requests (移除完成的请求) ──────────────────
        done = [r for r in running if r.is_done()]
        for r in done:
            running.remove(r)
            finished.append(r)
            print(f"  [Done]    {r.req_id} finished at iteration {iteration}")

        # ── 2. Admit new requests to fill freed slots (补充新请求) ────────
        while waiting_queue and len(running) < max_batch_size:
            new_req = waiting_queue.popleft()
            running.append(new_req)
            print(f"  [Admit]   {new_req.req_id} admitted at iteration {iteration}")

        if not running:
            print(f"Iteration {iteration}: all done.")
            break

        # ── 3. Step every running request (执行一步) ──────────────────────
        print(f"\n--- Iteration {iteration} | batch={[r.req_id for r in running]} ---")
        for r in running:
            r.step()

    print("\n=== Summary ===")
    for r in finished:
        print(f"  {r.req_id}: prompt={r.prompt_len}, generated={r.generated}")

if __name__ == "__main__":
    requests = [
        Request("R1", prompt_len=10, max_new_tokens=2),
        Request("R2", prompt_len=8,  max_new_tokens=5),
        Request("R3", prompt_len=12, max_new_tokens=3),
        Request("R4", prompt_len=6,  max_new_tokens=2),
        Request("R5", prompt_len=9,  max_new_tokens=4),
    ]
    run_continuous_batching(requests, max_batch_size=3)

Expected output (abridged):

--- Iteration 1 | batch=['R1', 'R2', 'R3'] ---
  [Admit]   R1 admitted at iteration 1
  ...
--- Iteration 3 | batch=['R1', 'R2', 'R3'] ---
  [Done]    R1 finished at iteration 4
  [Admit]   R4 admitted at iteration 4    ← slot freed, new request in immediately
--- Iteration 4 | batch=['R2', 'R3', 'R4'] ---
...

2) Key Invariant (关键不变式)

At every iteration $t$:

$$ |\text{running}t| \leq B{\max} $$

where $B_{\max}$ is the maximum batch size (最大批大小), constrained by GPU memory (显存) available for KV Cache (键值缓存).

4. No Padding Needed

In static batching, all sequences in a batch must share the same length tensor, requiring padding tokens (填充令牌):

Static:  [A A A A _ _]   ← _ = wasted padding
         [B B _ _ _ _]
         [C C C C C C]

In continuous batching with PagedAttention (分页注意力), each request owns its own KV cache pages. The forward pass uses variable-length attention — no padding:

Continuous:  [A A A A]   ← exact length, no padding
             [B B]
             [C C C C C C]

5. Comparison Table

Property	Static Batching (静态批处理)	Continuous Batching (连续批处理)
Scheduling granularity (调度粒度)	Per-batch	Per-iteration
Padding (填充)	Required	Not needed
GPU idle time (GPU空闲时间)	High (slot waits for stragglers)	Near zero
Latency for new requests (新请求延迟)	Full batch wait	At most one iteration
Implementation complexity (实现复杂度)	Simple	Moderate
Typical throughput gain (吞吐量提升)	Baseline	2–5× higher

6. Key Metrics (关键指标)

$$ \text{Throughput (吞吐量)} = \frac{\text{Total output tokens}}{\text{Wall-clock time}} $$

$$ \text{TTFT (首个令牌时间)} = t_{\text{first token}} - t_{\text{request arrival}} $$

$$ \text{ITL (令牌间延迟)} = \frac{t_{\text{last token}} - t_{\text{first token}}}{\text{output tokens} - 1} $$

Continuous batching primarily improves throughput and reduces queuing latency (排队延迟) for newly arriving requests.

7. Related Concepts (相关概念)

Chunked Prefill (分块预填充) — splits long prefills into chunks; a scheduling strategy that sits on top of continuous batching.
PagedAttention (分页注意力) — enables variable-length KV cache per request; prerequisite for efficient continuous batching.
Preemption (抢占) — when KV cache is full, the scheduler may evict a low-priority request and recompute later.
vLLM — the open-source serving framework that popularized continuous batching + PagedAttention.

Python Decorator

Mon, 09 Mar 2026 00:00:00 GMT

I. Decorator Pattern In Python

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Decorator (装饰器)</span> is a design pattern in Python that allows you to <span style="color:#E8600A;font-weight:700">wrap (包裹)</span> a function or class with additional behavior — without modifying its source code. Decorators are built on the concept that <span style="color:#2980B9">functions are first-class objects (函数是一等对象)</span>, meaning they can be passed as arguments, returned from other functions, and assigned to variables. </div>

1. Why Do We Need Decorators?

<span style="color:#E8600A">1.</span> Code Reuse (代码复用): Apply the same cross-cutting logic (logging, timing, auth) to many functions without copy-pasting.

<span style="color:#E8600A">2.</span> Separation of Concerns (关注点分离): Keep business logic clean; push auxiliary logic into decorators.

<span style="color:#E8600A">3.</span> Readability (可读性): The @decorator syntax makes intent explicit at a glance.

2. Prerequisites — Functions as First-Class Objects

Before understanding decorators, you must understand three building blocks.

1) Functions Can Be Assigned to Variables

def greet(name):
    return f"Hello, {name}!"

say_hello = greet          # Assign function to a variable (赋值给变量)
print(say_hello("Alice"))  # Output: Hello, Alice!

2) Functions Can Be Passed as Arguments

def apply(func, value):
    return func(value)     # Call the passed-in function (调用传入的函数)

result = apply(greet, "Bob")
print(result)  # Output: Hello, Bob!

3) Functions Can Be Returned from Functions (Closures 闭包)

def make_multiplier(factor):
    def multiplier(x):
        return x * factor  # Captures `factor` from enclosing scope (捕获外部变量)
    return multiplier      # Return the inner function (返回内部函数)

double = make_multiplier(2)
print(double(5))   # Output: 10
print(double(9))   # Output: 18

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> A <span style="color:#E8600A;font-weight:700">Closure (闭包)</span> is an inner function that "remembers" variables from its enclosing scope even after the outer function has returned. This is the foundation of every decorator.</div>

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A decorator is essentially a <span style="color:#E8600A;font-weight:700">Higher-Order Function (高阶函数)</span> — it takes a function as input, wraps it in a new function that adds behavior, and returns the new function. The <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">@syntax</code> is just syntactic sugar (语法糖) for <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">func = decorator(func)</code>. </div>

1. The Simplest Decorator

1) Manual Style (Without @ Syntax)

def my_decorator(func):
    def wrapper():
        print("--- Before function call ---")  # Pre-logic (前置逻辑)
        func()                                  # Call original function (调用原函数)
        print("--- After function call ---")   # Post-logic (后置逻辑)
    return wrapper

def say_hi():
    print("Hi!")

# Manually wrap (手动包裹)
say_hi = my_decorator(say_hi)
say_hi()

Output:

--- Before function call ---
Hi!
--- After function call ---

2) Using @ Syntax (语法糖)

def my_decorator(func):
    def wrapper():
        print("--- Before function call ---")
        func()
        print("--- After function call ---")
    return wrapper

@my_decorator          # Equivalent to: say_hi = my_decorator(say_hi)
def say_hi():
    print("Hi!")

say_hi()

2. Decorating Functions with Arguments

The <span style="color:#E8600A;font-weight:700">wrapper</span> must accept and forward any arguments the original function takes, using <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">*args</code> and <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">**kwargs</code>.

1) Problem Without *args / **kwargs

def my_decorator(func):
    def wrapper():         # ← No parameters! Will crash if func takes arguments.
        func()
    return wrapper

@my_decorator
def add(a, b):
    return a + b

add(1, 2)  # ❌ TypeError: wrapper() takes 0 positional arguments but 2 were given

2) Correct: Use *args and **kwargs

def my_decorator(func):
    def wrapper(*args, **kwargs):           # Accept any arguments (接受任意参数)
        print(f"Calling: {func.__name__}")
        result = func(*args, **kwargs)      # Forward to original function (转发给原函数)
        print(f"Result: {result}")
        return result                       # Don't forget to return! (记得返回结果)
    return wrapper

@my_decorator
def add(a, b):
    return a + b

@my_decorator
def greet(name, greeting="Hello"):
    return f"{greeting}, {name}!"

add(3, 4)
greet("Alice", greeting="Hi")

Output:

Calling: add
Result: 7
Calling: greet
Result: Hi, Alice!

3. Preserving Function Metadata with functools.wraps

1) The Problem — Metadata Loss (元数据丢失)

def my_decorator(func):
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def add(a, b):
    """Adds two numbers."""  # Docstring (文档字符串)
    return a + b

print(add.__name__)   # Output: wrapper  ← WRONG! Should be "add"
print(add.__doc__)    # Output: None     ← WRONG! Lost the docstring

2) The Fix — @functools.wraps

import functools

def my_decorator(func):
    @functools.wraps(func)           # Copies metadata from func to wrapper (复制元数据)
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def add(a, b):
    """Adds two numbers."""
    return a + b

print(add.__name__)   # Output: add      ✅
print(add.__doc__)    # Output: Adds two numbers.  ✅

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Always use @functools.wraps(func) inside every decorator you write.</span> Without it, debugging tools, documentation generators, and frameworks that inspect <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">name</code> / <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">doc</code> will silently receive wrong information.</div>

4. Practical Decorator Examples

1) Timing Decorator (计时装饰器)

import functools
import time

def timer(func):
    """Measures execution time (测量执行时间) of a function."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()          # High-resolution timer (高精度计时器)
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"[TIMER] {func.__name__} took {elapsed:.6f}s")
        return result
    return wrapper

@timer
def slow_sum(n):
    """Sum of 0..n using a loop."""
    return sum(range(n))

slow_sum(10_000_000)
# Output: [TIMER] slow_sum took 0.412381s

2) Logging Decorator (日志装饰器)

import functools

def logger(func):
    """Logs function calls and their arguments (记录函数调用和参数)."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        args_repr = [repr(a) for a in args]
        kwargs_repr = [f"{k}={v!r}" for k, v in kwargs.items()]
        signature = ", ".join(args_repr + kwargs_repr)
        print(f"[LOG] Calling {func.__name__}({signature})")
        result = func(*args, **kwargs)
        print(f"[LOG] {func.__name__} returned {result!r}")
        return result
    return wrapper

@logger
def divide(a, b):
    return a / b

divide(10, 2)
divide(7, b=3)

Output:

[LOG] Calling divide(10, 2)
[LOG] divide returned 5.0
[LOG] Calling divide(7, b=3)
[LOG] divide returned 2.3333333333333335

3) Retry Decorator (重试装饰器)

import functools
import time

def retry(max_attempts=3, delay=1.0):
    """Retries a function on exception (异常时重试)."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(1, max_attempts + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    print(f"[RETRY] Attempt {attempt}/{max_attempts} failed: {e}")
                    if attempt < max_attempts:
                        time.sleep(delay)
            raise RuntimeError(f"{func.__name__} failed after {max_attempts} attempts.")
        return wrapper
    return decorator

@retry(max_attempts=3, delay=0.5)
def unstable_api_call(url):
    import random
    if random.random() < 0.7:   # 70% chance of failure (模拟不稳定网络)
        raise ConnectionError("Network timeout")
    return f"200 OK from {url}"

try:
    print(unstable_api_call("https://example.com"))
except RuntimeError as e:
    print(e)

4) Cache / Memoization Decorator (缓存装饰器)

import functools

def memoize(func):
    """Caches results of expensive function calls (缓存昂贵函数调用的结果)."""
    cache = {}   # Cache storage (缓存存储)
    @functools.wraps(func)
    def wrapper(*args):
        if args not in cache:
            cache[args] = func(*args)   # Compute and store (计算并存储)
        else:
            print(f"[CACHE] Hit for args={args}")
        return cache[args]
    return wrapper

@memoize
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

print(fibonacci(10))   # Computes fresh
print(fibonacci(10))   # [CACHE] Hit for args=(10,)

5) Access Control Decorator (访问控制装饰器)

import functools

def requires_auth(func):
    """Blocks calls if user is not authenticated (未认证则阻止调用)."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        user = kwargs.get("user") or (args[0] if args else None)
        if not getattr(user, "is_authenticated", False):
            raise PermissionError(f"Access denied: authentication required.")
        return func(*args, **kwargs)
    return wrapper

class User:
    def __init__(self, name, authenticated):
        self.name = name
        self.is_authenticated = authenticated

@requires_auth
def view_dashboard(user):
    return f"Welcome to dashboard, {user.name}!"

alice = User("Alice", authenticated=True)
bob   = User("Bob",   authenticated=False)

print(view_dashboard(alice))   # ✅ Welcome to dashboard, Alice!
try:
    print(view_dashboard(bob)) # ❌ PermissionError
except PermissionError as e:
    print(e)

5. Decorators with Parameters (带参数的装饰器)

A decorator with parameters requires an extra layer of nesting (额外一层嵌套): an outer function receives the parameters and returns the actual decorator.

call site:     @repeat(3)
what happens:  repeat(3)  →  returns decorator
               decorator(func)  →  returns wrapper

1) repeat(n) — Run a Function n Times

import functools

def repeat(n):
    """Runs the decorated function n times (运行被装饰函数 n 次)."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for i in range(n):
                result = func(*args, **kwargs)
            return result          # Returns last result (返回最后一次结果)
        return wrapper
    return decorator

@repeat(3)
def say(message):
    print(message)

say("Hello!")
# Output:
# Hello!
# Hello!
# Hello!

2) validate_types — Runtime Type Checking (运行时类型检查)

import functools

def validate_types(**type_map):
    """Validates argument types at runtime (运行时验证参数类型)."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            import inspect
            sig = inspect.signature(func)
            bound = sig.bind(*args, **kwargs)
            bound.apply_defaults()
            for param_name, expected_type in type_map.items():
                value = bound.arguments.get(param_name)
                if value is not None and not isinstance(value, expected_type):
                    raise TypeError(
                        f"Argument '{param_name}' expected {expected_type.__name__}, "
                        f"got {type(value).__name__}"
                    )
            return func(*args, **kwargs)
        return wrapper
    return decorator

@validate_types(a=int, b=int)
def add(a, b):
    return a + b

print(add(2, 3))       # ✅ 5
print(add(2.0, 3))     # ❌ TypeError: Argument 'a' expected int, got float

6. Stacking Multiple Decorators (叠加多个装饰器)

You can apply several decorators to one function. They are applied bottom-up (从下到上) at definition time, but execute top-down (从上到下) at call time.

import functools

def decorator_A(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print("A: before")
        result = func(*args, **kwargs)
        print("A: after")
        return result
    return wrapper

def decorator_B(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print("B: before")
        result = func(*args, **kwargs)
        print("B: after")
        return result
    return wrapper

@decorator_A         # Applied second (第二个应用) → outermost wrapper
@decorator_B         # Applied first  (第一个应用) → innermost wrapper
def my_func():
    print("  → Running my_func")

my_func()

Output:

A: before
B: before
  → Running my_func
B: after
A: after

7. Class-Based Decorators (基于类的装饰器)

A class can act as a decorator by implementing <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">init</code> (to receive the function) and <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">call</code> (to act as the wrapper).

1) Call Counter Decorator

import functools

class CountCalls:
    """Counts how many times a function has been called (统计调用次数)."""

    def __init__(self, func):
        functools.update_wrapper(self, func)  # Equivalent to @functools.wraps
        self.func = func
        self.count = 0                        # Instance variable (实例变量)

    def __call__(self, *args, **kwargs):
        self.count += 1
        print(f"[COUNT] {self.func.__name__} has been called {self.count} time(s)")
        return self.func(*args, **kwargs)

@CountCalls
def say_hello():
    print("Hello!")

say_hello()
say_hello()
say_hello()
print(f"Total calls: {say_hello.count}")

Output:

[COUNT] say_hello has been called 1 time(s)
Hello!
[COUNT] say_hello has been called 2 time(s)
Hello!
[COUNT] say_hello has been called 3 time(s)
Hello!
Total calls: 3

8. Decorating Classes (装饰类)

Decorators can also be applied to entire classes (整个类), typically to add or modify class-level behavior.

1) Singleton Decorator (单例装饰器)

def singleton(cls):
    """Ensures only one instance of a class is created (确保类只有一个实例)."""
    instances = {}
    @functools.wraps(cls)
    def get_instance(*args, **kwargs):
        if cls not in instances:
            instances[cls] = cls(*args, **kwargs)
        return instances[cls]
    return get_instance

@singleton
class DatabaseConnection:
    def __init__(self, host):
        self.host = host
        print(f"Creating connection to {host}")

db1 = DatabaseConnection("localhost")   # Creating connection to localhost
db2 = DatabaseConnection("localhost")   # (No output — returns existing instance)
print(db1 is db2)                       # True ✅

9. Built-in Decorators in Python (Python内置装饰器)

Decorator (装饰器)	Location	Purpose
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">@staticmethod</code>	Built-in	Method that doesn't receive `self` or `cls` (不接收self/cls的方法)
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">@classmethod</code>	Built-in	Method that receives the class as first arg `cls` (接收类作为第一参数)
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">@property</code>	Built-in	Makes a method accessible like an attribute (方法变属性访问)
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">@functools.wraps</code>	functools	Preserves metadata of wrapped function (保留被包裹函数的元数据)
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">@functools.lru_cache</code>	functools	LRU memoization cache (LRU缓存)
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">@functools.cache</code>	functools (3.9+)	Unbounded memoization cache (无界缓存)
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">@dataclasses.dataclass</code>	dataclasses	Auto-generates `__init__`, `__repr__`, etc. (自动生成初始化方法等)
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">@abstractmethod</code>	abc	Marks a method as abstract (标记方法为抽象方法)

1) @property Example

class Circle:
    def __init__(self, radius):
        self._radius = radius

    @property
    def radius(self):
        """Getter (读取器)."""
        return self._radius

    @radius.setter
    def radius(self, value):
        """Setter with validation (带验证的写入器)."""
        if value < 0:
            raise ValueError("Radius cannot be negative (半径不能为负)")
        self._radius = value

    @property
    def area(self):
        """Computed property (计算属性) — no setter needed."""
        import math
        return math.pi * self._radius ** 2

c = Circle(5)
print(c.radius)   # 5      — accessed like an attribute, not c.radius()
print(c.area)     # 78.53...
c.radius = 10
print(c.area)     # 314.15...
c.radius = -1     # ❌ ValueError

2) @functools.lru_cache Example

import functools

@functools.lru_cache(maxsize=128)   # Cache up to 128 results (缓存最多128个结果)
def expensive_query(user_id: int) -> str:
    print(f"  [DB] Querying user {user_id}...")   # Only prints on cache miss (仅缓存未命中时打印)
    return f"User #{user_id} data"

print(expensive_query(1))   # [DB] Querying...   → cache miss
print(expensive_query(1))   # (no DB print)       → cache hit ✅
print(expensive_query(2))   # [DB] Querying...   → cache miss
print(expensive_query.cache_info())
# CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)

10. Common Pitfalls (常见陷阱)

1) Forgetting to Return the Result

# ❌ WRONG — silently returns None
def bad_decorator(func):
    def wrapper(*args, **kwargs):
        func(*args, **kwargs)   # Missing return!
    return wrapper

# ✅ CORRECT
def good_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)  # Always return!
    return wrapper

2) Mutable Default in Cached Decorator

# ❌ lru_cache requires hashable arguments (需要可哈希参数)
@functools.lru_cache(maxsize=128)
def process(data: list):   # list is unhashable (列表不可哈希)!
    pass
# TypeError: unhashable type: 'list'

# ✅ Use tuple instead (使用元组代替)
@functools.lru_cache(maxsize=128)
def process(data: tuple):
    pass

3) Decorator Evaluated at Import Time (装饰器在导入时执行)

def register(func):
    print(f"Registering {func.__name__}...")  # Runs at import, not at call time!
    return func

@register
def my_task():
    pass

# Output when module is imported: "Registering my_task..."

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">The decorator body runs once at definition time (import time)</span>, while the <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">wrapper</code> body runs every time the decorated function is called. Keep heavy setup logic inside wrapper, not the decorator factory.</div>

11. Summary Comparison Table

Feature (特性)	Function Decorator	Class Decorator
Syntax (语法)	`def deco(func)`	`class Deco: __init__ + __call__`
State (状态)	Via closure variables (通过闭包变量)	Via instance variables (通过实例变量)
Readability (可读性)	✅ More concise	🟡 More explicit for complex state
Works with methods	✅ Yes	⚠️ Needs extra care with `self`
Metadata preservation	`@functools.wraps`	`functools.update_wrapper`

Python typing

Mon, 09 Mar 2026 00:00:00 GMT

I. Python Typing (类型注解)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Python's <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">typing</code> module adds <span style="color:#2980B9">type hints (类型提示)</span> to Python. While Python remains dynamically typed, type hints help with <span style="color:#E8600A;font-weight:700">code documentation, IDE autocompletion, and catching bugs</span> before runtime using tools like <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">mypy</code>. </div>

1. Basic Type Annotations

1) Variable Annotations

# Basic types
name: str = "Alice"
age: int = 30
height: float = 1.75
is_student: bool = True

# Without initial value
address: str  # Just type annotation, no value yet
address = "123 Main St"  # Later assignment

2) Function Annotations

def greet(name: str) -> str:
    return f"Hello, {name}!"

def add(a: int, b: int) -> int:
    return a + b

def process_data(data: list) -> None:  # None means returns nothing
    print(f"Processing {len(data)} items")

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span>Type hints are <span style="color:#C0392B;font-weight:600">not enforced at runtime</span>. They're for developers and tools only. Python won't stop you from passing an <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">int</code> to a <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">str</code> parameter!</div>

2. Container Types

1) List, Tuple, Set, Dict

from typing import List, Tuple, Set, Dict

# List of strings
names: List[str] = ["Alice", "Bob", "Charlie"]

# Tuple of int and str (fixed length, mixed types)
person: Tuple[int, str, bool] = (1, "Alice", True)

# Set of integers
unique_ids: Set[int] = {101, 102, 103}

# Dictionary with string keys and int values
scores: Dict[str, int] = {"Alice": 95, "Bob": 87}

2) Nested Containers

from typing import List, Dict, Tuple

# List of dictionaries
users: List[Dict[str, str]] = [
    {"name": "Alice", "email": "alice@example.com"},
    {"name": "Bob", "email": "bob@example.com"}
]

# Complex nesting
matrix: List[List[int]] = [[1, 2, 3], [4, 5, 6]]

# Tuple with list inside
data: Tuple[int, List[str], bool] = (1, ["a", "b", "c"], True)

3. Optional and Union Types

1) Optional (value or None)

from typing import Optional

def find_user(user_id: int) -> Optional[str]:
    # Returns name or None if not found
    users = {1: "Alice", 2: "Bob"}
    return users.get(user_id)  # May return None

# Optional[str] means either str or None
result: Optional[str] = find_user(1)  # "Alice"
result2: Optional[str] = find_user(99)  # None

2) Union (multiple possible types)

from typing import Union

# Function accepts int OR float
def square(value: Union[int, float]) -> Union[int, float]:
    return value * value

# Modern syntax (Python 3.10+)
def square2(value: int | float) -> int | float:
    return value * value

# Multiple types
def process_id(id_value: int | str) -> str:
    return f"ID: {id_value}"

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <span style="color:#E8600A;font-weight:700">Python 3.10+</span> introduced the <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">|</code> syntax for Union types, making code cleaner! </div>

4. Type Aliases

Create readable names for complex types:

from typing import List, Tuple, Dict

# Type alias
UserID = int
UserName = str
UserInfo = Tuple[UserID, UserName, bool]

# Using the alias
def get_user() -> UserInfo:
    return (1, "Alice", True)

# More complex alias
Coordinate = Tuple[float, float]
Polygon = List[Coordinate]

def calculate_area(shape: Polygon) -> float:
    # shape is list of (x, y) tuples
    pass

5. Callable Types

Type functions that accept functions:

from typing import Callable

# Function that takes an int and returns a str
def apply_twice(func: Callable[[int], str], value: int) -> str:
    return func(func(value))  # First call returns str, second call fails!

# Fixed version - proper typing catches this!
def apply_twice_fixed(func: Callable[[int], int], value: int) -> int:
    return func(func(value))

# More complex callback
def process_items(
    items: List[int],
    callback: Callable[[int], str]
) -> List[str]:
    return [callback(item) for item in items]

6. Any and TypeVar (Generics)

1) Any - escape hatch

from typing import Any

# Use sparingly - defeats type checking!
def debug_print(value: Any) -> None:
    print(f"Value: {value}, Type: {type(value)}")

2) TypeVar - generic functions

from typing import TypeVar, List

T = TypeVar('T')  # Generic type variable

def first_element(items: List[T]) -> T:
    """Returns first element, type preserved"""
    return items[0]

# Works with any list type
num = first_element([1, 2, 3])        # num is int
text = first_element(["a", "b", "c"]) # text is str

# Multiple type variables
K = TypeVar('K')
V = TypeVar('V')

def get_value(dict: Dict[K, V], key: K, default: V) -> V:
    return dict.get(key, default)

7. Special Forms

1) Literal - exact values

from typing import Literal

def set_status(status: Literal["active", "inactive", "pending"]) -> None:
    print(f"Status set to {status}")

set_status("active")   # OK
set_status("active")    # OK
# set_status("unknown") # Type checker would complain

# Multiple literal values
def move(direction: Literal["north", "south", "east", "west"]) -> None:
    pass

2) Final - constants

from typing import Final

MAX_SIZE: Final[int] = 100
PI: Final[float] = 3.14159

# Type checker would warn against reassignment
# MAX_SIZE = 200  # Error!

8. Protocol (Structural Subtyping)

Like interfaces, but duck-typed:

from typing import Protocol

class Drawable(Protocol):
    def draw(self) -> None: ...

class Circle:
    def draw(self) -> None:
        print("Drawing circle")

class Square:
    def draw(self) -> None:
        print("Drawing square")
    def area(self) -> float:
        return 16.0

def render(obj: Drawable) -> None:
    obj.draw()  # Works with anything that has draw()

render(Circle())  # OK
render(Square())  # OK - Square has draw()

9. TypedDict - Dictionary with fixed keys

from typing import TypedDict

class Person(TypedDict):
    name: str
    age: int
    email: str

# Works like a dict, but with type checking
alice: Person = {
    "name": "Alice",
    "age": 30,
    "email": "alice@example.com"
}

# Error if missing keys or wrong types
# bob: Person = {"name": "Bob"}  # Missing age, email

10. Practical Examples

1) API Response Handler

from typing import Dict, List, Optional, Union, TypedDict
import json

class User(TypedDict):
    id: int
    name: str
    email: str
    is_active: bool

class APIResponse(TypedDict):
    status: Literal["success", "error"]
    data: Optional[List[User]]
    message: Optional[str]

def parse_api_response(json_str: str) -> APIResponse:
    data = json.loads(json_str)
    return {
        "status": data["status"],
        "data": data.get("users"),
        "message": data.get("message")
    }

2) Data Processing Pipeline

from typing import List, Callable, TypeVar

T = TypeVar('T')
U = TypeVar('U')

def pipeline(
    data: List[T],
    *transforms: Callable[[T], U]
) -> List[U]:
    """Apply multiple transforms to data"""
    result: List[U] = []
    for item in data:
        current = item
        for transform in transforms:
            current = transform(current)  # type: ignore
        result.append(current)  # type: ignore
    return result

# Usage
numbers = [1, 2, 3, 4]
result = pipeline(
    numbers,
    lambda x: x * 2,
    lambda x: str(x)
)  # result is List[str]

3) Configuration Manager

from typing import Dict, Any, Optional
from dataclasses import dataclass

@dataclass
class DatabaseConfig:
    host: str
    port: int = 5432
    username: str
    password: str
    database: str

@dataclass
class AppConfig:
    debug: bool = False
    database: DatabaseConfig
    allowed_hosts: List[str]
    secret_key: Optional[str] = None

def load_config(config_dict: Dict[str, Any]) -> AppConfig:
    db_config = DatabaseConfig(**config_dict["database"])
    return AppConfig(
        debug=config_dict.get("debug", False),
        database=db_config,
        allowed_hosts=config_dict["allowed_hosts"],
        secret_key=config_dict.get("secret_key")
    )

11. Type Checking Tools

mypy - Most popular
```
pip install mypy
mypy your_script.py
```

pydantic - Runtime type checking + data validation

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

user = User(name="Alice", age=30)  # Validates at runtime!

VS Code/PyCharm - Built-in type checking with Pylance/Pyright </div>

12. Best Practices

✅ DO:

# 1. Use type hints for public APIs
def calculate_total(prices: List[float]) -> float:
    return sum(prices)

# 2. Use Optional for values that can be None
def find_by_id(id: int) -> Optional[User]:
    pass

# 3. Use type aliases for complex types
JSON = Dict[str, Any]

# 4. Use Literal for limited options
def set_mode(mode: Literal["read", "write", "append"]) -> None:
    pass

❌ DON'T:

# 1. Don't overuse Any (defeats purpose)
def process(data: Any) -> Any:  # Better to be specific

# 2. Don't ignore type errors without reason
result = func()  # type: ignore  # Add comment explaining why

# 3. Don't use type hints for everything in simple scripts
# For small scripts, they can be overkill

13. Comparison Table

Feature	Python 3.8-3.9	Python 3.10+	Python 3.11+
Union	`Union[int, str]`	`int \| str`	`int \| str`
Optional	`Optional[str]`	`str \| None`	`str \| None`
List type	`List[int]`	`list[int]`	`list[int]`
Dict type	`Dict[str, int]`	`dict[str, int]`	`dict[str, int]`
Self reference	Forward ref `'MyClass'`	Forward ref `'MyClass'`	`Self` type

Python collections

Mon, 09 Mar 2026 00:00:00 GMT

I. Python `collections` Module — Complete Learning Manual

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Python's <span style="color:#E8600A;font-weight:700">collections</span> module provides <span style="color:#2980B9">specialized container datatypes (特殊容器数据类型)</span> that extend the built-in <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">dict</code>, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">list</code>, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">tuple</code>. The seven main classes are: <span style="color:#E8600A;font-weight:700">defaultdict</span>, <span style="color:#E8600A;font-weight:700">Counter</span>, <span style="color:#E8600A;font-weight:700">OrderedDict</span>, <span style="color:#E8600A;font-weight:700">deque</span>, <span style="color:#E8600A;font-weight:700">namedtuple</span>, <span style="color:#E8600A;font-weight:700">ChainMap</span>, and <span style="color:#E8600A;font-weight:700">UserDict / UserList / UserString</span>. Each solves a specific pain-point of the standard built-ins with minimal overhead. </div>

1. defaultdict — Default Value Dict (默认值字典)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">defaultdict (默认值字典)</span> behaves exactly like a regular <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">dict</code>, except that accessing a <span style="color:#2980B9">missing key (缺失键)</span> automatically creates it by calling the <span style="color:#E8600A;font-weight:700">default_factory (默认工厂函数)</span> — eliminating <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">KeyError</code> and verbose <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">setdefault()</code> boilerplate. </div>

1) Constructor (构造函数)

collections.defaultdict(default_factory=None, **kwargs)

<span style="color:#E8600A;font-weight:700">default_factory</span> is any zero-argument callable: <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">int</code>, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">list</code>, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">set</code>, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">dict</code>, or a custom <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">lambda</code>.

2) `defaultdict(int)` — Frequency counter (频率计数)

from collections import defaultdict

text = "apple banana apple cherry banana apple"

freq = defaultdict(int)        # missing key → 0

for word in text.split():
    freq[word] += 1            # no KeyError on first access

print(dict(freq))
# → {'apple': 3, 'banana': 2, 'cherry': 1}

# Compare with plain dict (verbose):
freq2 = {}
for word in text.split():
    freq2[word] = freq2.get(word, 0) + 1   # needs .get()

3) `defaultdict(list)` — Grouping (分组)

from collections import defaultdict

students = [
    ("Alice", "Math"),
    ("Bob",   "Science"),
    ("Alice", "Science"),
    ("Carol", "Math"),
    ("Bob",   "Math"),
]

by_name = defaultdict(list)

for name, subject in students:
    by_name[name].append(subject)   # missing key → [] automatically

print(dict(by_name))
# → {'Alice': ['Math', 'Science'], 'Bob': ['Science', 'Math'], 'Carol': ['Math']}

4) `defaultdict(set)` — Unique grouping (去重分组)

from collections import defaultdict

edges = [(1, 2), (1, 3), (2, 3), (1, 2)]   # duplicate edge (1,2)

graph = defaultdict(set)

for u, v in edges:
    graph[u].add(v)
    graph[v].add(u)

print(dict(graph))
# → {1: {2, 3}, 2: {1, 3}, 3: {1, 2}}   (no duplicates)

5) `defaultdict(dict)` — Nested dict (嵌套字典)

from collections import defaultdict

# 2-level nested defaultdict
matrix = defaultdict(lambda: defaultdict(int))

matrix["row1"]["col1"] += 10
matrix["row1"]["col2"] += 20
matrix["row2"]["col1"] += 30

for row, cols in matrix.items():
    print(f"{row}: {dict(cols)}")
# → row1: {'col1': 10, 'col2': 20}
# → row2: {'col1': 30}

6) Custom `default_factory` (自定义工厂函数)

from collections import defaultdict

# Factory that returns a specific default value
dd = defaultdict(lambda: "N/A")
dd["name"] = "Alice"

print(dd["name"])     # → Alice
print(dd["age"])      # → N/A   (key created with "N/A")
print(dd["city"])     # → N/A

# Factory with counter
id_counter = [0]
def next_id():
    id_counter[0] += 1
    return id_counter[0]

registry = defaultdict(next_id)
print(registry["alice"])   # → 1
print(registry["bob"])     # → 2
print(registry["alice"])   # → 1  (already exists)

7) `default_factory` attribute — Inspect and change

from collections import defaultdict

dd = defaultdict(list)
print(dd.default_factory)    # → <class 'list'>

dd.default_factory = set     # change factory at runtime
dd["new_key"].add(42)
print(dict(dd))              # → {'new_key': {42}}

dd.default_factory = None    # disable factory → KeyError on missing keys
try:
    _ = dd["missing"]
except KeyError as e:
    print(f"KeyError: {e}")  # → KeyError: 'missing'

8) `missing` — How defaultdict works internally

from collections import defaultdict

class MyDefaultDict(dict):
    """Manual implementation of defaultdict logic."""

    def __init__(self, factory):
        super().__init__()
        self.factory = factory

    def __missing__(self, key):
        # Called automatically when key is not found
        value = self.factory()
        self[key] = value
        return value

d = MyDefaultDict(list)
d["x"].append(1)
d["x"].append(2)
d["y"].append(3)
print(dict(d))   # → {'x': [1, 2], 'y': [3]}

9) Inherits all `dict` methods

from collections import defaultdict

dd = defaultdict(int, a=1, b=2)

# All standard dict methods work
print(dd.keys())              # → dict_keys(['a', 'b'])
print(dd.values())            # → dict_values([1, 2])
print(dd.items())             # → dict_items([('a', 1), ('b', 2)])
print(dd.get("x", 99))        # → 99  (no key created)
print("a" in dd)              # → True
dd.update({"c": 3})
print(dd.pop("a"))            # → 1
print(dict(dd))               # → {'b': 2, 'c': 3}

2. Counter — Multiset / Frequency Map (计数器)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Counter (计数器)</span> is a <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">dict</code> subclass designed for <span style="color:#2980B9">counting hashable objects (统计可哈希对象)</span>. Missing keys return <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">0</code> instead of raising <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">KeyError</code>. It supports <span style="color:#E8600A;font-weight:700">arithmetic operations (算术运算)</span> between counters. </div>

1) Constructor — Three ways to create

from collections import Counter

# From an iterable
c1 = Counter("abracadabra")
print(c1)   # → Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

# From a dict
c2 = Counter({"cats": 4, "dogs": 8})
print(c2)   # → Counter({'dogs': 8, 'cats': 4})

# From keyword arguments
c3 = Counter(red=3, blue=1, green=5)
print(c3)   # → Counter({'green': 5, 'red': 3, 'blue': 1})

2) Missing key → 0 (缺失键返回0)

from collections import Counter

c = Counter("hello")
print(c["l"])    # → 2  (exists)
print(c["z"])    # → 0  (missing — no KeyError!)
print("z" in c) # → False  (not stored, just returns 0)

3) `most_common(n)` — Top N elements (最高频N个元素)

from collections import Counter

words = "the quick brown fox jumps over the lazy dog the fox".split()
c = Counter(words)

print(c.most_common(3))
# → [('the', 3), ('fox', 2), ('quick', 1)]

print(c.most_common())        # all elements, sorted by frequency
print(c.most_common()[:-4:-1])# least common 3 (tail trick)
# → [('dog', 1), ('lazy', 1), ('over', 1)]

4) `elements()` — Expand back to iterable (展开为可迭代)

from collections import Counter

c = Counter(a=3, b=1, c=2)

print(list(c.elements()))
# → ['a', 'a', 'a', 'b', 'c', 'c']  (ordered by insertion)

# Reconstruct a sorted list
print(sorted(c.elements()))
# → ['a', 'a', 'a', 'b', 'c', 'c']

# Elements with count ≤ 0 are excluded
c["x"] = -1
print(list(c.elements()))    # 'x' not included

5) `subtract()` / `update()` — In-place operations (就地运算)

from collections import Counter

inventory = Counter(apples=10, oranges=5, bananas=8)

# subtract: reduces counts (allows negatives)
sold = Counter(apples=3, oranges=5, bananas=10)
inventory.subtract(sold)
print(inventory)
# → Counter({'apples': 7, 'bananas': -2, 'oranges': 0})

# update: adds counts (merges)
restocked = Counter(apples=5, bananas=15)
inventory.update(restocked)
print(inventory)
# → Counter({'bananas': 13, 'apples': 12, 'oranges': 0})

6) Arithmetic operators (算术运算符)

from collections import Counter

a = Counter(x=4, y=2, z=0)
b = Counter(x=1, y=3, w=5)

print(a + b)    # add counts
# → Counter({'x': 5, 'w': 5, 'y': 5})

print(a - b)    # subtract, keep only positives
# → Counter({'x': 3})

print(a & b)    # intersection: min of each count
# → Counter({'x': 1, 'y': 2})

print(a | b)    # union: max of each count
# → Counter({'w': 5, 'x': 4, 'y': 3})

# Unary operators
print(+a)       # remove zero and negative counts
print(-a)       # negate — flip sign, keep negatives as positives

7) Total count and filtering (总计数与过滤)

from collections import Counter

c = Counter(a=5, b=3, c=0, d=-2)

# Total of all positive counts (Python 3.10+)
print(c.total())       # → 8   (5+3+0 = 8, negatives excluded)

# Keep only positive counts
positive = +c
print(positive)        # → Counter({'a': 5, 'b': 3})

# Keep only negative counts (useful for "owed" quantities)
negative = -c
print(negative)        # → Counter({'d': 2})

8) Practical: anagram check, top-K, word frequency

from collections import Counter

# ── Anagram check (变位词检测) ──
def is_anagram(s1: str, s2: str) -> bool:
    return Counter(s1.lower()) == Counter(s2.lower())

print(is_anagram("listen", "silent"))   # → True
print(is_anagram("hello",  "world"))    # → False

# ── Character frequency difference ──
def missing_chars(have: str, need: str) -> Counter:
    deficit = Counter(need) - Counter(have)
    return deficit

print(missing_chars("aab", "aaabbc"))
# → Counter({'a': 1, 'b': 1, 'c': 1})

# ── Top-K frequent words ──
import re

text = """To be or not to be that is the question
          whether tis nobler in the mind to suffer"""

words  = re.findall(r'\w+', text.lower())
top5   = Counter(words).most_common(5)
print(top5)
# → [('to', 3), ('be', 2), ('the', 2), ('or', 1), ('not', 1)]

9) Inherits all `dict` methods

from collections import Counter

c = Counter("mississippi")

print(c.keys())           # → dict_keys(['m', 'i', 's', 'p'])
print(c.values())         # → dict_values([1, 4, 4, 2])
print(c.items())          # → dict_items([('m', 1), ('i', 4), ('s', 4), ('p', 2)])
print(c.get("i"))         # → 4
print(c.get("z"))         # → None   (get() returns None, not 0)

# del sets count to 0 conceptually, but removes the key
del c["m"]
print("m" in c)           # → False
print(c["m"])             # → 0  (missing key returns 0)

3. OrderedDict — Ordered Dictionary (有序字典)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Since Python 3.7, plain <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">dict</code> preserves insertion order. <span style="color:#E8600A;font-weight:700">OrderedDict (有序字典)</span> still offers unique advantages: <span style="color:#2980B9">order-sensitive equality</span>, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">move_to_end()</code>, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">popitem(last=True/False)</code> for implementing <span style="color:#E8600A;font-weight:700">LRU Cache (LRU缓存)</span> and similar structures. </div>

1) Basic usage and order-sensitive equality

from collections import OrderedDict

od = OrderedDict()
od["banana"] = 3
od["apple"]  = 5
od["cherry"] = 1

print(od)
# → OrderedDict([('banana', 3), ('apple', 5), ('cherry', 1)])

# Order-sensitive equality (顺序敏感的相等判断)
od1 = OrderedDict([("a", 1), ("b", 2)])
od2 = OrderedDict([("b", 2), ("a", 1)])
d1  = {"a": 1, "b": 2}

print(od1 == od2)   # → False  (same keys/values, different order)
print(od1 == d1)    # → True   (OrderedDict == dict ignores order)

2) `move_to_end(key, last=True)` — Reposition a key

from collections import OrderedDict

od = OrderedDict.fromkeys("ABCDE")

od.move_to_end("B")          # move B to end (last=True default)
print(list(od))              # → ['A', 'C', 'D', 'E', 'B']

od.move_to_end("E", last=False)  # move E to front
print(list(od))              # → ['E', 'A', 'C', 'D', 'B']

3) `popitem(last=True)` — LIFO / FIFO removal

from collections import OrderedDict

od = OrderedDict.fromkeys("ABCDE")

print(od.popitem(last=True))    # → ('E', None)  LIFO (like a stack)
print(od.popitem(last=False))   # → ('A', None)  FIFO (like a queue)
print(list(od))                 # → ['B', 'C', 'D']

4) LRU Cache implementation (LRU缓存实现)

from collections import OrderedDict

class LRUCache:
    """
    Least Recently Used Cache (最近最少使用缓存)
    using OrderedDict for O(1) get and put.
    """

    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache    = OrderedDict()

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)    # mark as recently used
        return self.cache[key]

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # evict least recently used

cache = LRUCache(3)
cache.put(1, 10)
cache.put(2, 20)
cache.put(3, 30)
print(cache.get(1))   # → 10  (1 moved to end)
cache.put(4, 40)      # evicts key 2 (least recently used)
print(cache.get(2))   # → -1  (evicted)
print(cache.get(3))   # → 30
print(cache.get(4))   # → 40

5) `reversed()` — Reverse iteration

from collections import OrderedDict

od = OrderedDict([("a", 1), ("b", 2), ("c", 3)])

for key in reversed(od):
    print(key, od[key])
# → c 3
# → b 2
# → a 1

4. deque — Double-Ended Queue (双端队列)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">deque (双端队列)</span> supports <span style="color:#2980B9">O(1) append and pop from both ends</span>. Unlike a <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">list</code> (where <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">insert(0, x)</code> is O(n)), deque is the correct data structure for <span style="color:#E8600A;font-weight:700">queues (队列)</span>, <span style="color:#E8600A;font-weight:700">stacks (栈)</span>, and <span style="color:#E8600A;font-weight:700">sliding windows (滑动窗口)</span>. </div>

1) Constructor

from collections import deque

d1 = deque()                         # empty
d2 = deque([1, 2, 3, 4, 5])          # from iterable
d3 = deque("abcde")                  # from string
d4 = deque(range(10), maxlen=5)      # bounded deque (固定长度)

print(d2)   # → deque([1, 2, 3, 4, 5])
print(d4)   # → deque([5, 6, 7, 8, 9], maxlen=5)  (first 5 discarded)

2) `append()` / `appendleft()` — Add to ends (两端添加)

from collections import deque

d = deque([3, 4, 5])

d.append(6)         # right end  → deque([3, 4, 5, 6])
d.appendleft(2)     # left end   → deque([2, 3, 4, 5, 6])
d.appendleft(1)     #            → deque([1, 2, 3, 4, 5, 6])

print(d)            # → deque([1, 2, 3, 4, 5, 6])

3) `pop()` / `popleft()` — Remove from ends (两端弹出)

from collections import deque

d = deque([1, 2, 3, 4, 5])

print(d.pop())       # → 5   (right)  → deque([1, 2, 3, 4])
print(d.popleft())   # → 1   (left)   → deque([2, 3, 4])
print(d)             # → deque([2, 3, 4])

4) `extend()` / `extendleft()` — Batch add (批量添加)

from collections import deque

d = deque([3, 4])

d.extend([5, 6, 7])          # right: → deque([3, 4, 5, 6, 7])
d.extendleft([2, 1, 0])      # left, each prepended individually
                             # 2 → [2,3..], 1 → [1,2,3..], 0 → [0,1,2,3..]
print(d)   # → deque([0, 1, 2, 3, 4, 5, 6, 7])

5) `rotate(n)` — Circular rotation (循环旋转)

from collections import deque

d = deque([1, 2, 3, 4, 5])

d.rotate(2)     # rotate RIGHT by 2
print(d)        # → deque([4, 5, 1, 2, 3])

d.rotate(-2)    # rotate LEFT by 2 (undo)
print(d)        # → deque([1, 2, 3, 4, 5])

# Circular buffer simulation (循环缓冲区)
ring = deque(range(5))
for _ in range(8):
    print(ring[0], end=" ")
    ring.rotate(-1)
# → 0 1 2 3 4 0 1 2

6) `maxlen` — Bounded / sliding window (有界滑动窗口)

from collections import deque

# Keep only the last 3 elements
window = deque(maxlen=3)

for i in range(7):
    window.append(i)
    print(f"added {i}: {list(window)}")
# → added 0: [0]
# → added 1: [0, 1]
# → added 2: [0, 1, 2]
# → added 3: [1, 2, 3]   ← 0 dropped automatically
# → added 4: [2, 3, 4]
# → added 5: [3, 4, 5]
# → added 6: [4, 5, 6]

# Moving average (滑动平均)
def moving_average(data, window_size):
    w = deque(maxlen=window_size)
    result = []
    for val in data:
        w.append(val)
        result.append(sum(w) / len(w))
    return result

print(moving_average([1, 2, 3, 4, 5, 6], 3))
# → [1.0, 1.5, 2.0, 3.0, 4.0, 5.0]

7) `insert()` / `remove()` / `count()` / `index()`

from collections import deque

d = deque([1, 2, 3, 2, 4])

d.insert(2, 99)       # insert 99 at position 2
print(d)              # → deque([1, 2, 99, 3, 2, 4])

d.remove(99)          # remove first occurrence
print(d)              # → deque([1, 2, 3, 2, 4])

print(d.count(2))     # → 2  (occurrences of 2)
print(d.index(3))     # → 2  (first index of 3)

8) `reverse()` / `copy()` / `clear()`

from collections import deque

d = deque([1, 2, 3, 4, 5])

d.reverse()
print(d)        # → deque([5, 4, 3, 2, 1])

d2 = d.copy()   # shallow copy
d2.append(0)
print(d)        # → deque([5, 4, 3, 2, 1])  (original unchanged)

d.clear()
print(d)        # → deque([])
print(len(d))   # → 0

9) Performance comparison vs list (与list性能对比)

import timeit
from collections import deque

# prepend 100_000 items
list_time  = timeit.timeit(lambda: [0] * 100_000, number=100)
deque_time = timeit.timeit(lambda: deque([0] * 100_000), number=100)

# insert at front
n = 10_000
t_list  = timeit.timeit(lambda: [None] + list(range(n)), number=1000)
t_deque = timeit.timeit(lambda: deque([None]) + deque(range(n)), number=1000)

print(f"list  front-insert: {t_list:.4f}s")
print(f"deque front-insert: {t_deque:.4f}s")
# deque is orders of magnitude faster for front operations

5. namedtuple — Immutable Record (具名元组)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">namedtuple</code> creates a <span style="color:#E8600A;font-weight:700">tuple subclass (元组子类)</span> whose fields can be accessed by <span style="color:#2980B9">name</span> as well as by index. It is <span style="color:#E8600A;font-weight:700">immutable (不可变)</span>, memory-efficient, and self-documenting. </div>

1) Factory function `namedtuple(typename, field_names)`

from collections import namedtuple

# Three equivalent ways to define field names:
Point = namedtuple("Point", ["x", "y"])
Point = namedtuple("Point", "x y")
Point = namedtuple("Point", "x, y")

p = Point(3, 4)
print(p)           # → Point(x=3, y=4)
print(p.x, p.y)    # → 3 4       (by name)
print(p[0], p[1])  # → 3 4       (by index)
print(p == (3, 4)) # → True      (is a tuple subclass)

2) `_make()` — Create from iterable (从可迭代对象创建)

from collections import namedtuple

Employee = namedtuple("Employee", "name age department salary")

data = ["Alice", 30, "Engineering", 95000]
emp  = Employee._make(data)
print(emp)
# → Employee(name='Alice', age=30, department='Engineering', salary=95000)

# From CSV row
import csv, io
csv_data = "Bob,25,Marketing,60000"
for row in csv.reader(io.StringIO(csv_data)):
    e = Employee._make(row)
    print(f"{e.name} in {e.department}")
# → Bob in Marketing

3) `_asdict()` — Convert to OrderedDict (转换为有序字典)

from collections import namedtuple

Point3D = namedtuple("Point3D", "x y z")
p = Point3D(1, 2, 3)

d = p._asdict()
print(d)            # → {'x': 1, 'y': 2, 'z': 3}
print(type(d))      # → <class 'dict'>

# Serialize to JSON
import json
print(json.dumps(p._asdict()))   # → {"x": 1, "y": 2, "z": 3}

4) `_replace()` — Create modified copy (创建修改副本)

from collections import namedtuple

# namedtuple is IMMUTABLE — _replace() returns a new instance
Person = namedtuple("Person", "name age city")
alice  = Person("Alice", 30, "NYC")

# "update" one field
older_alice = alice._replace(age=31)
print(alice)        # → Person(name='Alice', age=30, city='NYC')  (unchanged)
print(older_alice)  # → Person(name='Alice', age=31, city='NYC')

5) `_fields` / `_field_defaults` — Introspection (内省)

from collections import namedtuple

Config = namedtuple("Config", "host port timeout", defaults=["localhost", 8080, 30])

print(Config._fields)          # → ('host', 'port', 'timeout')
print(Config._field_defaults)  # → {'host': 'localhost', 'port': 8080, 'timeout': 30}

c1 = Config()                  # all defaults
c2 = Config("example.com")     # override host only
print(c1)  # → Config(host='localhost', port=8080, timeout=30)
print(c2)  # → Config(host='example.com', port=8080, timeout=30)

6) `rename=True` — Auto-rename invalid field names

from collections import namedtuple

# 'class' and '2bad' are invalid Python identifiers
T = namedtuple("T", ["class", "2bad", "ok"], rename=True)
print(T._fields)   # → ('_0', '_1', 'ok')  (invalid names → _index)

t = T(1, 2, 3)
print(t._0, t._1, t.ok)   # → 1 2 3

7) Subclassing namedtuple — Adding methods

from collections import namedtuple
import math

class Vector(namedtuple("Vector", "x y")):
    """Extend namedtuple with custom methods."""

    def magnitude(self) -> float:
        return math.sqrt(self.x**2 + self.y**2)

    def dot(self, other: "Vector") -> float:
        return self.x * other.x + self.y * other.y

    def __add__(self, other):
        return Vector(self.x + other.x, self.y + other.y)

v1 = Vector(3, 4)
v2 = Vector(1, 2)

print(v1.magnitude())   # → 5.0
print(v1.dot(v2))       # → 11.0
print(v1 + v2)          # → Vector(x=4, y=6)

6. ChainMap — Multi-scope Lookup (多层级查找映射)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">ChainMap (链式映射)</span> groups multiple dicts into a single, updateable view. Lookups search the dicts <span style="color:#2980B9">from first to last</span>, returning the first match. Writes always go to the <span style="color:#E8600A;font-weight:700">first map</span>. Perfect for modeling <span style="color:#2980B9">variable scopes (变量作用域)</span> like Python's own LEGB rule. </div>

1) Basic lookup (基本查找)

from collections import ChainMap

defaults  = {"color": "red",  "user": "guest", "timeout": 30}
env_vars  = {"color": "blue", "debug": True}
cli_args  = {"timeout": 10}

# Priority: cli_args > env_vars > defaults
config = ChainMap(cli_args, env_vars, defaults)

print(config["color"])    # → blue   (from env_vars, overrides defaults)
print(config["user"])     # → guest  (only in defaults)
print(config["timeout"])  # → 10     (from cli_args, highest priority)
print(config["debug"])    # → True   (from env_vars)

2) Writes go to first map only (写入仅影响第一个映射)

from collections import ChainMap

base    = {"x": 1, "y": 2}
overlay = {}

cm = ChainMap(overlay, base)

cm["x"] = 99      # written to overlay (first map)
cm["z"] = 0       # new key also goes to overlay

print(overlay)    # → {'x': 99, 'z': 0}
print(base)       # → {'x': 1, 'y': 2}   (unchanged!)
print(cm["x"])    # → 99   (overlay shadows base)
print(cm["y"])    # → 2    (from base)

3) `new_child(m=None)` — Push a new scope (推入新作用域)

from collections import ChainMap

# Simulate nested scopes (模拟嵌套作用域)
global_scope = ChainMap({"x": 1, "y": 2})
local_scope  = global_scope.new_child({"x": 10, "z": 3})

print(local_scope["x"])   # → 10   (local shadows global)
print(local_scope["y"])   # → 2    (falls through to global)
print(local_scope["z"])   # → 3    (local only)

# Pop the local scope (返回父作用域)
parent_scope = local_scope.parents
print(parent_scope["x"])  # → 1    (original global value)

4) `maps` attribute — Access underlying dicts (访问底层字典列表)

from collections import ChainMap

cm = ChainMap({"a": 1}, {"b": 2}, {"c": 3})

print(cm.maps)
# → [{'a': 1}, {'b': 2}, {'c': 3}]

# Modify underlying dicts directly
cm.maps[1]["b"] = 99
print(cm["b"])   # → 99

5) Practical: CLI argument + environment + defaults

from collections import ChainMap
import os

def build_config(cli_args: dict) -> ChainMap:
    """Three-tier configuration (三层配置): CLI > ENV > defaults."""
    defaults = {
        "host":    "localhost",
        "port":    8080,
        "debug":   False,
        "workers": 4,
    }
    env_config = {
        k.lower().replace("app_", ""): v
        for k, v in os.environ.items()
        if k.startswith("APP_")
    }
    return ChainMap(cli_args, env_config, defaults)

config = build_config({"port": 9090, "debug": True})
print(config["host"])     # → localhost (from defaults)
print(config["port"])     # → 9090      (from cli_args)
print(config["debug"])    # → True      (from cli_args)

7. UserDict / UserList / UserString — Custom Containers (自定义容器基类)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <span style="color:#E8600A;font-weight:700">UserDict</span>, <span style="color:#E8600A;font-weight:700">UserList</span>, and <span style="color:#E8600A;font-weight:700">UserString</span> are wrapper classes designed for <span style="color:#2980B9">safe subclassing</span>. Subclassing built-in <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">dict</code> / <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">list</code> directly can miss overrides because C-level methods call each other without going through Python. <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">UserDict</code> etc. route ALL operations through Python methods. </div>

1) `UserDict` — Custom dict with validation (带验证的自定义字典)

from collections import UserDict

class TypedDict(UserDict):
    """A dict that only accepts string keys and int values."""

    def __setitem__(self, key, value):
        if not isinstance(key, str):
            raise TypeError(f"Key must be str, got {type(key).__name__}")
        if not isinstance(value, int):
            raise TypeError(f"Value must be int, got {type(value).__name__}")
        super().__setitem__(key, value)   # delegate to UserDict

td = TypedDict()
td["score"] = 100
td["count"] = 42
print(td)            # → {'score': 100, 'count': 42}

try:
    td[123] = 10     # invalid key
except TypeError as e:
    print(f"Error: {e}")   # → Error: Key must be str, got int

try:
    td["x"] = "hello"  # invalid value
except TypeError as e:
    print(f"Error: {e}")   # → Error: Value must be int, got str

2) `UserList` — Custom list with constraints (带约束的自定义列表)

from collections import UserList

class BoundedList(UserList):
    """A list that enforces a maximum length (最大长度限制)."""

    def __init__(self, maxlen: int, iterable=()):
        self.maxlen = maxlen
        super().__init__()
        for item in iterable:
            self.append(item)

    def append(self, item):
        if len(self.data) >= self.maxlen:
            raise OverflowError(f"List is full (max {self.maxlen})")
        self.data.append(item)

    def insert(self, index, item):
        if len(self.data) >= self.maxlen:
            raise OverflowError(f"List is full (max {self.maxlen})")
        self.data.insert(index, item)

bl = BoundedList(3, [1, 2, 3])
print(bl)   # → [1, 2, 3]

try:
    bl.append(4)
except OverflowError as e:
    print(f"Error: {e}")   # → Error: List is full (max 3)

3) `UserString` — Custom string with transforms (带转换的自定义字符串)

from collections import UserString

class SlugString(UserString):
    """Auto-converts string to URL-safe slug (URL友好字符串)."""

    def __init__(self, seq=""):
        import re
        slug = re.sub(r'[^a-z0-9]+', '-', str(seq).lower()).strip('-')
        super().__init__(slug)

    def __add__(self, other):
        return SlugString(self.data + "-" + str(other))

s = SlugString("Hello World! This is a Test.")
print(s)           # → hello-world-this-is-a-test

s2 = s + "extra"
print(s2)          # → hello-world-this-is-a-test-extra
print(len(s))      # → 28   (all str methods work)
print(s.upper())   # → HELLO-WORLD-THIS-IS-A-TEST

8. Comparison Table (对比总结)

Class	Based on	Missing key	Ordered	Mutable	Best use case
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">defaultdict</code>	dict	auto-creates	insertion	✅	Grouping, counting
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Counter</code>	dict	returns 0	insertion	✅	Frequency, multiset ops
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">OrderedDict</code>	dict	KeyError	insertion	✅	LRU cache, order-sensitive eq
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">deque</code>	list-like	IndexError	yes	✅	Queue, stack, sliding window
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">namedtuple</code>	tuple	AttributeError	yes	❌	Immutable records, CSV rows
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ChainMap</code>	dict view	KeyError	first-wins	✅ (first)	Config layers, scopes
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">UserDict</code>	dict	KeyError	insertion	✅	Safe dict subclassing

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">defaultdict</code> to eliminate <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">KeyError</code> boilerplate, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Counter</code> for frequency analysis and multiset arithmetic, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">deque</code> when you need O(1) operations on both ends, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">namedtuple</code> for self-documenting immutable records, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">OrderedDict</code> for LRU caches and order-sensitive comparisons, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ChainMap</code> for multi-tier configuration or scope simulation. </div>

Python global

Mon, 09 Mar 2026 00:00:00 GMT

I. Understanding `global` in Python

1. Scope Basics in Python

Every variable in Python has a defined <span style="color:#2980B9">scope (作用域)</span> – the region of code where it is accessible.

Global scope: Variables defined outside any function
Local scope: Variables defined inside a function

# Global variable
x = 10

def my_function():
    # Local variable (different from global x)
    x = 5
    print("Inside function:", x)

my_function()  # Output: Inside function: 5
print("Outside function:", x)  # Output: Outside function: 10

2. The Problem: Modifying Global Variables

When you try to <span style="color:#E8600A;font-weight:700">modify</span> a global variable inside a function without declaring it as global, Python creates a new local variable instead.

counter = 0

def increment():
    # This creates a NEW local variable 'counter'
    counter += 1  # ❌ ERROR!

# increment()  # Uncommenting this line causes UnboundLocalError

<span style="color:#C0392B;font-weight:600">Pitfall: </span>This code raises <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">UnboundLocalError: local variable 'counter' referenced before assignment</code> because Python sees the assignment and treats <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">counter</code> as a local variable, but it's being referenced before it's defined.

3. The Solution: Using `global`

The <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">global</code> keyword tells Python: <span style="color:#2980B9">"This variable belongs to the global scope"</span>.

counter = 0  # Global variable

def increment():
    global counter  # Declare that we're using the global counter
    counter = counter + 1
    print(f"Counter is now: {counter}")

increment()  # Output: Counter is now: 1
increment()  # Output: Counter is now: 2
print(f"Global counter: {counter}")  # Output: Global counter: 2

4. Multiple Global Variables

You can declare multiple global variables in a single statement:

name = "Python"
version = 3.9
year = 2023

def update_info():
    global name, version, year
    name = "Python 3"
    version = 3.11
    year = 2024

update_info()
print(f"{name} {version} ({year})")  # Output: Python 3 3.11 (2024)

5. Global vs Local: A Comparison

Aspect	Local Variable	Global Variable
<span style="color:#2980B9">Scope (作用域)</span>	Inside function only	Throughout the module
<span style="color:#2980B9">Declaration</span>	Automatic on assignment	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">global</code> keyword required inside functions
<span style="color:#2980B9">Read access</span>	Direct access	Direct access (without <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">global</code> for reading)
<span style="color:#2980B9">Write access</span>	Direct assignment	Requires <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">global</code> declaration
<span style="color:#2980B9">Memory (内存)</span>	Created when function runs	Created when module loads
<span style="color:#2980B9">Best practice</span>	Preferred for temporary values	Use sparingly, prefer parameters

6. Reading Global Variables (Without `global`)

Interesting fact: You can <span style="color:#E8600A;font-weight:700">read</span> global variables without the global keyword:

message = "Hello, World!"

def show_message():
    # No 'global' needed for reading
    print(message)  # This works!

show_message()  # Output: Hello, World!

<span style="color:#C0392B;font-weight:600">Warning: </span>This only works for <span style="color:#2980B9">reading (读取)</span>. The moment you try to assign a value, Python treats it as a local variable unless you use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">global</code>.

7. Nested Functions and `global`

The global keyword always refers to the <span style="color:#2980B9">module-level (模块级别)</span> scope, not the enclosing function scope.

x = "global x"

def outer():
    x = "outer x"
    
    def inner():
        global x  # This refers to the module-level 'x', not outer's 'x'
        x = "changed by inner"
    
    inner()
    print("outer x:", x)  # Output: outer x: outer x

outer()
print("global x:", x)  # Output: global x: changed by inner

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span>To modify variables in an <span style="color:#2980B9">enclosing (but non-global) scope (外部嵌套作用域)</span>, use the <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">nonlocal</code> keyword instead, which we'll cover in a separate note.</div>

8. Mutable Objects: A Special Case

For <span style="color:#2980B9">mutable objects (可变对象)</span> like lists and dictionaries, you can modify their <span style="color:#E8600A;font-weight:700">contents</span> without using global:

my_list = [1, 2, 3]
my_dict = {"count": 0}

def modify_mutable():
    # No 'global' needed - we're modifying, not reassigning
    my_list.append(4)
    my_dict["count"] += 1
    print("Inside function:", my_list, my_dict)

modify_mutable()
print("Outside function:", my_list, my_dict)
# Output: Outside function: [1, 2, 3, 4] {'count': 1}

<span style="color:#C0392B;font-weight:600">Pitfall: </span>This works because we're <span style="color:#2980B9">modifying (修改)</span> the object, not <span style="color:#2980B9">reassigning (重新赋值)</span> the variable. If we tried my_list = [4, 5, 6], that would require global.

Python re

Mon, 09 Mar 2026 00:00:00 GMT

I. Python `re` — Regular Expressions

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Regular Expression (正则表达式)</span> is a <span style="color:#2980B9">pattern (模式)</span> used to find and manipulate text. Think of it as a <span style="color:#E8600A;font-weight:700">"super-powered search"</span> that can match patterns, not just exact words. Python's <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re</code> module gives you tools to use regex. </div>

II. Pattern Syntax — The Complete Reference

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Every regex pattern is built from three kinds of building blocks: <span style="color:#E8600A;font-weight:700">Literals (字面量)</span> that match themselves, <span style="color:#E8600A;font-weight:700">Metacharacters (元字符)</span> that have special meaning, and <span style="color:#E8600A;font-weight:700">Quantifiers (量词)</span> that control repetition. Learn these 30-odd symbols and you can write any pattern. </div>

1. Literals and Metacharacters (字面量与元字符)

1) Plain literals (普通字面量)

Most characters match themselves exactly.

import re

# Literal match — 'cat' matches exactly the string "cat"
print(re.search(r'cat', 'I have a cat'))        # match
print(re.search(r'cat', 'I have a CAT'))        # None  (case-sensitive by default)
print(re.search(r'cat', 'concatenate'))         # match (found inside)

2) The 14 metacharacters (14个元字符)

These characters have special meaning and must be escaped with \ to match literally:

. ^ $ * + ? { } [ ] \ | ( )
import re

# Matching a literal dot — must escape it
text = "price: $3.99"

print(re.search(r'3.99',  text))    # matches "3.99" BUT also "3X99" (dot = any char!)
print(re.search(r'3\.99', text))    # matches ONLY "3.99"  ← correct

# Matching a literal backslash
print(re.search(r'C:\\Users', r'C:\Users'))   # matches C:\Users

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Always use raw strings <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">r'pattern'</code> for regex patterns.</span> Without <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">r</code>, Python processes <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">\n</code> as newline before the regex engine sees it. With <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">r'\n'</code>, the regex engine receives the literal two characters <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">\n</code> and interprets them as "newline character".</div>

2. The Dot `.` — Any character (任意字符)

. matches any single character except a newline \n (unless re.DOTALL flag is set).

import re

# . matches exactly ONE character (any except \n)
print(re.findall(r'c.t', 'cat cut c t c\nt coot'))
# → ['cat', 'cut', 'c t']   ('c\nt' skipped — \n not matched by dot)
# Note: 'coot' not matched — dot matches exactly 1 char

# With re.DOTALL, dot matches newline too
text = "first\nsecond"
print(re.search(r'first.second',  text))             # None
print(re.search(r'first.second',  text, re.DOTALL))  # match

3. Anchors — Position matchers (锚点 — 位置匹配)

Anchors match positions, not characters.

1) `^` and `$` — Start and end of string/line

import re

text = "hello world"

print(re.search(r'^hello', text))   # match  — 'hello' is at start
print(re.search(r'^world', text))   # None   — 'world' is NOT at start
print(re.search(r'world$', text))   # match  — 'world' is at end
print(re.search(r'hello$', text))   # None

# With re.MULTILINE: ^ and $ match start/end of EACH LINE
multiline = "line1\nline2\nline3"
print(re.findall(r'^\w+', multiline, re.MULTILINE))
# → ['line1', 'line2', 'line3']

print(re.findall(r'\w+$', multiline, re.MULTILINE))
# → ['line1', 'line2', 'line3']

2) `\b` and `\B` — Word boundaries (单词边界)

\b matches the boundary between a word character and a non-word character.

import re

# \b matches word boundary — prevents partial matches
print(re.findall(r'\bcat\b', 'cat cats concatenate scatter'))
# → ['cat']   (only the standalone word)

print(re.findall(r'cat',     'cat cats concatenate scatter'))
# → ['cat', 'cat', 'cat', 'cat']  (too many!)

# \B matches NON-word boundary (inside a word)
print(re.findall(r'\Bcat\B', 'cat cats concatenate'))
# → ['cat']   (only the 'cat' inside 'concatenate')

3) `\A`, `\Z` — Absolute start/end of string (字符串绝对首尾)

import re

# \A and \Z are NOT affected by re.MULTILINE — always match string start/end
text = "line1\nline2"

print(re.search(r'\Aline1', text))   # match — absolute start
print(re.search(r'\Aline2', text))   # None  — line2 is NOT at absolute start
print(re.search(r'line2\Z', text))   # match — absolute end

4. Character Classes `[ ]` (字符类)

1) Basic character class (基本字符类)

A character class matches one character that is any of the listed characters.

import re

# [aeiou] matches any single vowel
print(re.findall(r'[aeiou]', 'hello world'))
# → ['e', 'o', 'o']

# [a-z] matches any lowercase letter (range syntax)
print(re.findall(r'[a-z]+', 'Hello World 123'))
# → ['ello', 'orld']

# [A-Za-z0-9] matches any alphanumeric
print(re.findall(r'[A-Za-z0-9]+', 'foo_bar-123!'))
# → ['foo', 'bar', '123']

# [0-9] is equivalent to \d
print(re.findall(r'[0-9]+', 'abc 123 def 456'))
# → ['123', '456']

2) Negated character class `[^ ]` (否定字符类)

[^...] matches any character NOT in the class.

import re

# [^aeiou] matches any consonant (non-vowel)
print(re.findall(r'[^aeiou\s]+', 'hello world'))
# → ['h', 'll', 'w', 'rld']

# [^0-9] matches any non-digit character
print(re.findall(r'[^0-9]+', 'abc123def456'))
# → ['abc', 'def']

# Strip all non-alphanumeric characters
cleaned = re.sub(r'[^A-Za-z0-9]', '', 'Hello, World! 123')
print(cleaned)   # → HelloWorld123

3) Special sequences inside `[ ]`

import re

# Inside [], most metacharacters lose special meaning
# - (dash) is literal if first, last, or escaped
print(re.findall(r'[-+*/]', '3+4-2*1/5'))  # → ['+', '-', '*', '/']

# ^ is literal unless it is the FIRST character
print(re.findall(r'[a^b]', 'a^b c'))       # → ['a', '^', 'b']  (literal ^)

# ] must be escaped or placed first
print(re.findall(r'[]a-z]', 'a]b'))        # → ['a', ']', 'b']

5. Predefined Character Classes (预定义字符类)

These are shorthand for common character sets:

Shorthand	Equivalent	Meaning
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">\d</code>	`[0-9]`	Any digit (数字)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">\D</code>	`[^0-9]`	Any non-digit (非数字)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">\w</code>	`[A-Za-z0-9_]`	Word character (单词字符)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">\W</code>	`[^A-Za-z0-9_]`	Non-word character (非单词字符)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">\s</code>	`[ \t\n\r\f\v]`	Whitespace (空白字符)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">\S</code>	`[^ \t\n\r\f\v]`	Non-whitespace (非空白字符)

import re

text = "Hello, World! 42 items at $3.99 each.\n"

print(re.findall(r'\d+',  text))  # → ['42', '3', '99']
print(re.findall(r'\w+',  text))  # → ['Hello', 'World', '42', 'items', 'at', '3', '99', 'each']
print(re.findall(r'\s+',  text))  # → [' ', ' ', ' ', ' ', ' ', '\n']
print(re.findall(r'\W+',  text))  # → [', ', '! ', ' ', ' $', '.', '\n']

# Combining: \w+ matches whole words
print(re.findall(r'\b\w{5}\b', text))  # words of exactly 5 chars
# → ['Hello', 'World', 'items']

6. Quantifiers — Repetition (量词 — 重复)

1) Basic quantifiers (基本量词)

Quantifier	Meaning
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">*</code>	0 or more (零次或多次)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">+</code>	1 or more (一次或多次)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">?</code>	0 or 1 (零次或一次，可选)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">{n}</code>	Exactly n times (恰好n次)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">{n,}</code>	n or more times (n次或更多)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">{n,m}</code>	Between n and m times (n到m次)

import re

s = "colour   color   colouur"

print(re.findall(r'colou?r',    s))   # ? → u is optional
# → ['colour', 'color']

print(re.findall(r'colou*r',    s))   # * → 0 or more u's
# → ['colour', 'color', 'colouur']

print(re.findall(r'colou+r',    s))   # + → 1 or more u's
# → ['colour', 'colouur']

print(re.findall(r'colou{2}r',  s))   # exactly 2 u's
# → ['colouur']

print(re.findall(r'colou{1,2}r',s))   # 1 or 2 u's
# → ['colour', 'colouur']

# Phone number: exactly 10 digits
print(re.findall(r'\d{10}', '1234567890 12345'))
# → ['1234567890']

2) Greedy vs Non-greedy (贪婪 vs 非贪婪)

By default, quantifiers are <span style="color:#E8600A;font-weight:700">greedy (贪婪)</span> — they match as much as possible. Adding ? makes them <span style="color:#E8600A;font-weight:700">non-greedy (非贪婪/懒惰)</span> — they match as little as possible.

import re

html = "<b>bold</b> and <i>italic</i>"

# Greedy: .* expands as far right as possible
print(re.findall(r'<.*>',  html))
# → ['<b>bold</b> and <i>italic</i>']   ← one huge match (too greedy)

# Non-greedy: .*? stops at the FIRST >
print(re.findall(r'<.*?>',  html))
# → ['<b>', '</b>', '<i>', '</i>']        ← each tag separately

# Extracting content between tags
print(re.findall(r'<b>(.*?)</b>', html))
# → ['bold']

# More examples
text = '"first" and "second"'
print(re.findall(r'".*"',  text))   # → ['"first" and "second"']  greedy
print(re.findall(r'".*?"', text))   # → ['"first"', '"second"']   non-greedy

Pattern	Type	Matches
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.*</code>	Greedy	As many chars as possible
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.*?</code>	Non-greedy	As few chars as possible
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.+</code>	Greedy	1+ chars, maximum
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.+?</code>	Non-greedy	1+ chars, minimum

7. Groups — Capturing and Non-capturing (分组 — 捕获与非捕获)

1) Capturing group `( )` (捕获组)

Groups serve two purposes: grouping for quantifiers, and capturing the matched text.

import re

# Grouping: (ab)+ repeats the whole "ab"
print(re.findall(r'(ab)+', 'ab abab ababab'))
# → ['ab', 'ab', 'ab']  (returns last captured group)

# Capturing: extract the content inside ()
dates = "2024-01-15, 2023-12-31, 2025-06-01"
print(re.findall(r'(\d{4})-(\d{2})-(\d{2})', dates))
# → [('2024', '01', '15'), ('2023', '12', '31'), ('2025', '06', '01')]
#   ↑ each match returns a tuple of all captured groups

# .group() on a Match object
m = re.search(r'(\d{4})-(\d{2})-(\d{2})', '2024-01-15')
print(m.group(0))   # → 2024-01-15  (entire match)
print(m.group(1))   # → 2024        (group 1)
print(m.group(2))   # → 01          (group 2)
print(m.group(3))   # → 15          (group 3)

2) Named group `(?P<name>...)` (命名捕获组)

import re

# Named groups — access by name instead of index
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
m = re.search(pattern, '2024-01-15')

print(m.group('year'))    # → 2024
print(m.group('month'))   # → 01
print(m.group('day'))     # → 15
print(m.groupdict())      # → {'year': '2024', 'month': '01', 'day': '15'}

# Named groups in re.sub — backreference by name
result = re.sub(
    r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})',
    r'\g<day>/\g<month>/\g<year>',    # reorder: DD/MM/YYYY
    '2024-01-15'
)
print(result)   # → 15/01/2024

3) Non-capturing group `(?:...)` (非捕获组)

When you need grouping for quantifiers but don't want the group in your results:

import re

# Without (?:...) — capturing group pollutes findall results
print(re.findall(r'(\d+)(?:px|em|rem)', '12px 3em 100rem'))
# → ['12', '3', '100']   ← only numbers, units NOT captured ✅

# With capturing group — units would also appear
print(re.findall(r'(\d+)(px|em|rem)', '12px 3em 100rem'))
# → [('12', 'px'), ('3', 'em'), ('100', 'rem')]  ← units captured too

# (?:...) for grouping quantifiers
print(re.findall(r'(?:ha)+', 'hahaha haha ha h'))
# → ['hahaha', 'haha', 'ha']   (group 'ha' as a unit for +)

4) Backreferences `\1` `\2` (反向引用)

Refer to a previously captured group within the same pattern.

import re

# Find repeated words
text = "the the quick brown fox fox jumps"
print(re.findall(r'\b(\w+)\s+\1\b', text))
# → ['the', 'fox']   (\1 refers back to group 1)

# Find doubled characters
print(re.findall(r'(.)\1', 'aabcddee'))
# → ['a', 'd', 'e']

# HTML tag matching: opening and closing tags must match
html = "<h1>Title</h1> <h2>Subtitle</h2>"
print(re.findall(r'<(\w+)>(.*?)</\1>', html))
# → [('h1', 'Title'), ('h2', 'Subtitle')]
#   \1 ensures the closing tag matches the opening tag

8. Lookahead and Lookbehind — Zero-width assertions (零宽断言)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <span style="color:#E8600A;font-weight:700">Lookaround (环视断言)</span> matches a position based on what is around it, <span style="color:#2980B9">without consuming characters (不消耗字符)</span>. They are "zero-width" — the match position doesn't advance. </div>

1) Positive lookahead `(?=...)` (正向先行断言)

"Match X only if followed by Y" — Y is NOT included in the match.

import re

# Match a number only if followed by "px"
print(re.findall(r'\d+(?=px)', '12px 3em 100px 5rem'))
# → ['12', '100']   (px NOT included in results)

# Match word only if followed by a colon
text = "name: Alice age: 30 city: NYC"
print(re.findall(r'\w+(?=:)', text))
# → ['name', 'age', 'city']

# Password validation: must contain a digit
import re
def has_digit(pw): return bool(re.search(r'(?=.*\d)', pw))
print(has_digit("abc123"))   # → True
print(has_digit("abcdef"))   # → False

2) Negative lookahead `(?!...)` (负向先行断言)

"Match X only if NOT followed by Y"

import re

# Match a number only if NOT followed by "px"
print(re.findall(r'\d+(?!px)\b', '12px 3em 100px 5rem'))
# → ['3', '5']

# Match 'foo' not followed by 'bar'
print(re.findall(r'foo(?!bar)', 'foobar foobaz foo'))
# → ['foo', 'foo']   ('foobar' excluded, 'foobaz' and 'foo' included)

3) Positive lookbehind `(?<=...)` (正向后行断言)

"Match X only if preceded by Y" — Y is NOT included in the match.

import re

# Match digits only if preceded by '$'
prices = "items: $10, €20, £30, $50"
print(re.findall(r'(?<=\$)\d+', prices))
# → ['10', '50']   ($ NOT included in results)

# Match word after a colon and space
text = "name: Alice, city: NYC, age: 30"
print(re.findall(r'(?<=: )\w+', text))
# → ['Alice', 'NYC', '30']

4) Negative lookbehind `(?<!...)` (负向后行断言)

"Match X only if NOT preceded by Y"

import re

# Match digits NOT preceded by '$'
prices = "items: $10, 20, $50, 100"
print(re.findall(r'(?<!\$)\b\d+\b', prices))
# → ['20', '100']

# Match 'ing' not preceded by 'run'
words = "running swimming singing"
print(re.findall(r'(?<!run)ning\b', words))
# → ['ning', 'ning']   (swim→ming yes, sing→ning yes, runNING excluded)

5) Lookaround summary table (环视断言总结)

Syntax	Name	Meaning
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">(?=Y)</code>	Positive lookahead (正向先行)	Followed by Y
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">(?!Y)</code>	Negative lookahead (负向先行)	NOT followed by Y
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">(?<=Y)</code>	Positive lookbehind (正向后行)	Preceded by Y
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">(?<!Y)</code>	Negative lookbehind (负向后行)	NOT preceded by Y

9. Alternation `|` — OR operator (或运算符)

import re

# | matches either the left or right pattern
print(re.findall(r'cat|dog|fish', 'I have a cat and a dog'))
# → ['cat', 'dog']

# With groups: (cat|dog) scopes the alternation
print(re.findall(r'(cat|dog)s?', 'cats dogs cat dog'))
# → ['cat', 'dog', 'cat', 'dog']

# Alternation of longer patterns
log = "ERROR: disk full  WARNING: low memory  INFO: started"
print(re.findall(r'ERROR|WARNING|INFO', log))
# → ['ERROR', 'WARNING', 'INFO']

# Order matters: first match wins
print(re.search(r'cat|catch', 'I catch cats'))   # matches 'cat' (not 'catch'!)
print(re.search(r'catch|cat', 'I catch cats'))   # matches 'catch' ← correct order

10. Flags — Modifying match behavior (标志位)

1) All flags (所有标志位)

Flag (short)	Flag (long)	Effect
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.I</code>	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.IGNORECASE</code>	Case-insensitive matching (忽略大小写)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.M</code>	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.MULTILINE</code>	`^`/`$` match each line (多行模式)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.S</code>	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.DOTALL</code>	`.` matches `\n` too (点号匹配换行)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.X</code>	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.VERBOSE</code>	Allow whitespace/comments in pattern (详细模式)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.A</code>	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.ASCII</code>	`\w \d \s` match ASCII only (ASCII模式)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.L</code>	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.LOCALE</code>	Locale-dependent matching (本地化模式)

2) `re.IGNORECASE` (re.I)

import re

print(re.findall(r'hello', 'Hello HELLO hello', re.I))
# → ['Hello', 'HELLO', 'hello']

# Case-insensitive word boundary
print(re.findall(r'\bpython\b', 'Python PYTHON python', re.IGNORECASE))
# → ['Python', 'PYTHON', 'python']

3) `re.MULTILINE` (re.M)

import re

log = """ERROR: disk full
WARNING: low memory
ERROR: timeout
INFO: done"""

# Without re.M: ^ only matches start of entire string
print(re.findall(r'^ERROR.*',  log))
# → ['ERROR: disk full']

# With re.M: ^ matches start of EACH line
print(re.findall(r'^ERROR.*',  log, re.M))
# → ['ERROR: disk full', 'ERROR: timeout']

4) `re.DOTALL` (re.S)

import re

html = "<div>\n  <p>Hello</p>\n</div>"

# Without re.S: . does not match \n
print(re.search(r'<div>.*</div>',  html))          # None

# With re.S: . matches everything including \n
print(re.search(r'<div>.*</div>',  html, re.S))    # match
print(re.search(r'<div>.*?</div>', html, re.S).group())
# → <div>\n  <p>Hello</p>\n</div>

5) `re.VERBOSE` (re.X) — Readable complex patterns (可读的复杂模式)

import re

# Without re.X — hard to read
email_pattern_compact = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'

# With re.X — add whitespace and comments freely
email_pattern_verbose = re.compile(r'''
    ^                       # start of string
    [a-zA-Z0-9._%+-]+       # local part (user name)
    @                       # @ symbol
    [a-zA-Z0-9.-]+          # domain name
    \.                      # literal dot
    [a-zA-Z]{2,}            # top-level domain (2+ letters)
    $                       # end of string
''', re.VERBOSE)

print(email_pattern_verbose.match('user@example.com'))   # match
print(email_pattern_verbose.match('bad@'))               # None

6) Combining flags (组合标志位)

import re

# Combine with | (bitwise OR)
text = "Hello\nWorld"
print(re.findall(r'^.+$', text, re.M | re.I))
# re.M → ^ and $ per line
# re.I → case-insensitive
# → ['Hello', 'World']

# Inline flags in the pattern (?flags) — scoped to pattern
print(re.findall(r'(?i)hello', 'Hello HELLO hello'))
# → ['Hello', 'HELLO', 'hello']

# Inline flags for part of pattern
print(re.findall(r'(?i:hello) world', 'HELLO world hello World'))
# → ['HELLO world']   (only 'hello' is case-insensitive, 'world' is not)

III. The `re` Module API — Complete Reference

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> The <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re</code> module has two usage modes: <span style="color:#E8600A;font-weight:700">① module-level functions</span> like <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.search()</code> (convenient for one-off use) and <span style="color:#E8600A;font-weight:700">② compiled Pattern objects</span> via <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.compile()</code> (preferred when the same pattern is used repeatedly — avoids recompilation overhead). </div>

1. `re.compile()` — Pre-compile a pattern (预编译模式)

import re

# compile() returns a Pattern object
pattern = re.compile(r'\d{4}-\d{2}-\d{2}', re.IGNORECASE)

# Call methods on the Pattern object (same names as module-level functions)
print(pattern.search('date: 2024-01-15'))
print(pattern.findall('from 2024-01-01 to 2024-12-31'))
# → ['2024-01-01', '2024-12-31']

# Pattern attributes
print(pattern.pattern)    # → \d{4}-\d{2}-\d{2}
print(pattern.flags)      # → 34  (2 = default + 32 = IGNORECASE)
print(pattern.groups)     # → 0   (no capturing groups)

2. `re.search()` — Find first match anywhere (查找第一个匹配)

Returns a <span style="color:#E8600A;font-weight:700">Match object</span> if found anywhere in the string, or None.

import re

text = "The price is $42.99 for 3 items"

m = re.search(r'\$(\d+\.\d{2})', text)
if m:
    print(m.group())    # → $42.99   (full match)
    print(m.group(1))   # → 42.99    (group 1 — no $)
    print(m.start())    # → 13       (start index)
    print(m.end())      # → 19       (end index)
    print(m.span())     # → (13, 19) (start, end)
    print(m.string)     # → "The price is $42.99 for 3 items"  (original)

3. `re.match()` — Match at string start (从字符串开头匹配)

<span style="color:#C0392B;font-weight:600">Warning: re.match() only matches at the BEGINNING of the string — NOT the same as re.search()!</span>

import re

# match() — only succeeds if pattern starts at position 0
print(re.match(r'\d+', '123 abc'))    # match   — starts at position 0
print(re.match(r'\d+', 'abc 123'))    # None    — 'abc' is not \d+
print(re.search(r'\d+', 'abc 123'))   # match   — search finds it anywhere

# match() with ^ is redundant (both restrict to start)
print(re.match(r'hello', 'hello world'))    # match
print(re.match(r'hello', 'say hello'))      # None

# Practical: validate that a string is ENTIRELY a number
def is_integer(s):
    return bool(re.match(r'^\d+$', s))

print(is_integer("12345"))    # → True
print(is_integer("123a5"))    # → False

4. `re.fullmatch()` — Match entire string (匹配整个字符串)

Requires the pattern to match the complete string from start to end.

import re

# fullmatch() equivalent to match() with ^ and $ anchors
print(re.fullmatch(r'\d+', '12345'))     # match   — entire string is digits
print(re.fullmatch(r'\d+', '123abc'))    # None    — not ALL digits
print(re.fullmatch(r'\d+', '  123  '))   # None    — spaces don't match \d

# Validate formats completely
ip_pattern  = re.compile(r'(\d{1,3}\.){3}\d{1,3}')
zip_pattern = re.compile(r'\d{5}(-\d{4})?')
email_pat   = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}')

tests = ['192.168.1.1', '12345', 'user@example.com', 'bad_input']
for t in tests:
    results = {
        'ip':    bool(ip_pattern.fullmatch(t)),
        'zip':   bool(zip_pattern.fullmatch(t)),
        'email': bool(email_pat.fullmatch(t)),
    }
    print(f"{t:<25} → {results}")

5. `re.findall()` — Find all matches (查找所有匹配)

Returns a list of all non-overlapping matches.

import re

text = "2024-01-15, 2023-12-31, 2025-06-01"

# No groups → returns list of strings
print(re.findall(r'\d{4}-\d{2}-\d{2}', text))
# → ['2024-01-15', '2023-12-31', '2025-06-01']

# One group → returns list of group contents
print(re.findall(r'(\d{4})-\d{2}-\d{2}', text))
# → ['2024', '2023', '2025']   (only the year group)

# Multiple groups → returns list of tuples
print(re.findall(r'(\d{4})-(\d{2})-(\d{2})', text))
# → [('2024', '01', '15'), ('2023', '12', '31'), ('2025', '06', '01')]

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">The return type of <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">findall()</code> changes based on groups: no groups → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">List[str]</code>, one group → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">List[str]</code>, multiple groups → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">List[tuple]</code>. This is a common source of bugs.</span> Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">finditer()</code> for consistent Match objects.</div>

6. `re.finditer()` — Iterator of Match objects (匹配对象迭代器)

Returns an iterator of Match objects. More powerful than findall() because each Match has .start(), .end(), .group(), etc.

import re

text = "Alice scored 95, Bob scored 87, Carol scored 100"

for m in re.finditer(r'(\w+) scored (\d+)', text):
    name  = m.group(1)
    score = int(m.group(2))
    print(f"{name}: {score} pts  | span={m.span()}")
# → Alice: 95 pts  | span=(0, 16)
# → Bob: 87 pts    | span=(18, 32)
# → Carol: 100 pts | span=(34, 49)

# Collect all spans for highlighting
positions = [(m.start(), m.end()) for m in re.finditer(r'\d+', text)]
print(positions)   # → [(13, 15), (28, 30), (44, 47)]

7. `re.sub()` — Substitute matches (替换匹配)

1) Basic substitution (基本替换)

import re

text = "Hello   World   Python"

# Replace multiple spaces with single space
result = re.sub(r'\s+', ' ', text)
print(result)   # → Hello World Python

# count parameter: replace only first N occurrences
result = re.sub(r'\s+', ' ', text, count=1)
print(result)   # → Hello World   Python  (only first replaced)

2) Backreferences in replacement (替换中的反向引用)

import re

# \1, \2 refer to captured groups in the replacement string
# Reformat date from YYYY-MM-DD to DD/MM/YYYY
dates = "Born: 2024-01-15, Died: 2099-12-31"
result = re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\3/\2/\1', dates)
print(result)   # → Born: 15/01/2024, Died: 31/12/2099

# Wrap all numbers in <b> tags
result = re.sub(r'(\d+)', r'<b>\1</b>', 'I have 3 cats and 2 dogs')
print(result)   # → I have <b>3</b> cats and <b>2</b> dogs

# Named group backreference \g<name>
result = re.sub(
    r'(?P<last>\w+), (?P<first>\w+)',
    r'\g<first> \g<last>',
    'Smith, John'
)
print(result)   # → John Smith

3) Replacement function (替换函数)

Pass a callable as the replacement — it receives the Match object and returns the replacement string.

import re

# Convert all numbers to their double
def double(m):
    return str(int(m.group()) * 2)

result = re.sub(r'\d+', double, 'I have 3 cats and 10 dogs')
print(result)   # → I have 6 cats and 20 dogs

# Normalize different date formats to ISO 8601
def normalize_date(m):
    month_map = {'Jan':1,'Feb':2,'Mar':3,'Apr':4,'May':5,'Jun':6,
                 'Jul':7,'Aug':8,'Sep':9,'Oct':10,'Nov':11,'Dec':12}
    month = month_map.get(m.group('month_name'),
                          int(m.group('month_num') or 0))
    day   = int(m.group('day'))
    year  = int(m.group('year'))
    return f"{year:04d}-{month:02d}-{day:02d}"

pattern = re.compile(r'''
    (?:(?P<month_name>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
       \s+(?P<day>\d{1,2}),\s+(?P<year>\d{4}))
    |
    (?:(?P<month_num>\d{1,2})/(?P<day2>\d{1,2})/(?P<year2>\d{4}))
''', re.VERBOSE)

# Just demonstrate the function approach:
text = "Meeting on Jan 15, 2024"
result = re.sub(
    r'(?P<month_name>Jan|Feb|Mar)\s+(?P<day>\d{1,2}),\s+(?P<year>\d{4})',
    normalize_date,
    text
)
print(result)   # → Meeting on 2024-01-15

8. `re.subn()` — Substitute and count (替换并计数)

Like re.sub() but returns a tuple (new_string, count).

import re

text = "foo bar foo baz foo"
result, n = re.subn(r'foo', 'qux', text)
print(result)   # → qux bar qux baz qux
print(n)        # → 3  (number of substitutions made)

# Useful for detecting if any replacements occurred
text2 = "no matches here"
_, count = re.subn(r'foo', 'qux', text2)
if count == 0:
    print("No substitutions made")

9. `re.split()` — Split by pattern (按模式分割)

import re

# Split on any non-alphanumeric sequence
text = "one,two;;three   four\tfive"
print(re.split(r'[^a-zA-Z0-9]+', text))
# → ['one', 'two', 'three', 'four', 'five']

# Split on commas with optional surrounding whitespace
csv = "Alice , Bob,Carol ,  Dave"
print(re.split(r'\s*,\s*', csv))
# → ['Alice', 'Bob', 'Carol', 'Dave']

# maxsplit: only split N times
print(re.split(r'\s+', 'a b c d e', maxsplit=2))
# → ['a', 'b', 'c d e']

# Capturing group: delimiters are INCLUDED in the result
text = "one+two-three*four"
print(re.split(r'([+\-*])', text))
# → ['one', '+', 'two', '-', 'three', '*', 'four']  ← operators kept

10. `re.escape()` — Escape special characters (转义特殊字符)

Escapes all non-alphanumeric characters so a raw string can be used as a literal pattern.

import re

# When user input is used as part of a pattern — MUST escape it
user_input = "hello.world (test)"
safe_pattern = re.escape(user_input)
print(safe_pattern)   # → hello\.world\ \(test\)

# Safe search
text = "I said: hello.world (test) today"
m = re.search(re.escape(user_input), text)
print(bool(m))   # → True

# Dangerous without escape:
print(re.search(user_input, text))   # . and () have special meaning!

# Common use: build a pattern from a list of keywords
keywords = ['c++', 'c#', '.net', 'node.js']
pattern  = '|'.join(re.escape(k) for k in keywords)
print(pattern)   # → c\+\+|c\#|\.net|node\.js

found = re.findall(pattern, 'I know c++ and .net and node.js', re.I)
print(found)   # → ['c++', '.net', 'node.js']

11. Match Object — Complete API (匹配对象完整API)

import re

text = "2024-01-15 is a Monday in New York"
m    = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', text)

# ── Accessing matched text ─────────────────────────────────
print(m.group())           # → 2024-01-15  (full match, same as group(0))
print(m.group(0))          # → 2024-01-15
print(m.group(1))          # → 2024         (group 1 by index)
print(m.group(2, 3))       # → ('01', '15') (multiple groups)
print(m.group('year'))     # → 2024         (group by name)
print(m.groupdict())       # → {'year': '2024', 'month': '01', 'day': '15'}
print(m.groups())          # → ('2024', '01', '15')  (all groups as tuple)
print(m.groups(default='N/A'))  # groups() with default for non-participating groups

# ── Position information ───────────────────────────────────
print(m.start())           # → 0    (start of full match)
print(m.end())             # → 10   (end of full match)
print(m.span())            # → (0, 10)
print(m.start(1))          # → 0    (start of group 1)
print(m.end('month'))      # → 7    (end of named group)
print(m.span('day'))       # → (8, 10)

# ── Context ────────────────────────────────────────────────
print(m.string)            # → full original string
print(m.re)                # → compiled pattern object
print(m.pos)               # → 0    (start position passed to search)
print(m.endpos)            # → 34   (end position passed to search)
print(m.lastindex)         # → 3    (index of last matched group)
print(m.lastgroup)         # → 'day' (name of last matched group)

# ── Expand — backreferences in a template string ───────────
print(m.expand(r'\g<day>/\g<month>/\g<year>'))
# → 15/01/2024

IV. Practical Patterns — Production-Ready Recipes (生产级常用模式)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> This section provides <span style="color:#E8600A;font-weight:700">ready-to-use, battle-tested patterns</span> for the most common real-world tasks. Each pattern is annotated and tested. </div>

1. Validation Patterns (验证模式)

import re

patterns = {

    # Email (simplified RFC 5321 compliant)
    'email': re.compile(r'''
        ^[a-zA-Z0-9._%+\-]+     # local part
        @
        [a-zA-Z0-9.\-]+          # domain
        \.[a-zA-Z]{2,}$          # TLD (2+ chars)
    ''', re.VERBOSE),

    # Phone: +1 (555) 123-4567 / 555-123-4567 / 5551234567
    'phone_us': re.compile(
        r'^(\+1[-.\s]?)?'
        r'(\(?\d{3}\)?[-.\s]?)'
        r'\d{3}[-.\s]?\d{4}$'
    ),

    # IPv4 address
    'ipv4': re.compile(
        r'^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}'
        r'(25[0-5]|2[0-4]\d|[01]?\d\d?)$'
    ),

    # URL (http/https)
    'url': re.compile(
        r'^https?://'
        r'(([a-zA-Z0-9\-]+\.)+[a-zA-Z]{2,})'
        r'(:\d+)?'
        r'(/[^\s]*)?$'
    ),

    # Date: YYYY-MM-DD
    'date_iso': re.compile(
        r'^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$'
    ),

    # Strong password: 8+ chars, upper, lower, digit, special
    'strong_password': re.compile(
        r'^(?=.*[a-z])'          # at least one lowercase
        r'(?=.*[A-Z])'           # at least one uppercase
        r'(?=.*\d)'              # at least one digit
        r'(?=.*[!@#$%^&*])'     # at least one special char
        r'.{8,}$'                # at least 8 chars total
    ),

    # Credit card (Visa/MC/Amex, with/without spaces)
    'credit_card': re.compile(
        r'^(?:4\d{12}(?:\d{3})?'     # Visa
        r'|5[1-5]\d{14}'             # MasterCard
        r'|3[47]\d{13})$'            # Amex
    ),

    # ZIP code (US)
    'zip_us': re.compile(r'^\d{5}(-\d{4})?$'),

    # Hex color
    'hex_color': re.compile(r'^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$'),

    # Semantic version: 1.2.3 or 1.2.3-alpha.1
    'semver': re.compile(
        r'^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)'
        r'(-[a-zA-Z0-9.\-]+)?(\+[a-zA-Z0-9.\-]+)?$'
    ),
}

# Test them
tests = {
    'email':           ['user@example.com', 'bad@', 'no-at-sign'],
    'ipv4':            ['192.168.1.1', '256.0.0.1', '10.0.0'],
    'date_iso':        ['2024-01-15', '2024-13-01', '24-1-1'],
    'strong_password': ['Abc@1234', 'weakpass', 'NoSpecial1'],
    'hex_color':       ['#FF5733', '#abc', '#GGGGGG'],
    'semver':          ['1.2.3', '1.0.0-alpha.1', '1.2'],
}

for field, values in tests.items():
    pat = patterns[field]
    print(f"\n{field}:")
    for v in values:
        ok = '✅' if pat.fullmatch(v) else '❌'
        print(f"  {ok} {v!r}")

2. Extraction Patterns (提取模式)

import re

# ── Extract all URLs from text ──────────────────────────────
def extract_urls(text):
    pattern = r'https?://[^\s<>"{}|\\^`\[\]]+'
    return re.findall(pattern, text)

html = 'Visit <a href="https://example.com/path?q=1">site</a> or http://other.org'
print(extract_urls(html))
# → ['https://example.com/path?q=1', 'http://other.org']


# ── Extract all emails ──────────────────────────────────────
def extract_emails(text):
    pattern = r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}'
    return re.findall(pattern, text)

text = "Contact alice@example.com or bob.smith@company.co.uk for info"
print(extract_emails(text))
# → ['alice@example.com', 'bob.smith@company.co.uk']


# ── Parse log lines ─────────────────────────────────────────
def parse_log(line):
    pattern = re.compile(r'''
        (?P<ip>[\d.]+)          \s+   # IP address
        \S+                     \s+   # ident
        \S+                     \s+   # auth user
        \[(?P<time>[^\]]+)\]    \s+   # timestamp
        "(?P<method>\w+)        \s+
         (?P<path>[^\s"]+)      \s+
         \S+"                   \s+   # HTTP version
        (?P<status>\d{3})       \s+   # status code
        (?P<size>\d+)                 # bytes
    ''', re.VERBOSE)
    m = pattern.match(line)
    return m.groupdict() if m else None

log_line = '127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326'
print(parse_log(log_line))
# → {'ip': '127.0.0.1', 'time': '10/Oct/2000:13:55:36 -0700',
#    'method': 'GET', 'path': '/apache_pb.gif', 'status': '200', 'size': '2326'}


# ── Extract numbers with units ──────────────────────────────
def extract_measurements(text):
    pattern = r'(\d+(?:\.\d+)?)\s*(px|em|rem|%|pt|vh|vw)'
    return [(float(v), u) for v, u in re.findall(pattern, text)]

css = "width: 100px; margin: 1.5em; font-size: 16px; height: 50vh"
print(extract_measurements(css))
# → [(100.0, 'px'), (1.5, 'em'), (16.0, 'px'), (50.0, 'vh')]

3. Cleaning and Normalization Patterns (清理与标准化模式)

import re

# ── Normalize whitespace ────────────────────────────────────
def normalize_whitespace(text):
    return re.sub(r'\s+', ' ', text).strip()

print(normalize_whitespace("  Hello   World  \n\t  Python  "))
# → Hello World Python


# ── Remove HTML tags ────────────────────────────────────────
def strip_html(html):
    clean = re.sub(r'<[^>]+>', '', html)
    return re.sub(r'\s+', ' ', clean).strip()

html = "<h1>Title</h1><p>Some <b>bold</b> and <em>italic</em> text.</p>"
print(strip_html(html))
# → Title Some bold and italic text.


# ── Slugify a string ────────────────────────────────────────
def slugify(text):
    text = text.lower()
    text = re.sub(r'[^\w\s-]', '',  text)   # remove non-word chars
    text = re.sub(r'[\s_]+',   '-', text)   # spaces/underscores → dash
    text = re.sub(r'-+',       '-', text)   # multiple dashes → one
    return text.strip('-')

print(slugify("Hello, World! This is Python 3.12"))
# → hello-world-this-is-python-312


# ── Camel case to snake case ────────────────────────────────
def camel_to_snake(name):
    name = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1_\2', name)  # ABCDef → ABC_Def
    name = re.sub(r'([a-z\d])([A-Z])',      r'\1_\2', name)  # fooBar → foo_Bar
    return name.lower()

print(camel_to_snake('camelCaseString'))    # → camel_case_string
print(camel_to_snake('parseHTMLContent'))   # → parse_html_content
print(camel_to_snake('MyClassName'))        # → my_class_name


# ── Mask sensitive data ─────────────────────────────────────
def mask_credit_card(text):
    return re.sub(r'\b(\d{4})\d{8}(\d{4})\b', r'\1 **** **** \2', text)

def mask_email(text):
    return re.sub(r'(\w{2})\w+(@[^\s]+)', r'\1***\2', text)

print(mask_credit_card("Card: 4111111111111111"))
# → Card: 4111 **** **** 1111
print(mask_email("Email alice@example.com to bob@test.org"))
# → Email al***@example.com to bo***@test.org

4. Common Pitfalls (常见陷阱)

1) Catastrophic backtracking (灾难性回溯)

import re, time

# ⚠️ DANGEROUS pattern: (a+)+ causes exponential backtracking
evil_pattern   = r'^(a+)+$'
safe_pattern   = r'^a+$'

test_string = 'a' * 25 + 'X'  # no match — forces max backtracking

# Safe pattern — fast
t = time.time()
re.search(safe_pattern, test_string)
print(f"Safe:  {time.time()-t:.6f}s")   # → ~0.000001s

# Evil pattern — hangs for long inputs!
# (DO NOT run with 'a' * 30 + 'X')
t = time.time()
re.search(evil_pattern, 'a' * 20 + 'X')
print(f"Evil:  {time.time()-t:.6f}s")   # → much longer

# FIX: use atomic groups or possessive quantifiers, or restructure
# In Python 3.11+: use re.POSSESSIVE or regex module

2) `re.match()` vs `re.search()` confusion

import re

# COMMON MISTAKE: using match() when search() is needed
data = "  123 some text"

# Incorrect — thinking match() searches anywhere
result = re.match(r'\d+', data)   # → None!  (leading spaces)

# Correct
result = re.search(r'\d+', data)  # → '123'

# Or anchor explicitly
result = re.match(r'\s*(\d+)', data)  # → group(1) = '123'

3) `findall()` group return type surprise

import re

text = "2024-01 2024-02"

# Bug: adding a group changes return type
print(re.findall(r'\d{4}-\d{2}',       text))  # → ['2024-01', '2024-02']
print(re.findall(r'(\d{4})-\d{2}',     text))  # → ['2024', '2024']  (years only!)
print(re.findall(r'(\d{4})-(\d{2})',   text))  # → [('2024','01'), ('2024','02')]

# Fix: use non-capturing group when you don't need the group value
print(re.findall(r'(?:\d{4})-(?:\d{2})', text))  # → ['2024-01', '2024-02']

4) Forgetting raw strings

import re

# WRONG: \b interpreted by Python as backspace character (ASCII 8)
print(re.findall('\bword\b', 'word in a sentence'))   # → []  WRONG

# CORRECT: raw string
print(re.findall(r'\bword\b', 'word in a sentence'))  # → ['word']

# WRONG: \d interpreted as literal 'd' in some contexts
print(re.findall('\d+', 'abc 123'))   # may work but is fragile
# CORRECT:
print(re.findall(r'\d+', 'abc 123'))  # → ['123']

V. Complete API Quick Reference (完整API速查表)

Function / Method	Returns	Use when
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.compile(pat, flags)</code>	Pattern	Pattern reused multiple times
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.search(pat, s)</code>	Match or None	Find first match anywhere
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.match(pat, s)</code>	Match or None	Match only at position 0
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.fullmatch(pat, s)</code>	Match or None	Pattern must cover entire string
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.findall(pat, s)</code>	List[str or tuple]	All matches as a list
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.finditer(pat, s)</code>	Iterator[Match]	All matches with position info
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.sub(pat, repl, s)</code>	str	Replace matches
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.subn(pat, repl, s)</code>	(str, int)	Replace + count substitutions
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.split(pat, s)</code>	List[str]	Split string by pattern
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.escape(s)</code>	str	Treat literal string as pattern
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">m.group(n)</code>	str	Get captured group text
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">m.groups()</code>	tuple	All groups as tuple
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">m.groupdict()</code>	dict	Named groups as dict
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">m.start() / m.end()</code>	int	Match position
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">m.span()</code>	(int, int)	(start, end) tuple
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">m.expand(template)</code>	str	Backreference expansion

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> Master regex in four steps: <span style="color:#E8600A;font-weight:700">① know the 5 building blocks</span> (literals, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.</code>, classes <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">[]</code>, quantifiers <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">+?{}</code>, anchors <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">^$\b</code>) → <span style="color:#E8600A;font-weight:700">② use groups <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">()</code> to capture, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">(?:)</code> to group without capturing</span> → <span style="color:#E8600A;font-weight:700">③ add lookaround <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">(?=)(?!)</code> for context-sensitive matching</span> → <span style="color:#E8600A;font-weight:700">④ always use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">r''</code> raw strings, prefer non-greedy <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.?</code>, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">re.VERBOSE</code> for complex patterns</span>. </div>

Python text.split()

Mon, 09 Mar 2026 00:00:00 GMT

I. `text.split()` in Python

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">split()</code> is a Python <span style="color:#2980B9">string method (字符串方法)</span> that divides a string into a list of substrings. It's one of the most commonly used tools for text processing. </div>

1. Basic Usage

text = "Python is awesome"
result = text.split()
print(result)  # Output: ['Python', 'is', 'awesome']

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> By default, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">split()</code> uses <span style="color:#2980B9">whitespace characters (空白字符)</span> as delimiters: spaces, newlines <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">\n</code>, tabs <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">\t</code>, etc. </div>

2. Parameter Details

1) `split()` vs `split(' ')` Difference

text = "Python   is   awesome"  # Multiple spaces

# Default split() - handles any amount of whitespace
print(text.split())   # Output: ['Python', 'is', 'awesome']

# split(' ') - strictly splits on single space
print(text.split(' '))  # Output: ['Python', '', '', 'is', '', '', 'awesome']

2) Specifying Separator `sep`

data = "apple,banana,orange"
print(data.split(','))  # Output: ['apple', 'banana', 'orange']

path = "user/local/bin" # slash/backslash
print(path.split('/'))  # Output: ['user', 'local', 'bin']

sentence = "Hello-World-Python"
print(sentence.split('-'))  # Output: ['Hello', 'World', 'Python']

3) Limiting Splits with `maxsplit`

The <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">*message</code> syntax is used for Extended Unpacking (扩展解包).

date, time, level: These take the first 3 elements of the list respectively.
*message: This collects all remaining elements into a List (列表). message = parts[3:] # ['Connection', 'failed'] ← all remaining as list

text = "one two three four five"

# Split only first 2 times
print(text.split(maxsplit=2))  # Output: ['one', 'two', 'three four five']

# Equivalent syntax
print(text.split(' ', 2))  # Output: ['one', 'two', 'three four five']

# Practical example: parsing simple logs
log = "2024-01-15 10:30:45 ERROR Connection failed"
date, time, level, *message = log.split(maxsplit=3)
print(f"Date: {date}, Time: {time}, Level: {level}, Message: {message}")
# Output: Date: 2024-01-15, Time: 10:30:45, Level: ERROR, Message: ['Connection', 'failed']

3. Common Use Cases

1) Word Frequency Counting

from collections import Counter

text = "Python is awesome. Python is powerful!"
words = text.lower().split()  # Convert to lowercase then split
# Note: Punctuation remains! Output: ['python', 'is', 'awesome.', 'python', 'is', 'powerful!']

# Better approach: clean punctuation
import re
words = re.findall(r'\w+', text.lower())
print(Counter(words))  # Output: Counter({'python': 2, 'is': 2, 'awesome': 1, 'powerful': 1})

Part	Meaning	Explanation
`re`	regex module	Python's <span style="color:#2980B9">regular expression library (正则表达式库)</span>
`.findall()`	find all matches	Returns <span style="color:#E8600A;font-weight:700">all non-overlapping matches</span> as a list
`r''`	raw string	<span style="color:#2980B9">Raw string (原始字符串)</span> - backslashes are treated literally
`\w`	word character	Matches <span style="color:#2980B9">letters, digits, and underscore (字母、数字、下划线)</span>
`+`	one or more	<span style="color:#2980B9">Quantifier (量词)</span> - match 1 or more occurrences
`text.lower()`	lowercase	Converts everything to <span style="color:#2980B9">lowercase (小写)</span> for case-insensitive counting

2) Parsing CSV (Simple Cases)

csv_line = "John,25,Engineer,New York"
name, age, job, city = csv_line.split(',')
print(f"{name} is {age} years old, works as {job} in {city}")
# Output: John is 25 years old, works as Engineer in New York

3) Handling User Input

# Parsing commands
command = "save document.txt"
action, filename = command.split(maxsplit=1)
print(f"Action: {action}, File: {filename}")  # Output: Action: save, File: document.txt

# Processing multiple inputs
user_input = "5 10 15"
numbers = [int(x) for x in user_input.split()]
print(sum(numbers))  # Output: 30

4. Important Notes

Empty string:

text = ""
print(text.split())  # Output: [] (empty list)
print(text.split(','))  # Output: [''] (list with one element)

Separator not found:

text = "hello world"
print(text.split(','))  # Output: ['hello world']

Consecutive separators:

text = "a,,b,c"
print(text.split(','))  # Output: ['a', '', 'b', 'c']

Return value is always a list:

text = "python"
result = text.split()
print(type(result))  # Output: <class 'list'>
print(result)  # Output: ['python']

</div>

5. Method Comparison

Method	Purpose	Example	Result
`split()`	Split by whitespace	`"a b c".split()`	`['a', 'b', 'c']`
`split(' ')`	Split by single space	`"a b".split(' ')`	`['a', '', 'b']`
`rsplit()`	Split from right	`"a-b-c".rsplit('-',1)`	`['a-b', 'c']`
`splitlines()`	Split by line breaks	`"a\nb".splitlines()`	`['a', 'b']`
`partition()`	Split into 3 parts	`"a-b-c".partition('-')`	`('a', '-', 'b-c')`

6. Practical Example: Parsing Configuration Files

config = """
host=localhost
port=8080
debug=true
"""

settings = {}
for line in config.strip().split('\n'):
    if '=' in line:
        key, value = line.split('=', 1)
        settings[key] = value

print(settings)
# Output: {'host': 'localhost', 'port': '8080', 'debug': 'true'}

7. Advanced Techniques

1) Using `split()` with List Comprehension

# Extract numbers from mixed string
data = "age:25,score:95,weight:70"
values = [item.split(':')[1] for item in data.split(',')]
print(values)  # Output: ['25', '95', '70']

# Convert to appropriate types
numeric_values = [int(item.split(':')[1]) for item in data.split(',')]
print(numeric_values)  # Output: [25, 95, 70]

2) Handling Multiple Delimiters

import re

text = "apple;banana,orange|grape"
# Split on ; , or |
fruits = re.split('[;,]', text)  # Simple case
fruits = re.split('[;,\|]', text)  # With escape for |
print(fruits)  # Output: ['apple', 'banana', 'orange', 'grape']

3) Preserving Delimiters

# Using re.split() with capturing group keeps delimiters
text = "hello-world-python"
parts = re.split('(-)', text)
print(parts)  # Output: ['hello', '-', 'world', '-', 'python']

Python threading

Mon, 09 Mar 2026 00:00:00 GMT

I. Python Multithreading — Complete API Reference Manual

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Python's <span style="color:#E8600A;font-weight:700">threading</span> module provides a high-level interface for <span style="color:#E8600A;font-weight:700">Multithreading (多线程编程)</span> built on top of the lower-level <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">_thread</code> module. Because of the <span style="color:#E8600A;font-weight:700">GIL (Global Interpreter Lock, 全局解释器锁)</span>, threads do not achieve true CPU parallelism for pure Python code — but they excel at <span style="color:#2980B9">IO-bound tasks (IO密集型任务)</span> such as network requests, file operations, and database calls. This manual covers every public API with runnable examples. </div>

1. Thread — Core Thread Object (核心线程对象)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">threading.Thread</code> is the fundamental building block. A thread can be created by passing a <strong>callable target</strong> or by <strong>subclassing</strong> and overriding <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">run()</code>. </div>

1) Constructor (构造函数)

threading.Thread(
    group=None,      # reserved, always None
    target=None,     # callable to run in thread
    name=None,       # thread name string
    args=(),         # positional args tuple for target
    kwargs=None,     # keyword args dict for target
    daemon=None      # True → daemon thread (守护线程)
)

2) `Thread.start()` — Launch the thread

<span style="color:#2980B9">Schedules</span> the thread for execution. Must be called exactly once per Thread object.

import threading
import time

def worker(name, delay):
    time.sleep(delay)
    print(f"[{name}] finished after {delay}s")

t1 = threading.Thread(target=worker, args=("Alpha", 1))
t2 = threading.Thread(target=worker, args=("Beta",  2))

t1.start()   # ← launches t1
t2.start()   # ← launches t2 concurrently

print("Main thread continues immediately")
# Output order (non-deterministic):
# Main thread continues immediately
# [Alpha] finished after 1s
# [Beta]  finished after 2s

3) `Thread.join(timeout=None)` — Wait for completion (等待线程结束)

Blocks the calling thread until the target thread terminates, or ==until timeout seconds elapse.==

import threading, time

def slow_task():
    print("Task started")
    time.sleep(3)
    print("Task done")

t = threading.Thread(target=slow_task)
t.start()

t.join(timeout=5)   # wait up to 5 seconds

if t.is_alive():
    print("Thread still running after timeout!")
else:
    print("Thread completed successfully")
# → Task started
# → Task done
# → Thread completed successfully

4) `Thread.is_alive()` — Check thread status (检查线程状态)

Returns True between start() and thread termination.

import threading, time

def task():
    time.sleep(2)

t = threading.Thread(target=task)
print(t.is_alive())   # → False  (not started yet)
t.start()
print(t.is_alive())   # → True   (running)
t.join()
print(t.is_alive())   # → False  (terminated)

5) `Thread.name` / `Thread.getName()` / `Thread.setName()` — Thread name (线程名)

import threading

def task():
    # Access name inside the thread
    print(f"Running as: {threading.current_thread().name}")

t = threading.Thread(target=task, name="WorkerThread-1")
print(t.name)          # → WorkerThread-1
t.setName("Renamed")
print(t.getName())     # → Renamed
t.start()
t.join()
# → Running as: Renamed

6) `Thread.daemon` — Daemon threads (守护线程)

<span style="color:#C0392B;font-weight:600">A daemon thread is automatically killed when ALL non-daemon threads exit — it does NOT block program shutdown.</span>

import threading, time

def background_monitor():
    while True:
        print("[Monitor] heartbeat")
        time.sleep(1)

# Must set daemon BEFORE start()
monitor = threading.Thread(target=background_monitor, daemon=True)
monitor.start()

print("Main: doing work")
time.sleep(2.5)
print("Main: exiting — monitor will be killed automatically")
# → [Monitor] heartbeat
# → Main: doing work
# → [Monitor] heartbeat
# → [Monitor] heartbeat
# → Main: exiting — monitor will be killed automatically

7) `Thread.ident` / `Thread.native_id` — Thread identifiers (线程标识符)

import threading

def show_ids():
    t = threading.current_thread()
    print(f"ident={t.ident}, native_id={t.native_id}")

t = threading.Thread(target=show_ids)
t.start()
t.join()
# → ident=140234567890, native_id=12345

print(f"Main ident: {threading.main_thread().ident}")

8) Subclass Pattern — Override `run()` (子类模式)

import threading, time

class DownloadThread(threading.Thread):
    """Custom thread that downloads a resource."""

    def __init__(self, url: str):
        super().__init__(name=f"Download-{url}")
        self.url    = url
        self.result = None

    def run(self):
        # Simulate download
        time.sleep(0.5)
        self.result = f"<html from {self.url}>"
        print(f"Downloaded: {self.url}")

threads = [DownloadThread(f"http://example.com/page{i}") for i in range(3)]

for t in threads:
    t.start()

for t in threads:
    t.join()
    print(f"Result: {t.result}")

2. Lock — Mutual Exclusion (互斥锁)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Lock (互斥锁)</span> ensures only ONE thread accesses a critical section (临界区) at a time. It has two states: <span style="color:#2980B9">locked</span> and <span style="color:#2980B9">unlocked</span>. </div>

1) `Lock.acquire(blocking=True, timeout=-1)` / `Lock.release()`

import threading

counter = 0
lock    = threading.Lock()

def increment(n):
    global counter
    for _ in range(n):
        lock.acquire()     # ← blocks until lock is free
        counter += 1       # critical section (临界区)
        lock.release()     # ← always release!

threads = [threading.Thread(target=increment, args=(100_000,)) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

print(f"Counter: {counter}")   # → Counter: 500000  (always correct)

2) Context Manager — `with lock` (上下文管理器)

import threading

shared_list = []
lock = threading.Lock()

def safe_append(value):
    with lock:                     # ← acquire on entry, release on exit (even on exception)
        shared_list.append(value)

threads = [threading.Thread(target=safe_append, args=(i,)) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()

print(sorted(shared_list))   # → [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

3) `Lock.acquire(blocking=False)` — Non-blocking try (非阻塞尝试)

import threading, time

lock = threading.Lock()

def try_lock(name):
    acquired = lock.acquire(blocking=False)
    if acquired:
        print(f"[{name}] acquired the lock")
        time.sleep(2)
        lock.release()
    else:
        print(f"[{name}] could not acquire — skipping")

t1 = threading.Thread(target=try_lock, args=("T1",))
t2 = threading.Thread(target=try_lock, args=("T2",))
t1.start(); t2.start()
t1.join();  t2.join()
# → [T1] acquired the lock
# → [T2] could not acquire — skipping

4) `Lock.acquire(timeout=N)` — Timed wait (超时等待)

import threading, time

lock = threading.Lock()
lock.acquire()   # pre-lock it

def worker():
    result = lock.acquire(timeout=1.5)   # wait max 1.5s
    if result:
        print("Got the lock")
        lock.release()
    else:
        print("Timed out waiting for lock")

t = threading.Thread(target=worker)
t.start()
t.join()
# → Timed out waiting for lock   (lock was never released)

5) `Lock.locked()` — Query state (查询状态)

import threading

lock = threading.Lock()
print(lock.locked())   # → False

lock.acquire()
print(lock.locked())   # → True

lock.release()
print(lock.locked())   # → False

3. RLock — Reentrant Lock (可重入锁)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">RLock (可重入锁)</span> can be acquired multiple times by the <em>same thread</em> without deadlocking. It tracks an internal <span style="color:#2980B9">recursion count (递归计数)</span> — the lock is only released when the count reaches zero. </div>

1) Basic RLock usage

import threading

rlock = threading.RLock()

def outer():
    with rlock:                   # recursion count → 1
        print("outer acquired")
        inner()                   # same thread acquires again
        print("outer releasing")
    # recursion count → 0 (fully released)

def inner():
    with rlock:                   # recursion count → 2
        print("inner acquired")
    # recursion count → 1

t = threading.Thread(target=outer)
t.start(); t.join()
# → outer acquired
# → inner acquired
# → outer releasing

2) RLock in a class (类中使用RLock)

import threading

class BankAccount:
    def __init__(self, balance: float):
        self.balance = balance
        self._lock   = threading.RLock()

    def deposit(self, amount: float):
        with self._lock:
            self.balance += amount
            print(f"Deposited {amount:.2f} → balance={self.balance:.2f}")

    def withdraw(self, amount: float):
        with self._lock:
            self.balance -= amount
            print(f"Withdrew  {amount:.2f} → balance={self.balance:.2f}")

    def transfer_in(self, amount: float):
        with self._lock:            # outer acquire
            self.deposit(amount)   # inner acquire (reentrant!)
            print(f"Transfer complete")

account = BankAccount(1000.0)
t = threading.Thread(target=account.transfer_in, args=(250.0,))
t.start(); t.join()
# → Deposited 250.00 → balance=1250.00
# → Transfer complete

4. Condition — Wait/Notify Pattern (条件变量)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Condition (条件变量)</span> allows threads to <span style="color:#2980B9">wait</span> for a specific condition to become true and <span style="color:#2980B9">notify</span> other threads when it does. It wraps an underlying lock. </div>

1) `Condition.wait()` / `notify()` / `notify_all()`

import threading, time, collections

# Classic Producer-Consumer (生产者-消费者) pattern
buffer    = collections.deque()
MAX_SIZE  = 3
condition = threading.Condition()

def producer():
    for i in range(6):
        with condition:
            while len(buffer) >= MAX_SIZE:
                print(f"Producer waiting — buffer full")
                condition.wait()           # ← releases lock, blocks
            buffer.append(i)
            print(f"Produced {i}  | buffer={list(buffer)}")
            condition.notify_all()        # ← wake waiting consumers
        time.sleep(0.3)

def consumer(name):
    for _ in range(3):
        with condition:
            while not buffer:
                print(f"[{name}] waiting — buffer empty")
                condition.wait()           # ← releases lock, blocks
            item = buffer.popleft()
            print(f"[{name}] consumed {item} | buffer={list(buffer)}")
            condition.notify_all()        # ← wake waiting producer

threads = [
    threading.Thread(target=producer),
    threading.Thread(target=consumer, args=("C1",)),
    threading.Thread(target=consumer, args=("C2",)),
]
for t in threads: t.start()
for t in threads: t.join()

2) `Condition.wait(timeout=N)` — Timed wait

import threading, time

condition = threading.Condition()
data_ready = False

def waiter():
    with condition:
        result = condition.wait(timeout=2.0)   # wait max 2 seconds
        if result:
            print("Condition met!")
        else:
            print("Timed out — condition never triggered")

def notifier():
    time.sleep(5)   # too slow
    with condition:
        condition.notify()

t1 = threading.Thread(target=waiter)
t2 = threading.Thread(target=notifier)
t1.start(); t2.start()
t1.join();  t2.join()
# → Timed out — condition never triggered

3) `Condition.wait_for(predicate, timeout=None)` — Predicate wait

import threading, time

items  = []
cond   = threading.Condition()

def consumer():
    with cond:
        # Block until at least 3 items are available
        cond.wait_for(lambda: len(items) >= 3)
        print(f"Got items: {items}")

def producer():
    for i in range(5):
        time.sleep(0.5)
        with cond:
            items.append(i)
            print(f"Added item {i}")
            cond.notify_all()

t1 = threading.Thread(target=consumer)
t2 = threading.Thread(target=producer)
t1.start(); t2.start()
t1.join();  t2.join()
# → Added item 0
# → Added item 1
# → Added item 2
# → Got items: [0, 1, 2]

5. Semaphore & BoundedSemaphore (信号量)

1) `Semaphore(value=1)` — Connection pool simulation (连接池模拟)

import threading, time, random

# Allow max 3 simultaneous DB connections
db_semaphore = threading.Semaphore(3)

def use_db_connection(thread_id):
    print(f"Thread {thread_id}: waiting for DB connection")
    with db_semaphore:                      # acquire (count -1)
        print(f"Thread {thread_id}: got connection")
        time.sleep(random.uniform(0.5, 1.5))
        print(f"Thread {thread_id}: released connection")
                                            # release (count +1) on exit

threads = [threading.Thread(target=use_db_connection, args=(i,)) for i in range(7)]
for t in threads: t.start()
for t in threads: t.join()
# At most 3 "got connection" lines active at any time

2) `BoundedSemaphore` — Prevent over-release (防止超额释放)

<span style="color:#C0392B;font-weight:600">Warning: a plain Semaphore allows release() beyond the initial value — this is usually a bug. BoundedSemaphore raises ValueError if the count would exceed the initial value.</span>

import threading

sem   = threading.Semaphore(2)
bsem  = threading.BoundedSemaphore(2)

# Plain Semaphore — silently over-releases
sem.release()   # count goes to 3 — no error (潜在bug)
print(f"Semaphore value after over-release: OK (silent)")

# BoundedSemaphore — raises ValueError
try:
    bsem.release()   # count would exceed 2
except ValueError as e:
    print(f"BoundedSemaphore caught: {e}")
# → BoundedSemaphore caught: Semaphore released too many times

3) Rate limiter pattern (限速器模式)

import threading, time

# Limit to 2 concurrent API calls
api_semaphore = threading.BoundedSemaphore(2)

def call_api(endpoint):
    with api_semaphore:
        print(f"Calling {endpoint}")
        time.sleep(1)   # simulate API latency
        print(f"Done    {endpoint}")

endpoints = [f"/api/resource/{i}" for i in range(6)]
threads   = [threading.Thread(target=call_api, args=(ep,)) for ep in endpoints]

for t in threads: t.start()
for t in threads: t.join()

6. Event — Simple Flag Signaling (事件信号)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> An <span style="color:#E8600A;font-weight:700">Event (事件)</span> is a simple boolean flag. Threads can <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">wait()</code> until the flag is set, and any thread can <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">set()</code> or <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">clear()</code> it. </div>

1) `Event.set()` / `Event.clear()` / `Event.wait()` / `Event.is_set()`

import threading, time

start_event = threading.Event()

def worker(name):
    print(f"[{name}] waiting for start signal...")
    start_event.wait()               # blocks until event is set
    print(f"[{name}] GO! Starting work")

workers = [threading.Thread(target=worker, args=(f"W{i}",)) for i in range(4)]
for w in workers: w.start()

print("Main: preparing...")
time.sleep(2)
print("Main: firing start signal!")
start_event.set()                    # wake ALL waiting threads at once

for w in workers: w.join()
# → [W0] waiting for start signal...
# → [W1] waiting for start signal...
# → [W2] waiting for start signal...
# → [W3] waiting for start signal...
# (2s pause)
# → Main: firing start signal!
# → [W0] GO! Starting work    (all 4 unblock simultaneously)

2) `Event.wait(timeout=N)` — Timed wait

import threading, time

ready = threading.Event()

def service():
    print("Service: initializing (takes 3s)...")
    time.sleep(3)
    ready.set()
    print("Service: ready!")

def client():
    if ready.wait(timeout=1.5):    # only wait 1.5s
        print("Client: connected!")
    else:
        print("Client: service not ready in time, aborting")

t1 = threading.Thread(target=service)
t2 = threading.Thread(target=client)
t1.start(); t2.start()
t1.join();  t2.join()
# → Service: initializing (takes 3s)...
# → Client: service not ready in time, aborting
# → Service: ready!

3) Stop signal pattern (停止信号模式)

import threading, time

stop_event = threading.Event()

def background_worker():
    count = 0
    while not stop_event.is_set():    # check flag each iteration
        print(f"Working... iteration {count}")
        count += 1
        time.sleep(0.5)
    print("Worker: received stop signal, exiting cleanly")

t = threading.Thread(target=background_worker)
t.start()

time.sleep(2)
print("Main: sending stop signal")
stop_event.set()
t.join()

7. Timer — Delayed Execution (延迟执行)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">threading.Timer</code> is a subclass of <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Thread</code> that executes a function after a specified delay. It can be <span style="color:#2980B9">cancelled</span> before firing. </div>

1) Basic Timer

import threading

def reminder(message):
    print(f"⏰ Reminder: {message}")

# Fire after 3 seconds
t = threading.Timer(3.0, reminder, args=("Meeting at 3pm!",))
t.start()

print("Timer set. Waiting...")
t.join()
# → Timer set. Waiting...
# (3s pause)
# → ⏰ Reminder: Meeting at 3pm!

2) `Timer.cancel()` — Cancel before firing

import threading, time

fired = False

def action():
    global fired
    fired = True
    print("Action fired!")

t = threading.Timer(5.0, action)
t.start()

time.sleep(1)
t.cancel()    # ← cancel within the window
t.join()

print(f"Action fired: {fired}")   # → Action fired: False

3) Repeating timer pattern (重复定时器模式)

import threading

class RepeatingTimer:
    """Fires a function every `interval` seconds."""

    def __init__(self, interval: float, func, *args):
        self.interval = interval
        self.func     = func
        self.args     = args
        self._timer   = None
        self._running = False

    def _run(self):
        self.func(*self.args)
        if self._running:
            self._schedule()

    def _schedule(self):
        self._timer = threading.Timer(self.interval, self._run)
        self._timer.daemon = True
        self._timer.start()

    def start(self):
        self._running = True
        self._schedule()

    def stop(self):
        self._running = False
        if self._timer:
            self._timer.cancel()

import time

counter = [0]
def tick():
    counter[0] += 1
    print(f"Tick #{counter[0]}")

rt = RepeatingTimer(0.5, tick)
rt.start()
time.sleep(2.5)
rt.stop()
print(f"Total ticks: {counter[0]}")   # → Total ticks: 5

8. Barrier — Thread Synchronization Point (屏障同步点)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Barrier (屏障)</span> makes a fixed number of threads wait at a rendezvous point until ALL of them arrive — then releases all of them simultaneously. </div>

1) `Barrier(parties, action=None, timeout=None)`

import threading, time, random

NUM_WORKERS = 4
barrier = threading.Barrier(NUM_WORKERS)

def phase_worker(name):
    # Phase 1
    duration = random.uniform(0.5, 2.0)
    print(f"[{name}] phase 1 working for {duration:.1f}s")
    time.sleep(duration)
    print(f"[{name}] phase 1 done — waiting at barrier")

    barrier.wait()     # ← all threads block here until all 4 arrive

    print(f"[{name}] phase 2 starting (all threads released together)")

threads = [threading.Thread(target=phase_worker, args=(f"W{i}",))
           for i in range(NUM_WORKERS)]
for t in threads: t.start()
for t in threads: t.join()

2) `Barrier` with `action` callback

import threading, time

def setup_phase():
    """Runs ONCE when all threads reach the barrier, before release."""
    print(">>> All threads ready — running barrier action <<<")

barrier = threading.Barrier(3, action=setup_phase)

def worker(name):
    time.sleep(0.1)
    print(f"[{name}] arrived at barrier")
    barrier.wait()
    print(f"[{name}] past barrier")

threads = [threading.Thread(target=worker, args=(f"T{i}",)) for i in range(3)]
for t in threads: t.start()
for t in threads: t.join()

3) `Barrier.abort()` / `BrokenBarrierError`

import threading, time

barrier = threading.Barrier(3)

def risky_worker(name, should_abort):
    try:
        if should_abort:
            time.sleep(0.2)
            print(f"[{name}] aborting barrier!")
            barrier.abort()          # breaks the barrier for everyone
        else:
            print(f"[{name}] waiting at barrier...")
            barrier.wait(timeout=2)
            print(f"[{name}] passed!")
    except threading.BrokenBarrierError:
        print(f"[{name}] barrier was broken — handling gracefully")

threads = [
    threading.Thread(target=risky_worker, args=("T0", False)),
    threading.Thread(target=risky_worker, args=("T1", False)),
    threading.Thread(target=risky_worker, args=("T2", True)),   # aborts
]
for t in threads: t.start()
for t in threads: t.join()

4) Barrier properties

import threading

b = threading.Barrier(5)
print(b.parties)    # → 5   (total threads needed)
print(b.n_waiting)  # → 0   (currently waiting)
print(b.broken)     # → False

9. local — Thread-local Storage (线程本地存储)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">threading.local()</code> creates an object where each thread has its <span style="color:#E8600A;font-weight:700">own independent copy</span> of every attribute. Ideal for thread-specific state like database connections or request contexts. </div>

1) Basic thread-local usage

import threading

local_data = threading.local()

def worker(value):
    local_data.x = value              # each thread sets its own .x
    import time; time.sleep(0.1)      # let other threads run
    print(f"Thread {threading.current_thread().name}: x = {local_data.x}")

threads = [threading.Thread(target=worker, args=(i*10,), name=f"T{i}")
           for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()
# → Thread T0: x = 0
# → Thread T1: x = 10
# → Thread T2: x = 20
# → Thread T3: x = 30
# (each thread sees only its own value — no interference)

2) Thread-local DB connection pattern

import threading
import sqlite3

_local = threading.local()

def get_connection(db_path: str) -> sqlite3.Connection:
    """Return a per-thread DB connection (创建线程私有数据库连接)."""
    if not hasattr(_local, "conn"):
        _local.conn = sqlite3.connect(db_path)
        print(f"[{threading.current_thread().name}] created new connection")
    return _local.conn

def db_worker(db_path: str):
    conn = get_connection(db_path)
    conn.execute("CREATE TABLE IF NOT EXISTS t (v INTEGER)")
    conn.execute("INSERT INTO t VALUES (?)", (threading.get_ident(),))
    conn.commit()
    print(f"[{threading.current_thread().name}] inserted row")

threads = [threading.Thread(target=db_worker, args=(":memory:",), name=f"DB-{i}")
           for i in range(3)]
for t in threads: t.start()
for t in threads: t.join()

3) Subclass `local` for initialization

import threading

class RequestContext(threading.local):
    """Thread-local request context with defaults."""
    def __init__(self):
        super().__init__()
        self.user_id   = None
        self.request_id = None

ctx = RequestContext()

def handle_request(user_id, req_id):
    ctx.user_id    = user_id
    ctx.request_id = req_id
    import time; time.sleep(0.05)
    print(f"Processing request {ctx.request_id} for user {ctx.user_id}")

threads = [threading.Thread(target=handle_request, args=(f"user{i}", f"req-{i:03}"))
           for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()

10. Module-level Functions (模块级函数)

1) `threading.current_thread()` — Get the current thread

import threading

def show_self():
    t = threading.current_thread()
    print(f"name={t.name}, ident={t.ident}, daemon={t.daemon}")

main_t = threading.current_thread()
print(f"Main thread: {main_t.name}")

t = threading.Thread(target=show_self, name="MyWorker")
t.start(); t.join()
# → Main thread: MainThread
# → name=MyWorker, ident=140..., daemon=False

2) `threading.main_thread()` — Get the main thread

import threading

def check_main():
    mt = threading.main_thread()
    ct = threading.current_thread()
    print(f"Main thread: {mt.name}")
    print(f"This thread: {ct.name}")
    print(f"Am I main?  {ct is mt}")

t = threading.Thread(target=check_main)
t.start(); t.join()
# → Main thread: MainThread
# → This thread: Thread-1
# → Am I main?  False

3) `threading.active_count()` — Count live threads

import threading, time

def slow():
    time.sleep(2)

print(threading.active_count())   # → 1  (main only)

threads = [threading.Thread(target=slow) for _ in range(3)]
for t in threads: t.start()

print(threading.active_count())   # → 4  (main + 3 workers)
for t in threads: t.join()
print(threading.active_count())   # → 1

4) `threading.enumerate()` — List all live threads

import threading, time

def task(n):
    time.sleep(n)

threads = [threading.Thread(target=task, args=(i,), name=f"T{i}") for i in range(1,4)]
for t in threads: t.start()

for t in threading.enumerate():
    print(f"  alive: {t.name} | daemon={t.daemon}")
# → alive: MainThread | daemon=False
# → alive: T1        | daemon=False
# → alive: T2        | daemon=False
# → alive: T3        | daemon=False

for t in threads: t.join()

5) `threading.settrace(func)` / `threading.setprofile(func)` — Thread hooks

import threading, sys

def my_tracer(frame, event, arg):
    if event == "call":
        print(f"[TRACE] calling {frame.f_code.co_name}")
    return my_tracer

def task():
    x = 1 + 1
    return x

threading.settrace(my_tracer)    # set trace for ALL future threads
t = threading.Thread(target=task)
t.start(); t.join()
threading.settrace(None)         # remove tracer

6) `threading.stack_size(size=0)` — Set thread stack size

import threading

# Set stack size to 512 KB for all future threads
threading.stack_size(512 * 1024)
print(f"Stack size: {threading.stack_size()} bytes")

def task():
    print(f"Running with custom stack size")

t = threading.Thread(target=task)
t.start(); t.join()

threading.stack_size(0)   # reset to default

7) `threading.excepthook` — Handle uncaught thread exceptions (未捕获异常处理)

import threading

def custom_excepthook(args):
    print(f"Uncaught exception in thread [{args.thread.name}]:")
    print(f"  Type:    {args.exc_type.__name__}")
    print(f"  Message: {args.exc_value}")

threading.excepthook = custom_excepthook

def buggy_task():
    raise ValueError("Something went wrong in thread!")

t = threading.Thread(target=buggy_task, name="BuggyThread")
t.start(); t.join()
# → Uncaught exception in thread [BuggyThread]:
# →   Type:    ValueError
# →   Message: Something went wrong in thread!

8) `threading.get_ident()` / `threading.get_native_id()`

import threading

def show_ids():
    print(f"Python ident:    {threading.get_ident()}")
    print(f"OS native id:    {threading.get_native_id()}")

t = threading.Thread(target=show_ids)
t.start(); t.join()

11. queue Module — Thread-safe Queues (线程安全队列)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> The <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">queue</code> module provides three thread-safe queue classes: <span style="color:#E8600A;font-weight:700">Queue (FIFO)</span>, <span style="color:#E8600A;font-weight:700">LifoQueue (LIFO/stack)</span>, and <span style="color:#E8600A;font-weight:700">PriorityQueue (优先队列)</span>. All use internal locks, so no external synchronization is needed. </div>

1) `Queue(maxsize=0)` — FIFO Queue

from queue import Queue
import threading, time

q = Queue(maxsize=3)

def producer():
    for i in range(6):
        q.put(i)          # blocks if queue is full (maxsize reached)
        print(f"Put {i}  | qsize={q.qsize()}")
        time.sleep(0.2)

def consumer():
    for _ in range(6):
        item = q.get()    # blocks if queue is empty
        print(f"Got {item}")
        q.task_done()
        time.sleep(0.5)

t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)
t1.start(); t2.start()
t1.join();  t2.join()

2) `Queue.put_nowait()` / `Queue.get_nowait()` — Non-blocking

from queue import Queue, Full, Empty

q = Queue(maxsize=2)
q.put("item1")
q.put("item2")

try:
    q.put_nowait("item3")     # queue full!
except Full:
    print("Queue full — item3 dropped")

try:
    while True:
        print(q.get_nowait())
except Empty:
    print("Queue emptied")
# → Queue full — item3 dropped
# → item1
# → item2
# → Queue emptied

3) `Queue.join()` / `Queue.task_done()` — Work tracking

from queue import Queue
import threading

work_queue = Queue()

def worker():
    while True:
        task = work_queue.get()
        if task is None:
            break
        print(f"Processing: {task}")
        work_queue.task_done()   # signal this task is complete

# Start 3 workers
workers = [threading.Thread(target=worker, daemon=True) for _ in range(3)]
for w in workers: w.start()

# Enqueue tasks
for task in ["task_A", "task_B", "task_C", "task_D", "task_E"]:
    work_queue.put(task)

work_queue.join()   # blocks until ALL task_done() called
print("All tasks completed!")

4) `LifoQueue` — Stack (栈/后进先出)

from queue import LifoQueue

stack = LifoQueue()
stack.put("first")
stack.put("second")
stack.put("third")

while not stack.empty():
    print(stack.get())
# → third
# → second
# → first

5) `PriorityQueue` — Priority-based processing (优先级队列)

from queue import PriorityQueue
import threading, time

pq = PriorityQueue()

# (priority, task_name) — lower number = higher priority
pq.put((3, "low-priority task"))
pq.put((1, "URGENT task"))
pq.put((2, "medium task"))
pq.put((1, "another URGENT task"))

while not pq.empty():
    priority, task = pq.get()
    print(f"[priority={priority}] Processing: {task}")
# → [priority=1] Processing: URGENT task
# → [priority=1] Processing: another URGENT task
# → [priority=2] Processing: medium task
# → [priority=3] Processing: low-priority task

12. ThreadPoolExecutor — High-level Thread Pool (高级线程池)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">concurrent.futures.ThreadPoolExecutor</code> provides a high-level, <span style="color:#2980B9">Future-based (Future对象)</span> interface for thread pools. It is the <span style="color:#E8600A;font-weight:700">recommended way</span> to run IO-bound tasks in modern Python. </div>

1) `submit()` → Future

from concurrent.futures import ThreadPoolExecutor
import time

def fetch_data(url: str) -> str:
    time.sleep(1)   # simulate network call
    return f"<data from {url}>"

urls = [f"http://example.com/page{i}" for i in range(5)]

with ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(fetch_data, url) for url in urls]

    for future in futures:
        result = future.result()   # blocks until this future completes
        print(result)

2) `map()` — Parallel map (并行映射)

from concurrent.futures import ThreadPoolExecutor
import time

def square(n):
    time.sleep(0.2)
    return n * n

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(square, range(10)))

print(results)   # → [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

3) `Future` API — `done()`, `cancel()`, `add_done_callback()`

from concurrent.futures import ThreadPoolExecutor
import time

def slow_task(n):
    time.sleep(n)
    return f"result-{n}"

def on_done(future):
    print(f"Callback: task finished → {future.result()}")

with ThreadPoolExecutor(max_workers=2) as executor:
    f1 = executor.submit(slow_task, 1)
    f2 = executor.submit(slow_task, 2)

    f1.add_done_callback(on_done)    # register callback
    f2.add_done_callback(on_done)

    print(f"f1 done: {f1.done()}")   # likely False (still running)
    time.sleep(1.5)
    print(f"f1 done: {f1.done()}")   # → True

4) `as_completed()` — Process in completion order (按完成顺序处理)

from concurrent.futures import ThreadPoolExecutor, as_completed
import time, random

def task(n):
    delay = random.uniform(0.1, 1.0)
    time.sleep(delay)
    return (n, delay)

with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(task, i): i for i in range(8)}

    for future in as_completed(futures):
        task_id = futures[future]
        n, delay = future.result()
        print(f"Task {n} finished in {delay:.2f}s")
# Tasks print in the order they complete, not submission order

5) Exception handling in futures (Future异常处理)

from concurrent.futures import ThreadPoolExecutor

def risky(x):
    if x == 3:
        raise ValueError(f"Bad input: {x}")
    return x * 2

with ThreadPoolExecutor(max_workers=2) as executor:
    futures = [executor.submit(risky, i) for i in range(5)]

for i, f in enumerate(futures):
    try:
        print(f"Result {i}: {f.result()}")
    except ValueError as e:
        print(f"Result {i}: ERROR — {e}")
# → Result 0: 0
# → Result 1: 2
# → Result 2: 4
# → Result 3: ERROR — Bad input: 3
# → Result 4: 8

13. Common Patterns & Pitfalls (常见模式与陷阱)

1) Race condition example (竞态条件示例)

import threading

counter = 0   # UNSAFE shared state

def unsafe_increment():
    global counter
    for _ in range(100_000):
        counter += 1   # NOT atomic! (read-modify-write)

threads = [threading.Thread(target=unsafe_increment) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

print(f"Expected: 500000")
print(f"Actual:   {counter}")   # likely LESS than 500000 — data race!

2) Deadlock example + fix (死锁示例及修复)

import threading

lock_a = threading.Lock()
lock_b = threading.Lock()

# ─── DEADLOCK version ────────────────────────────────
def thread1_deadlock():
    with lock_a:
        import time; time.sleep(0.1)
        with lock_b:                  # waits for lock_b
            print("T1: got both locks")

def thread2_deadlock():
    with lock_b:
        import time; time.sleep(0.1)
        with lock_a:                  # waits for lock_a → DEADLOCK
            print("T2: got both locks")

# ─── FIXED version: always acquire locks in the same order ──
def thread1_safe():
    with lock_a:                      # acquire A first
        with lock_b:                  # then B
            print("T1 safe: got both locks")

def thread2_safe():
    with lock_a:                      # acquire A first (same order!)
        with lock_b:
            print("T2 safe: got both locks")

t1 = threading.Thread(target=thread1_safe)
t2 = threading.Thread(target=thread2_safe)
t1.start(); t2.start()
t1.join();  t2.join()
# → T1 safe: got both locks
# → T2 safe: got both locks

3) Thread-safe singleton (线程安全单例)

import threading

class Singleton:
    _instance = None
    _lock     = threading.Lock()

    def __new__(cls):
        if cls._instance is None:              # first check (no lock)
            with cls._lock:
                if cls._instance is None:      # second check (with lock)
                    cls._instance = super().__new__(cls)
                    print("Singleton created")
        return cls._instance

def get_instance():
    s = Singleton()
    print(f"Got instance: {id(s)}")

threads = [threading.Thread(target=get_instance) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
# → Singleton created   (exactly once)
# → Got instance: 140...  (same id for all 5 threads)

14. Full API Quick Reference (API速查表)

Class / Function	Key Methods	Purpose
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Thread</code>	`start()` `join()` `is_alive()`	Create and manage threads
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code>	`acquire()` `release()` `locked()`	Mutual exclusion
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">RLock</code>	`acquire()` `release()`	Reentrant mutual exclusion
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Condition</code>	`wait()` `wait_for()` `notify()` `notify_all()`	Wait/notify synchronization
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Semaphore</code>	`acquire()` `release()`	Limit concurrent access
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">BoundedSemaphore</code>	`acquire()` `release()`	Semaphore with over-release guard
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code>	`set()` `clear()` `wait()` `is_set()`	Boolean flag signaling
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Timer</code>	`start()` `cancel()`	Delayed / cancellable execution
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Barrier</code>	`wait()` `abort()` `reset()`	N-thread rendezvous point
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">local</code>	attribute access	Per-thread storage
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Queue</code>	`put()` `get()` `task_done()` `join()`	Thread-safe FIFO queue
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">LifoQueue</code>	`put()` `get()`	Thread-safe stack
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">PriorityQueue</code>	`put()` `get()`	Thread-safe priority queue
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ThreadPoolExecutor</code>	`submit()` `map()` `shutdown()`	High-level thread pool
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">current_thread()</code>	—	Get current Thread object
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">active_count()</code>	—	Count live threads
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">enumerate()</code>	—	List all live threads
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">excepthook</code>	—	Handle uncaught thread exceptions

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> Python threading excels at <span style="color:#E8600A;font-weight:700">IO-bound concurrency (IO密集型并发)</span>: use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ThreadPoolExecutor</code> for simple task pools, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Queue</code> for producer-consumer pipelines, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code>/<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">RLock</code> for shared state, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code> for signaling, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Semaphore</code> for resource pools, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Barrier</code> for multi-phase synchronization — always protect shared mutable state to avoid <span style="color:#C0392B;font-weight:600">Race Conditions (竞态条件)</span> and <span style="color:#C0392B;font-weight:600">Deadlocks (死锁)</span>. </div>

I. Python Multithreading — Complete API Reference Manual

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Python's <span style="color:#E8600A;font-weight:700">threading</span> module provides a high-level interface for <span style="color:#E8600A;font-weight:700">Multithreading (多线程编程)</span> built on top of the lower-level <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">_thread</code> module. Because of the <span style="color:#E8600A;font-weight:700">GIL (Global Interpreter Lock, 全局解释器锁)</span>, threads do not achieve true CPU parallelism for pure Python code — but they excel at <span style="color:#2980B9">IO-bound tasks (IO密集型任务)</span> such as network requests, file operations, and database calls. This manual covers every public API with runnable examples. </div>

1. Thread — Core Thread Object (核心线程对象)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">threading.Thread</code> is the fundamental building block. A thread can be created by passing a <strong>callable target</strong> or by <strong>subclassing</strong> and overriding <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">run()</code>. </div>

1) Constructor (构造函数)

threading.Thread(
    group=None,      # reserved, always None
    target=None,     # callable to run in thread
    name=None,       # thread name string
    args=(),         # positional args tuple for target
    kwargs=None,     # keyword args dict for target
    daemon=None      # True → daemon thread (守护线程)
)

2) `Thread.start()` — Launch the thread

<span style="color:#2980B9">Schedules</span> the thread for execution. Must be called exactly once per Thread object.

import threading
import time

def worker(name, delay):
    time.sleep(delay)
    print(f"[{name}] finished after {delay}s")

t1 = threading.Thread(target=worker, args=("Alpha", 1))
t2 = threading.Thread(target=worker, args=("Beta",  2))

t1.start()   # ← launches t1
t2.start()   # ← launches t2 concurrently

print("Main thread continues immediately")
# Output order (non-deterministic):
# Main thread continues immediately
# [Alpha] finished after 1s
# [Beta]  finished after 2s

3) `Thread.join(timeout=None)` — Wait for completion (等待线程结束)

Blocks the calling thread until the target thread terminates, or until timeout seconds elapse.

import threading, time

def slow_task():
    print("Task started")
    time.sleep(3)
    print("Task done")

t = threading.Thread(target=slow_task)
t.start()

t.join(timeout=5)   # wait up to 5 seconds

if t.is_alive():
    print("Thread still running after timeout!")
else:
    print("Thread completed successfully")
# → Task started
# → Task done
# → Thread completed successfully

4) `Thread.is_alive()` — Check thread status (检查线程状态)

Returns True between start() and thread termination.

import threading, time

def task():
    time.sleep(2)

t = threading.Thread(target=task)
print(t.is_alive())   # → False  (not started yet)
t.start()
print(t.is_alive())   # → True   (running)
t.join()
print(t.is_alive())   # → False  (terminated)

5) `Thread.name` / `Thread.getName()` / `Thread.setName()` — Thread name (线程名)

import threading

def task():
    # Access name inside the thread
    print(f"Running as: {threading.current_thread().name}")

t = threading.Thread(target=task, name="WorkerThread-1")
print(t.name)          # → WorkerThread-1
t.setName("Renamed")
print(t.getName())     # → Renamed
t.start()
t.join()
# → Running as: Renamed

6) `Thread.daemon` — Daemon threads (守护线程)

<span style="color:#C0392B;font-weight:600">A daemon thread is automatically killed when ALL non-daemon threads exit — it does NOT block program shutdown.</span>

import threading, time

def background_monitor():
    while True:
        print("[Monitor] heartbeat")
        time.sleep(1)

# Must set daemon BEFORE start()
monitor = threading.Thread(target=background_monitor, daemon=True)
monitor.start()

print("Main: doing work")
time.sleep(2.5)
print("Main: exiting — monitor will be killed automatically")
# → [Monitor] heartbeat
# → Main: doing work
# → [Monitor] heartbeat
# → [Monitor] heartbeat
# → Main: exiting — monitor will be killed automatically

7) `Thread.ident` / `Thread.native_id` — Thread identifiers (线程标识符)

import threading

def show_ids():
    t = threading.current_thread()
    print(f"ident={t.ident}, native_id={t.native_id}")

t = threading.Thread(target=show_ids)
t.start()
t.join()
# → ident=140234567890, native_id=12345

print(f"Main ident: {threading.main_thread().ident}")

8) Subclass Pattern — Override `run()` (子类模式)

import threading, time

class DownloadThread(threading.Thread):
    """Custom thread that downloads a resource."""

    def __init__(self, url: str):
        super().__init__(name=f"Download-{url}")
        self.url    = url
        self.result = None

    def run(self):
        # Simulate download
        time.sleep(0.5)
        self.result = f"<html from {self.url}>"
        print(f"Downloaded: {self.url}")

threads = [DownloadThread(f"http://example.com/page{i}") for i in range(3)]

for t in threads:
    t.start()

for t in threads:
    t.join()
    print(f"Result: {t.result}")

2. Lock — Mutual Exclusion (互斥锁)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Lock (互斥锁)</span> ensures only ONE thread accesses a critical section (临界区) at a time. It has two states: <span style="color:#2980B9">locked</span> and <span style="color:#2980B9">unlocked</span>. </div>

1) `Lock.acquire(blocking=True, timeout=-1)` / `Lock.release()`

import threading

counter = 0
lock    = threading.Lock()

def increment(n):
    global counter
    for _ in range(n):
        lock.acquire()     # ← blocks until lock is free
        counter += 1       # critical section (临界区)
        lock.release()     # ← always release!

threads = [threading.Thread(target=increment, args=(100_000,)) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

print(f"Counter: {counter}")   # → Counter: 500000  (always correct)

2) Context Manager — `with lock` (上下文管理器)

import threading

shared_list = []
lock = threading.Lock()

def safe_append(value):
    with lock:                     # ← acquire on entry, release on exit (even on exception)
        shared_list.append(value)

threads = [threading.Thread(target=safe_append, args=(i,)) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()

print(sorted(shared_list))   # → [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

3) `Lock.acquire(blocking=False)` — Non-blocking try (非阻塞尝试)

import threading, time

lock = threading.Lock()

def try_lock(name):
    acquired = lock.acquire(blocking=False)
    if acquired:
        print(f"[{name}] acquired the lock")
        time.sleep(2)
        lock.release()
    else:
        print(f"[{name}] could not acquire — skipping")

t1 = threading.Thread(target=try_lock, args=("T1",))
t2 = threading.Thread(target=try_lock, args=("T2",))
t1.start(); t2.start()
t1.join();  t2.join()
# → [T1] acquired the lock
# → [T2] could not acquire — skipping

4) `Lock.acquire(timeout=N)` — Timed wait (超时等待)

import threading, time

lock = threading.Lock()
lock.acquire()   # pre-lock it

def worker():
    result = lock.acquire(timeout=1.5)   # wait max 1.5s
    if result:
        print("Got the lock")
        lock.release()
    else:
        print("Timed out waiting for lock")

t = threading.Thread(target=worker)
t.start()
t.join()
# → Timed out waiting for lock   (lock was never released)

5) `Lock.locked()` — Query state (查询状态)

import threading

lock = threading.Lock()
print(lock.locked())   # → False

lock.acquire()
print(lock.locked())   # → True

lock.release()
print(lock.locked())   # → False

3. RLock — Reentrant Lock (可重入锁)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">RLock (可重入锁)</span> can be acquired multiple times by the <em>same thread</em> without deadlocking. It tracks an internal <span style="color:#2980B9">recursion count (递归计数)</span> — the lock is only released when the count reaches zero. </div>

1) Basic RLock usage

import threading

rlock = threading.RLock()

def outer():
    with rlock:                   # recursion count → 1
        print("outer acquired")
        inner()                   # same thread acquires again
        print("outer releasing")
    # recursion count → 0 (fully released)

def inner():
    with rlock:                   # recursion count → 2
        print("inner acquired")
    # recursion count → 1

t = threading.Thread(target=outer)
t.start(); t.join()
# → outer acquired
# → inner acquired
# → outer releasing

2) RLock in a class (类中使用RLock)

import threading

class BankAccount:
    def __init__(self, balance: float):
        self.balance = balance
        self._lock   = threading.RLock()

    def deposit(self, amount: float):
        with self._lock:
            self.balance += amount
            print(f"Deposited {amount:.2f} → balance={self.balance:.2f}")

    def withdraw(self, amount: float):
        with self._lock:
            self.balance -= amount
            print(f"Withdrew  {amount:.2f} → balance={self.balance:.2f}")

    def transfer_in(self, amount: float):
        with self._lock:            # outer acquire
            self.deposit(amount)   # inner acquire (reentrant!)
            print(f"Transfer complete")

account = BankAccount(1000.0)
t = threading.Thread(target=account.transfer_in, args=(250.0,))
t.start(); t.join()
# → Deposited 250.00 → balance=1250.00
# → Transfer complete

4. Condition — Wait/Notify Pattern (条件变量)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Condition (条件变量)</span> allows threads to <span style="color:#2980B9">wait</span> for a specific condition to become true and <span style="color:#2980B9">notify</span> other threads when it does. It wraps an underlying lock. </div>

1) `Condition.wait()` / `notify()` / `notify_all()`

import threading, time, collections

# Classic Producer-Consumer (生产者-消费者) pattern
buffer    = collections.deque()
MAX_SIZE  = 3
condition = threading.Condition()

def producer():
    for i in range(6):
        with condition:
            while len(buffer) >= MAX_SIZE:
                print(f"Producer waiting — buffer full")
                condition.wait()           # ← releases lock, blocks
            buffer.append(i)
            print(f"Produced {i}  | buffer={list(buffer)}")
            condition.notify_all()        # ← wake waiting consumers
        time.sleep(0.3)

def consumer(name):
    for _ in range(3):
        with condition:
            while not buffer:
                print(f"[{name}] waiting — buffer empty")
                condition.wait()           # ← releases lock, blocks
            item = buffer.popleft()
            print(f"[{name}] consumed {item} | buffer={list(buffer)}")
            condition.notify_all()        # ← wake waiting producer

threads = [
    threading.Thread(target=producer),
    threading.Thread(target=consumer, args=("C1",)),
    threading.Thread(target=consumer, args=("C2",)),
]
for t in threads: t.start()
for t in threads: t.join()

2) `Condition.wait(timeout=N)` — Timed wait

import threading, time

condition = threading.Condition()
data_ready = False

def waiter():
    with condition:
        result = condition.wait(timeout=2.0)   # wait max 2 seconds
        if result:
            print("Condition met!")
        else:
            print("Timed out — condition never triggered")

def notifier():
    time.sleep(5)   # too slow
    with condition:
        condition.notify()

t1 = threading.Thread(target=waiter)
t2 = threading.Thread(target=notifier)
t1.start(); t2.start()
t1.join();  t2.join()
# → Timed out — condition never triggered

3) `Condition.wait_for(predicate, timeout=None)` — Predicate wait

import threading, time

items  = []
cond   = threading.Condition()

def consumer():
    with cond:
        # Block until at least 3 items are available
        cond.wait_for(lambda: len(items) >= 3)
        print(f"Got items: {items}")

def producer():
    for i in range(5):
        time.sleep(0.5)
        with cond:
            items.append(i)
            print(f"Added item {i}")
            cond.notify_all()

t1 = threading.Thread(target=consumer)
t2 = threading.Thread(target=producer)
t1.start(); t2.start()
t1.join();  t2.join()
# → Added item 0
# → Added item 1
# → Added item 2
# → Got items: [0, 1, 2]

5. Semaphore & BoundedSemaphore (信号量)

1) `Semaphore(value=1)` — Connection pool simulation (连接池模拟)

import threading, time, random

# Allow max 3 simultaneous DB connections
db_semaphore = threading.Semaphore(3)

def use_db_connection(thread_id):
    print(f"Thread {thread_id}: waiting for DB connection")
    with db_semaphore:                      # acquire (count -1)
        print(f"Thread {thread_id}: got connection")
        time.sleep(random.uniform(0.5, 1.5))
        print(f"Thread {thread_id}: released connection")
                                            # release (count +1) on exit

threads = [threading.Thread(target=use_db_connection, args=(i,)) for i in range(7)]
for t in threads: t.start()
for t in threads: t.join()
# At most 3 "got connection" lines active at any time

2) `BoundedSemaphore` — Prevent over-release (防止超额释放)

import threading

sem   = threading.Semaphore(2)
bsem  = threading.BoundedSemaphore(2)

# Plain Semaphore — silently over-releases
sem.release()   # count goes to 3 — no error (潜在bug)
print(f"Semaphore value after over-release: OK (silent)")

# BoundedSemaphore — raises ValueError
try:
    bsem.release()   # count would exceed 2
except ValueError as e:
    print(f"BoundedSemaphore caught: {e}")
# → BoundedSemaphore caught: Semaphore released too many times

3) Rate limiter pattern (限速器模式)

import threading, time

# Limit to 2 concurrent API calls
api_semaphore = threading.BoundedSemaphore(2)

def call_api(endpoint):
    with api_semaphore:
        print(f"Calling {endpoint}")
        time.sleep(1)   # simulate API latency
        print(f"Done    {endpoint}")

endpoints = [f"/api/resource/{i}" for i in range(6)]
threads   = [threading.Thread(target=call_api, args=(ep,)) for ep in endpoints]

for t in threads: t.start()
for t in threads: t.join()

6. Event — Simple Flag Signaling (事件信号)

1) `Event.set()` / `Event.clear()` / `Event.wait()` / `Event.is_set()`

import threading, time

start_event = threading.Event()

def worker(name):
    print(f"[{name}] waiting for start signal...")
    start_event.wait()               # blocks until event is set
    print(f"[{name}] GO! Starting work")

workers = [threading.Thread(target=worker, args=(f"W{i}",)) for i in range(4)]
for w in workers: w.start()

print("Main: preparing...")
time.sleep(2)
print("Main: firing start signal!")
start_event.set()                    # wake ALL waiting threads at once

for w in workers: w.join()
# → [W0] waiting for start signal...
# → [W1] waiting for start signal...
# → [W2] waiting for start signal...
# → [W3] waiting for start signal...
# (2s pause)
# → Main: firing start signal!
# → [W0] GO! Starting work    (all 4 unblock simultaneously)

2) `Event.wait(timeout=N)` — Timed wait

import threading, time

ready = threading.Event()

def service():
    print("Service: initializing (takes 3s)...")
    time.sleep(3)
    ready.set()
    print("Service: ready!")

def client():
    if ready.wait(timeout=1.5):    # only wait 1.5s
        print("Client: connected!")
    else:
        print("Client: service not ready in time, aborting")

t1 = threading.Thread(target=service)
t2 = threading.Thread(target=client)
t1.start(); t2.start()
t1.join();  t2.join()
# → Service: initializing (takes 3s)...
# → Client: service not ready in time, aborting
# → Service: ready!

3) Stop signal pattern (停止信号模式)

import threading, time

stop_event = threading.Event()

def background_worker():
    count = 0
    while not stop_event.is_set():    # check flag each iteration
        print(f"Working... iteration {count}")
        count += 1
        time.sleep(0.5)
    print("Worker: received stop signal, exiting cleanly")

t = threading.Thread(target=background_worker)
t.start()

time.sleep(2)
print("Main: sending stop signal")
stop_event.set()
t.join()

7. Timer — Delayed Execution (延迟执行)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">threading.Timer</code> is a subclass of <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Thread</code> that executes a function after a specified delay. It can be <span style="color:#2980B9">cancelled</span> before firing. </div>

1) Basic Timer

import threading

def reminder(message):
    print(f"⏰ Reminder: {message}")

# Fire after 3 seconds
t = threading.Timer(3.0, reminder, args=("Meeting at 3pm!",))
t.start()

print("Timer set. Waiting...")
t.join()
# → Timer set. Waiting...
# (3s pause)
# → ⏰ Reminder: Meeting at 3pm!

2) `Timer.cancel()` — Cancel before firing

import threading, time

fired = False

def action():
    global fired
    fired = True
    print("Action fired!")

t = threading.Timer(5.0, action)
t.start()

time.sleep(1)
t.cancel()    # ← cancel within the window
t.join()

print(f"Action fired: {fired}")   # → Action fired: False

3) Repeating timer pattern (重复定时器模式)

import threading

class RepeatingTimer:
    """Fires a function every `interval` seconds."""

    def __init__(self, interval: float, func, *args):
        self.interval = interval
        self.func     = func
        self.args     = args
        self._timer   = None
        self._running = False

    def _run(self):
        self.func(*self.args)
        if self._running:
            self._schedule()

    def _schedule(self):
        self._timer = threading.Timer(self.interval, self._run)
        self._timer.daemon = True
        self._timer.start()

    def start(self):
        self._running = True
        self._schedule()

    def stop(self):
        self._running = False
        if self._timer:
            self._timer.cancel()

import time

counter = [0]
def tick():
    counter[0] += 1
    print(f"Tick #{counter[0]}")

rt = RepeatingTimer(0.5, tick)
rt.start()
time.sleep(2.5)
rt.stop()
print(f"Total ticks: {counter[0]}")   # → Total ticks: 5

8. Barrier — Thread Synchronization Point (屏障同步点)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> A <span style="color:#E8600A;font-weight:700">Barrier (屏障)</span> makes a fixed number of threads wait at a rendezvous point until ALL of them arrive — then releases all of them simultaneously. </div>

1) `Barrier(parties, action=None, timeout=None)`

import threading, time, random

NUM_WORKERS = 4
barrier = threading.Barrier(NUM_WORKERS)

def phase_worker(name):
    # Phase 1
    duration = random.uniform(0.5, 2.0)
    print(f"[{name}] phase 1 working for {duration:.1f}s")
    time.sleep(duration)
    print(f"[{name}] phase 1 done — waiting at barrier")

    barrier.wait()     # ← all threads block here until all 4 arrive

    print(f"[{name}] phase 2 starting (all threads released together)")

threads = [threading.Thread(target=phase_worker, args=(f"W{i}",))
           for i in range(NUM_WORKERS)]
for t in threads: t.start()
for t in threads: t.join()

2) `Barrier` with `action` callback

import threading, time

def setup_phase():
    """Runs ONCE when all threads reach the barrier, before release."""
    print(">>> All threads ready — running barrier action <<<")

barrier = threading.Barrier(3, action=setup_phase)

def worker(name):
    time.sleep(0.1)
    print(f"[{name}] arrived at barrier")
    barrier.wait()
    print(f"[{name}] past barrier")

threads = [threading.Thread(target=worker, args=(f"T{i}",)) for i in range(3)]
for t in threads: t.start()
for t in threads: t.join()

3) `Barrier.abort()` / `BrokenBarrierError`

import threading, time

barrier = threading.Barrier(3)

def risky_worker(name, should_abort):
    try:
        if should_abort:
            time.sleep(0.2)
            print(f"[{name}] aborting barrier!")
            barrier.abort()          # breaks the barrier for everyone
        else:
            print(f"[{name}] waiting at barrier...")
            barrier.wait(timeout=2)
            print(f"[{name}] passed!")
    except threading.BrokenBarrierError:
        print(f"[{name}] barrier was broken — handling gracefully")

threads = [
    threading.Thread(target=risky_worker, args=("T0", False)),
    threading.Thread(target=risky_worker, args=("T1", False)),
    threading.Thread(target=risky_worker, args=("T2", True)),   # aborts
]
for t in threads: t.start()
for t in threads: t.join()

4) Barrier properties

import threading

b = threading.Barrier(5)
print(b.parties)    # → 5   (total threads needed)
print(b.n_waiting)  # → 0   (currently waiting)
print(b.broken)     # → False

9. local — Thread-local Storage (线程本地存储)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">threading.local()</code> creates an object where each thread has its <span style="color:#E8600A;font-weight:700">own independent copy</span> of every attribute. Ideal for thread-specific state like database connections or request contexts. </div>

1) Basic thread-local usage

import threading

local_data = threading.local()

def worker(value):
    local_data.x = value              # each thread sets its own .x
    import time; time.sleep(0.1)      # let other threads run
    print(f"Thread {threading.current_thread().name}: x = {local_data.x}")

threads = [threading.Thread(target=worker, args=(i*10,), name=f"T{i}")
           for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()
# → Thread T0: x = 0
# → Thread T1: x = 10
# → Thread T2: x = 20
# → Thread T3: x = 30
# (each thread sees only its own value — no interference)

2) Thread-local DB connection pattern

import threading
import sqlite3

_local = threading.local()

def get_connection(db_path: str) -> sqlite3.Connection:
    """Return a per-thread DB connection (创建线程私有数据库连接)."""
    if not hasattr(_local, "conn"):
        _local.conn = sqlite3.connect(db_path)
        print(f"[{threading.current_thread().name}] created new connection")
    return _local.conn

def db_worker(db_path: str):
    conn = get_connection(db_path)
    conn.execute("CREATE TABLE IF NOT EXISTS t (v INTEGER)")
    conn.execute("INSERT INTO t VALUES (?)", (threading.get_ident(),))
    conn.commit()
    print(f"[{threading.current_thread().name}] inserted row")

threads = [threading.Thread(target=db_worker, args=(":memory:",), name=f"DB-{i}")
           for i in range(3)]
for t in threads: t.start()
for t in threads: t.join()

3) Subclass `local` for initialization

import threading

class RequestContext(threading.local):
    """Thread-local request context with defaults."""
    def __init__(self):
        super().__init__()
        self.user_id   = None
        self.request_id = None

ctx = RequestContext()

def handle_request(user_id, req_id):
    ctx.user_id    = user_id
    ctx.request_id = req_id
    import time; time.sleep(0.05)
    print(f"Processing request {ctx.request_id} for user {ctx.user_id}")

threads = [threading.Thread(target=handle_request, args=(f"user{i}", f"req-{i:03}"))
           for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()

10. Module-level Functions (模块级函数)

1) `threading.current_thread()` — Get the current thread

import threading

def show_self():
    t = threading.current_thread()
    print(f"name={t.name}, ident={t.ident}, daemon={t.daemon}")

main_t = threading.current_thread()
print(f"Main thread: {main_t.name}")

t = threading.Thread(target=show_self, name="MyWorker")
t.start(); t.join()
# → Main thread: MainThread
# → name=MyWorker, ident=140..., daemon=False

2) `threading.main_thread()` — Get the main thread

import threading

def check_main():
    mt = threading.main_thread()
    ct = threading.current_thread()
    print(f"Main thread: {mt.name}")
    print(f"This thread: {ct.name}")
    print(f"Am I main?  {ct is mt}")

t = threading.Thread(target=check_main)
t.start(); t.join()
# → Main thread: MainThread
# → This thread: Thread-1
# → Am I main?  False

3) `threading.active_count()` — Count live threads

import threading, time

def slow():
    time.sleep(2)

print(threading.active_count())   # → 1  (main only)

threads = [threading.Thread(target=slow) for _ in range(3)]
for t in threads: t.start()

print(threading.active_count())   # → 4  (main + 3 workers)
for t in threads: t.join()
print(threading.active_count())   # → 1

4) `threading.enumerate()` — List all live threads

import threading, time

def task(n):
    time.sleep(n)

threads = [threading.Thread(target=task, args=(i,), name=f"T{i}") for i in range(1,4)]
for t in threads: t.start()

for t in threading.enumerate():
    print(f"  alive: {t.name} | daemon={t.daemon}")
# → alive: MainThread | daemon=False
# → alive: T1        | daemon=False
# → alive: T2        | daemon=False
# → alive: T3        | daemon=False

for t in threads: t.join()

5) `threading.settrace(func)` / `threading.setprofile(func)` — Thread hooks

import threading, sys

def my_tracer(frame, event, arg):
    if event == "call":
        print(f"[TRACE] calling {frame.f_code.co_name}")
    return my_tracer

def task():
    x = 1 + 1
    return x

threading.settrace(my_tracer)    # set trace for ALL future threads
t = threading.Thread(target=task)
t.start(); t.join()
threading.settrace(None)         # remove tracer

6) `threading.stack_size(size=0)` — Set thread stack size

import threading

# Set stack size to 512 KB for all future threads
threading.stack_size(512 * 1024)
print(f"Stack size: {threading.stack_size()} bytes")

def task():
    print(f"Running with custom stack size")

t = threading.Thread(target=task)
t.start(); t.join()

threading.stack_size(0)   # reset to default

7) `threading.excepthook` — Handle uncaught thread exceptions (未捕获异常处理)

import threading

def custom_excepthook(args):
    print(f"Uncaught exception in thread [{args.thread.name}]:")
    print(f"  Type:    {args.exc_type.__name__}")
    print(f"  Message: {args.exc_value}")

threading.excepthook = custom_excepthook

def buggy_task():
    raise ValueError("Something went wrong in thread!")

t = threading.Thread(target=buggy_task, name="BuggyThread")
t.start(); t.join()
# → Uncaught exception in thread [BuggyThread]:
# →   Type:    ValueError
# →   Message: Something went wrong in thread!

8) `threading.get_ident()` / `threading.get_native_id()`

import threading

def show_ids():
    print(f"Python ident:    {threading.get_ident()}")
    print(f"OS native id:    {threading.get_native_id()}")

t = threading.Thread(target=show_ids)
t.start(); t.join()

11. queue Module — Thread-safe Queues (线程安全队列)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> The <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">queue</code> module provides three thread-safe queue classes: <span style="color:#E8600A;font-weight:700">Queue (FIFO)</span>, <span style="color:#E8600A;font-weight:700">LifoQueue (LIFO/stack)</span>, and <span style="color:#E8600A;font-weight:700">PriorityQueue (优先队列)</span>. All use internal locks, so no external synchronization is needed. </div>

1) `Queue(maxsize=0)` — FIFO Queue

from queue import Queue
import threading, time

q = Queue(maxsize=3)

def producer():
    for i in range(6):
        q.put(i)          # blocks if queue is full (maxsize reached)
        print(f"Put {i}  | qsize={q.qsize()}")
        time.sleep(0.2)

def consumer():
    for _ in range(6):
        item = q.get()    # blocks if queue is empty
        print(f"Got {item}")
        q.task_done()
        time.sleep(0.5)

t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)
t1.start(); t2.start()
t1.join();  t2.join()

2) `Queue.put_nowait()` / `Queue.get_nowait()` — Non-blocking

from queue import Queue, Full, Empty

q = Queue(maxsize=2)
q.put("item1")
q.put("item2")

try:
    q.put_nowait("item3")     # queue full!
except Full:
    print("Queue full — item3 dropped")

try:
    while True:
        print(q.get_nowait())
except Empty:
    print("Queue emptied")
# → Queue full — item3 dropped
# → item1
# → item2
# → Queue emptied

3) `Queue.join()` / `Queue.task_done()` — Work tracking

from queue import Queue
import threading

work_queue = Queue()

def worker():
    while True:
        task = work_queue.get()
        if task is None:
            break
        print(f"Processing: {task}")
        work_queue.task_done()   # signal this task is complete

# Start 3 workers
workers = [threading.Thread(target=worker, daemon=True) for _ in range(3)]
for w in workers: w.start()

# Enqueue tasks
for task in ["task_A", "task_B", "task_C", "task_D", "task_E"]:
    work_queue.put(task)

work_queue.join()   # blocks until ALL task_done() called
print("All tasks completed!")

4) `LifoQueue` — Stack (栈/后进先出)

from queue import LifoQueue

stack = LifoQueue()
stack.put("first")
stack.put("second")
stack.put("third")

while not stack.empty():
    print(stack.get())
# → third
# → second
# → first

5) `PriorityQueue` — Priority-based processing (优先级队列)

from queue import PriorityQueue
import threading, time

pq = PriorityQueue()

# (priority, task_name) — lower number = higher priority
pq.put((3, "low-priority task"))
pq.put((1, "URGENT task"))
pq.put((2, "medium task"))
pq.put((1, "another URGENT task"))

while not pq.empty():
    priority, task = pq.get()
    print(f"[priority={priority}] Processing: {task}")
# → [priority=1] Processing: URGENT task
# → [priority=1] Processing: another URGENT task
# → [priority=2] Processing: medium task
# → [priority=3] Processing: low-priority task

12. ThreadPoolExecutor — High-level Thread Pool (高级线程池)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">concurrent.futures.ThreadPoolExecutor</code> provides a high-level, <span style="color:#2980B9">Future-based (Future对象)</span> interface for thread pools. It is the <span style="color:#E8600A;font-weight:700">recommended way</span> to run IO-bound tasks in modern Python. </div>

1) `submit()` → Future

from concurrent.futures import ThreadPoolExecutor
import time

def fetch_data(url: str) -> str:
    time.sleep(1)   # simulate network call
    return f"<data from {url}>"

urls = [f"http://example.com/page{i}" for i in range(5)]

with ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(fetch_data, url) for url in urls]

    for future in futures:
        result = future.result()   # blocks until this future completes
        print(result)

2) `map()` — Parallel map (并行映射)

from concurrent.futures import ThreadPoolExecutor
import time

def square(n):
    time.sleep(0.2)
    return n * n

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(square, range(10)))

print(results)   # → [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

3) `Future` API — `done()`, `cancel()`, `add_done_callback()`

from concurrent.futures import ThreadPoolExecutor
import time

def slow_task(n):
    time.sleep(n)
    return f"result-{n}"

def on_done(future):
    print(f"Callback: task finished → {future.result()}")

with ThreadPoolExecutor(max_workers=2) as executor:
    f1 = executor.submit(slow_task, 1)
    f2 = executor.submit(slow_task, 2)

    f1.add_done_callback(on_done)    # register callback
    f2.add_done_callback(on_done)

    print(f"f1 done: {f1.done()}")   # likely False (still running)
    time.sleep(1.5)
    print(f"f1 done: {f1.done()}")   # → True

4) `as_completed()` — Process in completion order (按完成顺序处理)

from concurrent.futures import ThreadPoolExecutor, as_completed
import time, random

def task(n):
    delay = random.uniform(0.1, 1.0)
    time.sleep(delay)
    return (n, delay)

with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(task, i): i for i in range(8)}

    for future in as_completed(futures):
        task_id = futures[future]
        n, delay = future.result()
        print(f"Task {n} finished in {delay:.2f}s")
# Tasks print in the order they complete, not submission order

5) Exception handling in futures (Future异常处理)

from concurrent.futures import ThreadPoolExecutor

def risky(x):
    if x == 3:
        raise ValueError(f"Bad input: {x}")
    return x * 2

with ThreadPoolExecutor(max_workers=2) as executor:
    futures = [executor.submit(risky, i) for i in range(5)]

for i, f in enumerate(futures):
    try:
        print(f"Result {i}: {f.result()}")
    except ValueError as e:
        print(f"Result {i}: ERROR — {e}")
# → Result 0: 0
# → Result 1: 2
# → Result 2: 4
# → Result 3: ERROR — Bad input: 3
# → Result 4: 8

13. Common Patterns & Pitfalls (常见模式与陷阱)

1) Race condition example (竞态条件示例)

import threading

counter = 0   # UNSAFE shared state

def unsafe_increment():
    global counter
    for _ in range(100_000):
        counter += 1   # NOT atomic! (read-modify-write)

threads = [threading.Thread(target=unsafe_increment) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

print(f"Expected: 500000")
print(f"Actual:   {counter}")   # likely LESS than 500000 — data race!

2) Deadlock example + fix (死锁示例及修复)

import threading

lock_a = threading.Lock()
lock_b = threading.Lock()

# ─── DEADLOCK version ────────────────────────────────
def thread1_deadlock():
    with lock_a:
        import time; time.sleep(0.1)
        with lock_b:                  # waits for lock_b
            print("T1: got both locks")

def thread2_deadlock():
    with lock_b:
        import time; time.sleep(0.1)
        with lock_a:                  # waits for lock_a → DEADLOCK
            print("T2: got both locks")

# ─── FIXED version: always acquire locks in the same order ──
def thread1_safe():
    with lock_a:                      # acquire A first
        with lock_b:                  # then B
            print("T1 safe: got both locks")

def thread2_safe():
    with lock_a:                      # acquire A first (same order!)
        with lock_b:
            print("T2 safe: got both locks")

t1 = threading.Thread(target=thread1_safe)
t2 = threading.Thread(target=thread2_safe)
t1.start(); t2.start()
t1.join();  t2.join()
# → T1 safe: got both locks
# → T2 safe: got both locks

3) Thread-safe singleton (线程安全单例)

import threading

class Singleton:
    _instance = None
    _lock     = threading.Lock()

    def __new__(cls):
        if cls._instance is None:              # first check (no lock)
            with cls._lock:
                if cls._instance is None:      # second check (with lock)
                    cls._instance = super().__new__(cls)
                    print("Singleton created")
        return cls._instance

def get_instance():
    s = Singleton()
    print(f"Got instance: {id(s)}")

threads = [threading.Thread(target=get_instance) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
# → Singleton created   (exactly once)
# → Got instance: 140...  (same id for all 5 threads)

14. Full API Quick Reference (API速查表)

Class / Function	Key Methods	Purpose
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Thread</code>	`start()` `join()` `is_alive()`	Create and manage threads
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code>	`acquire()` `release()` `locked()`	Mutual exclusion
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">RLock</code>	`acquire()` `release()`	Reentrant mutual exclusion
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Condition</code>	`wait()` `wait_for()` `notify()` `notify_all()`	Wait/notify synchronization
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Semaphore</code>	`acquire()` `release()`	Limit concurrent access
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">BoundedSemaphore</code>	`acquire()` `release()`	Semaphore with over-release guard
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code>	`set()` `clear()` `wait()` `is_set()`	Boolean flag signaling
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Timer</code>	`start()` `cancel()`	Delayed / cancellable execution
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Barrier</code>	`wait()` `abort()` `reset()`	N-thread rendezvous point
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">local</code>	attribute access	Per-thread storage
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Queue</code>	`put()` `get()` `task_done()` `join()`	Thread-safe FIFO queue
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">LifoQueue</code>	`put()` `get()`	Thread-safe stack
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">PriorityQueue</code>	`put()` `get()`	Thread-safe priority queue
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ThreadPoolExecutor</code>	`submit()` `map()` `shutdown()`	High-level thread pool
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">current_thread()</code>	—	Get current Thread object
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">active_count()</code>	—	Count live threads
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">enumerate()</code>	—	List all live threads
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">excepthook</code>	—	Handle uncaught thread exceptions

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> Python threading excels at <span style="color:#E8600A;font-weight:700">IO-bound concurrency (IO密集型并发)</span>: use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ThreadPoolExecutor</code> for simple task pools, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Queue</code> for producer-consumer pipelines, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code>/<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">RLock</code> for shared state, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code> for signaling, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Semaphore</code> for resource pools, and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Barrier</code> for multi-phase synchronization — always protect shared mutable state to avoid <span style="color:#C0392B;font-weight:600">Race Conditions (竞态条件)</span> and <span style="color:#C0392B;font-weight:600">Deadlocks (死锁)</span>. </div>

II. When to Use Each API — Scenario Decision Guide (使用场景决策指南)

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> Choosing the wrong synchronization primitive is a common source of bugs, deadlocks, and poor performance. This chapter maps every threading API to its <span style="color:#E8600A;font-weight:700">concrete real-world scenarios</span>, explains the <span style="color:#2980B9">decision logic</span> behind each choice, and provides a final <span style="color:#E8600A;font-weight:700">Decision Flowchart (决策流程图)</span> for quick lookup. </div>

1. Thread — When to create raw threads (何时创建原始线程)

1) ✅ Use `Thread` directly when

<span style="color:#E8600A">1.</span> You need full lifecycle control — start, monitor, join at precise moments. <span style="color:#E8600A">2.</span> The thread has long-running, stateful logic best expressed as a class with run(). <span style="color:#E8600A">3.</span> You need to store a result on the thread object itself (self.result = ...). <span style="color:#E8600A">4.</span> You're building a daemon background service (heartbeat, log flusher, monitor).

# ✅ Scenario: long-lived stateful background service
import threading, time

class HeartbeatThread(threading.Thread):
    """Sends periodic heartbeats to a server."""
    def __init__(self, server_url, interval=5):
        super().__init__(daemon=True, name="Heartbeat")
        self.server_url = server_url
        self.interval   = interval
        self._stop      = threading.Event()

    def run(self):
        while not self._stop.is_set():
            print(f"[Heartbeat] ping → {self.server_url}")
            time.sleep(self.interval)

    def stop(self):
        self._stop.set()

hb = HeartbeatThread("http://api.example.com/health")
hb.start()
time.sleep(12)
hb.stop()

2) ❌ Do NOT use raw `Thread` when

<span style="color:#C0392B;font-weight:600">× You just need to run many short tasks in parallel → use ThreadPoolExecutor instead.</span> <span style="color:#C0392B;font-weight:600">× You need return values from many tasks → Future.result() is cleaner than t.result.</span> <span style="color:#C0392B;font-weight:600">× You need CPU parallelism → use multiprocessing (GIL blocks true parallelism).</span>

3) `daemon=True` — Specifically when

Use daemon threads for tasks that should not keep the program alive if the main thread exits:

Scenario	`daemon=True`	`daemon=False`
Background log flusher	✅	—
Health monitor / watchdog	✅	—
Worker that must finish	—	✅
DB write that must commit	—	✅

# ✅ Scenario: log flusher that should die with the app
import threading, time

log_buffer = []

def flush_logs():
    while True:
        if log_buffer:
            print(f"[Flush] writing {len(log_buffer)} log entries")
            log_buffer.clear()
        time.sleep(1)

flusher = threading.Thread(target=flush_logs, daemon=True)
flusher.start()

# Main thread does work, flusher auto-dies when main exits
for i in range(5):
    log_buffer.append(f"event-{i}")
    time.sleep(0.5)
print("Main done — flusher daemon killed automatically")

2. Lock — When to use mutual exclusion (何时使用互斥锁)

1) ✅ Use `Lock` when

<span style="color:#E8600A">1.</span> Multiple threads read AND write the same variable / data structure. <span style="color:#E8600A">2.</span> An operation that looks atomic is actually read-modify-write (e.g. counter += 1). <span style="color:#E8600A">3.</span> You're updating a shared list, dict, or custom object. <span style="color:#E8600A">4.</span> You need to protect a file write or database update.

# ✅ Scenario: shared bank account balance — MUST use Lock
import threading

class Account:
    def __init__(self, balance):
        self.balance = balance
        self._lock   = threading.Lock()

    def transfer(self, amount):
        with self._lock:                  # critical section
            if self.balance >= amount:
                time.sleep(0.001)         # simulate DB latency
                self.balance -= amount
                return True
            return False

import time
acc     = Account(1000)
results = []

def try_withdraw():
    results.append(acc.transfer(100))

threads = [threading.Thread(target=try_withdraw) for _ in range(20)]
for t in threads: t.start()
for t in threads: t.join()

print(f"Balance: {acc.balance}")         # always ≥ 0
print(f"Successful: {results.count(True)}")

2) ❌ Do NOT use `Lock` when

<span style="color:#C0392B;font-weight:600">× The same thread needs to acquire the lock twice → use RLock instead (plain Lock deadlocks).</span> <span style="color:#C0392B;font-weight:600">× You need to wait for a condition, not just exclusive access → use Condition.</span> <span style="color:#C0392B;font-weight:600">× You only need to limit concurrency to N > 1 → use Semaphore.</span>

3) Scenario matrix (场景矩阵)

Situation	Correct primitive
1 thread at a time, non-reentrant	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code>
1 thread at a time, same thread re-acquires	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">RLock</code>
N threads at a time	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Semaphore(N)</code>
Wait until data is ready	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Condition</code>
One-time go signal	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code>

3. RLock — When re-entrancy is needed (何时需要可重入锁)

1) ✅ Use `RLock` when

<span style="color:#E8600A">1.</span> A method holding the lock calls another method that also acquires the same lock. <span style="color:#E8600A">2.</span> You're building a class with multiple synchronized methods that call each other. <span style="color:#E8600A">3.</span> You have recursive algorithms that need locking at each level.

# ✅ Scenario: tree traversal where each node uses the same lock
import threading

class SafeTree:
    def __init__(self, value, children=None):
        self.value    = value
        self.children = children or []
        self._lock    = threading.RLock()

    def sum_values(self):
        with self._lock:                        # acquire (depth 1)
            total = self.value
            for child in self.children:
                total += child.sum_values()     # same lock, deeper (depth 2+)
            return total

tree = SafeTree(1, [SafeTree(2), SafeTree(3, [SafeTree(4)])])
t = threading.Thread(target=lambda: print(f"Sum: {tree.sum_values()}"))
t.start(); t.join()
# → Sum: 10

2) ❌ Do NOT use `RLock` when

<span style="color:#C0392B;font-weight:600">× Methods don't call each other — a plain Lock has slightly lower overhead.</span> <span style="color:#C0392B;font-weight:600">× You want to detect accidental re-entry as a bug — Lock will surface it as a deadlock.</span>

4. Condition — When threads must wait for state changes (何时等待状态变化)

1) ✅ Use `Condition` when

<span style="color:#E8600A">1.</span> One thread must wait until another thread changes some data (not just unlocks). <span style="color:#E8600A">2.</span> Implementing producer-consumer patterns with a bounded buffer. <span style="color:#E8600A">3.</span> Threads need to coordinate in phases — e.g., "wait until queue has ≥ 3 items". <span style="color:#E8600A">4.</span> You need selective wakeup — notify only one waiter vs. all waiters.

# ✅ Scenario: order fulfillment system
# Orders must wait until inventory is restocked
import threading, time, collections

inventory = collections.defaultdict(int)
cond      = threading.Condition()

def restock_worker():
    items = [("apple", 50), ("banana", 30), ("cherry", 20)]
    for item, qty in items:
        time.sleep(1)
        with cond:
            inventory[item] += qty
            print(f"[Restock] {item} +{qty} → total={inventory[item]}")
            cond.notify_all()   # wake all waiting orders

def process_order(order_id, item, qty):
    with cond:
        cond.wait_for(lambda: inventory[item] >= qty)  # wait for stock
        inventory[item] -= qty
        print(f"[Order {order_id}] filled {qty}x {item} → remaining={inventory[item]}")

threads = [
    threading.Thread(target=restock_worker),
    threading.Thread(target=process_order, args=(1, "apple",  20)),
    threading.Thread(target=process_order, args=(2, "banana", 15)),
    threading.Thread(target=process_order, args=(3, "apple",  40)),
]
for t in threads: t.start()
for t in threads: t.join()

2) `notify()` vs `notify_all()` — When to use which

Situation	Use
Only one consumer can act (e.g. one slot freed)	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">notify()</code>
All consumers might now be able to proceed	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">notify_all()</code>
You added multiple items to the buffer at once	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">notify_all()</code>
Only one thread is waiting (guaranteed)	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">notify()</code>

3) ❌ Do NOT use `Condition` when

<span style="color:#C0392B;font-weight:600">× You just need a one-time signal → use Event (simpler API).</span> <span style="color:#C0392B;font-weight:600">× The data flowing between threads is the signal → use Queue (built-in blocking).</span>

5. Semaphore — When limiting concurrent access (何时限制并发访问数量)

1) ✅ Use `Semaphore` when

<span style="color:#E8600A">1.</span> You have a resource pool with a fixed capacity: DB connections, HTTP clients, file handles. <span style="color:#E8600A">2.</span> You need rate limiting — at most N concurrent API calls. <span style="color:#E8600A">3.</span> Implementing a thread pool from scratch (though ThreadPoolExecutor is preferred). <span style="color:#E8600A">4.</span> A resource requires N permits to use (e.g. a GPU with N memory slots).

# ✅ Scenario: limit concurrent external API calls to avoid 429 Too Many Requests
import threading, time, random

MAX_CONCURRENT = 3
api_semaphore  = threading.BoundedSemaphore(MAX_CONCURRENT)

def call_external_api(request_id):
    print(f"[Req {request_id}] queued")
    with api_semaphore:
        print(f"[Req {request_id}] calling API...")
        time.sleep(random.uniform(0.5, 1.5))   # simulate API latency
        print(f"[Req {request_id}] done")

# Simulate 10 concurrent requests — only 3 run at once
threads = [threading.Thread(target=call_external_api, args=(i,)) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()

2) `Semaphore` vs `BoundedSemaphore` — When to use which

Situation	Use
Resource pool (connection pool, thread pool)	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">BoundedSemaphore</code> — prevents logic bugs
Signaling between threads (producer increments)	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Semaphore</code> — counter can exceed initial
You want a runtime error on over-release	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">BoundedSemaphore</code>

3) ❌ Do NOT use `Semaphore` when

<span style="color:#C0392B;font-weight:600">× You only need to allow 1 thread at a time → use Lock (clearer intent).</span> <span style="color:#C0392B;font-weight:600">× You need workers to process tasks from a queue → use ThreadPoolExecutor.</span>

6. Event — When broadcasting a one-time signal (何时广播一次性信号)

1) ✅ Use `Event` when

<span style="color:#E8600A">1.</span> One thread needs to signal multiple waiting threads simultaneously (broadcast). <span style="color:#E8600A">2.</span> Implementing a start gun — all workers blocked until a "ready" signal fires. <span style="color:#E8600A">3.</span> A graceful shutdown flag — workers poll stop_event.is_set() each iteration. <span style="color:#E8600A">4.</span> A service readiness probe — clients wait until the server is initialized. <span style="color:#E8600A">5.</span> One-shot notifications where the flag stays set permanently after firing.

# ✅ Scenario: web server workers wait for config to load before serving
import threading, time

config_loaded = threading.Event()
config        = {}

def load_config():
    print("[Config] loading from database...")
    time.sleep(2)
    config.update({"host": "0.0.0.0", "port": 8080, "debug": False})
    print("[Config] loaded!")
    config_loaded.set()           # broadcast to ALL waiting workers

def request_handler(worker_id):
    config_loaded.wait()          # block until config ready
    print(f"[Worker {worker_id}] serving on {config['host']}:{config['port']}")

threads = (
    [threading.Thread(target=load_config)] +
    [threading.Thread(target=request_handler, args=(i,)) for i in range(5)]
)
for t in threads: t.start()
for t in threads: t.join()

2) `Event` vs `Condition` — Decision rule

Question	Answer → Use
Signal multiple threads with a permanent flag?	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code>
Wait for a data condition that can change repeatedly?	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Condition</code>
Need to reset and re-arm the signal?	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event.clear()</code> or <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Condition</code>
One producer, many consumers woken at once?	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code>

3) ❌ Do NOT use `Event` when

<span style="color:#C0392B;font-weight:600">× The condition can be true/false multiple times (e.g. buffer empty↔full) → use Condition.</span> <span style="color:#C0392B;font-weight:600">× You're passing data along with the signal → use Queue.</span>

7. Timer — When delaying or scheduling execution (何时延迟或定时执行)

1) ✅ Use `Timer` when

<span style="color:#E8600A">1.</span> You need to run a function once after a delay, without blocking the main thread. <span style="color:#E8600A">2.</span> The action might need to be cancelled before it fires (e.g. debouncing). <span style="color:#E8600A">3.</span> Implementing timeouts for external operations. <span style="color:#E8600A">4.</span> Session expiry, cache invalidation, or auto-logout after inactivity.

# ✅ Scenario: debounce user input — only save after 500ms of inactivity
import threading

_save_timer = None

def debounced_save(content):
    global _save_timer
    if _save_timer:
        _save_timer.cancel()     # cancel previous pending save
    _save_timer = threading.Timer(0.5, do_save, args=(content,))
    _save_timer.start()

def do_save(content):
    print(f"[Save] writing: '{content}'")

# Rapid keystrokes — only the last one saves
import time
debounced_save("H")
debounced_save("He")
debounced_save("Hel")
debounced_save("Hell")
time.sleep(0.1)
debounced_save("Hello")
time.sleep(0.8)
# → [Save] writing: 'Hello'   (only once, after 500ms of quiet)

2) ❌ Do NOT use `Timer` when

<span style="color:#C0392B;font-weight:600">× You need recurring execution → build a RepeatingTimer (see §7.3 in Part I) or use sched.</span> <span style="color:#C0392B;font-weight:600">× You need sub-millisecond precision — Timer uses time.sleep() which is OS-dependent.</span> <span style="color:#C0392B;font-weight:600">× Complex scheduling (cron-like) → use APScheduler or Celery.</span>

8. Barrier — When threads must synchronize at a checkpoint (何时需要检查点同步)

1) ✅ Use `Barrier` when

<span style="color:#E8600A">1.</span> A computation has multiple phases and ALL threads must finish phase N before ANY starts phase N+1. <span style="color:#E8600A">2.</span> Parallel simulation — each timestep must complete across all worker threads before advancing. <span style="color:#E8600A">3.</span> Test synchronization — ensure all threads reach a certain point before asserting results. <span style="color:#E8600A">4.</span> Coordinated startup — all services initialized before traffic is allowed.

# ✅ Scenario: parallel matrix computation with two phases
import threading, time, random

NUM_WORKERS = 4
barrier     = threading.Barrier(NUM_WORKERS)
partial_results = [0] * NUM_WORKERS
final_results   = [0] * NUM_WORKERS

def compute_worker(worker_id):
    # ── Phase 1: independent computation ──────────────
    time.sleep(random.uniform(0.3, 1.2))
    partial_results[worker_id] = random.randint(10, 100)
    print(f"[W{worker_id}] Phase 1 done: partial={partial_results[worker_id]}")

    barrier.wait()   # ← ALL workers must finish phase 1 before phase 2

    # ── Phase 2: needs ALL phase-1 results ────────────
    # e.g., normalize by global sum
    total = sum(partial_results)
    final_results[worker_id] = partial_results[worker_id] / total
    print(f"[W{worker_id}] Phase 2 done: final={final_results[worker_id]:.3f}")

threads = [threading.Thread(target=compute_worker, args=(i,)) for i in range(NUM_WORKERS)]
for t in threads: t.start()
for t in threads: t.join()

print(f"\nFinal results: {[f'{r:.3f}' for r in final_results]}")
print(f"Sum check: {sum(final_results):.6f}")   # → ~1.0

2) ❌ Do NOT use `Barrier` when

<span style="color:#C0392B;font-weight:600">× Thread count is dynamic (unknown at creation time) — Barrier requires a fixed parties count.</span> <span style="color:#C0392B;font-weight:600">× Only one thread needs to wait for others → use Thread.join() or Event.</span> <span style="color:#C0392B;font-weight:600">× Threads have different roles (not symmetric) → use Condition or Queue.</span>

9. threading.local — When isolating per-thread state (何时隔离线程私有状态)

1) ✅ Use `threading.local` when

<span style="color:#E8600A">1.</span> Each thread needs its own copy of a connection (DB, HTTP session, file handle). <span style="color:#E8600A">2.</span> You're building middleware or frameworks that attach request context per thread. <span style="color:#E8600A">3.</span> A global-looking variable must actually be thread-specific (e.g., current user, transaction ID). <span style="color:#E8600A">4.</span> Avoiding lock contention by giving each thread its own cache.

# ✅ Scenario: per-thread HTTP session (connection pooling per thread)
import threading
import urllib.request

_local = threading.local()

def get_session():
    """Return a thread-local opener — no lock needed, no sharing."""
    if not hasattr(_local, "opener"):
        _local.opener = urllib.request.build_opener()
        print(f"[{threading.current_thread().name}] created new HTTP session")
    return _local.opener

def fetch(url):
    session = get_session()     # each thread gets its own
    # session.open(url) ...
    print(f"[{threading.current_thread().name}] fetching {url}")

threads = [threading.Thread(target=fetch, args=(f"http://example.com/{i}",),
                            name=f"Fetcher-{i}") for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()
# Each thread creates exactly one session — no contention, no sharing

2) ❌ Do NOT use `threading.local` when

<span style="color:#C0392B;font-weight:600">× Threads need to share and pass data to each other → use Queue or shared objects with Lock.</span> <span style="color:#C0392B;font-weight:600">× Using ThreadPoolExecutor — threads are reused, old local state may persist unexpectedly.</span>

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">In a thread pool, worker threads are reused across tasks. If you use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">threading.local</code> inside a pool, <strong>always initialize the local value at the start of each task</strong>, not just on first access — otherwise task 2 on the same thread will see task 1's leftover state.</span></div>

10. Queue / LifoQueue / PriorityQueue — When passing data between threads (何时在线程间传递数据)

1) ✅ Use `Queue` when

<span style="color:#E8600A">1.</span> Implementing producer-consumer patterns — the queue IS the synchronization. <span style="color:#E8600A">2.</span> Work items need to be processed in order (FIFO). <span style="color:#E8600A">3.</span> You want backpressure — producers block when the buffer is full (maxsize). <span style="color:#E8600A">4.</span> You need work completion tracking via task_done() + join().

# ✅ Scenario: image processing pipeline
# Loader threads → Queue → Processor threads → Queue → Writer threads
import threading, time, queue

raw_queue       = queue.Queue(maxsize=10)
processed_queue = queue.Queue(maxsize=10)

def loader(n_images):
    for i in range(n_images):
        time.sleep(0.1)
        raw_queue.put(f"image_{i:03}.jpg")
        print(f"[Loader] queued image_{i:03}.jpg")
    raw_queue.put(None)   # sentinel (哨兵值)

def processor():
    while True:
        item = raw_queue.get()
        if item is None:
            processed_queue.put(None)
            raw_queue.task_done()
            break
        result = f"processed_{item}"
        time.sleep(0.2)   # simulate processing
        processed_queue.put(result)
        raw_queue.task_done()

def writer():
    while True:
        item = processed_queue.get()
        if item is None:
            processed_queue.task_done()
            break
        print(f"[Writer] saved {item}")
        processed_queue.task_done()

threads = [
    threading.Thread(target=loader,    args=(5,)),
    threading.Thread(target=processor),
    threading.Thread(target=writer),
]
for t in threads: t.start()
for t in threads: t.join()

2) ✅ Use `LifoQueue` when

<span style="color:#E8600A">1.</span> Most-recently-added tasks are more cache-warm or likely to be more relevant. <span style="color:#E8600A">2.</span> Implementing depth-first search with worker threads. <span style="color:#E8600A">3.</span> Worker threads processing undo stacks or rollback operations.

3) ✅ Use `PriorityQueue` when

<span style="color:#E8600A">1.</span> Tasks have different urgency levels — critical tasks skip the queue. <span style="color:#E8600A">2.</span> Implementing a task scheduler with priority (e.g., real-time vs. batch jobs). <span style="color:#E8600A">3.</span> Retry logic — failed tasks re-enqueued with higher priority.

# ✅ Scenario: multi-tier job scheduler
import threading, queue, time

job_queue = queue.PriorityQueue()

# Priority levels (优先级级别)
CRITICAL = 1
HIGH     = 2
NORMAL   = 3
BATCH    = 4

def scheduler():
    while True:
        try:
            priority, job_id, task = job_queue.get(timeout=2)
            print(f"[Scheduler] running [{['','CRITICAL','HIGH','NORMAL','BATCH'][priority]}] {job_id}")
            task()
            job_queue.task_done()
        except queue.Empty:
            print("[Scheduler] no more jobs")
            break

# Submit jobs in arbitrary order
job_queue.put((NORMAL,   "job-001", lambda: time.sleep(0.1)))
job_queue.put((BATCH,    "job-002", lambda: time.sleep(0.1)))
job_queue.put((CRITICAL, "job-003", lambda: time.sleep(0.1)))
job_queue.put((HIGH,     "job-004", lambda: time.sleep(0.1)))
job_queue.put((NORMAL,   "job-005", lambda: time.sleep(0.1)))

t = threading.Thread(target=scheduler)
t.start(); t.join()
# Always runs: CRITICAL → HIGH → NORMAL → NORMAL → BATCH

11. ThreadPoolExecutor — When managing a pool of workers (何时使用线程池)

1) ✅ Use `ThreadPoolExecutor` when

<span style="color:#E8600A">1.</span> Running many short-to-medium IO-bound tasks concurrently (HTTP, DB, file IO). <span style="color:#E8600A">2.</span> You need return values from concurrent tasks without manual thread management. <span style="color:#E8600A">3.</span> Applying the same function to many inputs in parallel (executor.map). <span style="color:#E8600A">4.</span> You want automatic thread lifecycle management (creation, recycling, shutdown).

# ✅ Scenario: fetch multiple URLs concurrently, collect all results
from concurrent.futures import ThreadPoolExecutor, as_completed
import time, random

def fetch_url(url):
    """Simulate network fetch with random latency."""
    latency = random.uniform(0.2, 1.5)
    time.sleep(latency)
    if "broken" in url:
        raise ConnectionError(f"Failed to connect to {url}")
    return {"url": url, "status": 200, "latency": round(latency, 3)}

urls = [
    "https://api.example.com/users",
    "https://api.example.com/products",
    "https://api.example.com/broken-endpoint",
    "https://api.example.com/orders",
    "https://api.example.com/inventory",
]

print("Starting concurrent fetches...\n")
with ThreadPoolExecutor(max_workers=4) as executor:
    future_to_url = {executor.submit(fetch_url, url): url for url in urls}

    for future in as_completed(future_to_url):
        url = future_to_url[future]
        try:
            result = future.result()
            print(f"✅ {result['url']:<40} latency={result['latency']}s")
        except ConnectionError as e:
            print(f"❌ {url:<40} ERROR: {e}")

2) `submit()` vs `map()` — When to use which

Situation	Use
Need individual `Future` objects for callbacks/cancellation	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">submit()</code>
Simple parallel map, results in submission order	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">map()</code>
Process results as they complete (not submission order)	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">as_completed()</code>
Mixed inputs with different argument structures	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">submit()</code>

3) ❌ Do NOT use `ThreadPoolExecutor` when

<span style="color:#C0392B;font-weight:600">× CPU-bound tasks (image processing, ML inference) → use ProcessPoolExecutor instead.</span> <span style="color:#C0392B;font-weight:600">× Tasks need complex inter-thread communication → combine with Queue.</span> <span style="color:#C0392B;font-weight:600">× Thousands of very short tasks (< 1ms) → thread overhead dominates; use asyncio.</span>

12. Master Decision Flowchart (总决策流程图)

START: I need concurrent execution
│
├─ CPU-bound (数学计算、压缩、ML)?
│   └─ YES → use multiprocessing.Process or ProcessPoolExecutor
│
└─ IO-bound (网络、文件、数据库)?
    │
    ├─ Simple: run N tasks, collect results
    │   └─ use ThreadPoolExecutor.submit() / map()
    │
    ├─ Complex: need fine-grained control
    │   │
    │   ├─ Tasks need to exchange data?
    │   │   └─ use Queue (FIFO) / LifoQueue / PriorityQueue
    │   │
    │   ├─ Need to protect shared state?
    │   │   ├─ One thread at a time, non-reentrant → Lock
    │   │   ├─ One thread at a time, reentrant    → RLock
    │   │   └─ N threads at a time                → Semaphore(N)
    │   │
    │   ├─ Need to wait for a condition?
    │   │   ├─ One-time broadcast signal → Event
    │   │   └─ Repeated condition change → Condition
    │   │
    │   ├─ Need all threads to reach a point?
    │   │   └─ Barrier(N)
    │   │
    │   ├─ Need per-thread private state?
    │   │   └─ threading.local()
    │   │
    │   ├─ Need delayed / cancellable execution?
    │   │   └─ Timer
    │   │
    │   └─ Long-lived background service?
    │       └─ Thread(daemon=True) + Event (stop signal)
    │
    └─ Very high concurrency (1000s of tasks)?
        └─ use asyncio + aiohttp (not threading)

13. Real-world Scenario → API Mapping (真实场景 → API 映射)

Real-world scenario (真实场景)	API to use
Fetch 100 URLs in parallel	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ThreadPoolExecutor</code>
Download pipeline: fetch → parse → store	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Queue</code> (3-stage pipeline)
Shared counter incremented by many threads	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code>
Class method calls another synchronized method	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">RLock</code>
Workers wait for DB to be populated	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Condition.wait_for()</code>
5 workers start simultaneously (race simulation)	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Barrier</code>
Max 3 concurrent DB connections	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">BoundedSemaphore(3)</code>
Server "ready" signal to all request handlers	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code>
Graceful shutdown of background worker	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code> (stop flag)
Auto-logout after 30min inactivity	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Timer</code> + `cancel()` on activity
Debounce save-to-disk on rapid edits	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Timer</code> + `cancel()`
Per-thread DB connection (no sharing)	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">threading.local()</code>
Critical tasks skip the line	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">PriorityQueue</code>
Undo stack processed by worker thread	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">LifoQueue</code>
Parallel phases: all workers finish step 1 first	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Barrier</code>
Background heartbeat / health monitor	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Thread(daemon=True)</code>
LRU cache with thread-safe eviction	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code> + <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">OrderedDict</code>
Rate-limit outgoing API requests	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">BoundedSemaphore</code>

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway for Part II</span><br> The decision rule is simple: <span style="color:#2980B9"><strong>data flows between threads</strong></span> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Queue</code> | <span style="color:#2980B9"><strong>shared state needs protection</strong></span> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code>/<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">RLock</code> | <span style="color:#2980B9"><strong>wait for a condition</strong></span> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code> or <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Condition</code> | <span style="color:#2980B9"><strong>limit concurrency</strong></span> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Semaphore</code> | <span style="color:#2980B9"><strong>just run N tasks</strong></span> → <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ThreadPoolExecutor</code>. </div>

Transformer Code

Sun, 08 Mar 2026 00:00:00 GMT

I. Transformer — Complete Learning Handbook

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> The <strong>Transformer (变换器)</strong> is the foundational architecture behind virtually all modern large language models — GPT, BERT, T5, LLaMA, and beyond. Introduced in <em>"Attention Is All You Need"</em> (Vaswani et al., 2017), it replaces recurrence with <strong>Self-Attention (自注意力机制)</strong>, enabling fully parallel training and capturing long-range dependencies without vanishing gradients. This handbook covers every component from first principles, and ends with complete, runnable training and inference code. </div>

1. Architecture Overview (架构总览)

A standard Encoder-Decoder Transformer (编码器-解码器变换器) consists of:

Input Tokens
     ↓
[Token Embedding + Positional Encoding]
     ↓
┌─────────────────────────────────┐
│  Encoder (编码器)  × N layers    │
│  ┌──────────────────────────┐   │
│  │ Multi-Head Self-Attention│   │
│  │ Add & Norm               │   │
│  │ Feed-Forward Network     │   │
│  │ Add & Norm               │   │
│  └──────────────────────────┘   │
└─────────────────────────────────┘
     ↓  (encoder output = memory)
┌─────────────────────────────────┐
│  Decoder (解码器)  × N layers    │
│  ┌──────────────────────────┐   │
│  │ Masked Self-Attention    │   │
│  │ Add & Norm               │   │
│  │ Cross-Attention          │   │
│  │ Add & Norm               │   │
│  │ Feed-Forward Network     │   │
│  │ Add & Norm               │   │
│  └──────────────────────────┘   │
└─────────────────────────────────┘
     ↓
Linear + Softmax → Output Probabilities

2. Scaled Dot-Product Attention (缩放点积注意力)

1) The Formula

The core attention operation:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

<span style="color:#E8600A;font-weight:700">Q (Query, 查询)</span>: What is each token looking for?
<span style="color:#E8600A;font-weight:700">K (Key, 键)</span>: How can this token be found by others?
<span style="color:#E8600A;font-weight:700">V (Value, 值)</span>: What does this token actually offer?
<span style="color:#E8600A;font-weight:700">$\sqrt{d_k}$ (scaling factor, 缩放因子)</span>: Prevents softmax saturation when $d_k$ is large

2) Implementation

import torch
import torch.nn as nn
import torch.nn.functional as F
import math

def scaled_dot_product_attention(
    Q: torch.Tensor,   # (batch, heads, seq_q, d_k)
    K: torch.Tensor,   # (batch, heads, seq_k, d_k)
    V: torch.Tensor,   # (batch, heads, seq_k, d_v)
    mask: torch.Tensor = None,
) -> tuple[torch.Tensor, torch.Tensor]:
    d_k = Q.size(-1)

    # Step 1: Compute attention scores
    scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)
    # scores shape: (batch, heads, seq_q, seq_k)

    # Step 2: Apply mask (set -inf so softmax → 0)
    if mask is not None:
        scores = scores.masked_fill(mask == 0, float('-inf'))

    # Step 3: Softmax over key dimension
    attn_weights = F.softmax(scores, dim=-1)   # (batch, heads, seq_q, seq_k)

    # Step 4: Weighted sum of values
    output = torch.matmul(attn_weights, V)     # (batch, heads, seq_q, d_v)

    return output, attn_weights

3. Multi-Head Attention (多头注意力)

1) Motivation

A single attention head can only attend to one "subspace" at a time. Multi-Head Attention (多头注意力) runs $h$ attention heads in parallel, each learning to focus on different aspects (syntax, semantics, coreference, etc.), then concatenates and projects the results.

$$\text{MultiHead}(Q,K,V) = \text{Concat}(\text{head}_1, \ldots, \text{head}_h) W^O$$

$$\text{head}_i = \text{Attention}(QW_i^Q,\ KW_i^K,\ VW_i^V)$$

2) Implementation

class MultiHeadAttention(nn.Module):
    def __init__(self, d_model: int, num_heads: int, dropout: float = 0.1):
        super().__init__()
        assert d_model % num_heads == 0, "d_model must be divisible by num_heads"

        self.d_model = d_model
        self.num_heads = num_heads
        self.d_k = d_model // num_heads   # Dimension per head

        # Linear projections for Q, K, V, and output
        self.W_q = nn.Linear(d_model, d_model, bias=False)
        self.W_k = nn.Linear(d_model, d_model, bias=False)
        self.W_v = nn.Linear(d_model, d_model, bias=False)
        self.W_o = nn.Linear(d_model, d_model, bias=False)

        self.dropout = nn.Dropout(dropout)

    def split_heads(self, x: torch.Tensor) -> torch.Tensor:
        """(batch, seq, d_model) → (batch, heads, seq, d_k)"""
        batch, seq, _ = x.size()
        x = x.view(batch, seq, self.num_heads, self.d_k)
        return x.transpose(1, 2)   # (batch, heads, seq, d_k)

    def forward(
        self,
        query: torch.Tensor,    # (batch, seq_q, d_model)
        key: torch.Tensor,      # (batch, seq_k, d_model)
        value: torch.Tensor,    # (batch, seq_k, d_model)
        mask: torch.Tensor = None,
    ) -> torch.Tensor:
        # Project inputs to Q, K, V
        Q = self.split_heads(self.W_q(query))   # (batch, heads, seq_q, d_k)
        K = self.split_heads(self.W_k(key))     # (batch, heads, seq_k, d_k)
        V = self.split_heads(self.W_v(value))   # (batch, heads, seq_k, d_k)

        # Scaled dot-product attention
        attn_output, _ = scaled_dot_product_attention(Q, K, V, mask)
        # attn_output: (batch, heads, seq_q, d_k)

        # Concatenate heads
        batch, _, seq_q, _ = attn_output.size()
        attn_output = attn_output.transpose(1, 2).contiguous()
        attn_output = attn_output.view(batch, seq_q, self.d_model)
        # attn_output: (batch, seq_q, d_model)

        # Final linear projection
        return self.W_o(attn_output)

4. Position-wise Feed-Forward Network (逐位置前馈网络)

Applied independently to each position — acts as a two-layer MLP (多层感知机) with an inner expansion:

$$\text{FFN}(x) = \max(0,\ xW_1 + b_1) W_2 + b_2$$

class FeedForward(nn.Module):
    def __init__(self, d_model: int, d_ff: int, dropout: float = 0.1):
        super().__init__()
        # Standard expansion: d_ff = 4 * d_model
        self.linear1 = nn.Linear(d_model, d_ff)
        self.linear2 = nn.Linear(d_ff, d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x: (batch, seq, d_model)
        x = self.linear1(x)       # (batch, seq, d_ff)
        x = F.relu(x)             # ReLU activation (or GELU in modern variants)
        x = self.dropout(x)
        x = self.linear2(x)       # (batch, seq, d_model)
        return x

5. Positional Encoding (位置编码)

Since Transformers have no recurrence, positional information must be injected explicitly. The original paper uses sinusoidal encoding (正弦编码):

$$PE_{(pos, 2i)} = \sin\left(\frac{pos}{10000^{2i/d_{model}}}\right)$$ $$PE_{(pos, 2i+1)} = \cos\left(\frac{pos}{10000^{2i/d_{model}}}\right)$$

class PositionalEncoding(nn.Module):
    def __init__(self, d_model: int, max_seq_len: int = 5000, dropout: float = 0.1):
        super().__init__()
        self.dropout = nn.Dropout(dropout)

        # Build the positional encoding table once
        pe = torch.zeros(max_seq_len, d_model)                    # (max_len, d_model)
        position = torch.arange(0, max_seq_len).unsqueeze(1)      # (max_len, 1)
        div_term = torch.exp(
            torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model)
        )

        pe[:, 0::2] = torch.sin(position * div_term)   # Even indices
        pe[:, 1::2] = torch.cos(position * div_term)   # Odd indices
        pe = pe.unsqueeze(0)                            # (1, max_len, d_model)

        # Register as buffer (not a parameter — not updated during training)
        self.register_buffer('pe', pe)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x: (batch, seq, d_model)
        x = x + self.pe[:, :x.size(1)]   # Add positional encoding
        return self.dropout(x)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Modern models (BERT, RoBERTa, GPT) use <strong>learned positional embeddings (可学习位置嵌入)</strong> instead of fixed sinusoids. Even more recent models (LLaMA, Mistral) use <strong>RoPE (Rotary Position Embedding, 旋转位置编码)</strong> which encodes relative positions directly into the attention computation.</div>

6. Add & Norm — Residual Connection + Layer Normalization

Each sub-layer is wrapped with a residual connection (残差连接) and Layer Normalization (层归一化):

$$\text{LayerNorm}(x + \text{Sublayer}(x))$$

class AddNorm(nn.Module):
    def __init__(self, d_model: int, dropout: float = 0.1):
        super().__init__()
        self.norm = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x: torch.Tensor, sublayer_output: torch.Tensor) -> torch.Tensor:
        # Pre-norm variant: norm(x) → sublayer → + x  (used in modern GPT-style)
        # Post-norm variant (original paper): x + sublayer(x) → norm
        return self.norm(x + self.dropout(sublayer_output))

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> The original paper uses <strong>Post-LN (后归一化)</strong> — normalize after adding the residual. Modern models (GPT-2, LLaMA) use <strong>Pre-LN (前归一化)</strong> — normalize before the sublayer. Pre-LN is more training-stable and is now the dominant choice.</div>

7. Encoder Layer (编码器层)

class EncoderLayer(nn.Module):
    def __init__(self, d_model: int, num_heads: int, d_ff: int, dropout: float = 0.1):
        super().__init__()
        self.self_attn = MultiHeadAttention(d_model, num_heads, dropout)
        self.ff        = FeedForward(d_model, d_ff, dropout)
        self.norm1     = nn.LayerNorm(d_model)
        self.norm2     = nn.LayerNorm(d_model)
        self.dropout   = nn.Dropout(dropout)

    def forward(self, x: torch.Tensor, src_mask: torch.Tensor = None) -> torch.Tensor:
        # Self-attention + residual + norm
        attn_out = self.self_attn(x, x, x, src_mask)
        x = self.norm1(x + self.dropout(attn_out))

        # Feed-forward + residual + norm
        ff_out = self.ff(x)
        x = self.norm2(x + self.dropout(ff_out))

        return x


class Encoder(nn.Module):
    def __init__(self, num_layers: int, d_model: int, num_heads: int, d_ff: int, dropout: float):
        super().__init__()
        self.layers = nn.ModuleList([
            EncoderLayer(d_model, num_heads, d_ff, dropout)
            for _ in range(num_layers)
        ])
        self.norm = nn.LayerNorm(d_model)

    def forward(self, x: torch.Tensor, src_mask: torch.Tensor = None) -> torch.Tensor:
        for layer in self.layers:
            x = layer(x, src_mask)
        return self.norm(x)

8. Decoder Layer (解码器层)

The decoder has three sub-layers: masked self-attention, cross-attention over encoder output, and feed-forward.

class DecoderLayer(nn.Module):
    def __init__(self, d_model: int, num_heads: int, d_ff: int, dropout: float = 0.1):
        super().__init__()
        self.self_attn  = MultiHeadAttention(d_model, num_heads, dropout)  # Masked
        self.cross_attn = MultiHeadAttention(d_model, num_heads, dropout)  # Cross
        self.ff         = FeedForward(d_model, d_ff, dropout)
        self.norm1      = nn.LayerNorm(d_model)
        self.norm2      = nn.LayerNorm(d_model)
        self.norm3      = nn.LayerNorm(d_model)
        self.dropout    = nn.Dropout(dropout)

    def forward(
        self,
        x: torch.Tensor,           # Decoder input  (batch, tgt_seq, d_model)
        memory: torch.Tensor,      # Encoder output (batch, src_seq, d_model)
        tgt_mask: torch.Tensor = None,   # Causal mask for decoder self-attention
        src_mask: torch.Tensor = None,   # Padding mask for cross-attention
    ) -> torch.Tensor:
        # 1. Masked self-attention (prevents attending to future tokens)
        attn1 = self.self_attn(x, x, x, tgt_mask)
        x = self.norm1(x + self.dropout(attn1))

        # 2. Cross-attention over encoder memory
        attn2 = self.cross_attn(x, memory, memory, src_mask)
        x = self.norm2(x + self.dropout(attn2))

        # 3. Feed-forward
        ff_out = self.ff(x)
        x = self.norm3(x + self.dropout(ff_out))

        return x


class Decoder(nn.Module):
    def __init__(self, num_layers: int, d_model: int, num_heads: int, d_ff: int, dropout: float):
        super().__init__()
        self.layers = nn.ModuleList([
            DecoderLayer(d_model, num_heads, d_ff, dropout)
            for _ in range(num_layers)
        ])
        self.norm = nn.LayerNorm(d_model)

    def forward(self, x, memory, tgt_mask=None, src_mask=None):
        for layer in self.layers:
            x = layer(x, memory, tgt_mask, src_mask)
        return self.norm(x)

9. Masks (掩码)

1) Padding Mask (填充掩码)

Prevents attention over <PAD> tokens:

def make_pad_mask(seq: torch.Tensor, pad_idx: int = 0) -> torch.Tensor:
    """
    seq: (batch, seq_len) — integer token IDs
    Returns: (batch, 1, 1, seq_len) — True where NOT padding
    """
    return (seq != pad_idx).unsqueeze(1).unsqueeze(2)

2) Causal Mask / Look-ahead Mask (因果掩码)

Prevents decoder positions from attending to future positions:

def make_causal_mask(seq_len: int, device: torch.device) -> torch.Tensor:
    """
    Returns lower-triangular mask of shape (1, 1, seq_len, seq_len)
    Position i can attend to positions 0..i only.
    """
    mask = torch.tril(torch.ones(seq_len, seq_len, device=device))
    return mask.unsqueeze(0).unsqueeze(0)   # (1, 1, seq_len, seq_len)

10. Complete Transformer Model (完整模型)

class Transformer(nn.Module):
    def __init__(
        self,
        src_vocab_size: int,
        tgt_vocab_size: int,
        d_model: int      = 512,
        num_heads: int    = 8,
        num_layers: int   = 6,
        d_ff: int         = 2048,
        max_seq_len: int  = 512,
        dropout: float    = 0.1,
        pad_idx: int      = 0,
    ):
        super().__init__()
        self.pad_idx = pad_idx
        self.d_model = d_model

        # Embeddings
        self.src_embedding = nn.Embedding(src_vocab_size, d_model, padding_idx=pad_idx)
        self.tgt_embedding = nn.Embedding(tgt_vocab_size, d_model, padding_idx=pad_idx)
        self.pos_encoding  = PositionalEncoding(d_model, max_seq_len, dropout)

        # Encoder & Decoder
        self.encoder = Encoder(num_layers, d_model, num_heads, d_ff, dropout)
        self.decoder = Decoder(num_layers, d_model, num_heads, d_ff, dropout)

        # Output projection
        self.fc_out = nn.Linear(d_model, tgt_vocab_size)

        # Weight initialization
        self._init_weights()

    def _init_weights(self):
        for p in self.parameters():
            if p.dim() > 1:
                nn.init.xavier_uniform_(p)

    def encode(self, src: torch.Tensor, src_mask: torch.Tensor = None) -> torch.Tensor:
        x = self.pos_encoding(self.src_embedding(src) * math.sqrt(self.d_model))
        return self.encoder(x, src_mask)

    def decode(
        self,
        tgt: torch.Tensor,
        memory: torch.Tensor,
        tgt_mask: torch.Tensor = None,
        src_mask: torch.Tensor = None,
    ) -> torch.Tensor:
        x = self.pos_encoding(self.tgt_embedding(tgt) * math.sqrt(self.d_model))
        return self.decoder(x, memory, tgt_mask, src_mask)

    def forward(
        self,
        src: torch.Tensor,   # (batch, src_len)
        tgt: torch.Tensor,   # (batch, tgt_len)
    ) -> torch.Tensor:
        # Build masks
        src_mask = make_pad_mask(src, self.pad_idx)
        tgt_pad_mask = make_pad_mask(tgt, self.pad_idx)
        tgt_causal   = make_causal_mask(tgt.size(1), tgt.device)
        tgt_mask     = tgt_pad_mask & tgt_causal   # Combine both

        # Forward pass
        memory = self.encode(src, src_mask)
        output = self.decode(tgt, memory, tgt_mask, src_mask)

        # Project to vocabulary
        return self.fc_out(output)   # (batch, tgt_len, tgt_vocab_size)

11. Training (训练)

1) Hyperparameters & Setup

import torch
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

# ---- Hyperparameters ----
SRC_VOCAB  = 8000
TGT_VOCAB  = 8000
D_MODEL    = 256
NUM_HEADS  = 8
NUM_LAYERS = 4
D_FF       = 1024
MAX_LEN    = 128
DROPOUT    = 0.1
PAD_IDX    = 0
BOS_IDX    = 1
EOS_IDX    = 2
BATCH_SIZE = 32
EPOCHS     = 20
LR         = 1e-4
DEVICE     = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# ---- Model ----
model = Transformer(
    src_vocab_size=SRC_VOCAB,
    tgt_vocab_size=TGT_VOCAB,
    d_model=D_MODEL,
    num_heads=NUM_HEADS,
    num_layers=NUM_LAYERS,
    d_ff=D_FF,
    max_seq_len=MAX_LEN,
    dropout=DROPOUT,
    pad_idx=PAD_IDX,
).to(DEVICE)

print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

2) Learning Rate Scheduler — Warmup (学习率预热)

The original paper uses a custom schedule: $$lr = d_{model}^{-0.5} \cdot \min(\text{step}^{-0.5},\ \text{step} \cdot \text{warmup}^{-1.5})$$

class WarmupScheduler:
    def __init__(self, optimizer, d_model: int, warmup_steps: int = 4000):
        self.optimizer = optimizer
        self.d_model = d_model
        self.warmup_steps = warmup_steps
        self.step_num = 0

    def step(self):
        self.step_num += 1
        lr = self.d_model ** (-0.5) * min(
            self.step_num ** (-0.5),
            self.step_num * self.warmup_steps ** (-1.5)
        )
        for param_group in self.optimizer.param_groups:
            param_group['lr'] = lr

optimizer  = optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9)
scheduler  = WarmupScheduler(optimizer, d_model=D_MODEL, warmup_steps=4000)
criterion  = nn.CrossEntropyLoss(ignore_index=PAD_IDX, label_smoothing=0.1)

3) Dummy Dataset for Demonstration

class Seq2SeqDataset(Dataset):
    """
    Minimal demo dataset — replace with real tokenized data.
    Each sample is (src_ids, tgt_ids).
    """
    def __init__(self, size=1000, src_vocab=8000, tgt_vocab=8000,
                 src_len=20, tgt_len=22):
        self.data = [
            (
                torch.randint(3, src_vocab, (src_len,)),
                torch.randint(3, tgt_vocab, (tgt_len,)),
            )
            for _ in range(size)
        ]

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]


def collate_fn(batch):
    """Pad sequences in a batch to the same length."""
    src_batch, tgt_batch = zip(*batch)
    src_padded = torch.nn.utils.rnn.pad_sequence(src_batch, batch_first=True, padding_value=PAD_IDX)
    tgt_padded = torch.nn.utils.rnn.pad_sequence(tgt_batch, batch_first=True, padding_value=PAD_IDX)
    return src_padded, tgt_padded


train_dataset = Seq2SeqDataset(size=2000)
train_loader  = DataLoader(train_dataset, batch_size=BATCH_SIZE,
                           shuffle=True, collate_fn=collate_fn)

4) Training Loop (训练循环)

def train_epoch(model, loader, optimizer, scheduler, criterion, device):
    model.train()
    total_loss = 0.0
    total_tokens = 0

    for batch_idx, (src, tgt) in enumerate(loader):
        src = src.to(device)         # (batch, src_len)
        tgt = tgt.to(device)         # (batch, tgt_len)

        # Teacher forcing (教师强制):
        #   Input  to decoder: tgt[:, :-1]  (all but last token)
        #   Target from model: tgt[:, 1:]   (all but first token = BOS)
        tgt_input  = tgt[:, :-1]
        tgt_target = tgt[:, 1:]

        # Forward pass
        logits = model(src, tgt_input)
        # logits: (batch, tgt_len-1, tgt_vocab_size)

        # Reshape for cross-entropy
        logits_flat  = logits.reshape(-1, logits.size(-1))  # (batch*(tgt-1), vocab)
        targets_flat = tgt_target.reshape(-1)               # (batch*(tgt-1),)

        loss = criterion(logits_flat, targets_flat)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)  # Gradient clipping
        optimizer.step()
        scheduler.step()

        # Track metrics
        non_pad = (tgt_target != PAD_IDX).sum().item()
        total_loss   += loss.item() * non_pad
        total_tokens += non_pad

        if batch_idx % 50 == 0:
            print(f"  Batch {batch_idx}/{len(loader)}  "
                  f"Loss: {loss.item():.4f}  "
                  f"LR: {optimizer.param_groups[0]['lr']:.6f}")

    return total_loss / total_tokens


def evaluate(model, loader, criterion, device):
    model.eval()
    total_loss = 0.0
    total_tokens = 0

    with torch.no_grad():
        for src, tgt in loader:
            src = src.to(device)
            tgt = tgt.to(device)
            tgt_input  = tgt[:, :-1]
            tgt_target = tgt[:, 1:]

            logits = model(src, tgt_input)
            loss   = criterion(logits.reshape(-1, logits.size(-1)), tgt_target.reshape(-1))

            non_pad = (tgt_target != PAD_IDX).sum().item()
            total_loss   += loss.item() * non_pad
            total_tokens += non_pad

    return total_loss / total_tokens


# ---- Main Training Loop ----
best_val_loss = float('inf')

for epoch in range(1, EPOCHS + 1):
    train_loss = train_epoch(model, train_loader, optimizer, scheduler, criterion, DEVICE)
    # val_loss = evaluate(model, val_loader, criterion, DEVICE)

    print(f"\nEpoch {epoch}/{EPOCHS}  Train Loss: {train_loss:.4f}  "
          f"Perplexity: {math.exp(train_loss):.2f}")

    # Save best checkpoint
    torch.save({
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': train_loss,
    }, 'transformer_best.pt')

12. Inference — Greedy Decoding (贪婪解码)

The simplest decoding strategy: at each step, pick the token with the highest probability.

def greedy_decode(
    model: Transformer,
    src: torch.Tensor,         # (1, src_len) — single example
    max_len: int = 50,
    bos_idx: int = BOS_IDX,
    eos_idx: int = EOS_IDX,
    device: torch.device = DEVICE,
) -> list[int]:
    model.eval()
    src = src.to(device)

    with torch.no_grad():
        # Step 1: Encode source sequence once
        src_mask = make_pad_mask(src, PAD_IDX)
        memory = model.encode(src, src_mask)   # (1, src_len, d_model)

        # Step 2: Initialize decoder input with BOS token
        tgt = torch.tensor([[bos_idx]], device=device)   # (1, 1)
        output_tokens = []

        for _ in range(max_len):
            # Build causal mask for current target length
            tgt_mask = make_causal_mask(tgt.size(1), device)

            # Decode one step
            dec_out = model.decode(tgt, memory, tgt_mask, src_mask)
            # dec_out: (1, tgt_len, d_model)

            # Project and take argmax of last position
            logits     = model.fc_out(dec_out[:, -1, :])   # (1, vocab)
            next_token = logits.argmax(dim=-1).item()

            output_tokens.append(next_token)

            if next_token == eos_idx:
                break

            # Append predicted token and continue
            tgt = torch.cat([tgt, torch.tensor([[next_token]], device=device)], dim=1)

    return output_tokens


# Example usage
src_example = torch.randint(3, SRC_VOCAB, (1, 15))
predicted = greedy_decode(model, src_example, max_len=50)
print("Predicted token IDs:", predicted)

13. Inference — Beam Search (束搜索)

Maintains the top-k candidate sequences at each step — much better output quality than greedy.

from dataclasses import dataclass, field

@dataclass(order=True)
class BeamHypothesis:
    score: float
    tokens: list[int] = field(compare=False)


def beam_search_decode(
    model: Transformer,
    src: torch.Tensor,
    beam_size: int   = 4,
    max_len: int     = 50,
    bos_idx: int     = BOS_IDX,
    eos_idx: int     = EOS_IDX,
    device: torch.device = DEVICE,
    length_penalty: float = 0.6,
) -> list[int]:
    model.eval()
    src = src.to(device)

    with torch.no_grad():
        src_mask = make_pad_mask(src, PAD_IDX)
        memory   = model.encode(src, src_mask)

        # Initialize beam with BOS token
        beams     = [BeamHypothesis(score=0.0, tokens=[bos_idx])]
        completed = []

        for step in range(max_len):
            all_candidates = []

            for beam in beams:
                if beam.tokens[-1] == eos_idx:
                    completed.append(beam)
                    continue

                tgt = torch.tensor([beam.tokens], device=device)
                tgt_mask = make_causal_mask(tgt.size(1), device)

                dec_out = model.decode(tgt, memory, tgt_mask, src_mask)
                logits  = model.fc_out(dec_out[:, -1, :])           # (1, vocab)
                log_probs = F.log_softmax(logits, dim=-1).squeeze(0) # (vocab,)

                # Expand top-k tokens
                topk_log_probs, topk_ids = log_probs.topk(beam_size)

                for log_prob, token_id in zip(topk_log_probs.tolist(),
                                               topk_ids.tolist()):
                    new_score  = beam.score + log_prob
                    new_tokens = beam.tokens + [token_id]
                    all_candidates.append(
                        BeamHypothesis(score=new_score, tokens=new_tokens)
                    )

            if not all_candidates:
                break

            # Keep top beam_size candidates
            all_candidates.sort(key=lambda h: h.score / (len(h.tokens) ** length_penalty),
                                 reverse=True)
            beams = all_candidates[:beam_size]

        # Return best completed hypothesis (or best incomplete beam)
        all_hyps = completed + beams
        best = max(all_hyps, key=lambda h: h.score / (len(h.tokens) ** length_penalty))
        return best.tokens[1:]   # Strip BOS


predicted_beam = beam_search_decode(model, src_example, beam_size=4)
print("Beam search tokens:", predicted_beam)

14. Saving & Loading Checkpoints (保存与加载)

# ---- Save ----
torch.save({
    'model_state_dict'    : model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'epoch'               : epoch,
    'loss'                : train_loss,
    'config': {
        'src_vocab': SRC_VOCAB, 'tgt_vocab': TGT_VOCAB,
        'd_model': D_MODEL, 'num_heads': NUM_HEADS,
        'num_layers': NUM_LAYERS, 'd_ff': D_FF,
    }
}, 'transformer_checkpoint.pt')

# ---- Load ----
checkpoint = torch.load('transformer_checkpoint.pt', map_location=DEVICE)
cfg = checkpoint['config']

model = Transformer(
    src_vocab_size=cfg['src_vocab'],
    tgt_vocab_size=cfg['tgt_vocab'],
    d_model=cfg['d_model'],
    num_heads=cfg['num_heads'],
    num_layers=cfg['num_layers'],
    d_ff=cfg['d_ff'],
).to(DEVICE)

model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
model.eval()
print(f"Loaded checkpoint from epoch {checkpoint['epoch']}")

15. Key Design Decisions & Modern Variants (关键设计决策与现代变体)

Component	Original Paper	Modern Practice
Positional Encoding	Sinusoidal (fixed)	Learned embeddings (BERT) / RoPE (LLaMA)
Normalization	Post-LN (后归一化)	Pre-LN (前归一化) — more stable
Activation	ReLU	GELU / SwiGLU (GPT, LLaMA)
Attention	Full self-attention	GQA / MQA (grouped/multi-query) — faster inference
Vocab size	~37,000	32k–128k+ with BPE/SentencePiece
Weight tying	None	Tie input & output embeddings (GPT-2)
KV Cache	None	KV Cache (KV 缓存) for autoregressive inference
Context length	512	4k–128k+ with sliding window or ALiBi

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> A Transformer is a stack of <strong>Multi-Head Attention (多头注意力)</strong> + <strong>Feed-Forward (前馈网络)</strong> blocks tied together by <strong>Residual Connections (残差连接)</strong> + <strong>LayerNorm (层归一化)</strong> — master <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">scaled_dot_product_attention</code>, understand causal masking, use warmup scheduling, and switch from greedy to beam search for better output quality.</div>

Python asyncio

Sun, 08 Mar 2026 00:00:00 GMT

I. Python `asyncio` — Principles & Core Mechanics

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">asyncio</code> is Python's <strong>asynchronous concurrency framework (异步并发框架)</strong> that uses an <strong>Event Loop (事件循环)</strong>, <strong>Coroutines (协程)</strong>, and <strong>non-blocking I/O (非阻塞 I/O)</strong> to efficiently handle many I/O-bound tasks within a single thread. The core principle: when a coroutine reaches an <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">await</code>, it <em>yields control</em> back to the event loop, which uses OS-level I/O multiplexing to resume the coroutine once the I/O operation is ready. </div>

1. How `asyncio` Works — The Big Picture

The three pillars of asyncio:

<span style="color:#E8600A;font-weight:700">Event Loop (事件循环)</span> — schedules and runs coroutines, monitors I/O events, resumes tasks when they are ready
<span style="color:#E8600A;font-weight:700">Coroutines (协程)</span> — define asynchronous tasks using async/await; can pause and resume execution
<span style="color:#E8600A;font-weight:700">Non-blocking I/O (非阻塞 I/O)</span> — allows the program to perform other work while waiting for I/O operations to complete

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> The principle of asyncio is: use an event loop to schedule coroutines (使用事件循环调度协程). When a coroutine reaches <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">await</code>, it <strong>yields control (让出执行权)</strong> back to the event loop, which then uses non-blocking I/O and OS I/O multiplexing (I/O 多路复用) to <strong>resume the coroutine (恢复协程执行)</strong> once the I/O operation is ready.</div>

2. Event Loop (事件循环)

1) What the Event Loop Does

The <span style="color:#E8600A;font-weight:700">Event Loop</span> is the <span style="color:#2980B9">core scheduler (核心调度器)</span> of asyncio. It is responsible for:

<span style="color:#2980B9">Running</span> coroutines and tasks
<span style="color:#2980B9">Monitoring</span> I/O events (sockets, file descriptors, timers)
<span style="color:#2980B9">Resuming</span> coroutines when their awaited operation completes

2) Starting the Event Loop

The standard way is via <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">asyncio.run()</code>:

import asyncio

async def main():
    print("hello")

asyncio.run(main())

What happens here:

<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">asyncio.run()</code> <span style="color:#2980B9">creates and starts</span> the event loop
The event loop <span style="color:#2980B9">executes</span> the <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">main()</code> coroutine to completion
The event loop is <span style="color:#2980B9">closed</span> when the coroutine returns

3. Coroutines (协程)

1) What Is a Coroutine?

A <span style="color:#E8600A;font-weight:700">Coroutine</span> is a special function that can <span style="color:#2980B9">pause and resume execution</span>. When it encounters an I/O wait, it pauses and returns control to the event loop, allowing other tasks to run in the meantime.

2) How to Define and Use Coroutines

Define with <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">async def</code>, await with <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">await</code>:

import asyncio

async def task():
    await asyncio.sleep(1)
    print("done")

asyncio.run(task())

What happens here:

<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">async def</code> <span style="color:#2980B9">defines</span> the coroutine
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">await</code> <span style="color:#2980B9">pauses</span> the coroutine until the operation completes, yielding control back to the event loop

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> asyncio achieves concurrency on a <strong>single thread</strong> by having the <strong>Event Loop (事件循环)</strong> continuously schedule <strong>Coroutines (协程)</strong> — each coroutine runs until it hits an <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">await</code>, yields control, and is resumed by the event loop once its I/O is ready.</div>

II. Python `asyncio` — Complete API Reference & Usage Scenarios

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> This note is a complete API reference for Python's <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">asyncio</code> library, organized by category. Every interface is paired with a real-world usage scenario so you can immediately see <em>when</em> and <em>why</em> to use it. The library is built on a single-threaded <strong>Event Loop (事件循环)</strong> that schedules <strong>Coroutines (协程)</strong> cooperatively, making it ideal for <strong>I/O-bound (I/O 密集型)</strong> workloads. </div>

1. Entry Points — Running Coroutines

1) `asyncio.run(coro)`

<span style="color:#2980B9">The top-level entry point</span> for running an async program. Creates a new Event Loop (事件循环), runs the coroutine to completion, then closes the loop.

import asyncio

async def main():
    print("Hello from asyncio!")

asyncio.run(main())

Scenario: Entry point of any standalone async application — CLI tools, scripts, servers.

2) `asyncio.get_event_loop()` / `asyncio.get_running_loop()`

async def main():
    loop = asyncio.get_running_loop()   # Preferred inside async context
    print(loop)

loop = asyncio.get_event_loop()         # Can be used outside async context

API	When to Use
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">get_running_loop()</code>	Inside a coroutine — raises `RuntimeError` if no loop is running
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">get_event_loop()</code>	Outside a coroutine — may create a new loop if none exists

2. Coroutines & Tasks (协程与任务)

1) `async def` / `await` — Defining and Awaiting Coroutines

async def fetch(url: str) -> str:
    await asyncio.sleep(1)      # Yield control to event loop
    return f"data from {url}"

async def main():
    result = await fetch("https://api.example.com")
    print(result)

Scenario: Any function that performs I/O — HTTP requests, DB queries, file reads.

2) `asyncio.create_task(coro)` — Schedule Concurrently

<span style="color:#E8600A;font-weight:700">Wraps a coroutine into a Task (任务)</span> and schedules it to run on the current event loop immediately — without blocking the caller.

async def worker(name: str, delay: float):
    await asyncio.sleep(delay)
    print(f"{name} done")

async def main():
    t1 = asyncio.create_task(worker("A", 2.0))
    t2 = asyncio.create_task(worker("B", 1.0))
    await t1
    await t2
    # Total time ≈ 2s, not 3s

Scenario: Fire multiple independent I/O operations simultaneously (parallel API calls, parallel DB queries).

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">A task that is created but never awaited will still run, but any exception it raises will be silently discarded.</span> Always await your tasks or attach a <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">add_done_callback</code>.</div>

3) `asyncio.Task` Methods

async def main():
    task = asyncio.create_task(worker("A", 5.0))

    task.cancel()                    # Request cancellation
    print(task.done())               # True if finished/cancelled/errored
    print(task.cancelled())          # True if cancelled
    print(task.result())             # Returns result (raises if not done)
    task.add_done_callback(lambda t: print("finished:", t))

Method	Purpose
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">cancel()</code>	Request cancellation — injects `CancelledError` at next `await`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">done()</code>	True if completed, cancelled, or raised an exception
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">result()</code>	Returns the return value, or re-raises the exception
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">exception()</code>	Returns the exception if one was raised, else `None`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">add_done_callback(fn)</code>	Register a callback to run when the task finishes

3. Concurrency Helpers (并发工具)

1) `asyncio.gather(*coros, return_exceptions=False)`

Runs multiple awaitables (可等待对象) <span style="color:#2980B9">concurrently</span>, returns a list of results in the same order as input.

async def main():
    results = await asyncio.gather(
        fetch("url1"),
        fetch("url2"),
        fetch("url3"),
    )
    print(results)   # ["data from url1", "data from url2", "data from url3"]

With exception handling:

results = await asyncio.gather(
    fetch("url1"),
    failing_fetch(),
    return_exceptions=True   # Exceptions returned as values, not raised
)
for r in results:
    if isinstance(r, Exception):
        print(f"Error: {r}")

Scenario: Batch HTTP requests, parallel DB lookups, loading multiple config files simultaneously.

2) `asyncio.wait(tasks, return_when=...)`

Returns two sets: <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">done</code> and <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">pending</code>. Gives fine-grained control over when to stop waiting.

async def main():
    tasks = {asyncio.create_task(fetch(url)) for url in urls}

    done, pending = await asyncio.wait(
        tasks,
        return_when=asyncio.FIRST_COMPLETED
    )
    for task in pending:
        task.cancel()

`return_when`	Behavior
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ALL_COMPLETED</code>	Wait for all tasks (default)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">FIRST_COMPLETED</code>	Return as soon as any task finishes
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">FIRST_EXCEPTION</code>	Return as soon as any task raises

Scenario: Race condition (竞态) — take the first successful result and cancel the rest (e.g., querying multiple replicas, use whichever responds first).

3) `asyncio.as_completed(coros)`

Yields tasks <span style="color:#2980B9">in completion order</span> (not submission order).

async def main():
    coros = [fetch(url) for url in urls]
    for future in asyncio.as_completed(coros):
        result = await future
        print(f"Got: {result}")   # Processed as each one finishes

Scenario: Show results to the user as they arrive, without waiting for the slowest request.

4) `asyncio.TaskGroup` (Python 3.11+) — Structured Concurrency (结构化并发)

If any task raises, all remaining tasks are <span style="color:#E8600A;font-weight:700">automatically cancelled</span>.

async def main():
    async with asyncio.TaskGroup() as tg:
        t1 = tg.create_task(fetch("url1"))
        t2 = tg.create_task(fetch("url2"))
    # All done here — or all cancelled if one failed
    print(t1.result(), t2.result())

Scenario: Any workflow where subtasks are all required — if one fails, the whole group should abort (e.g., a multi-step pipeline).

4. Timeouts & Cancellation (超时与取消)

1) `asyncio.wait_for(coro, timeout)`

async def main():
    try:
        result = await asyncio.wait_for(fetch("url"), timeout=3.0)
    except asyncio.TimeoutError:
        print("Request timed out after 3s")

Scenario: Any network call that must complete within a deadline (SLA enforcement, user-facing APIs).

2) `asyncio.timeout(seconds)` (Python 3.11+)

A context-manager (上下文管理器) version of timeout — more composable than wait_for.

async def main():
    try:
        async with asyncio.timeout(5.0):
            result = await fetch("url")
            await process(result)
    except TimeoutError:
        print("Entire block timed out")

Scenario: Apply a single deadline across multiple awaits inside a block.

3) `asyncio.shield(coro)` — Protect from Cancellation

Prevents the inner coroutine from being cancelled when the outer task is cancelled.

async def important_cleanup():
    await asyncio.sleep(1)
    print("Cleanup done")

async def main():
    task = asyncio.create_task(important_cleanup())
    try:
        await asyncio.shield(task)
    except asyncio.CancelledError:
        print("Outer cancelled, but cleanup still runs!")
        await task   # Wait for it to actually finish

Scenario: Protect a critical cleanup/commit operation from being interrupted by a cancellation signal.

4) Handling `CancelledError`

async def worker():
    try:
        while True:
            await asyncio.sleep(1)
    except asyncio.CancelledError:
        print("Cleaning up before cancel...")
        await do_cleanup()
        raise   # Always re-raise!

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Always re-raise <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">CancelledError</code> after cleanup.</span> Swallowing it breaks the cancellation chain. In Python 3.8+, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">CancelledError</code> is a subclass of <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">BaseException</code>, not <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Exception</code>.</div>

5. Synchronization Primitives (同步原语)

1) `asyncio.Lock` — Mutual Exclusion (互斥锁)

lock = asyncio.Lock()

async def safe_write(db, data):
    async with lock:
        await db.write(data)   # Only one coroutine at a time

Scenario: Protecting shared in-memory state (counters, caches, connection pools) from concurrent modification.

2) `asyncio.Semaphore` — Concurrency Limiter (并发限制)

sem = asyncio.Semaphore(10)   # Max 10 concurrent requests

async def rate_limited_fetch(session, url):
    async with sem:
        return await session.get(url)

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [rate_limited_fetch(session, url) for url in urls]
        results = await asyncio.gather(*tasks)

Scenario: Rate-limiting API calls, capping DB connection count, controlling concurrent file handles.

3) `asyncio.BoundedSemaphore`

Same as Semaphore but raises ValueError if release() is called more times than acquire().

Scenario: Safety-critical resource pools where over-releasing would be a bug.

4) `asyncio.Event` — Signal Between Coroutines (协程间信号)

event = asyncio.Event()

async def producer():
    await asyncio.sleep(2)
    print("Data ready")
    event.set()            # Signal the consumer

async def consumer():
    await event.wait()     # Block until set
    print("Processing data")

async def main():
    await asyncio.gather(producer(), consumer())

Scenario: One-shot notification — signal consumers when data/resource becomes available.

5) `asyncio.Condition` — Wait + Notify Pattern

condition = asyncio.Condition()
buffer = []

async def producer():
    async with condition:
        buffer.append("item")
        condition.notify_all()   # Wake all waiting consumers

async def consumer():
    async with condition:
        await condition.wait_for(lambda: len(buffer) > 0)
        item = buffer.pop()

Scenario: Multiple consumers waiting on a shared resource to reach a specific state.

6) `asyncio.Queue` — Producer-Consumer (生产者-消费者)

async def producer(q: asyncio.Queue):
    for i in range(10):
        await q.put(i)
    await q.put(None)   # Sentinel

async def consumer(q: asyncio.Queue):
    while True:
        item = await q.get()
        if item is None:
            break
        await process(item)
        q.task_done()

async def main():
    q = asyncio.Queue(maxsize=5)
    await asyncio.gather(producer(q), consumer(q))

Queue Type	Behavior
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Queue</code>	FIFO (先进先出)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">LifoQueue</code>	LIFO / stack (后进先出)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">PriorityQueue</code>	Smallest item dequeued first (优先队列)

Scenario: Pipeline architectures — web crawlers, log processors, streaming data ingestion.

6. Async Context Managers & Iterators (异步上下文管理器与迭代器)

1) `async with` — Async Context Manager (异步上下文管理器)

Implement <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">aenter</code> and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">aexit</code>:

class AsyncDB:
    async def __aenter__(self):
        self.conn = await connect_db()
        return self.conn

    async def __aexit__(self, *args):
        await self.conn.close()

async def main():
    async with AsyncDB() as conn:
        result = await conn.query("SELECT 1")

Scenario: Any resource requiring async setup/teardown — DB connections, HTTP sessions, file handles, locks.

2) `@asynccontextmanager` — Decorator Shortcut

from contextlib import asynccontextmanager

@asynccontextmanager
async def managed_connection():
    conn = await connect_db()
    try:
        yield conn
    finally:
        await conn.close()

async def main():
    async with managed_connection() as conn:
        await conn.query("SELECT 1")

Scenario: Simpler alternative to writing a full class when you need a one-off async context manager.

3) `async for` — Async Iterator (异步迭代器)

Implement <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">aiter</code> and <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">anext</code>, or use an async generator:

async def paginated_api(base_url: str):
    page = 1
    while True:
        data = await fetch(f"{base_url}?page={page}")
        if not data:
            break
        yield data
        page += 1

async def main():
    async for page in paginated_api("https://api.example.com/items"):
        await process(page)

Scenario: Paginated APIs, streaming database cursors, real-time event streams (WebSocket, SSE).

7. Running Blocking Code (在异步中运行阻塞代码)

1) `asyncio.to_thread(func, *args)` (Python 3.9+)

import time

async def main():
    # Run blocking I/O in a thread without freezing the event loop
    result = await asyncio.to_thread(time.sleep, 2)

Scenario: Legacy blocking libraries (e.g., requests, psycopg2, time.sleep), file system operations.

2) `loop.run_in_executor(executor, func, *args)`

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

async def main():
    loop = asyncio.get_running_loop()

    # Thread pool — for blocking I/O
    result = await loop.run_in_executor(None, blocking_io_func, arg)

    # Process pool — for CPU-bound work
    with ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(pool, cpu_bound_func, arg)

Executor	Use Case
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">None</code> (default ThreadPool)	Blocking I/O, legacy libraries
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ThreadPoolExecutor</code>	Explicit thread pool sizing
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ProcessPoolExecutor</code>	CPU-bound tasks (bypasses GIL)

Scenario: Image processing, ML inference on CPU, compression, encryption — any heavy computation alongside async I/O.

8. Streams — High-Level Network I/O (高层网络 I/O)

1) `asyncio.open_connection(host, port)` — TCP Client

async def tcp_client():
    reader, writer = await asyncio.open_connection("127.0.0.1", 8888)

    writer.write(b"Hello\n")
    await writer.drain()

    data = await reader.readline()
    print(f"Received: {data.decode()}")

    writer.close()
    await writer.wait_closed()

Scenario: Custom TCP clients — talking to Redis, custom protocols, game servers.

2) `asyncio.start_server(handler, host, port)` — TCP Server

async def handle_client(reader, writer):
    data = await reader.read(1024)
    writer.write(data)         # Echo back
    await writer.drain()
    writer.close()

async def main():
    server = await asyncio.start_server(handle_client, "127.0.0.1", 8888)
    async with server:
        await server.serve_forever()

Scenario: Building lightweight TCP/protocol servers (chat, telnet, custom RPC).

9. Subprocesses (异步子进程)

1) `asyncio.create_subprocess_exec()` / `asyncio.create_subprocess_shell()`

async def run_command():
    proc = await asyncio.create_subprocess_exec(
        "ls", "-la",
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )
    stdout, stderr = await proc.communicate()
    print(stdout.decode())

async def run_shell():
    proc = await asyncio.create_subprocess_shell(
        "echo hello && sleep 1 && echo world",
        stdout=asyncio.subprocess.PIPE,
    )
    stdout, _ = await proc.communicate()
    print(stdout.decode())

API	Use Case
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">create_subprocess_exec</code>	Safe — no shell injection, explicit args
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">create_subprocess_shell</code>	Convenient — supports pipes/redirects, shell expansion

Scenario: Running external tools (ffmpeg, git, compilers) without blocking the event loop.

10. Utilities & Introspection (工具与内省)

1) `asyncio.sleep(delay, result=None)`

async def main():
    await asyncio.sleep(0)    # Yield control without waiting (common pattern)
    await asyncio.sleep(1.5)  # Wait 1.5 seconds
    val = await asyncio.sleep(2, result="done")  # Returns result after delay
    print(val)   # "done"

Scenario: sleep(0) is used to yield control voluntarily in tight loops, preventing event loop starvation (事件循环饥饿).

2) `asyncio.current_task()` / `asyncio.all_tasks()`

async def main():
    me = asyncio.current_task()
    me.set_name("main-task")

    all_running = asyncio.all_tasks()
    print(f"Running tasks: {len(all_running)}")

Scenario: Debugging, logging task names, graceful shutdown (cancel all tasks on SIGINT).

3) `asyncio.ensure_future(coro_or_future)`

Schedules a coroutine or wraps a Future (期约) into a Task. Largely superseded by create_task() in modern code.

task = asyncio.ensure_future(my_coro())   # Legacy — prefer create_task()

4) `asyncio.wrap_future(future)` — Bridge with `concurrent.futures`

Wraps a <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">concurrent.futures.Future</code> into an asyncio-compatible <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">asyncio.Future</code>.

import concurrent.futures

def blocking():
    return 42

async def main():
    loop = asyncio.get_running_loop()
    with concurrent.futures.ThreadPoolExecutor() as pool:
        future = pool.submit(blocking)
        result = await asyncio.wrap_future(future)
        print(result)   # 42

Scenario: Integrating existing concurrent.futures-based code into an asyncio application.

5) `asyncio.isfuture()` / `asyncio.iscoroutine()` / `asyncio.iscoroutinefunction()`

import asyncio

async def my_coro(): pass

print(asyncio.iscoroutinefunction(my_coro))   # True
print(asyncio.iscoroutine(my_coro()))         # True
print(asyncio.isfuture(asyncio.Future()))     # True

Scenario: Writing framework code or decorators that need to handle both sync and async callables.

11. Low-level Event Loop APIs (底层事件循环接口)

1) `loop.call_soon(callback, *args)` / `loop.call_later(delay, callback)`

Schedule a plain (non-coroutine) callback:

loop = asyncio.get_event_loop()
loop.call_soon(print, "scheduled immediately")
loop.call_later(2.0, print, "scheduled in 2s")
loop.call_at(loop.time() + 5.0, print, "scheduled at absolute time")

Scenario: Integrating callback-based legacy code into an asyncio event loop.

2) `loop.add_reader(fd, callback)` / `loop.add_writer(fd, callback)`

loop.add_reader(sock.fileno(), on_data_received)
loop.remove_reader(sock.fileno())

Scenario: Building low-level protocol handlers — custom socket management, raw I/O multiplexing.

3) `asyncio.Protocol` / `asyncio.DatagramProtocol`

The low-level <span style="color:#2980B9">callback-based protocol interface</span>, underlying StreamReader/StreamWriter.

class EchoProtocol(asyncio.Protocol):
    def connection_made(self, transport):
        self.transport = transport

    def data_received(self, data: bytes):
        self.transport.write(data)   # Echo

    def connection_lost(self, exc):
        print("Connection closed")

async def main():
    loop = asyncio.get_running_loop()
    server = await loop.create_server(EchoProtocol, "127.0.0.1", 8888)
    async with server:
        await server.serve_forever()

Scenario: High-performance servers where the overhead of StreamReader/StreamWriter is unacceptable, or when implementing a custom protocol (e.g., custom binary framing).

12. Graceful Shutdown Pattern (优雅关闭模式)

import asyncio
import signal

async def main():
    loop = asyncio.get_running_loop()

    stop = loop.create_future()
    loop.add_signal_handler(signal.SIGINT, stop.set_result, None)
    loop.add_signal_handler(signal.SIGTERM, stop.set_result, None)

    tasks = [asyncio.create_task(worker(i)) for i in range(5)]

    await stop   # Block until SIGINT/SIGTERM

    print("Shutting down...")
    for t in tasks:
        t.cancel()

    await asyncio.gather(*tasks, return_exceptions=True)
    print("All tasks cancelled. Bye.")

asyncio.run(main())

Scenario: Any long-running async service (web server, bot, background processor) that must clean up gracefully on Ctrl+C or a system signal.

13. Quick API Comparison Table

API	Category	Key Trait
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">asyncio.run()</code>	Entry point	Creates + closes loop; top-level only
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">create_task()</code>	Scheduling	Non-blocking schedule; returns Task
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">gather()</code>	Concurrency	All results in order; short-circuits on exception
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">wait()</code>	Concurrency	Returns done/pending sets; fine control
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">as_completed()</code>	Concurrency	Yields in completion order
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">TaskGroup</code>	Structured	Auto-cancel on failure (3.11+)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">wait_for()</code>	Timeout	Cancels task on timeout
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">timeout()</code>	Timeout	Context manager; covers multiple awaits (3.11+)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">shield()</code>	Cancellation	Protects inner task from outer cancel
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Lock</code>	Sync	Mutex; one at a time
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Semaphore</code>	Sync	N at a time; rate limiting
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Event</code>	Sync	One-shot signal
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Condition</code>	Sync	Wait-for-state with notify
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Queue</code>	Sync	FIFO producer-consumer
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">to_thread()</code>	Blocking	Offload to thread pool (3.9+)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">run_in_executor()</code>	Blocking	Thread or process pool
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">open_connection()</code>	Streams	High-level TCP client
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">start_server()</code>	Streams	High-level TCP server
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">Protocol</code>	Low-level	Callback-based; max performance

python dataclass

Sun, 08 Mar 2026 00:00:00 GMT

I. Python `dataclasses` — Complete Learning Handbook

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> Python's <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">dataclasses</code> module (introduced in Python 3.7) provides a <strong>decorator (装饰器)</strong> that automatically generates boilerplate methods — <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">init</code>, <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">repr</code>, <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">eq</code> — for classes that primarily store data. It sits between a plain class and a full ORM/validation framework, offering clean syntax with zero runtime overhead beyond standard Python. </div>

1. Installation & Import

dataclasses is part of the Python standard library — no installation required.

from dataclasses import dataclass, field, fields, asdict, astuple, replace, KW_ONLY

2. Defining a Dataclass (定义数据类)

1) Basic Definition

from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float

p = Point(x=1.0, y=2.0)
print(p)          # Point(x=1.0, y=2.0)
print(p.x)        # 1.0
print(p == Point(1.0, 2.0))   # True

The @dataclass decorator auto-generates:

<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">init(self, x, y)</code> — constructor
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">repr</code> — pretty string representation
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">eq</code> — field-by-field equality comparison

2) Fields with Default Values (默认值)

@dataclass
class User:
    name: str
    age: int = 0
    active: bool = True

u = User(name="Alice")
print(u)   # User(name='Alice', age=0, active=True)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Fields with defaults must come after fields without defaults</span> — same rule as regular Python function parameters. Violating this raises a <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">TypeError</code>.</div>

3) `field()` — Advanced Field Configuration

from dataclasses import dataclass, field

@dataclass
class Config:
    tags: list[str] = field(default_factory=list)     # Mutable default
    name: str = field(default="unnamed")
    _secret: str = field(default="", repr=False)       # Hidden from repr
    metadata: dict = field(default_factory=dict, compare=False)  # Excluded from ==

`field()` Parameter Reference

Parameter	Default	Effect
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">default</code>	`MISSING`	Scalar default value
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">default_factory</code>	`MISSING`	Callable that produces the default (for mutable types)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">repr</code>	`True`	Include field in `__repr__` output
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">compare</code>	`True`	Include field in `__eq__` and `__lt__` etc.
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">hash</code>	`None`	Include field in `__hash__` (None = follow `compare`)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">init</code>	`True`	Include field as a parameter in `__init__`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">metadata</code>	`None`	Arbitrary read-only mapping attached to the field
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">kw_only</code>	`False`	Force this field to be keyword-only in `__init__` (3.10+)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Never use a mutable object (list, dict, set) directly as a default value.</span> Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">field(default_factory=list)</code> — otherwise all instances share the same object.</div>

4) Nested Dataclasses (嵌套数据类)

@dataclass
class Address:
    city: str
    country: str

@dataclass
class Person:
    name: str
    address: Address

p = Person(name="Bob", address=Address(city="NYC", country="US"))
print(p.address.city)   # NYC
print(p)
# Person(name='Bob', address=Address(city='NYC', country='US'))

3. `@dataclass` Decorator Options (装饰器配置)

@dataclass(frozen=True, order=True, eq=True, repr=True, unsafe_hash=False, slots=True)
class Config:
    x: int
    y: int

Option Reference Table

Option	Default	Effect
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">init</code>	`True`	Generate `__init__`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">repr</code>	`True`	Generate `__repr__`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">eq</code>	`True`	Generate `__eq__` based on field values
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">order</code>	`False`	Generate `__lt__`, `__le__`, `__gt__`, `__ge__`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">frozen</code>	`False`	Make instances <strong>immutable (不可变)</strong> — raises `FrozenInstanceError` on assignment
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">unsafe_hash</code>	`False`	Force generate `__hash__` even if `eq=True` (use with care)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">slots</code>	`False`	Use `__slots__` for faster attribute access and lower memory (3.10+)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">kw_only</code>	`False`	All fields must be passed as keyword arguments (3.10+)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">match_args</code>	`True`	Generate `__match_args__` for structural pattern matching (3.10+)

1) `frozen=True` — Immutable Dataclass

@dataclass(frozen=True)
class ImmutablePoint:
    x: float
    y: float

p = ImmutablePoint(x=1.0, y=2.0)
p.x = 99.0   # ❌ FrozenInstanceError: cannot assign to field 'x'

# Frozen dataclasses are hashable and can be used as dict keys
d = {p: "origin"}

Scenario: Configuration objects (配置对象), cache keys, value objects (值对象) that must never be mutated.

2) `order=True` — Sortable Dataclasses

@dataclass(order=True)
class Version:
    major: int
    minor: int
    patch: int

versions = [Version(1, 2, 0), Version(1, 0, 5), Version(2, 0, 0)]
print(sorted(versions))
# [Version(major=1, minor=0, patch=5), Version(major=1, minor=2, patch=0), Version(major=2, minor=0, patch=0)]

Scenario: Sorting records, priority queues, range-based comparisons.

3) `slots=True` — Memory-efficient Dataclass (Python 3.10+)

@dataclass(slots=True)
class FastPoint:
    x: float
    y: float

# __slots__ prevents arbitrary attribute addition and speeds up attribute access
p = FastPoint(1.0, 2.0)
p.z = 3.0   # ❌ AttributeError: 'FastPoint' object has no attribute 'z'

Scenario: Creating millions of small instances (数百万小对象) — data pipelines, geometry, particle simulations.

4) `kw_only=True` — Keyword-only Fields (Python 3.10+)

@dataclass(kw_only=True)
class Request:
    url: str
    method: str = "GET"
    timeout: float = 30.0

r = Request(url="https://api.example.com")
# r = Request("https://api.example.com")  ❌ TypeError

Use KW_ONLY sentinel to make only some fields keyword-only:

from dataclasses import KW_ONLY

@dataclass
class Mixed:
    x: int
    y: int
    _: KW_ONLY          # Everything after this is keyword-only
    label: str = ""
    weight: float = 1.0

m = Mixed(1, 2, label="point")   # x and y are positional, label is kw-only

4. `__post_init__` — Post-initialization Hook (初始化后钩子)

Runs automatically after __init__ completes. Use it for validation, derived fields, or type coercion.

1) Validation

@dataclass
class Temperature:
    celsius: float

    def __post_init__(self):
        if self.celsius < -273.15:
            raise ValueError(f"Temperature {self.celsius}°C is below absolute zero!")

t = Temperature(-300)   # ❌ ValueError

2) Derived Fields with `field(init=False)`

import math

@dataclass
class Circle:
    radius: float
    area: float = field(init=False)        # Not in __init__
    circumference: float = field(init=False)

    def __post_init__(self):
        self.area = math.pi * self.radius ** 2
        self.circumference = 2 * math.pi * self.radius

c = Circle(radius=5.0)
print(c.area)           # 78.539...
print(c.circumference)  # 31.415...

3) Type Coercion

@dataclass
class Coordinate:
    lat: float
    lon: float

    def __post_init__(self):
        # Auto-convert strings to float
        self.lat = float(self.lat)
        self.lon = float(self.lon)

coord = Coordinate(lat="51.5", lon="-0.1")
print(coord.lat, type(coord.lat))   # 51.5 <class 'float'>

4) `InitVar` — Init-only Parameters (仅初始化参数)

Fields that exist in __init__ but are NOT stored as instance attributes:

from dataclasses import dataclass, field, InitVar

@dataclass
class HashedPassword:
    username: str
    raw_password: InitVar[str]           # Passed to __init__ but not stored
    password_hash: str = field(init=False)

    def __post_init__(self, raw_password: str):
        import hashlib
        self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()

u = HashedPassword(username="alice", raw_password="secret123")
print(u.password_hash)    # sha256 hash
# print(u.raw_password)   # ❌ AttributeError — not stored

5. Utility Functions (工具函数)

1) `asdict()` — Convert to Dictionary

from dataclasses import asdict

@dataclass
class Point:
    x: float
    y: float

p = Point(1.0, 2.0)
print(asdict(p))   # {'x': 1.0, 'y': 2.0}

Works recursively on nested dataclasses:

person = Person(name="Bob", address=Address(city="NYC", country="US"))
print(asdict(person))
# {'name': 'Bob', 'address': {'city': 'NYC', 'country': 'US'}}

2) `astuple()` — Convert to Tuple

from dataclasses import astuple

print(astuple(p))   # (1.0, 2.0)

3) `replace()` — Copy with Changes (不可变更新)

Creates a new instance with specified fields replaced — the original is unchanged:

from dataclasses import replace

p = Point(x=1.0, y=2.0)
p2 = replace(p, x=99.0)
print(p2)   # Point(x=99.0, y=2.0)
print(p)    # Point(x=1.0, y=2.0)  ← original unchanged

Scenario: Immutable update patterns — building modified configurations, functional state updates.

4) `fields()` — Inspect Field Definitions

from dataclasses import fields

for f in fields(Point):
    print(f.name, f.type, f.default)
# x  float  MISSING
# y  float  MISSING

Scenario: Writing generic serializers, validators, or introspection utilities (内省工具).

5) `is_dataclass()` — Runtime Type Check

from dataclasses import is_dataclass

print(is_dataclass(Point))      # True  (class)
print(is_dataclass(Point(1,2))) # True  (instance)
print(is_dataclass(int))        # False

6. Inheritance (继承)

@dataclass
class Animal:
    name: str
    age: int

@dataclass
class Dog(Animal):
    breed: str
    trained: bool = False

d = Dog(name="Rex", age=3, breed="Husky")
print(d)   # Dog(name='Rex', age=3, breed='Husky', trained=False)

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">If a parent class has a field with a default value, the child class cannot add fields <em>without</em> a default — this violates the "defaults must come last" rule and raises a <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">TypeError</code>.</span> Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">kw_only=True</code> on the child to avoid this.</div>

@dataclass
class Base:
    x: int = 0     # Has default

@dataclass(kw_only=True)
class Child(Base):
    y: int         # No default — OK because kw_only avoids ordering conflict

7. Hashing & Usage as Dict Keys (哈希与字典键)

# eq=True + frozen=True → hashable
@dataclass(frozen=True)
class Point:
    x: float
    y: float

p = Point(1.0, 2.0)
cache = {p: "result"}   # ✅ Can be used as dict key or in set

# eq=True + frozen=False (default) → NOT hashable
@dataclass
class MutablePoint:
    x: float
    y: float

mp = MutablePoint(1.0, 2.0)
# {mp: "x"}  ❌ TypeError: unhashable type

`eq`	`frozen`	`__hash__`
`False`	`False`	Inherited from `object` (id-based)
`True`	`False`	Set to `None` — <span style="color:#C0392B;font-weight:600">unhashable</span>
`True`	`True`	Generated — <span style="color:#E8600A;font-weight:700">hashable</span> ✅
`True`	`False` + `unsafe_hash=True`	Force-generated — use with caution

8. Pattern Matching with Dataclasses (Python 3.10+)

@dataclass
class Point:
    x: float
    y: float

def describe(p):
    match p:
        case Point(x=0, y=0):
            return "Origin"
        case Point(x=0, y=y):
            return f"On Y-axis at {y}"
        case Point(x=x, y=0):
            return f"On X-axis at {x}"
        case Point(x=x, y=y):
            return f"Point at ({x}, {y})"

print(describe(Point(0, 5)))     # On Y-axis at 5
print(describe(Point(3, 4)))     # Point at (3, 4)

9. Comparison: `dataclasses` vs Alternatives

Feature	`dataclasses`	`attrs`	`msgspec.Struct`	`pydantic`
Stdlib	✅ Yes	❌	❌	❌
Auto `__init__`	✅	✅	✅	✅
Validation	❌ Manual	✅ (validators)	✅ Built-in	✅ Built-in
Serialization	❌ Manual	❌ Manual	✅ JSON/MsgPack	✅ JSON
Performance	⚡ Fast	⚡ Fast	⚡⚡ Fastest	🐢→⚡ (v1→v2)
Frozen support	✅	✅	✅	✅
`__slots__`	✅ (3.10+)	✅	✅ (C-level)	❌
Inheritance	✅	✅	✅ (limited)	✅
Ecosystem fit	Standard Python	Power users	High-perf I/O	Web APIs

10. Real-World Scenarios (实战场景)

1) Configuration Object

from dataclasses import dataclass, field

@dataclass(frozen=True)
class AppConfig:
    host: str = "0.0.0.0"
    port: int = 8080
    debug: bool = False
    allowed_origins: tuple[str, ...] = ("*",)

config = AppConfig(port=9000, debug=True)
print(config)
# AppConfig(host='0.0.0.0', port=9000, debug=True, allowed_origins=('*',))

2) Data Pipeline Record

from dataclasses import dataclass, field
from datetime import datetime

@dataclass(slots=True)
class LogRecord:
    timestamp: datetime
    level: str
    message: str
    tags: list[str] = field(default_factory=list)

records = [LogRecord(datetime.now(), "INFO", f"event {i}") for i in range(1_000_000)]

3) API Request / Response Model

import json
from dataclasses import dataclass, asdict, field
from typing import Optional

@dataclass
class CreateUserRequest:
    username: str
    email: str
    age: Optional[int] = None

@dataclass
class UserResponse:
    id: int
    username: str
    email: str

req = CreateUserRequest(username="alice", email="alice@example.com")
resp = UserResponse(id=42, username=req.username, email=req.email)
print(json.dumps(asdict(resp)))
# {"id": 42, "username": "alice", "email": "alice@example.com"}

4) State Machine Node

from dataclasses import dataclass, replace
from typing import Literal

@dataclass(frozen=True)
class JobState:
    job_id: str
    status: Literal["pending", "running", "done", "failed"]
    retries: int = 0

# Immutable state transitions
initial = JobState(job_id="abc", status="pending")
running = replace(initial, status="running")
failed  = replace(running, status="failed", retries=running.retries + 1)
retry   = replace(failed,  status="running", retries=failed.retries)

print(initial)   # JobState(job_id='abc', status='pending', retries=0)
print(retry)     # JobState(job_id='abc', status='running', retries=1)

python msgspec.struct

Sun, 08 Mar 2026 00:00:00 GMT

I. `msgspec.Struct` — High-Performance Typed Data Structures

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">msgspec.Struct</code> is a <strong>high-performance (高性能)</strong>, <strong>type-safe (类型安全)</strong> data class alternative from the <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">msgspec</code> library. It is designed as a faster, leaner replacement for Python <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">dataclasses</code>, <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">attrs</code>, and Pydantic models — with native support for <strong>JSON / MessagePack serialization (序列化)</strong> and <strong>validation (验证)</strong> baked in at the C level. </div>

1. Installation & Import

pip install msgspec
import msgspec
from msgspec import Struct, field

2. Defining a Struct (定义结构体)

1) Basic Definition

from msgspec import Struct

class Point(Struct):
    x: float
    y: float

p = Point(x=1.0, y=2.0)
print(p)        # Point(x=1.0, y=2.0)
print(p.x)      # 1.0

2) Fields with Default Values (默认值)

class User(Struct):
    name: str
    age: int = 0
    active: bool = True

u = User(name="Alice")
print(u)   # User(name='Alice', age=0, active=True)

3) `field()` — Advanced Field Configuration

from msgspec import Struct, field

class Config(Struct):
    tags: list[str] = field(default_factory=list)   # Mutable default
    name: str = field(default="unnamed")

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">Never use a mutable object (list, dict) directly as a default value.</span> Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">field(default_factory=list)</code> instead — just like with Python dataclasses.</div>

4) Nested Structs (嵌套结构体)

class Address(Struct):
    city: str
    country: str

class Person(Struct):
    name: str
    address: Address

p = Person(name="Bob", address=Address(city="NYC", country="US"))
print(p.address.city)   # NYC

3. Struct Configuration Options (结构体配置)

Pass options to the class definition via keyword arguments:

class MyStruct(Struct, frozen=True, order=True, eq=True, kw_only=True):
    x: int
    y: int

1) Option Reference Table

Option	Default	Effect
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">frozen=True</code>	`False`	Makes the struct <strong>immutable (不可变)</strong> — fields cannot be reassigned after creation
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">order=True</code>	`False`	Enables `<`, `>`, `<=`, `>=` comparison operators
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">eq=True</code>	`True`	Enables `==` / `!=` based on field values
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">kw_only=True</code>	`False`	All fields must be passed as keyword arguments
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">array_like=True</code>	`False`	Serializes as a JSON array `[...]` instead of object `{...}`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">gc=False</code>	`True`	Disables garbage collector tracking — faster for structs with no reference cycles
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">weakref=True</code>	`False`	Enables weak references to the struct instance
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">rename</code>	`None`	Rename fields during (de)serialization — e.g., `rename="camel"`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">tag</code>	`None`	Adds a type tag for <strong>tagged unions (标签联合)</strong>
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">tag_field</code>	`"type"`	The field name used to store the tag value

2) `frozen=True` — Immutable Struct

class ImmutablePoint(Struct, frozen=True):
    x: float
    y: float

p = ImmutablePoint(x=1.0, y=2.0)
p.x = 99.0   # ❌ TypeError: immutable type

Scenario: Configuration objects, cache keys, value objects (值对象) that should never change.

3) `order=True` — Sortable Structs

class Version(Struct, order=True):
    major: int
    minor: int
    patch: int

versions = [Version(1, 2, 0), Version(1, 0, 5), Version(2, 0, 0)]
print(sorted(versions))
# [Version(1,0,5), Version(1,2,0), Version(2,0,0)]

Scenario: Sorting records, priority queues, range comparisons.

4) `rename="camel"` — Field Name Mapping

class ApiResponse(Struct, rename="camel"):
    user_name: str
    created_at: str

import msgspec
obj = ApiResponse(user_name="Alice", created_at="2025-01-01")
print(msgspec.json.encode(obj))
# b'{"userName":"Alice","createdAt":"2025-01-01"}'

`rename` value	Effect
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">"camel"</code>	`user_name` → `userName`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">"pascal"</code>	`user_name` → `UserName`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">"lower"</code>	`userName` → `username`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">dict</code>	Explicit per-field mapping

Scenario: Interoperating with REST APIs that use camelCase JSON keys.

4. Serialization & Deserialization (序列化与反序列化)

1) JSON Encoding

import msgspec

class Order(Struct):
    id: int
    item: str
    price: float

order = Order(id=1, item="book", price=9.99)

# Encode to JSON bytes
data = msgspec.json.encode(order)
print(data)   # b'{"id":1,"item":"book","price":9.99}'

2) JSON Decoding with Type Validation

# Decode + validate in one step
order2 = msgspec.json.decode(data, type=Order)
print(order2)          # Order(id=1, item='book', price=9.99)
print(order2 == order) # True

3) MessagePack Encoding (二进制序列化)

# Encode to binary MessagePack
binary = msgspec.msgpack.encode(order)
print(binary)   # b'\x83\xa2id\x01\xa4item\xa4book\xa5price\xcb@#\xeb...'

# Decode from binary
order3 = msgspec.msgpack.decode(binary, type=Order)

Format	Function	Output
JSON	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">msgspec.json.encode/decode</code>	Human-readable bytes
MessagePack	<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">msgspec.msgpack.encode/decode</code>	Compact binary

4) `array_like=True` — Array Serialization

class Point(Struct, array_like=True):
    x: float
    y: float

p = Point(1.0, 2.0)
print(msgspec.json.encode(p))   # b'[1.0,2.0]'

Scenario: Compact serialization for large volumes of records (matrices, time series, coordinate data).

5) Handling Validation Errors

bad_json = b'{"id": "not-a-number", "item": "book", "price": 9.99}'

try:
    msgspec.json.decode(bad_json, type=Order)
except msgspec.ValidationError as e:
    print(e)
    # Expected `int`, got `str` - at `$.id`

5. Type Annotations & Supported Types (类型注解)

1) Built-in Types

class Example(Struct):
    a: int
    b: float
    c: str
    d: bool
    e: bytes
    f: None

2) Collections (集合类型)

from typing import Optional

class Collections(Struct):
    items: list[str]
    mapping: dict[str, int]
    pair: tuple[int, str]
    unique: set[int]
    maybe: Optional[str] = None        # str | None

3) `Optional` and `Union` Types

from typing import Union

class Response(Struct):
    data: Union[str, int, None]        # Can be str, int, or None
    error: str | None = None           # Python 3.10+ shorthand

4) `Literal` Types — Constrained Values (约束值)

from typing import Literal

class Status(Struct):
    state: Literal["pending", "running", "done", "failed"]

s = Status(state="running")
msgspec.json.decode(b'{"state":"invalid"}', type=Status)
# ❌ ValidationError: Expected one of 'pending', 'running', 'done', 'failed'

Scenario: Enforcing valid enum-like values without a full Enum class.

5) `datetime`, `UUID`, `Decimal`

from datetime import datetime
from uuid import UUID
from decimal import Decimal

class Event(Struct):
    id: UUID
    timestamp: datetime
    amount: Decimal

6. Tagged Unions (标签联合) — Polymorphic Types

1) Defining a Tagged Union

Use <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">tag=True</code> (or a custom tag string) to enable discriminated unions (判别联合):

from typing import Union

class Cat(Struct, tag=True):
    name: str
    indoor: bool

class Dog(Struct, tag=True):
    name: str
    breed: str

Animal = Union[Cat, Dog]

When serialized, a "type" field is added automatically:

cat = Cat(name="Whiskers", indoor=True)
print(msgspec.json.encode(cat))
# b'{"type":"Cat","name":"Whiskers","indoor":true}'

dog = Dog(name="Rex", breed="Husky")
print(msgspec.json.encode(dog))
# b'{"type":"Dog","name":"Rex","breed":"Husky"}'

2) Decoding a Tagged Union

data = b'{"type":"Dog","name":"Rex","breed":"Husky"}'
animal = msgspec.json.decode(data, type=Animal)
print(type(animal))   # <class 'Dog'>
print(animal.breed)   # Husky

Scenario: Event systems, polymorphic API responses, command/event patterns where the same endpoint can return different shapes.

3) Custom Tag Values

class Circle(Struct, tag="circle"):
    radius: float

class Rectangle(Struct, tag="rect"):
    width: float
    height: float

Shape = Union[Circle, Rectangle]

c = Circle(radius=5.0)
print(msgspec.json.encode(c))
# b'{"type":"circle","radius":5.0}'

7. Utility Methods (工具方法)

1) `msgspec.structs.asdict()` — Convert to Dict

from msgspec import structs

p = Point(x=1.0, y=2.0)
d = structs.asdict(p)
print(d)   # {'x': 1.0, 'y': 2.0}

2) `msgspec.structs.astuple()` — Convert to Tuple

t = structs.astuple(p)
print(t)   # (1.0, 2.0)

3) `msgspec.structs.replace()` — Copy with Changes

Like dataclasses.replace() — creates a new instance with some fields updated:

p2 = structs.replace(p, x=99.0)
print(p2)   # Point(x=99.0, y=2.0)
print(p)    # Point(x=1.0, y=2.0)  ← original unchanged

Scenario: Immutable update patterns (不可变更新模式) — create a modified copy without mutating the original.

4) `msgspec.structs.fields()` — Inspect Field Definitions

for f in structs.fields(Point):
    print(f.name, f.type, f.default)
# x  <class 'float'>  NODEFAULT
# y  <class 'float'>  NODEFAULT

Scenario: Writing generic serializers, validators, or introspection tools.

8. Inheritance (继承)

class Animal(Struct):
    name: str
    age: int

class Dog(Animal):
    breed: str

d = Dog(name="Rex", age=3, breed="Husky")
print(d)   # Dog(name='Rex', age=3, breed='Husky')

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">A child Struct cannot override a field defined in the parent.</span> Fields defined in the parent always come first in the constructor signature.</div>

9. Performance Comparison (性能对比)

Library	Construct	JSON Encode	JSON Decode+Validate
<span style="color:#E8600A;font-weight:700">msgspec.Struct</span>	⚡ Fastest	⚡ Fastest	⚡ Fastest
`dataclasses`	Fast	Needs `json.dumps`	No validation
`attrs`	Fast	Needs extra lib	No validation
`pydantic v2`	Medium	Fast	Fast (Rust core)
`pydantic v1`	Slow	Slow	Slow

10. Real-World Scenarios (实战场景)

1) FastAPI / HTTP API Request/Response Models

from msgspec import Struct
import msgspec

class CreateUserRequest(Struct):
    username: str
    email: str
    age: int | None = None

class UserResponse(Struct):
    id: int
    username: str
    email: str

# Decoding incoming JSON body
body = b'{"username":"alice","email":"alice@example.com"}'
req = msgspec.json.decode(body, type=CreateUserRequest)

# Encoding outgoing response
resp = UserResponse(id=42, username=req.username, email=req.email)
print(msgspec.json.encode(resp))
# b'{"id":42,"username":"alice","email":"alice@example.com"}'

2) Config File Parsing with Validation

import msgspec
from msgspec import Struct
from typing import Literal

class ServerConfig(Struct):
    host: str = "0.0.0.0"
    port: int = 8080
    mode: Literal["debug", "production"] = "production"
    workers: int = 4

config_json = b'{"host":"127.0.0.1","port":9000,"mode":"debug"}'
config = msgspec.json.decode(config_json, type=ServerConfig)
print(config.mode)   # debug

3) High-throughput MessagePack Messaging (e.g., vLLM, message queues)

class InferenceRequest(Struct):
    request_id: str
    prompt: str
    max_tokens: int = 512
    temperature: float = 1.0

class InferenceResponse(Struct):
    request_id: str
    output: str
    finish_reason: Literal["stop", "length", "error"]

# Fast binary serialization for IPC / queue transport
req = InferenceRequest(request_id="req-001", prompt="Hello!")
binary = msgspec.msgpack.encode(req)

resp_data = msgspec.msgpack.decode(binary, type=InferenceRequest)

4) Event / Command Pattern with Tagged Unions

from typing import Union
from msgspec import Struct
import msgspec

class StartJob(Struct, tag=True):
    job_id: str
    config: dict

class StopJob(Struct, tag=True):
    job_id: str
    reason: str

Command = Union[StartJob, StopJob]

# Dispatcher
def handle(data: bytes):
    cmd = msgspec.json.decode(data, type=Command)
    if isinstance(cmd, StartJob):
        print(f"Starting job {cmd.job_id}")
    elif isinstance(cmd, StopJob):
        print(f"Stopping job {cmd.job_id}: {cmd.reason}")

handle(b'{"type":"StartJob","job_id":"abc","config":{}}')
# Starting job abc

vllm contributor

Sun, 08 Mar 2026 00:00:00 GMT

I. Contributing to vLLM — Development Guide

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> This document covers the complete workflow for contributing to vLLM, including environment setup, two installation paths (Python-only vs. CUDA/C++ compilation), linting, documentation preview, test execution, and PR submission guidelines. Whether you are contributing for the first time or working on daily development, this guide serves as a handy reference. </div>

1. Contributing to vLLM

Ways to contribute include:

Reporting bugs / opening issues
Adding support for new models
Implementing new features
Improving documentation
Helping others, reviewing PRs
Starring the repo, writing articles — these count too

2. Developing

1) Step 1: Clone the Repository

git clone https://github.com/vllm-project/vllm.git
cd vllm

2) Step 2: Create a Python Environment (Recommended: uv)

uv venv --python 3.12 --seed
source .venv/bin/activate

If you don't have uv, install it first:

curl -LsSf https://astral.sh/uv/install.sh | sh

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Why Python 3.12? Because vLLM's CI (official automated tests) primarily uses 3.12. Using the same version prevents situations where tests pass locally but fail in CI.</div>

To delete the virtual environment:

rm -rf .venv
uv cache clean

3. Installing vLLM (Two Paths)

1) Path A: Python-only Changes (Fastest, Recommended)

VLLM_USE_PRECOMPILED=1 uv pip install -e .

What this means:

Installs in <span style="color:#E8600A;font-weight:700">Editable Mode</span> (-e) — changes to source files take effect immediately
Does not compile C++/CUDA locally
Downloads pre-compiled binaries from the corresponding pre-built wheel

👉 Advantage: Very fast, suitable for the majority of PRs.

2) Path B: CUDA/C++ Changes (Requires Local Compilation)

If you previously ran Path A, first force-remove the installed vllm Python package:

uv pip uninstall vllm

Install PyTorch (cu129):

uv pip install torch torchvision torchaudio \
  --extra-index-url https://download.pytorch.org/whl/cu129

Install the current project in Editable Mode:

CCACHE_NOHASHDIR="true" uv pip install --no-build-isolation -e . -v
CCACHE_NOHASHDIR="true" uv pip install -e . -v

Common Error: `ImportError: undefined symbol`

<span style="color:#C0392B;font-weight:600">If you encounter the following error:</span>

(vllm) [xli49@ghpc008 vllm]$ python examples/offline_inference/basic/basic.py
Traceback (most recent call last):
  ...
  File "/data/home/xli49/vllm/vllm/platforms/cuda.py", line 16, in <module>
    import vllm._C  # noqa
    ^^^^^^^^^^^^^^
ImportError: /data/home/xli49/vllm/vllm/_C.abi3.so: undefined symbol: _ZN3c104cuda9SetDeviceEa

The cause is a mismatch between the torch ABI used at compile time and the torch version at runtime. Ensure you use --no-build-isolation and recompile with the correct CUDA version:

uv pip install -e . --no-build-isolation

Why Does vLLM Require `--no-build-isolation`?

Because compiling vLLM's C++/CUDA extensions depends heavily on:

The torch installed in your current environment
The matching CUDA version (cu129/cu128, etc.)
Other compilation-related packages

Without this flag, the build system uses an isolated temporary environment, which may result in:

A mismatched torch being installed in the temporary environment
The current torch's CUDA configuration not being found
Compilation failures or incompatible binaries being generated

4. Linting (Code Style & Formatting)

vLLM uses <span style="color:#E8600A;font-weight:700">pre-commit</span> to enforce a unified code style.

<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">uv pip install pre-commit</code>: installs the pre-commit tool
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">pre-commit install</code>: installs hooks into .git/hooks/ so that checks run automatically on every git commit

1) Install and Enable

uv pip install pre-commit
pre-commit install

From now on, every git commit will automatically run the checks ✅

2) Run Manually

pre-commit run      # Check only staged files
pre-commit run -a   # Check all files (= --all-files)

3) CI-only Hooks (Trigger Locally on Demand)

pre-commit run --hook-stage manual markdownlint
pre-commit run --hook-stage manual mypy-3.10

5. Documentation

vLLM's docs are built with <span style="color:#E8600A;font-weight:700">MkDocs</span>.

1) Install Documentation Dependencies

uv pip install -r requirements/docs.txt

2) Preview the Docs Site Locally

mkdocs serve

3) Faster Preview (Skip API Reference Generation)

Controls whether the API Reference is generated.

API_AUTONAV_EXCLUDE=vllm mkdocs serve

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> Ensure your Python version is compatible with the plugins. For example, <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">mkdocs-awesome-nav</code> requires Python 3.10+.</div>

4) Forward the Port from a Remote Server

<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">-L</code> = Local port forwarding: maps a port on the remote machine to a port on your local machine.

ssh -L 8000:127.0.0.1:8000 xli49@spiedie.binghamton.edu

5) Connect to a Remote GPU Node via Jump Host

<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">-J</code> = Jump host: connect to a target machine by hopping through an intermediate host first.

ssh -J xli49@spiedie.binghamton.edu -L 8000:127.0.0.1:8000 xli49@ghpc005

6. Testing

vLLM uses <span style="color:#E8600A;font-weight:700">pytest</span>.

1) Path A: Full CI-equivalent Setup (CUDA)

uv pip install -r requirements/common.txt -r requirements/dev.txt --torch-backend=auto
pytest tests/

2) Path B: Minimal Test Tooling Only

uv pip install pytest pytest-asyncio
pytest tests/

3) Run a Single Test File (Useful for Debugging)

pytest -s -v tests/test_logger.py

7. Common Errors

1) Missing `Python.h`

If you encounter the following error during compilation or dependency installation:

Python.h: No such file or directory

Fix on Ubuntu:

sudo apt install python3-dev

8. Important Warnings

<span style="color:#C0392B;font-weight:600">✅ The repository is not yet fully covered by mypy</span> — do not rely on mypy being fully green.

<span style="color:#C0392B;font-weight:600">⚠️ Not all tests pass on CPU</span> — without a GPU, many tests will fail locally. The official stance is: rely on CI for those tests.

9. PR Submission Guidelines

1) DCO Sign-off

Every commit must include a Signed-off-by line:

git commit -s -m "xxx"

2) PR Title Must Include a Category Prefix

Examples:

[Bugfix] ...
[Kernel] ...
[Core] ...
[Doc] ...
[CI/Build] ...

<span style="color:#C0392B;font-weight:600">PRs without a valid prefix may not be reviewed.</span>

Note Prompt

Sat, 07 Mar 2026 00:00:00 GMT

Prompt for Claude

学习一个东西,应该是他是做什么的什么?为什么需要学习他? 和其他的对比

Generate English study notes in Typora-compatible Markdown format with the following requirements: Please explain the content concisely. The example code can be executed independently.

## 1. Language
- All explanations written in **English**
- The explanation should be concise, in one sentence, easy to remember, and in an interview tone
- All **technical terms** must include a Chinese annotation in parentheses
- Example: `The Time Complexity (时间复杂度) is O(n).`

## 2. Heading Structure
- Level 1 heading: Roman numerals → `# I.`
- Level 2 heading: Arabic numerals → `## 1.`
- Level 3 heading: Parenthesis form → `### 1)`

## 3. Typora Rules
1. Use `$$ $$` for math formulas, not `\[ \]`
2. After the content of the level 1 and level 2 headings is completed, a blank line should be left.

## 6. Content
Paste your content:

RoPE

Wed, 04 Mar 2026 00:00:00 GMT

这个公式其实是数学中的**旋转矩阵（Rotation Matrix）在二维空间的应用。我们可以从极坐标（Polar Coordinates）**的角度来推导，这样最直观。

1. 准备工作：极坐标表示

假设一个点 $P(x, y)$ 在圆心为原点的圆上，它距离原点的距离（半径）为 $r$，与 $x$ 轴的正方向夹角为 $\alpha$。

根据三角函数，我们可以把它的坐标写成：

$x = r \cos\alpha$
$y = r \sin\alpha$

2. 进行旋转

现在，我们要把这个点逆时针旋转一个角度 $\theta$。旋转后的新点记为 $P'(x', y')$。

新点 $P'$ 的半径依然是 $r$，但它与 $x$ 轴的夹角变成了 $\alpha + \theta$。

所以新坐标可以表示为：

$x' = r \cos(\alpha + \theta)$
$y' = r \sin(\alpha + \theta)$

3. 使用三角恒等式展开

利用高中的三角函数加法公式：

$\cos(\alpha + \theta) = \cos\alpha \cos\theta - \sin\alpha \sin\theta$

$\sin(\alpha + \theta) = \sin\alpha \cos\theta + \cos\alpha \sin\theta$

我们将这两个展开式代入 $x'$ 和 $y'$ 的表达式中：

$$x' = r(\cos\alpha \cos\theta - \sin\alpha \sin\theta) = (r \cos\alpha) \cos\theta - (r \sin\alpha) \sin\theta$$

$$y' = r(\sin\alpha \cos\theta + \cos\alpha \sin\theta) = (r \sin\alpha) \cos\theta + (r \cos\alpha) \sin\theta$$

4. 代回原始坐标

记得第一步里的 $x = r \cos\alpha$ 和 $y = r \sin\alpha$ 吗？把它们替换进去，就得到了你看到的公式：

$x' = x \cos\theta - y \sin\theta$
$y' = y \cos\theta + x \sin\theta$ （整理一下顺序即 $x \sin\theta + y \cos\theta$）

线性代数视角（更进一步）

在机器学习和 Transformer 的论文（如 RoPE）中，通常会写成矩阵乘法的形式，看起来更简洁：

$$\begin{bmatrix} x' \ y' \end{bmatrix} = \begin{bmatrix} \cos\theta & -\sin\theta \ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x \ y \end{bmatrix}$$

为什么 RoPE 要用这个？

因为这种旋转变换是线性的，而且它有一个神奇的特性：它保持向量的长度（模长）不变，只改变方向。 这使得模型在处理位置信息时，不会因为位置太远而导致向量数值爆炸。

下面我给一个“能手算完”的玩具例子：5 个词、8 维（4 个 pair），分别算一遍不使用 RoPE 和使用 RoPE 的 $QK^\top$。我会把每一步都写出来（包括每个 pair 的旋转、每个距离的点积分解与求和），最后把 $5 \times 5$ 的分数矩阵写出来。

设定前提（为了让计算可控）

这些简化不影响你要理解的结论：

位置：5 个词的位置是 $p=0, 1, 2, 3, 4$
维度：$d=8 \Rightarrow 4$ 个二维 pair：Pair 1 (维 1-2), Pair 2 (维 3-4), Pair 3 (维 5-6), Pair 4 (维 7-8)
角速度：每个 pair 的“每移动 1 步的旋转角速度”取一个简单的值（玩具版）：

$$\omega_1=1, \quad \omega_2=0.5, \quad \omega_3=0.25, \quad \omega_4=0.125 \quad (\text{单位：弧度/位置})$$

(注：真实 RoPE 用的是 $\omega_i=\theta^{-2i/d}$ 这种形式，但计算流程完全一样。)
最关键的简化：所有 Token 的原始 $Q$ 和 $K$ 都取同一个向量：

$$v=[1, 0, 1, 0, 1, 0, 1, 0]$$

也就是每个 pair 都是 $[1, 0]$。这样 RoPE 后每个 pair 会变成 $[\cos(p\omega), \sin(p\omega)]$，点积能清晰地看到“只和距离 $\Delta=t-p$ 有关”。

1. 不使用 RoPE：直接算 $QK^\top$

1.1 先写出 $Q$ 和 $K$

5 个词都一样，所以：

$$Q = \begin{bmatrix} v \ v \ v \ v \ v \end{bmatrix}, \quad K = \begin{bmatrix} v \ v \ v \ v \ v \end{bmatrix}$$

其中 $v=[1, 0, 1, 0, 1, 0, 1, 0]$。

1.2 计算任意一个分数项 $(QK^\top)_{p,t} = Q_p \cdot K_t = v \cdot v$

把 8 维逐项相乘再求和：

$$v \cdot v = 1\cdot1 + 0\cdot0 + 1\cdot1 + 0\cdot0 + 1\cdot1 + 0\cdot0 + 1\cdot1 + 0\cdot0 = 4$$

1.3 得出完整的 $QK^\top$ 矩阵

整个 $QK^\top$ 是一个全 4 的矩阵：

$$QK^\top = \begin{bmatrix} 4 & 4 & 4 & 4 & 4 \ 4 & 4 & 4 & 4 & 4 \ 4 & 4 & 4 & 4 & 4 \ 4 & 4 & 4 & 4 & 4 \ 4 & 4 & 4 & 4 & 4 \end{bmatrix}$$

解释：不带位置时，注意力分数完全不知道谁离谁近，所有位置一视同仁。

2. 使用 RoPE：先旋转，再算 $Q'K'^\top$

2.1 RoPE 对每个 pair 怎么旋转

对任意位置 $p$ 和任意一个 pair（角速度 $\omega$），二维旋转结果是：

原始 pair 是 $[1, 0]$，旋转角为 $\theta=p\omega$，所以：

$$[1, 0] \xrightarrow{R(p)} [\cos(p\omega), \sin(p\omega)]$$

因此 token 在位置 $p$ 的 8 维向量（4 个 pair 拼起来）是：

$$q'_p = k'_p = [\cos(p\omega_1), \sin(p\omega_1), \cos(p\omega_2), \sin(p\omega_2), \cos(p\omega_3), \sin(p\omega_3), \cos(p\omega_4), \sin(p\omega_4)]$$

2.2 算出 5 个位置的 $q'_p$（四舍五入到 4 位小数）

位置 $p=0$：

$$q'_0 = [1.0000, 0.0000, 1.0000, 0.0000, 1.0000, 0.0000, 1.0000, 0.0000]$$
位置 $p=1$（角分别是 1, 0.5, 0.25, 0.125）：

$$q'_1 = [0.5403, 0.8415, 0.8776, 0.4794, 0.9689, 0.2474, 0.9922, 0.1247]$$
位置 $p=2$（角分别是 2, 1, 0.5, 0.25）：

$$q'_2 = [-0.4161, 0.9093, 0.5403, 0.8415, 0.8776, 0.4794, 0.9689, 0.2474]$$
位置 $p=3$（角分别是 3, 1.5, 0.75, 0.375）：

$$q'_3 = [-0.9900, 0.1411, 0.0707, 0.9975, 0.7317, 0.6816, 0.9305, 0.3663]$$
位置 $p=4$（角分别是 4, 2, 1, 0.5）：

$$q'_4 = [-0.6536, -0.7568, -0.4161, 0.9093, 0.5403, 0.8415, 0.8776, 0.4794]$$

(因为我们设定了 $Q=K$，所以 $k'_p=q'_p$)

3. 一步一步计算 $Q'K'^\top$

分数矩阵元素是：$(Q'K'^\top)_{p,t} = q'_p \cdot k'_t = q'_p \cdot q'_t$

3.1 先完整算一个具体例子：$(p=0, t=1)$

把 8 维逐项乘加。因为 $q'_0$ 的每个 pair 都是 $[1, 0]$，所以每个 pair 只取到了对方的 $\cos$ 分量：

$$q'_0 \cdot q'_1 = 0.5403 + 0.8776 + 0.9689 + 0.9922 \approx 3.3790$$

所以 $(Q'K'^\top)_{0,1} \approx 3.3790$。

3.2 再算一个“不是从全 1 取 $\cos$”的例子：$(p=1, t=2)$

这里必须把每个 pair 都做完：

Pair 1 ($\omega_1=1$)：

$$[0.5403, 0.8415] \cdot [-0.4161, 0.9093] = -0.2248 + 0.7652 \approx 0.5403$$

(理论值即 $\cos(1)$)
Pair 2 ($\omega_2=0.5$)：

$$[0.8776, 0.4794] \cdot [0.5403, 0.8415] = 0.4742 + 0.4034 \approx 0.8776$$
Pair 3 ($\omega_3=0.25$)：

$$[0.9689, 0.2474] \cdot [0.8776, 0.4794] = 0.8503 + 0.1186 \approx 0.9689$$
Pair 4 ($\omega_4=0.125$)：

$$[0.9922, 0.1247] \cdot [0.9689, 0.2474] = 0.9613 + 0.0309 \approx 0.9922$$

把 4 个 pair 加起来：

$$q'_1 \cdot q'_2 \approx 0.5403 + 0.8776 + 0.9689 + 0.9922 = 3.3790$$

你会发现：距离 $\Delta=1$ 的 $(1, 2)$ 得分与 $(0, 1)$ 完全一样！

4. 把“所有距离”都算出来，然后拼成 $5 \times 5$ 矩阵

由于每个 pair 满足三角恒等式：

$$\cos(p\omega)\cos(t\omega) + \sin(p\omega)\sin(t\omega) = \cos((t-p)\omega)$$

总分数公式直接化简为：

$$\text{Score}(p,t) = \sum_{i=1}^4 \cos((t-p)\omega_i)$$

接下来我们对每个距离 $\Delta = |t-p|$ 进行求和：

$\Delta=0$：$S_0 = 1 + 1 + 1 + 1 = 4.0000$
$\Delta=1$：$S_1 = 0.5403 + 0.8776 + 0.9689 + 0.9922 = 3.3790$
$\Delta=2$：$S_2 = -0.4161 + 0.5403 + 0.8776 + 0.9689 = 1.9707$
$\Delta=3$：$S_3 = -0.9900 + 0.0707 + 0.7317 + 0.9305 = 0.7429$
$\Delta=4$：$S_4 = -0.6536 - 0.4161 + 0.5403 + 0.8776 = 0.3481$

最终拼出的矩阵（每条对角线相同，即 Toeplitz 矩阵）：

$$Q'K'^\top = \begin{bmatrix} S_0 & S_1 & S_2 & S_3 & S_4 \ S_1 & S_0 & S_1 & S_2 & S_3 \ S_2 & S_1 & S_0 & S_1 & S_2 \ S_3 & S_2 & S_1 & S_0 & S_1 \ S_4 & S_3 & S_2 & S_1 & S_0 \end{bmatrix} \approx \begin{bmatrix} 4.0000 & 3.3790 & 1.9707 & 0.7429 & 0.3481 \ 3.3790 & 4.0000 & 3.3790 & 1.9707 & 0.7429 \ 1.9707 & 3.3790 & 4.0000 & 3.3790 & 1.9707 \ 0.7429 & 1.9707 & 3.3790 & 4.0000 & 3.3790 \ 0.3481 & 0.7429 & 1.9707 & 3.3790 & 4.0000 \end{bmatrix}$$

5. 你要的“距离长短”到底在哪里

对比两种结果：

不使用 RoPE：所有位置之间分数都一样（全是 4.0000），模型从 $QK^\top$ 里完全看不出距离。
使用 RoPE：分数随距离 $|t-p|$ 的增大而稳步下降（4.0000 $\rightarrow$ 3.3790 $\rightarrow$ 1.9707 $\rightarrow$ 0.7429 $\rightarrow$ 0.3481），距离信息就完美体现在了最终的 $Q'K'^\top$ 数值衰减里。

uvloop

Tue, 03 Mar 2026 00:00:00 GMT

uvloop

Tuple

Sun, 01 Mar 2026 00:00:00 GMT

📌 元组（Tuple）

A tuple is an ordered and immutable collection of elements.

Features:

Ordered
Immutable (cannot be changed after creation)
Can contain different data types

Example:

t = (1, 2, 3)

Single-element tuple:

t = (5,)

Tuples are immutable:

t[0] = 10  # Error

🔹 Tuple vs List

	Tuple	List
符号	`()`	`[]`
是否可修改	❌ 不可改	✅ 可改
用途	固定数据	可变数据

ccache

Wed, 11 Feb 2026 00:00:00 GMT

I. ccache — Compiler Cache (编译缓存)

ccache wraps your compiler (gcc, g++, nvcc) and caches object files. Same source + same flags = instant replay, no recompilation.

1. How It Works (工作原理)

$$ \text{Cache Key (缓存键)} = \text{Hash}(\text{source content} + \text{flags} + \text{compiler version} + \text{headers}) $$

On a hit (命中): return cached .o file immediately. On a miss (未命中): compile normally, store result.

Two modes: Direct mode (直接模式) — fastest, hashes source directly. Preprocessed mode (预处理模式) — slower, more accurate, used as fallback.

2. Installation (无root安装)

# Download & link
wget https://github.com/ccache/ccache/releases/download/v4.10.2/ccache-4.10.2-linux-x86_64.tar.xz
tar xf ccache-4.10.2-linux-x86_64.tar.xz -C $HOME/local
ln -s $HOME/local/ccache-4.10.2-linux-x86_64/ccache $HOME/.local/bin/ccache
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc && source ~/.bashrc

3. Integration (接入构建系统)

# CMake projects
cmake -DCMAKE_C_COMPILER_LAUNCHER=ccache \
      -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
      -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache ..

# pip installs (e.g. vLLM)
export CMAKE_C_COMPILER_LAUNCHER=ccache
export CMAKE_CXX_COMPILER_LAUNCHER=ccache
export CMAKE_CUDA_COMPILER_LAUNCHER=ccache
pip install -e .

4. Reading Stats (读懂统计)

ccache -s

Field	Meaning
Hits = 0% on 1st build	Normal — cache is cold (冷缓存), next build hits ~100%
Hits = 0% on 2nd build	ccache not intercepting — check `CMAKE_*_LAUNCHER`
Cleanups: 240	Cache limit too small → run `ccache --max-size=20G`

5. HPC-Specific Tips (HPC注意事项)

# Avoid $HOME quota — move cache to scratch (临时文件系统)
export CCACHE_DIR=/scratch/$USER/.ccache

# Normalize absolute paths across nodes (跨节点路径归一化)
export CCACHE_BASEDIR=$HOME

# Increase size limit (扩大缓存上限)
ccache --max-size=20G

6. Quick Reference (命令速查)

ccache -s              # stats (统计)
ccache -z              # reset stats (重置)
ccache -C              # clear cache (清空)
CCACHE_DISABLE=1 make  # disable temporarily (临时禁用)

Linux&Slurm Common Command

Wed, 11 Feb 2026 00:00:00 GMT

I. HPC Resource Inspection Commands — Linux & SLURM

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> These are essential Linux and <strong>SLURM (Simple Linux Utility for Resource Management)</strong> commands used on HPC clusters to inspect <strong>memory, CPU, GPU resources, and job status</strong>. </div>

1. `lscpu` — CPU Hardware Info

lscpu

Displays <span style="color:#E8600A;font-weight:700">CPU hardware information</span>: core count, clock frequency, NUMA topology, cache sizes, and more.

2. `scontrol show node ... | grep -i gres` — Node GPU Resources

scontrol show node ghpc008 | grep -i gres

Shows the node's <span style="color:#E8600A;font-weight:700">GPU resources</span> — the number and type of GPUs allocated or available on that node.

3. `top` — Real-time Process Monitor

top

Displays a live, continuously refreshed view of <span style="color:#E8600A;font-weight:700">running processes, CPU usage, and memory usage</span>.

4. `scontrol show job $SLURM_JOB_ID` — Current Job Details

scontrol show job $SLURM_JOB_ID

Displays <span style="color:#E8600A;font-weight:700">detailed information about the current SLURM job</span>, including assigned CPUs, GPUs, memory allocation, target node, and runtime status.

Nvidia System Management Interface

Wed, 11 Feb 2026 00:00:00 GMT

I. `nvidia-smi` — NVIDIA System Management Interface

1. Show Overall GPU Status

nvidia-smi

Most commonly used to <span style="color:#E8600A;font-weight:700">quickly check whether GPUs are idle or busy</span>.

2. Monitor in Real Time

Refresh every second:

nvidia-smi -l 1

3. List All GPUs

nvidia-smi -L

4. Show Processes Using GPUs

<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">pmon</code> = <span style="color:#2980B9">process monitor</span>

nvidia-smi pmon

Refresh every 2 seconds:

nvidia-smi pmon -d 2

1) `pmon` Column Reference

Column	Full Name	Meaning
<span style="color:#E8600A;font-weight:700">pid</span>	Process ID	The Linux process ID using the GPU
type	Process type	GPU workload type: C = Compute (CUDA), G = Graphics, C+G = Both
<span style="color:#E8600A;font-weight:700">sm</span>	Streaming Multiprocessor utilization	Percentage of GPU compute cores being used by the process
<span style="color:#E8600A;font-weight:700">mem</span>	Memory controller utilization	Percentage of GPU memory bandwidth used by the process
enc	Encoder utilization	Usage of the NVENC video encoder
dec	Decoder utilization	Usage of the NVDEC video decoder
jpg	JPEG engine utilization	Usage of the hardware JPEG decoder/encoder
ofa	Optical Flow Accelerator utilization	Usage of the hardware optical-flow engine (video/vision tasks)
<span style="color:#E8600A;font-weight:700">fb</span>	Frame Buffer memory	Amount of GPU VRAM used by the process (in MB)
ccpm	Compute & Copy Engine / Protected Memory	Internal GPU engine / protection state info; often 0 on most systems

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Key columns to watch: </span> <strong>pid</strong> — shows which process is using the GPU. <strong>sm</strong> — indicates whether GPU cores are actively computing. <strong>fb</strong> — VRAM usage in MB; shows how much memory is consumed. <strong>mem</strong> — memory bandwidth utilization; indicates I/O pressure on GPU memory. </div>

2) Example Output

[xli49@ghpc008 ~]$ nvidia-smi pmon -i 0 -s um
# gpu         pid   type     sm    mem    enc    dec    jpg    ofa     fb   ccpm    command
# Idx           #    C/G      %      %      %      %      %      %     MB     MB    name
    0          -     -      -      -      -      -      -      -      -      -    -
    0          -     -      -      -      -      -      -      -      -      -    -

[xli49@ghpc008 ~]$ nvidia-smi pmon -i 0
# gpu         pid   type     sm    mem    enc    dec    jpg    ofa    command
# Idx           #    C/G      %      %      %      %      %      %    name
    0          -     -      -      -      -      -      -      -    -
    0          -     -      -      -      -      -      -      -    -

All dashes (-) indicate <span style="color:#2980B9">GPU 0 is currently idle</span> — no processes are running on it.

5. Custom Query of GPU Information

nvidia-smi --query-gpu=name,memory.used,utilization.gpu --format=csv

Commonly used for scripts, logging, and automated monitoring.

6. Log GPU Status to a File

nvidia-smi -l 5 -f gpu.log

Records GPU information every <span style="color:#E8600A;font-weight:700">5 seconds</span> and appends it to <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">gpu.log</code>.

Bash Shell Script

Sat, 07 Feb 2026 00:00:00 GMT

I. Bash Shell Script Fundamentals

<span style="color:#E8600A;font-weight:700">Bash Shell Script (Bash脚本)</span> is a scripting language executed by the <span style="color:#E8600A;font-weight:700">Unix Shell (Unix命令解释器)</span>. It is widely used for system automation, environment configuration, and workflow management on <span style="color:#E8600A;font-weight:700">Linux Systems (Linux系统)</span> and <span style="color:#E8600A;font-weight:700">High Performance Computing (高性能计算, HPC)</span> environments.

Unlike compiled languages, Bash follows an <span style="color:#E8600A;font-weight:700">Interpreter Model (解释执行模型)</span>, meaning commands are executed line by line by the shell.

</div>

<span style="color:#E8600A">1.</span> What is a Bash Script

1) Core Definition

A <span style="color:#E8600A;font-weight:700">Bash Script (Bash脚本)</span> is a text file containing a sequence of shell commands that are executed by the <span style="color:#E8600A;font-weight:700">Bash Interpreter (Bash解释器)</span>.

Its main purposes include:

1）Automating repetitive command execution 2）Managing <span style="color:#E8600A;font-weight:700">Environment Variables (环境变量)</span> 3）Controlling program flow using <span style="color:#E8600A;font-weight:700">Conditional Statements (条件语句)</span> and <span style="color:#E8600A;font-weight:700">Loops (循环)</span> 4）Serving as initialization scripts in Linux and HPC environments

<span style="color:#2980B9">Therefore</span>, Bash acts as a bridge between system commands and automated workflows.

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"> <span style="color:#E8600A;font-weight:700">Note: </span> Bash is a <span style="color:#E8600A;font-weight:700">Scripting Language (脚本语言)</span>, not a compiled language. Scripts are interpreted directly by the shell rather than compiled into machine code. </div>

<span style="color:#E8600A">2.</span> Script Execution Methods

There are three common ways to run a Bash script.

1) Direct Execution with Bash

bash script.sh

This launches a <span style="color:#E8600A;font-weight:700">Subshell (子Shell)</span> to execute the script.

2) Executable Script

First give execution permission:

chmod +x script.sh

Then run the script:

./script.sh

This uses the system's <span style="color:#E8600A;font-weight:700">Shebang (解释器声明)</span> if defined.

Example:

#!/bin/bash

3) Source Execution

source script.sh

. script.sh

This executes the script inside the <span style="color:#E8600A;font-weight:700">Current Shell (当前Shell)</span>.

Execution Behavior Comparison

Method	New Subshell (子Shell)	Variables Persist (变量保留)
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">bash script.sh</code>	Yes	No
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">./script.sh</code>	Yes	No
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">source script.sh</code>	No	Yes

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"> <span style="color:#E8600A;font-weight:700">Note: </span> The <span style="color:#E8600A;font-weight:700">source Command (source命令)</span> is commonly used when configuring shell environments such as <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.bashrc</code>, CUDA paths, or Conda environments. </div>

II. Basic Bash Syntax

<span style="color:#E8600A">1.</span> Variables and Environment Variables

1) Normal Variables

Variables are assigned without spaces around the equals sign.

name="Alice"
echo $name

Here:

<span style="color:#E8600A;font-weight:700">Variable Assignment (变量赋值)</span> defines the variable
<span style="color:#E8600A;font-weight:700">Variable Expansion (变量展开)</span> occurs using $

2) Environment Variables

Environment variables are exported using the <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">export</code> command.

export PATH=/usr/bin:$PATH

Properties:

1）Visible to <span style="color:#E8600A;font-weight:700">Child Processes (子进程)</span> 2）Frequently used for software configuration such as:

<span style="color:#E8600A;font-weight:700">PATH</span>
<span style="color:#E8600A;font-weight:700">CUDA</span>
<span style="color:#E8600A;font-weight:700">Conda</span>
<span style="color:#E8600A;font-weight:700">MPI</span>

<span style="color:#E8600A">2.</span> Conditional Statements

1) Basic `if` Structure

if condition; then
    statement
fi

Example:

if [ -f file.txt ]; then
    echo "file exists"
fi

Here the brackets use the <span style="color:#E8600A;font-weight:700">Test Command (测试命令)</span>.

2) Common File Test Operators

Operator	Meaning
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">-f</code>	regular file
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">-d</code>	directory
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">-e</code>	exists
<code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">-x</code>	executable

These belong to the <span style="color:#E8600A;font-weight:700">POSIX Test Syntax (POSIX测试语法)</span>.

<span style="color:#E8600A">3.</span> Loop Structures

1) `for` Loop

The most common loop used to iterate through lists or files.

for f in *.txt; do
    echo $f
done

Typical use cases:

File iteration
Command results
Parameter lists

2) `while` Loop

Often used for reading files line by line.

while read line; do
    echo $line
done < file.txt

This uses <span style="color:#E8600A;font-weight:700">Input Redirection (输入重定向)</span>.

III. Command Execution Model

<span style="color:#E8600A">1.</span> Subshell Execution

bash script.sh

Characteristics:

1）Creates a new <span style="color:#E8600A;font-weight:700">Process (进程)</span> 2）Variables do not affect the current shell

<span style="color:#E8600A">2.</span> Source Execution

source script.sh

. script.sh

Purpose:

Execute script content in the <span style="color:#E8600A;font-weight:700">Current Shell Environment (当前Shell环境)</span>.

Typical usage scenarios:

<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">.bashrc</code>
environment setup
CUDA configuration
HPC module initialization

IV. File and Path Operations

<span style="color:#E8600A">1.</span> File Testing

[ -d dir ]
[ -f file ]

These checks are part of the <span style="color:#E8600A;font-weight:700">POSIX Test System (POSIX测试系统)</span>.

<span style="color:#E8600A">2.</span> Path Expansion and Wildcards

Example:

~/.bashrc.d/*

Meaning:

Symbol	Meaning
`~`	user home directory
`*`	wildcard matching all files

This mechanism is called <span style="color:#E8600A;font-weight:700">Filename Expansion (文件名扩展)</span>.

V. Modular Configuration Example

A typical .bashrc uses modular configuration loading.

if [ -d ~/.bashrc.d ]; then
    for rc in ~/.bashrc.d/*; do
        if [ -f "$rc" ]; then
            . "$rc"
        fi
    done
fi

Explanation:

1）Check whether the directory exists 2）Iterate through each script file 3）Execute each script using <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">source</code>

This allows a <span style="color:#E8600A;font-weight:700">Modular Configuration System (模块化配置系统)</span>.

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"> <span style="color:#E8600A;font-weight:700">Note: </span> This design pattern is widely used in Linux distributions and <span style="color:#E8600A;font-weight:700">HPC Initialization Scripts (HPC初始化脚本)</span> to organize configuration files into reusable modules. </div>

<span style="color:#E8600A;font-weight:700">Bash Scripts (Bash脚本)</span> provide a powerful automation mechanism for Linux systems by combining command execution, environment configuration, and control flow using an interpreter-based shell environment.

</div>

Linux Tools

Sat, 07 Feb 2026 00:00:00 GMT

I. HPC Setup Without `sudo` — CUDA, CMake & `.bashrc` Configuration

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> This guide covers three tasks on HPC clusters where you have no root access: installing the <strong>CUDA Toolkit</strong> to your home directory without <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">sudo</code>; installing <strong>CMake</strong> locally from a pre-built binary; and structuring your <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">~/.bashrc</code> with a modular <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">~/.bashrc.d/</code> system for managing CUDA versions, libtorch, cuDNN, and custom paths. </div>

1. Installing CUDA Without `sudo`

CUDA only truly requires two things: the <span style="color:#E8600A;font-weight:700">nvcc compiler</span> and user-space libraries like <span style="color:#E8600A;font-weight:700">libcudart</span>. Both can be installed entirely inside your home directory:

$HOME/cuda

1) Check Your Linux Distribution and Architecture

cat /etc/os-release

NAME="Rocky Linux"
VERSION="9.7 (Blue Onyx)"
ID="rocky"
...

uname -m

x86_64

2) Download the Runfile from NVIDIA

Go to: https://developer.nvidia.com/cuda-downloads

Select your OS/architecture and choose the <span style="color:#E8600A;font-weight:700">runfile (local)</span> installer type.

The downloaded file will be named something like:

cuda_12.9.1_575.57.08_linux.run

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> To install an older CUDA version, visit the archive at <a href="https://developer.nvidia.com/cuda-toolkit-archive">https://developer.nvidia.com/cuda-toolkit-archive</a>. Make sure the CUDA version you choose is <strong>less than or equal to</strong> the CUDA Driver version reported by <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">nvidia-smi</code>.</div>

3) Install the Toolkit Only (No Driver)

Make the runfile executable:

chmod +x cuda_*.run

Run the installer with driver installation disabled:

./cuda_12.9*.run \
  --silent \
  --toolkit \
  --toolkitpath=$HOME/cuda-12.9 \
  --no-drm \
  --no-man-page

Flag	Purpose
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">--silent</code>	Non-interactive installation
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">--toolkit</code>	Install CUDA Toolkit only
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">--toolkitpath</code>	Target installation directory (user home)
No driver flag	Avoids any root requirement

4) Configure Environment Variables (Modular `.bashrc.d`)

Create the directory:

mkdir -p ~/.bashrc.d

Create a CUDA config file:

nano ~/.bashrc.d/cuda.sh

Write the following:

# ===== Default CUDA =====
export CUDA_HOME=$HOME/cuda-12.9
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

# ===== CUDA version switcher =====
use_cuda () {
    local ver=$1

    if [ ! -d "$HOME/cuda-$ver" ]; then
        echo "CUDA $ver not found in \$HOME"
        return 1
    fi

    export CUDA_HOME=$HOME/cuda-$ver
    export PATH=$CUDA_HOME/bin:$PATH
    export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

    echo "Switched to CUDA $ver"
    nvcc --version | head -n 1
}

Ensure ~/.bashrc loads all files in ~/.bashrc.d/ (add if not present):

if [ -d ~/.bashrc.d ]; then
    for rc in ~/.bashrc.d/*; do
        [ -f "$rc" ] && . "$rc"
    done
fi

Reload the environment:

source ~/.bashrc

5) Verify the Installation

Check the compiler:

nvcc -V

If a CUDA version string is printed, the toolkit is installed correctly.

Check GPU availability:

nvidia-smi

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> <span style="color:#C0392B;font-weight:600">This is a critical and often overlooked distinction.</span> If <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">nvidia-smi</code> runs successfully, the server already has a GPU driver installed and you can use the GPU. If it fails, the GPU driver is missing — without <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">sudo</code> you cannot install the driver yourself, meaning CUDA can only be used for compilation, not for GPU execution.</div>

2. Installing CMake Locally (No Root)

1) Download the Official Pre-built Installer

The official binary installer requires no source compilation, no gcc or make, installs quickly, and is compatible with most Linux environments.

cd ~
wget https://github.com/Kitware/CMake/releases/download/v3.29.6/cmake-3.29.6-linux-x86_64.sh

2) Install to Your User Directory

bash cmake-3.29.6-linux-x86_64.sh --skip-license --prefix=$HOME/.local

Flag	Purpose
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">--skip-license</code>	Skip the interactive license confirmation
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">--prefix=$HOME/.local</code>	Install into the user-level software directory (Linux convention)

3) Add to `PATH` and Verify

echo 'export PATH=$HOME/.local/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
cmake --version

3. `.bash_profile` — Auto-load `.bashrc` on Login

# .bash_profile

# Load aliases and functions from .bashrc
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

This ensures .bashrc is sourced automatically on every SSH login session.

4. `.bashrc` Section-by-Section Walkthrough

1) System Initialization

Loads the system-level bash configuration (modules, colors, completions, etc.):

if [ -f /etc/bashrc ]; then
    . /etc/bashrc
fi

2) User `PATH` Initialization

Adds user program directories to PATH without creating duplicates:

# Only add if not already present (prevents duplicate PATH entries)
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
    # Prepend user-level bin dirs so locally installed tools take priority
    PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi

# Export so child processes (Python, bash, etc.) inherit this PATH
export PATH

3) The `.bashrc.d` Modular Config System

Splits shell configuration into separate, focused files. <span style="color:#2980B9">Recommended when managing</span> multiple CUDA versions, multiple Conda environments, multi-project research, or many custom aliases.

# Load all user config modules from ~/.bashrc.d/
if [ -d ~/.bashrc.d ]; then
    for rc in ~/.bashrc.d/*; do
        # Only source regular files (not directories or other types)
        if [ -f "$rc" ]; then
            # Source the file — equivalent to: source "$rc"
            # Makes aliases, functions, and exports take effect immediately
            . "$rc"
        fi
    done
fi

4) PATH and Library Configuration

CUDA 12.6 Environment

# ===== CUDA 12.6 =====
export CUDA_HOME=$HOME/cuda-12.6
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Sets the default CUDA version to 12.6.

libtorch Headers and Library Paths

# ===== libtorch — PyTorch's official C++ API and runtime =====
export CPATH=$HOME/libtorch/include:$HOME/libtorch/include/torch/csrc/api/include:$CPATH
export LIBRARY_PATH=$HOME/libtorch/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH=$HOME/libtorch/lib:$LD_LIBRARY_PATH

Enables calling PyTorch from C++.

Use Case	Keep?
Writing C++/CUDA code with libtorch	✔ Required
Python-only PyTorch usage	❌ Can be removed

cuDNN Paths

Only needed for C++ builds, custom CUDA kernels, or TensorRT:

# ===== cuDNN =====
export CPATH=$HOME/cudnn/include:$CPATH
export LIBRARY_PATH=$HOME/cudnn/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH=$HOME/cudnn/lib:$LD_LIBRARY_PATH

CUTLASS

# ===== CUTLASS =====
export CUTLASS=$HOME/cutlass

Custom Command Paths

# ===== PATH: tells the shell where to find executables =====
export PATH=$HOME/.local/bin:$PATH
export PATH="$HOME/bin:$PATH"

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> On a no-root HPC cluster: install CUDA with <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">--toolkitpath=$HOME/cuda-X.Y --no-drm</code>, install CMake with <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">--prefix=$HOME/.local</code>, and keep your shell config clean by splitting everything into <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">~/.bashrc.d/</code> modules — remember that CUDA without a GPU driver can compile but not run.</div>

Custom Command

Thu, 05 Feb 2026 00:00:00 GMT

I. `sr` — Custom SLURM Interactive Job Launcher

1. Write the Script

Create a file named <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">sr</code> with the following content:

#!/bin/bash

# ===== Argument check =====
if [ -z "$1" ]; then
    echo "Usage:"
    echo "  sr <gpu_type> [node_id]"
    echo ""
    echo "Examples:"
    echo "  sr h100"
    echo "  sr h100 007"
    echo "  sr a100"
    exit 1
fi

GPU_TYPE="$1"
PARTITION="gpucompute-$GPU_TYPE"
NODE_ARG=""

# ===== Optional node targeting =====
if [ -n "$2" ]; then
    NODE_ARG="--nodelist=ghpc$2"
fi

# ===== Launch interactive session =====
srun \
  --gpus-per-node=1 \
  --cpus-per-gpu=4 \
  $NODE_ARG \
  --partition=$PARTITION \
  --time=12:00:00 \
  --pty /bin/bash

2. Install the Script

Move the script to <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">~/bin</code> and make it executable:

mkdir -p ~/bin
mv sr ~/bin/
chmod +x ~/bin/sr

Ensure <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">~/bin</code> is on your PATH:

echo 'export PATH="$HOME/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

3. Usage

sr h100          # Request any H100 node
sr h100 007      # Request H100 node ghpc007 specifically
sr a100          # Request any A100 node

Conda

Tue, 03 Feb 2026 00:00:00 GMT

I. Conda Cache Cleanup — Free Up Disk Space

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> Conda accumulates downloaded packages, extracted tarballs, and index caches over time. Cleaning these up frees disk space and reduces environment cruft — without touching any of your existing environments. </div>

1. Check Conda Cache Usage (Optional)

conda info
conda config --show pkgs_dirs
du -sh ~/miniconda3/pkgs 2>/dev/null

2. One-command Cache Cleanup (Recommended)

conda clean -a -y

Flag	Meaning
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">-a</code>	all — cleans every type of cache and leftover
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">-y</code>	yes — auto-confirms without prompting

This removes:

Downloaded .tar.bz2 / .conda package archives
Extracted package caches
Index caches
Unused package caches

3. Also Clean pip Cache (If Used Inside Conda Envs)

pip cache purge

4. Verify the Space Was Freed (Optional)

du -sh ~/miniconda3/pkgs 2>/dev/null

Git

Tue, 03 Feb 2026 00:00:00 GMT

I. Git — Clean, Fork Sync & Rebase

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> This note covers three related Git workflows: removing tracked and untracked files from the working directory with <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">git clean</code> and <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">git restore</code>; keeping a fork up to date with its upstream repository via merge or rebase; and understanding why <strong>rebase is preferred over merge</strong> for fork synchronization. </div>

1. Removing Files from the Working Directory

1) Remove Git-tracked Files

git restore .

Discards all unstaged changes to tracked files and restores them to the last committed state.

2) Remove Untracked Files (`git clean`)

Command	Danger	Description
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">git clean -f</code>	⭐	Deletes a small number of untracked files
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">git clean -fd</code>	⭐⭐	Also removes untracked directories (e.g., build dirs)
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">git clean -fdx</code>	🔥🔥🔥	<span style="color:#C0392B;font-weight:600">Removes almost all locally generated content</span>, including files ignored by `.gitignore`
<code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">git clean -fdxn</code>	—	`-n` = dry-run: preview what would be deleted without actually deleting

2. Updating a Forked Repository

1) Method 1: Sync Upstream Locally (Recommended)

Step 1: Navigate into Your Local Repository

cd your-repo

Step 2: View Existing Remotes

git remote -v

Step 3: Add the Upstream Remote (One-time Setup)

git remote add upstream https://github.com/ORIGINAL_OWNER/ORIGINAL_REPO.git

Step 4: Fetch the Latest Changes from Upstream

git fetch upstream

Step 5: Integrate Upstream Changes into Your Branch

Option A — Merge:

git checkout main
git merge upstream/main

If the upstream default branch is master:

git checkout master
git merge upstream/master

Option B — Rebase (Recommended):

git checkout main
git fetch upstream
git rebase upstream/main

Step 6: Push the Updated Branch to Your Fork

After merge:

git push origin main

After rebase:

git push origin main --force-with-lease

2) Method 2: Sync Directly on GitHub (Web UI)

Open your fork on GitHub, then click:

Sync fork → Update branch

No local commands required.

3. Why Rebase Is Preferred for Fork Sync

When syncing a personal fork with its upstream, <span style="color:#E8600A;font-weight:700">rebase is the recommended approach</span>.

Typical scenario:

upstream has moved ahead with new commits
Your fork is behind upstream
You have your own local commits on top

1) Using Merge

Produces an extra <span style="color:#C0392B;font-weight:600">Merge commit (M)</span>
History becomes tree-shaped and cluttered
PRs look noisy and harder to review

2) Using Rebase

After rebasing, your original commits are <span style="color:#E8600A;font-weight:700">replayed on top of the upstream</span>. The old commits are discarded from the branch history, and Git creates new commits with the same content but <span style="color:#E8600A;font-weight:700">brand-new commit IDs (D → D')</span>.

Benefits of rebase:

<span style="color:#2980B9">Clean, linear commit history</span>
<span style="color:#2980B9">Clearer, easier-to-review PRs</span>
The standard practice for syncing a fork with upstream

4. Handling Conflicts in Merge / Rebase / Sync Fork

Python Pydantic

Mon, 02 Feb 2026 00:00:00 GMT

Pydantic is a Python data validation and settings management library.（Pydantic 是一个用于 Python 数据校验和配置管理的库）

1) `BaseModel` (Define models)

It defines a typed data structure (schema). Pydantic will create an object and validate the input based on the type hints.

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str

u = User(id=1, name="Tom")
print(u)

2) Validation + Type Coercion

It validates input types and will automatically convert compatible values (e.g. "123" → 123). If it can’t validate/convert, it raises an error.

from pydantic import BaseModel

class User(BaseModel):
    id: int

u = User(id="123")  # "123" -> 123
print(u.id, type(u.id))

3) `Field` (Constraints / Metadata)

Field() lets you add constraints (like min/max) and extra metadata (like description). Pydantic will enforce the constraints during validation.

from pydantic import BaseModel, Field

class User(BaseModel):
    age: int = Field(ge=0, le=150, description="Age must be between 0 and 150")

u = User(age=18)
print(u)

4) Nested Models

A model can contain another model as a field. Pydantic will validate nested dictionaries and convert them into nested model objects automatically.

from pydantic import BaseModel

class Address(BaseModel):
    city: str

class User(BaseModel):
    name: str
    address: Address

u = User(name="Tom", address={"city": "Shanghai"})
print(u.address.city)

5) Optional Fields + Defaults

Optional fields (T | None) can be missing or set to None. Default values make the field optional during initialization.

from pydantic import BaseModel

class User(BaseModel):
    name: str
    email: str | None = None

u1 = User(name="Tom")
u2 = User(name="Jerry", email="jerry@example.com")

print(u1)
print(u2)

6) `@field_validator` (Field-level validation)

It lets you write custom validation logic for a specific field (e.g., trimming spaces, format checks, rejecting certain values).

from pydantic import BaseModel, field_validator

class User(BaseModel):
    name: str

    @field_validator("name")
    @classmethod
    def name_not_empty(cls, v: str):
        if not v.strip():
            raise ValueError("name must not be empty")
        return v.strip()

u = User(name="  Tom  ")
print(u)

7) `@model_validator` (Model-level validation)

It validates the model as a whole, which is useful for rules involving multiple fields (cross-field validation).

from pydantic import BaseModel, model_validator

class Login(BaseModel):
    username: str
    password: str

    @model_validator(mode="after")
    def check_password(self):
        if len(self.password) < 6:
            raise ValueError("password too short (min 6)")
        return self

ok = Login(username="tom", password="123456")
print(ok)

8) `model_dump()` (Export to dict)

It converts a validated model instance into a standard Python dictionary, often used for business logic or API responses.

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str

u = User(id=1, name="Tom")
print(u.model_dump())
print(type(u.model_dump()))

9) `model_dump_json()` (Export to JSON)

It converts the model into a JSON string, convenient for HTTP responses, logs, caches, or storing data.

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str

u = User(id=1, name="Tom")
print(u.model_dump_json())
print(type(u.model_dump_json()))

10) `BaseSettings` (Settings management)

It reads configuration from environment variables (and other sources) into a typed model, validating types just like BaseModel.

pip install pydantic-settings
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    app_name: str = "demo"
    debug: bool = False

s = Settings()
print(s.app_name, s.debug)

(Optional) run with environment variables:

export APP_NAME=myapp
export DEBUG=true
python your_file.py

Astro with Mermaid

Sun, 01 Feb 2026 00:00:00 GMT

设置 Mermaid in Astro

1）安装 Mermaid

在项目根目录执行：

npm i mermaid

如果你用 pnpm：

pnpm add mermaid

如果你用 yarn：

yarn add mermaid

2）添加`src/components/Mermaid.astro`

---
// src/components/Mermaid.astro
// 完整交互版：主界面滚动条 + Ctrl缩放(zoom) + 双击全屏
// 全屏：拖拽平移 + Ctrl缩放(transform)
---

<script>
  import mermaid from "mermaid";

  const isDark = document.documentElement.classList.contains("dark");
  console.log("isDark =", isDark);

  const BASE_FONT = 11;

  mermaid.initialize({
    startOnLoad: false,
    theme: isDark ? "dark" : "default",
    securityLevel: "loose",

    // ✅ Mermaid v10+ 类型不允许直接写 flowchart.fontSize / sequence.fontSize 等
    // ✅ 统一通过 themeVariables + CSS 强制字体大小更稳定
    themeVariables: {
      fontSize: `${BASE_FONT}px`,
      fontFamily: "Arial, sans-serif",
    },
  });

  function logMermaidAllLabels(): void {
    document.querySelectorAll<SVGSVGElement>(".mermaid svg").forEach((svg, svgIndex) => {
      console.log(`====== [SVG ${svgIndex}] ======`);

      // 1) 普通 svg text
      svg.querySelectorAll<SVGTextElement>("text").forEach((node, i) => {
        const style = getComputedStyle(node);
        console.log(
          `[text ${i}]`,
          node.textContent?.trim(),
          "font-size:",
          style.fontSize,
          "font-family:",
          style.fontFamily
        );
      });

      // 2) foreignObject (HTML label)
      svg.querySelectorAll<SVGForeignObjectElement>("foreignObject").forEach((fo, i) => {
        const div = fo.querySelector<HTMLElement>("div, span, p") || (fo as unknown as HTMLElement);
        const style = getComputedStyle(div);
        console.log(
          `[foreignObject ${i}]`,
          div.textContent?.trim(),
          "font-size:",
          style.fontSize,
          "font-family:",
          style.fontFamily
        );
      });
    });
  }

  function forceSvgOverflow(div: HTMLElement): void {
    const wrapper = div.querySelector<HTMLElement>(".mermaid-interactive-wrapper");
    const svg = div.querySelector<SVGSVGElement>(".mermaid-interactive-wrapper svg");
    if (!wrapper || !svg) return;

    // ✅ 永远不要让 svg 自动 fit 容器
    svg.style.width = "auto";
    svg.style.maxWidth = "none";
    svg.style.height = "auto";
    svg.style.display = "block";

    requestAnimationFrame(() => {
      const wrapperWidth = wrapper.clientWidth;

      // ✅ 优先使用 viewBox，更稳定（特别是 gantt）
      const vb = svg.viewBox?.baseVal;
      const viewBoxWidth = vb?.width || 0;

      // fallback：bbox
      const bboxWidth = svg.getBBox().width || 0;

      const realWidth = Math.ceil((viewBoxWidth || bboxWidth) + 40);

      // ✅ 用真实宽度撑开 svg
      svg.style.width = `${realWidth}px`;

      // ✅ 超出才滚动
      if (realWidth > wrapperWidth) {
        wrapper.classList.add("has-scroll");
      } else {
        wrapper.classList.remove("has-scroll");
      }

      console.log(
        "[forceSvgOverflow]",
        "wrapperWidth=",
        wrapperWidth,
        "realWidth=",
        realWidth,
        "viewBoxWidth=",
        viewBoxWidth,
        "bboxWidth=",
        bboxWidth
      );
    });
  }

  function extractMermaidText(container: Element): string {
    const code = container.querySelector<HTMLElement>("code");
    if (code) {
      const text = (code.innerText || code.textContent || "").trim();
      return cleanMermaidText(text);
    }

    const linesNodeList = container.querySelectorAll<HTMLElement>('[class*="line"]:not([class*="number"])');
    if (linesNodeList.length > 0) {
      const lines = Array.from(linesNodeList) as HTMLElement[];
      const text = lines
        .map((lineEl) => {
          const clone = lineEl.cloneNode(true) as HTMLElement;
          clone
            .querySelectorAll<HTMLElement>('[class*="number"], .line-number, .ln')
            .forEach((el) => el.remove());
          return (clone.innerText || clone.textContent || "").trim();
        })
        .filter((line) => line.length > 0)
        .join("\n");

      return cleanMermaidText(text);
    }

    const text = ((container as HTMLElement).innerText || container.textContent || "").trim();
    return cleanMermaidText(text);
  }

  function cleanMermaidText(text: string): string {
    const lines = text.split("\n");
    const cleanedLines: string[] = [];

    for (let line of lines) {
      line = line.trim();
      if (/^\d+$/.test(line)) continue;
      line = line.replace(/^\d+\s+/, "");
      if (line.length > 0) cleanedLines.push(line);
    }

    const result = cleanedLines.join("\n").trim();
    console.log("[Mermaid] After cleaning:", result);
    return result;
  }

  function findMermaidBlocks(): Element[] {
    const results: Element[] = [];

    document.querySelectorAll("pre").forEach((pre) => {
      const code = pre.querySelector("code");
      const cls = (code?.className || pre.className || "").toLowerCase();
      if (cls.includes("mermaid")) {
        results.push(pre);
      }
    });

    document.querySelectorAll("figure, div.expressive-code, [class*='astro-code']").forEach((box) => {
      const hasMermaidLabel = Array.from(box.querySelectorAll("*")).some((n) => {
        const text = (n.textContent || "").trim().toUpperCase();
        return text === "MERMAID";
      });

      if (hasMermaidLabel) {
        results.push(box);
      }
    });

    document.querySelectorAll<HTMLElement>('[data-language="mermaid"]').forEach((el) => {
      const container = el.closest("figure, div, pre") || el;
      results.push(container);
    });

    return Array.from(new Set(results));
  }

  // 创建模态框
  function createModal(): void {
    if (document.getElementById("mermaid-modal")) return;

    const modal = document.createElement("div");
    modal.id = "mermaid-modal";
    modal.className = "mermaid-modal";
    modal.innerHTML = `
      <div class="mermaid-modal-overlay"></div>
      <div class="mermaid-modal-content">
        <button class="mermaid-modal-close" aria-label="关闭">×</button>
        <div class="mermaid-modal-controls">
          <button class="mermaid-zoom-btn" data-action="zoom-in" title="放大">🔍+</button>
          <button class="mermaid-zoom-btn" data-action="zoom-out" title="缩小">🔍-</button>
          <button class="mermaid-zoom-btn" data-action="reset" title="重置">↺</button>
          <span class="mermaid-zoom-level">100%</span>
        </div>
        <div class="mermaid-modal-body"></div>
      </div>
    `;
    document.body.appendChild(modal);

    const closeBtn = modal.querySelector<HTMLButtonElement>(".mermaid-modal-close");
    const overlay = modal.querySelector<HTMLElement>(".mermaid-modal-overlay");

    const closeModal = () => {
      modal.classList.remove("active");
      document.body.style.overflow = "";
    };

    closeBtn?.addEventListener("click", closeModal);
    overlay?.addEventListener("click", closeModal);

    document.addEventListener("keydown", (e: KeyboardEvent) => {
      if (e.key === "Escape" && modal.classList.contains("active")) {
        closeModal();
      }
    });
  }

  // 更新缩放比例显示
  function updateZoomDisplay(container: HTMLElement, scale: number): void {
    const zoomLevel = container.closest(".mermaid-modal")?.querySelector<HTMLElement>(".mermaid-zoom-level");
    if (zoomLevel) {
      zoomLevel.textContent = `${Math.round(scale * 100)}%`;
    }
  }

  // ✅ 主界面滚动条缩放（使用 zoom，保证 overflow-x 生效）
  function makeScrollableZoom(wrapper: HTMLElement, content: HTMLElement): void {
    let scale = 1;

    const applyZoom = () => {
      scale = Math.min(Math.max(scale, 0.5), 3); // ✅ 主界面缩放范围
      (content.style as any).zoom = String(scale); // zoom 不是标准属性，TS 可能不认识
      console.log("zoom =", (content.style as any).zoom);
    };

    wrapper.addEventListener(
      "wheel",
      (e: WheelEvent) => {
        if (!e.ctrlKey && !e.metaKey) return;
        e.preventDefault();

        const delta = e.deltaY > 0 ? -0.1 : 0.1;
        scale += delta;
        applyZoom();
      },
      { passive: false }
    );

    applyZoom();
  }

  type TransformController = {
    get scale(): number;
    set scale(v: number);

    get translateX(): number;
    set translateX(v: number);

    get translateY(): number;
    set translateY(v: number);

    applyTransform: () => void;
  };

  // ✅ 全屏交互：拖拽 + transform 缩放（保证永远返回 TransformController）
  function makeInteractive(wrapper: HTMLElement, options = { enableDrag: true }): TransformController {
    let scale = 1;
    let translateX = 0;
    let translateY = 0;
    let isDragging = false;
    let startX = 0;
    let startY = 0;
    let lastTranslateX = 0;
    let lastTranslateY = 0;

    const svg = wrapper.querySelector<SVGSVGElement>("svg");

    const applyTransform = () => {
      if (!svg) return;
      svg.style.transform = `translate(${translateX}px, ${translateY}px) scale(${scale})`;
      svg.style.transformOrigin = "0 0";
      svg.style.transition = isDragging ? "none" : "transform 0.15s ease";
    };

    // ✅ 如果没 SVG，也返回一个空 controller，避免 transform 变 undefined
    if (!svg) {
      return {
        get scale() {
          return scale;
        },
        set scale(v: number) {
          scale = v;
        },
        get translateX() {
          return translateX;
        },
        set translateX(v: number) {
          translateX = v;
        },
        get translateY() {
          return translateY;
        },
        set translateY(v: number) {
          translateY = v;
        },
        applyTransform,
      };
    }

    // ✅ 拖拽（可选）
    if (options.enableDrag) {
      wrapper.addEventListener("mousedown", (e: MouseEvent) => {
        if (e.button !== 0) return;

        isDragging = true;
        startX = e.clientX;
        startY = e.clientY;
        lastTranslateX = translateX;
        lastTranslateY = translateY;
        wrapper.style.cursor = "grabbing";
        e.preventDefault();
      });

      document.addEventListener("mousemove", (e: MouseEvent) => {
        if (!isDragging) return;

        const deltaX = e.clientX - startX;
        const deltaY = e.clientY - startY;

        translateX = lastTranslateX + deltaX;
        translateY = lastTranslateY + deltaY;

        applyTransform();
      });

      document.addEventListener("mouseup", () => {
        if (isDragging) {
          isDragging = false;
          wrapper.style.cursor = "grab";
        }
      });
    }

    // Ctrl + 滚轮缩放（全屏）
    wrapper.addEventListener(
      "wheel",
      (e: WheelEvent) => {
        if (!e.ctrlKey && !e.metaKey) return;

        e.preventDefault();

        const delta = e.deltaY > 0 ? -0.1 : 0.1;
        const newScale = Math.min(Math.max(0.3, scale + delta), 5);

        // 以鼠标位置为中心缩放
        const rect = wrapper.getBoundingClientRect();
        const mouseX = e.clientX - rect.left;
        const mouseY = e.clientY - rect.top;

        const scaleChange = newScale / scale;
        translateX = mouseX - (mouseX - translateX) * scaleChange;
        translateY = mouseY - (mouseY - translateY) * scaleChange;

        scale = newScale;
        applyTransform();

        updateZoomDisplay(wrapper, scale);
      },
      { passive: false }
    );

    wrapper.style.cursor = options.enableDrag ? "grab" : "default";
    wrapper.style.overflow = "hidden";
    applyTransform();

    return {
      get scale() {
        return scale;
      },
      set scale(v: number) {
        scale = v;
      },
      get translateX() {
        return translateX;
      },
      set translateX(v: number) {
        translateX = v;
      },
      get translateY() {
        return translateY;
      },
      set translateY(v: number) {
        translateY = v;
      },
      applyTransform,
    };
  }

  // 为普通视图添加交互
  function addInteractiveFeature(mermaidDiv: HTMLElement): void {
    const wrapper = document.createElement("div");
    wrapper.className = "mermaid-interactive-wrapper";

    const content = document.createElement("div");
    content.className = "mermaid-interactive-content";

    content.innerHTML = mermaidDiv.innerHTML;
    mermaidDiv.innerHTML = "";

    wrapper.appendChild(content);
    mermaidDiv.appendChild(wrapper);

    const hint = document.createElement("div");
    hint.className = "mermaid-hint";
    // hint.innerHTML = `<strong>提示：</strong> 主界面 Ctrl/⌘ + 滚轮缩放；双击进入全屏；全屏可拖拽平移 + Ctrl/⌘ 缩放`;
    mermaidDiv.appendChild(hint);

    // ✅ 主界面：滚动条缩放（zoom）
    makeScrollableZoom(wrapper, content);

    // 双击全屏
    content.addEventListener("dblclick", () => {
      openFullscreen(mermaidDiv);
    });
  }

  // 打开全屏模态框
  function openFullscreen(mermaidDiv: HTMLElement): void {
    const modal = document.getElementById("mermaid-modal");
    const modalBody = modal?.querySelector<HTMLElement>(".mermaid-modal-body");

    if (!modal || !modalBody) return;

    const svg = mermaidDiv.querySelector<SVGSVGElement>("svg");
    if (!svg) return;

    // 克隆 SVG 到模态框
    const clonedSvg = svg.cloneNode(true) as SVGSVGElement;
    modalBody.innerHTML = "";
    modalBody.appendChild(clonedSvg);

    const transform = makeInteractive(modalBody, { enableDrag: true });

    const zoomIn = modal.querySelector<HTMLButtonElement>('[data-action="zoom-in"]');
    const zoomOut = modal.querySelector<HTMLButtonElement>('[data-action="zoom-out"]');
    const reset = modal.querySelector<HTMLButtonElement>('[data-action="reset"]');

    zoomIn?.addEventListener("click", () => {
      const newScale = Math.min(transform.scale + 0.2, 5);
      transform.scale = newScale;
      transform.applyTransform();
      updateZoomDisplay(modalBody, newScale);
    });

    zoomOut?.addEventListener("click", () => {
      const newScale = Math.max(transform.scale - 0.2, 0.3);
      transform.scale = newScale;
      transform.applyTransform();
      updateZoomDisplay(modalBody, newScale);
    });

    reset?.addEventListener("click", () => {
      transform.scale = 1;
      transform.translateX = 0;
      transform.translateY = 0;
      transform.applyTransform();
      updateZoomDisplay(modalBody, 1);
    });

    modal.classList.add("active");
    document.body.style.overflow = "hidden";
    updateZoomDisplay(modalBody, 1);
  }

  async function renderMermaid(): Promise<void> {
    const blocks = findMermaidBlocks();
    console.log("[Mermaid] Found blocks:", blocks.length);

    let converted = 0;

    for (const container of blocks) {
      try {
        const text = extractMermaidText(container);

        const isMermaid =
          text.includes("graph") ||
          text.includes("sequenceDiagram") ||
          text.includes("classDiagram") ||
          text.includes("stateDiagram") ||
          text.includes("erDiagram") ||
          text.includes("journey") ||
          text.includes("gantt") ||
          text.includes("pie") ||
          text.includes("flowchart");

        if (!isMermaid) continue;

        const div = document.createElement("div");
        div.className = "mermaid mermaid-container";
        div.textContent = text;

        container.replaceWith(div);
        converted++;
      } catch (e) {
        console.error("[Mermaid] Error processing block:", e);
      }
    }

    console.log("[Mermaid] Converted:", converted);

    if (converted > 0) {
      try {
        await mermaid.run({ querySelector: ".mermaid" });
        logMermaidAllLabels();
        console.log("[Mermaid] Rendered ✅");

        createModal();

        document.querySelectorAll<HTMLElement>(".mermaid-container").forEach((div) => {
          if (!div.querySelector(".mermaid-interactive-wrapper")) {
            addInteractiveFeature(div);
          }

          // ✅ 强制让 SVG 产生溢出 -> 横向滚动条必出现
          forceSvgOverflow(div);
        });
      } catch (e) {
        console.error("[Mermaid] Render failed ❌", e);
      }
    }
  }

  // 初始渲染
  if (document.readyState === "loading") {
    document.addEventListener("DOMContentLoaded", () => {
      renderMermaid();
    });
  } else {
    renderMermaid();
  }

  // Swup 支持（window.swup 不是标准类型）
  const w = window as any;
  if (w.swup) {
    w.swup.hooks.on("page:view", renderMermaid);
  }
  document.addEventListener("swup:page:view", renderMermaid);
</script>

<style is:global>
  /* 基础容器 */
  .mermaid-container {
    width: 100%;
    margin: 2rem 0;
    position: relative;
    border: 1px solid #e2e8f0;
    border-radius: 8px;
    padding: 1rem;
    background: #f8fafc;
  }

  :global(.dark) .mermaid-container {
    border-color: #374151;
    background: #1f2937;
  }

  /* ✅ 主界面滚动容器：横向滚动条 */
  .mermaid-interactive-wrapper {
    width: 100%;
    overflow-x: auto;
    overflow-y: hidden;
    position: relative;
    border-radius: 4px;
    background: white;
    scrollbar-width: thin;
  }

  :global(.dark) .mermaid-interactive-wrapper {
    background: #111827;
  }

  /* ✅ 内容容器必须由内容撑开，否则不会溢出 -> 没滚动条 */
  .mermaid-interactive-content {
    display: inline-block;
    width: max-content;
    height: auto;
  }

  .mermaid-interactive-content svg {
    display: block;
    max-width: none !important;
    height: auto;
  }

  /* ✅ 主界面字体强制（Mermaid 不同图类型都能覆盖到） */
  .mermaid-interactive-content svg text,
  .mermaid-interactive-content svg .nodeLabel,
  .mermaid-interactive-content svg .edgeLabel,
  .mermaid-interactive-content svg .label,
  .mermaid-interactive-content svg tspan {
    font-size: clamp(10px, 1em, 14px) !important;
    font-family: Arial, sans-serif !important;
  }

  /* 提示文字 */
  .mermaid-hint {
    text-align: center;
    font-size: 0.875rem;
    color: #64748b;
    margin-top: 1rem;
    padding: 0.5rem;
    background: #f1f5f9;
    border-radius: 4px;
  }

  .mermaid-hint strong {
    color: #475569;
    font-weight: 600;
  }

  :global(.dark) .mermaid-hint {
    color: #94a3b8;
    background: #334155;
  }

  :global(.dark) .mermaid-hint strong {
    color: #cbd5e1;
  }

  /* 模态框 */
  .mermaid-modal {
    display: none;
    position: fixed;
    inset: 0;
    z-index: 9999;
    align-items: center;
    justify-content: center;
  }

  .mermaid-modal.active {
    display: flex;
  }

  .mermaid-modal-overlay {
    position: absolute;
    inset: 0;
    background: rgba(0, 0, 0, 0.9);
    backdrop-filter: blur(4px);
  }

  .mermaid-modal-content {
    position: relative;
    width: 95vw;
    height: 95vh;
    background: white;
    border-radius: 12px;
    box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.5);
    display: flex;
    flex-direction: column;
    overflow: hidden;
    z-index: 1;
  }

  :global(.dark) .mermaid-modal-content {
    background: #1f2937;
  }

  /* 关闭按钮 */
  .mermaid-modal-close {
    position: absolute;
    top: 1rem;
    right: 1rem;
    width: 40px;
    height: 40px;
    border-radius: 50%;
    border: none;
    background: rgba(0, 0, 0, 0.7);
    color: white;
    font-size: 2rem;
    line-height: 1;
    cursor: pointer;
    z-index: 10;
    display: flex;
    align-items: center;
    justify-content: center;
  }

  /* 控制面板 */
  .mermaid-modal-controls {
    position: absolute;
    bottom: 2rem;
    left: 50%;
    transform: translateX(-50%);
    display: flex;
    gap: 0.5rem;
    align-items: center;
    background: rgba(0, 0, 0, 0.8);
    padding: 0.75rem 1.5rem;
    border-radius: 100px;
    z-index: 10;
    backdrop-filter: blur(8px);
  }

  .mermaid-zoom-btn {
    width: 36px;
    height: 36px;
    border-radius: 50%;
    border: 1px solid rgba(255, 255, 255, 0.2);
    background: rgba(255, 255, 255, 0.1);
    color: white;
    font-size: 1rem;
    cursor: pointer;
  }

  .mermaid-zoom-level {
    color: white;
    font-size: 0.875rem;
    font-weight: 600;
    min-width: 50px;
    text-align: center;
    margin: 0 0.5rem;
  }

  .mermaid-modal-body {
    flex: 1;
    overflow: hidden;
    display: flex;
    align-items: center;
    justify-content: center;
    position: relative;
  }

  /* ✅ 模态框字体也 clamp(10~14) */
  .mermaid-modal-body svg text,
  .mermaid-modal-body svg .nodeLabel,
  .mermaid-modal-body svg .edgeLabel,
  .mermaid-modal-body svg .label,
  .mermaid-modal-body svg tspan {
    font-size: clamp(10px, 1em, 14px) !important;
    font-family: Arial, sans-serif !important;
  }

  /* 响应式 */
  @media (max-width: 768px) {
    .mermaid-modal-content {
      width: 100vw;
      height: 100vh;
      border-radius: 0;
    }

    .mermaid-modal-controls {
      bottom: 1rem;
      padding: 0.5rem 1rem;
    }

    .mermaid-zoom-btn {
      width: 32px;
      height: 32px;
      font-size: 0.875rem;
    }

    .mermaid-hint {
      font-size: 0.75rem;
    }
  }
</style>

3）修改`src/layouts/Layout.astro`

---
.....
+ import Mermaid from "../components/Mermaid.astro";
---

	<body class=" min-h-screen transition " class:list={[{"lg:is-home": isHomePage, "enable-banner": enableBanner}]}
		  data-overlayscrollbars-initialize
	>
		<ConfigCarrier></ConfigCarrier>
        + <Mermaid client:load />
		+ <slot />

		<!-- increase the page height during page transition to prevent the scrolling animation from jumping -->
		<div id="page-height-extend" class="hidden h-[300vh]"></div>
	</body>

4）修改样式`src/styles/global.css`

.mermaid-interactive-wrapper {
  width: 100%;
  overflow-x: auto !important; /* ✅ 溢出才显示滚动条 */
  overflow-y: hidden !important;
  background: transparent;
}

/* ✅ 滚动条高度（不要太大，否则会挤压图） */
.mermaid-interactive-wrapper::-webkit-scrollbar {
  height: 14px;
}

/* ✅ 轨道 */
.mermaid-interactive-wrapper::-webkit-scrollbar-track {
  background: #f1f5f9;
  border-radius: 999px;
}

/* ✅ thumb */
.mermaid-interactive-wrapper::-webkit-scrollbar-thumb {
  background: #94a3b8;
  border-radius: 999px;

  /* ✅ 让 thumb 变细并居中 */
  border: 4px solid #f1f5f9;
  background-clip: padding-box;
}

/* hover */
.mermaid-interactive-wrapper::-webkit-scrollbar-thumb:hover {
  background: #64748b;
}

:global(.dark) .mermaid-interactive-wrapper::-webkit-scrollbar-track {
  background: #1f2937;
}

:global(.dark) .mermaid-interactive-wrapper::-webkit-scrollbar-thumb {
  background: #475569;
  border: 4px solid #1f2937;
  background-clip: padding-box;
}

:global(.dark) .mermaid-interactive-wrapper::-webkit-scrollbar-thumb:hover {
  background: #64748b;
}

Mermaid 测试

在你的 Markdown 文件中添加以下测试代码：

测试案例

流程图

graph TD
    A[开始] --> B{是否成功?}
    B -->|是| C[完成]
    B -->|否| D[重试]
    D --> A

时序图

sequenceDiagram
    participant A as 用户
    participant B as 系统
    A->>B: 发送请求
    B-->>A: 返回响应

类图

classDiagram
    class Animal {
        +String name
        +int age
        +makeSound()
    }
    class Dog {
        +bark()
    }
    Animal <|-- Dog

时序图

sequenceDiagram
    participant A as 用户
    participant B as 系统
    A->>B: 发送请求
    B-->>A: 返回响应

类图

classDiagram
    class Animal {
        +String name
        +int age
        +makeSound()
    }
    class Dog {
        +bark()
    }
    Animal <|-- Dog

状态图

stateDiagram-v2
    [*] --> 待处理
    待处理 --> 处理中
    处理中 --> 已完成
    处理中 --> 失败
    失败 --> 待处理
    已完成 --> [*]

甘特图

gantt
    title 项目计划
    dateFormat  YYYY-MM-DD
    section 设计
    需求分析      :a1, 2024-01-01, 30d
    UI设计        :a2, after a1, 20d
    section 开发
    后端开发      :b1, 2024-02-01, 45d
    前端开发      :b2, 2024-02-10, 40d

饼图

pie title 技术栈占比
    "JavaScript" : 45
    "Python" : 30
    "Go" : 15
    "其他" : 10

如果显示正常现在你应该能在页面上看到渲染好的图表了！🎉

控制台检测

请检查：

浏览器控制台是否显示 [Mermaid] Rendered ✅
控制台中 "Cleaned text" 的内容是否正确（没有行号）
页面上是否有 <div class="mermaid"> 元素

如果需要，可以截图或复制控制台的输出，我可以继续帮你调试。

额外优化建议

如果一切正常，你还可以添加暗色主题支持：

<script>
  import mermaid from "mermaid";
  
  // 检测主题
  const isDark = document.documentElement.classList.contains('dark');
  
  mermaid.initialize({ 
    startOnLoad: false,
    theme: isDark ? 'dark' : 'default',  // 👈 根据主题切换
    securityLevel: 'loose',
  });
  
  // ... 其余代码
</script>

这样 Mermaid 图表会自动适配你网站的深色/浅色主题！

Sort

Sat, 31 Jan 2026 00:00:00 GMT

Sort

Python Sort

Sat, 31 Jan 2026 00:00:00 GMT

1) Sort a List (In-Place)

a = [3, 1, 2]
a.sort()
print(a)  # [1, 2, 3]

2) Return a New Sorted List (Do Not Modify the Original)

a = [3, 1, 2]
b = sorted(a)
print(b)  # [1, 2, 3]
print(a)  # [3, 1, 2]

3) Sort in Descending Order

a.sort(reverse=True)
# or
b = sorted(a, reverse=True)

4) Sort by Key (Common for Tuples / Objects)

arr = [(1, 5), (2, 3), (3, 4)]
arr.sort(key=lambda x: x[1])
print(arr)  # sort by the second element

5) Sort by multiple key and value

1.Ascending, Ascending

arr.sort(key=lambda x: (x[0], x[1]))

Example:

arr = [(2, 0), (1, 4), (2, 1), (1, 3)]
arr.sort(key=lambda x: (x[0], x[1]))
print(arr)
# [(1, 3), (1, 4), (2, 0), (2, 1)]

Meaning:

sort by x[0] first
if x[0] is equal, sort by x[1]

2.Ascending + Descending

arr.sort(key=lambda x: (x[0], -x[1]))

Example:

# Ascending + Descending (most common in contests)
arr = [(2, 0), (1, 4), (2, 1), (1, 3)]
arr.sort(key=lambda x: (x[0], -x[1]))
print(arr)
# [(1, 4), (1, 3), (2, 1), (2, 0)]

3.All descending (two ways)

Method A: reverse=True (global reverse) ==reverse=True==

arr.sort(key=lambda x: (x[0], x[1]), reverse=True)

⚠️ This reverses the whole result, not “first asc, second desc”.

Method B: negate each key (more flexible)

arr.sort(key=lambda x: (-x[0], -x[1]))

4.key&value sorting for strings

Sort by length descending, then lexicographically ascending

words.sort(key=lambda s: (-len(s), s))

Example:

words = ["apple", "bat", "banana", "app"]
words.sort(key=lambda s: (-len(s), s))
print(words)
# ['banana', 'apple', 'app', 'bat']

5.Sort a dictionary / Counter by value, then key

Example: sort by frequency descending, then number ascending

from collections import Counter

# Sort a dictionary / Counter by value, then key
nums = [1,1,1,2,2,3,3,4]
cnt = Counter(nums)
print(cnt)
res = sorted(cnt.items(), key=lambda x: (-x[1], x[0]))
print(res)
# Counter({1: 3, 2: 2, 3: 2, 4: 1})
# [(1, 3), (2, 2), (3, 2), (4, 1)]

nums = {4:1, 1:3, 2:2, 3:2}
print(nums)
res = sorted(nums.items(), key=lambda x: (-x[1], x[0]))
print(res)
# [(1, 3), (2, 2), (3, 2), (4, 1)]

Passwordless Remote Login

Fri, 30 Jan 2026 00:00:00 GMT

I. SSH Passwordless Login — Key-Based Authentication

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> The goal is to log in to a remote server (e.g., <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">spiedie.binghamton.edu</code>) from Mac/Windows/Linux <strong>without entering a password</strong>, and to make VS Code Remote-SSH connections more stable. This is achieved by replacing password-based authentication with <strong>public/private key authentication (公钥/私钥认证)</strong>. </div>

1. Core Concept: How SSH Passwordless Login Works

SSH "passwordless login" does not skip identity verification — it replaces password-based authentication with key-based authentication.

1) Two Key Files

<span style="color:#E8600A;font-weight:700">Private Key (私钥)</span>: stored on your local machine Example: ~/.ssh/id_ed25519 <span style="color:#C0392B;font-weight:600">Must never be leaked or shared</span>
<span style="color:#E8600A;font-weight:700">Public Key (公钥)</span>: can be sent to the server Example: ~/.ssh/id_ed25519.pub Safe to share openly / copy to servers

2) How Does the Server Remember You?

The server stores your public key in:

~/.ssh/authorized_keys

Once added, the server will allow anyone who holds the corresponding private key to log in.

3) What Happens During Authentication (Simplified Flow)

You initiate a connection: ssh user@host
The server looks up your public key in authorized_keys
The server sends a random challenge
Your machine signs the challenge using your private key (the private key never leaves your machine)
The server verifies the signature using your public key
Verification passes → Login succeeds (no password prompt)

2. Step-by-Step Setup

The following examples use:

Username: xli49
Host: spiedie.binghamton.edu

1) Step 1: Generate an SSH Key on Your Local Machine

Check whether a key already exists:

ls ~/.ssh/id_ed25519 ~/.ssh/id_rsa 2>/dev/null

If not, generate one (ed25519 recommended):

ssh-keygen -t ed25519 -C "xli49@spiedie.binghamton.edu"

<div style="background:#F5F5F5;border-left:4px solid #E8600A;border-radius:0 6px 6px 0;padding:12px 16px;margin:14px 0;font-size:14px;line-height:1.85"><span style="color:#E8600A;font-weight:700">Note: </span> For a completely password-free experience, press Enter to leave the passphrase empty. For better security, set a passphrase and use it with Keychain / <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">ssh-agent</code>.</div>

2) Step 2: Copy the Public Key to the Server

✅ Recommended (when ssh-copy-id is available):

ssh-copy-id xli49@spiedie.binghamton.edu

Enter your password once — that's the last time.

Manual Method (if `ssh-copy-id` is not available)

cat ~/.ssh/id_ed25519.pub | ssh xli49@spiedie.binghamton.edu \
"mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys"

3) Step 3: Test the Passwordless Login

ssh xli49@spiedie.binghamton.edu

✅ If no password prompt appears → setup successful.

3. Configuring `~/.ssh/config` (Recommended)

Edit the config file:

nano ~/.ssh/config

Recommended configuration (using the full hostname as the Host):

Host spiedie.binghamton.edu
    HostName spiedie.binghamton.edu
    User xli49
    ServerAliveInterval 300
    ServerAliveCountMax 120

After saving, connect with just:

ssh spiedie.binghamton.edu

4. When Is `IdentityFile` Needed?

1) Usually Not Required

SSH automatically tries the following keys in order:

~/.ssh/id_ed25519
~/.ssh/id_rsa
Any keys already loaded into ssh-agent

If your key is in one of these default locations, you typically do not need to specify IdentityFile.

2) When You Must Specify `IdentityFile`

<span style="color:#2980B9">You have multiple keys</span> and SSH might pick the wrong one
<span style="color:#2980B9">Your key is not in a default path</span>
<span style="color:#2980B9">The server requires a specific key</span>

Example:

Host spiedie.binghamton.edu
    HostName spiedie.binghamton.edu
    User xli49
    IdentityFile ~/.ssh/id_ed25519

5. Preventing VS Code Remote-SSH Disconnections (KeepAlive)

Common causes of disconnection:

The school network or firewall clears long-idle connections
The remote session is considered idle when there is no output for an extended period

Solution: enable SSH heartbeat packets (KeepAlive):

ServerAliveInterval 30
ServerAliveCountMax 120

What this means:

Send a heartbeat packet every <span style="color:#E8600A;font-weight:700">30 seconds</span>
Allow up to <span style="color:#E8600A;font-weight:700">120 consecutive non-responses</span> (~1 hour) before disconnecting

6. Troubleshooting

1) Check Which Key SSH Is Using

ssh -v xli49@spiedie.binghamton.edu

Look for lines like:

Offering public key: ...
Authentication succeeded

2) Check the Default Shell on the Server

Run remotely to rule out misconfiguration:

echo $SHELL
getent passwd xli49 | cut -d: -f7
which zsh

Double&Triple Pointers

Thu, 29 Jan 2026 00:00:00 GMT

Double Pointers

Check if `p` is a subsequence of `s`

Check whether p is a subsequence of s after some characters of s have been removed.

✅ 最优雅（推荐）——只写一个 if

j = 0
for i in range(m):
    if not removed[i] and j < len(p) and s[i] == p[j]:
        j += 1
return j == len(p)

int j = 0;
for (int i = 0; i < m; i++) {
    if (!removed[i] && j < (int)p.size() && s[i] == p[j]) {
        j++;
    }
}
return j == (int)p.size();

✅ 更“清爽”的 continue 版（逻辑最直观）

j = 0
for i in range(m):
    if removed[i]:
        continue
    if j < len(p) and s[i] == p[j]:
        j += 1
return j == len(p)

(int) 是为了避免 int 和 size_t(无符号) 混着比较导致警告/潜在bug。在 C++ 里如果你把 j 定义成 size_t（无符号），然后==写了 j--，会出现一个非常坑的现象：不会变成 -1，而是“下溢”变成一个超级大的数。==

int j = 0;
for (int i = 0; i < m; i++) {
    if (removed[i]) continue;
    if (j < (int)p.size() && s[i] == p[j]) {
        j++;
    }
}
return j == (int)p.size();

✅ 用 for ch in s 更优雅（但要处理 removed）

j = 0
for ch in s:
    if j < len(p) and ch == p[j]:
        j += 1
return j == len(p)

int j = 0;
for (char ch : s) {
    if (j < (int)p.size() && ch == p[j]) {
        j++;
    }
}
return j == (int)p.size();

zsh-autosuggestions

Thu, 29 Jan 2026 00:00:00 GMT

I. zsh Plugins — Autosuggestions & Syntax Highlighting

<div style="background:#EBF0FF;border-left:4px solid #3B5BDB;border-radius:0 6px 6px 0;padding:14px 18px;margin:16px 0;line-height:1.9"> <strong>Overview:</strong> Two essential zsh plugins for a better terminal experience: <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">zsh-autosuggestions</code> provides auto-complete, parameter hints, and history-based suggestions; <code style="background:#E8F4FD;color:#1a3a5c;border-radius:4px;padding:1px 6px">zsh-syntax-highlighting</code> colorizes valid commands and flags errors in real time as you type. </div>

1. macOS

<span style="color:#E8600A;font-weight:700">Command suggestions</span> appear in gray while typing (pulled from shell history)
<span style="color:#E8600A;font-weight:700">Syntax highlighting</span> colors valid commands and highlights mistakes inline

1) Install (Homebrew)

brew install zsh-autosuggestions zsh-syntax-highlighting

2) Enable (add to `~/.zshrc`)

echo 'source /usr/local/share/zsh-autosuggestions/zsh-autosuggestions.zsh' >> ~/.zshrc
echo 'source /usr/local/share/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh' >> ~/.zshrc
source ~/.zshrc

✅ Works immediately.

<div style="background:linear-gradient(135deg,#EBF0FF 0%,#FFF3E0 100%);border:1.5px solid #c5d3ff;border-radius:8px;padding:14px 20px;margin-top:24px"><span style="color:#3B5BDB;font-weight:700">💡 One-line Takeaway</span><br> Install both plugins via Homebrew, source them in <code style="background:#FFF3E0;color:#7a2e00;border-radius:4px;padding:1px 6px">~/.zshrc</code>, and your shell gains history-based grey suggestions and live syntax coloring instantly.</div>

Greedy&Linear Scan

Tue, 27 Jan 2026 00:00:00 GMT

This algorithm is a linear scan (one-pass) greedy counting algorithm.

Linear Scan / One-pass Traversal

Greedy (always keeps the best answer so far)

Can also be seen as a simple state tracking approach

485. Max Consecutive Ones

Given a binary array nums, return the maximum number of consecutive 1's in the array.

Example 1:

Input: nums = [1,1,0,1,1,1]
Output: 3
Explanation: The first two digits or the last three digits are consecutive 1s. The maximum number of consecutive 1s is 3.

Example 2:

Input: nums = [1,0,1,1,0,1]
Output: 2

Constraints:

1 <= nums.length <= 105
nums[i] is either 0 or 1.

class Solution:
    def findMaxConsecutiveOnes(self, nums: List[int]) -> int:
        mx = 0
        cnt = 0
        for x in nums:
            if x == 1:
                cnt += 1
            else:
                mx = max(mx, cnt)
                cnt = 0
            mx = max(mx, cnt)
        return mx

class Solution {
public:
    int findMaxConsecutiveOnes(vector<int>& nums) {
        int ans = 0;
        int cnt = 0;
        for (int x : nums) {
            if (x == 0) {
                ans = max(ans, cnt);
                cnt = 0;
            } else {
                cnt += 1;
            }
        }
        ans = max(ans, cnt);
        return ans;
    }
};

1446. Consecutive Characters

The power of the string is the maximum length of a non-empty substring that contains only one unique character.

Given a string s, return the power of s.

Example 1:

Input: s = "leetcode"
Output: 2
Explanation: The substring "ee" is of length 2 with the character 'e' only.

Example 2:

Input: s = "abbcccddddeeeeedcba"
Output: 5
Explanation: The substring "eeeee" is of length 5 with the character 'e' only.

Constraints:

1 <= s.length <= 500
s consists of only lowercase English letters.

class Solution:
    def maxPower(self, s: str) -> int:
        ans = 1
        cnt = 1
        for i in range(len(s)):
            if s[i] == s[i - 1]:
                cnt += 1
            else:
                cnt = 1
            ans = max(ans, cnt)
        return ans

class Solution {
public:
    int maxPower(string s) {
        int ans = 0;
        int l = 0;
        int n = s.size();
        map<char, int> cnt;
        for (int r = 0; r < n; r++) {
            cnt[s[r]]++;
            while (cnt.size() > 1) {
                cnt[s[l]]--;
                if (cnt[s[l]] == 0) {
                    cnt.erase(s[l]);
                }
                l++;
            }
            ans = max(ans, r - l + 1);
        }
        return ans;
    }
};

bisect left

Mon, 26 Jan 2026 00:00:00 GMT

速记总结

找 “第一个 >= x”

i = bisect_left(a, x)
if i < len(a):
    ans = a[i]   # 第一个 >= x

找 “最后一个 < x”

i = bisect_left(a, x)
if i > 0:
    ans = a[i-1]  # 最后一个 < x

（在 Heaters 里就是左边最近的 heater）

判断 x 是否存在于数组中

i = bisect_left(a, x)
exists = (i < len(a) and a[i] == x)

找 “x 应该插入的位置”

pos = bisect_left(a, x)

插入到 pos 可以保持有序（并且插到同值最左边）

找 “小于 x 的元素个数”

count = bisect_left(a, x)

因为 i 左边全是 < x 的。

找 “>= x 的元素个数”

count = len(a) - bisect_left(a, x)

Example

r = len(nums) - 1 # 记得减1

import bisect as b

nums = [1, 2, 2, 3]

# 第一个 >= x 的位置
a = b.bisect_left(nums, 2)
print(a) # 1

def find_bisect_left(nums, k):
    l = 0
    r = len(nums) - 1 # 记得减1
    while l <= r:
        m = (l + r) // 2
        if nums[m] >= k:
            r = m - 1
        else:
            l = m + 1
    return l

print(find_bisect_left(nums, 2))


# 第一个 > x 的位置
a = b.bisect(nums, 2)
print(a) # 3
a = b.bisect_right(nums, 2)
print(a) # 4

def find_bisect_right(nums, k):
    l = 0
    r = len(nums) - 1
    while l <= r:
        m = (l + r) // 2
        if nums[m] > k:
            r = m - 1
        else:
            l = m + 1
    return l

print(find_bisect_right(nums, 2))

typing feature - typedDict

Mon, 26 Jan 2026 00:00:00 GMT

来自 typing（Python 3.11+）或 typing_extensions（旧版本），作用是：

在 TypedDict 里把某个字段标记为 可选（不是必须提供）

示例

from typing import TypedDict, NotRequired

class User(TypedDict):
    name: str                 # 必填
    age: NotRequired[int]     # 可选

这样写就允许：

u1: User = {"name": "Tom"}              # ✅ ok
u2: User = {"name": "Tom", "age": 18}   # ✅ ok

如果不用 NotRequired（默认都是必填）

class User(TypedDict):
    name: str
    age: int

那么：

{"name": "Tom"}   # ❌ 类型检查会报缺少 age

Mermaid

Sun, 25 Jan 2026 00:00:00 GMT

Export using Mermaid CLI

1) Install Mermaid CLI

npm install -g @mermaid-js/mermaid-cli

2) Save your diagram code into `arch.mmd`

Example content:

sequenceDiagram
    A->>B: hi

3) Export as PNG

mmdc -i arch.mmd -o arch.png

4Export as SVG (optional)

mmdc -i arch.mmd -o arch.svg

Binary Search

Sat, 24 Jan 2026 00:00:00 GMT

I. Binary Search Master Guide: From Logic to Universal Templates

1. The Core Essence(底层/核心本质): Monotonicity

The core of Binary Search isn't "sorting," but "Binary Properties" (Two-Segment Property(二段性)). As long as a function check(x) exists such that the search range presents one of the following two patterns, Binary Search is applicable(合适的，恰当的):

Find Minimum (First True): [False, False, ..., True, True]
Find Maximum (Last True): [True, True, ..., False, False]

2. Determining the Search Range `[left, right]`

Set the boundaries based on the physical meaning of x:

Type	left (Min Valid Value)	right (Definitely Feasible)	Classic Problem
Index	`0`	`n - 1`	Basic Search
Time	`1`	`min(time) * totalTrips`	2187. Min Time
Speed	`1`	`max(piles)`	875. Koko Eating Bananas
Capacity	`max(weights)`	`sum(weights)`	1011. Ship Packages
Divisor	`1`	`max(nums)`	1283. Smallest Divisor
Distance	`0` or `1`	`max(pos) - min(pos)`	1552. Magnetic Force

1) Time-based: 2187. Minimum Time to Complete Trips

Summary: Given the time each car takes to complete one trip, find the minimum total time required for all cars to complete at least totalTrips.
Why these boundaries:
- left = 1: Time cannot be zero.
- right = min(time) * totalTrips: This is a conservative upper bound. Even if only the fastest car were running, the time it takes to finish all trips alone would certainly be enough.

2) Speed-based: 875. Koko Eating Bananas

Summary: There are $n$ piles of bananas. You must finish all of them within $h$ hours. Find the minimum eating speed $K$ (bananas per hour). Note: Koko can only eat from one pile per hour.
Why these boundaries:
- left = 1: Speed must be at least 1, or she will never finish.
- right = max(piles): If your speed equals the largest pile, you are guaranteed to finish one pile per hour. Since you can't eat more than one pile an hour anyway, any speed higher than this is redundant.

3) Capacity-based: 1011. Capacity To Ship Packages Within D Days

Summary: Packages must be shipped in the order given within $D$ days. Find the minimum weight capacity of the conveyor belt.
Why these boundaries:
- left = max(weights): The belt must be able to carry the heaviest single package; otherwise, that package can never be shipped.
- right = sum(weights): The extreme case—shipping every single package on the very first day. The total sum of weights is the absolute maximum capacity needed.

4) Divisor-based: 1283. Find the Smallest Divisor Given a Threshold

Summary: Each number in an array is divided by $d$ (rounded up) and summed. The sum must be $\le$ a given threshold. Find the minimum $d$.
Why these boundaries:
- left = 1: The divisor cannot be zero.
- right = max(nums): When the divisor equals the maximum value in the array, every result becomes $1$ (except the max itself which also becomes 1). This is the effective boundary that reduces the "sum" to its minimum possible value (the length of the array $n$).

5) Distance-based: 1552. Magnetic Force Between Two Balls

Summary: Place $M$ balls in baskets such that the minimum distance between any two balls is as large as possible. Find this maximum minimum distance.
Why these boundaries:
- left = 1: The balls must be separated by at least 1 unit of distance (assuming distinct basket positions).
- right = max(pos) - min(pos): The theoretical maximum distance occurs when you place only two balls: one at the very first basket and one at the very last.

Would you like me to combine all these English sections into one single, clean Markdown file for you to save?

3. Core Templates: Closed Interval `while l <= r`

This is the most robust implementation. It is recommended to use this consistently.

1) Find Minimum (First True)

Goal: Find the smallest $x$ such that check(x) is True.

Python

l, r = min_valid, max_feasible
while l <= r:
    mid = l + (r - l) // 2
    if check(mid): # Feasible, but look for smaller ones to the left
        r = mid - 1
    else:          # Not feasible, must increase x
        l = mid + 1
return l  # When loop ends, l points to the first True

2) Find Maximum (Last True)

Goal: Find the largest $x$ such that check(x) is True.

Python

while l <= r:
    mid = l + (r - l) // 2
    if check(mid): # Feasible, try to find a larger one to the right
        l = mid + 1
    else:          # Not feasible, must decrease x
        r = mid - 1
return r  # When loop ends, r points to the last True

4. Advanced Tips & Mathematical Details

1) Finding Left/Right Boundaries of Elements

Left Boundary (Lower Bound): First index where element >= target.
Right Boundary (Upper Bound): Last index where element == target.
- Trick: lowerBound(target + 1) - 1.
- Principle: Find the start of the first number > target, then move back one spot. If target + 1 doesn't exist, the search returns n, and n - 1 correctly identifies the last element.

2) Avoiding Overflow

In C++/Java, left + right can exceed $2^{31} - 1$.

Standard approach: mid = left + (right - left) / 2
Python Note: Though Python handles arbitrarily large integers, keeping this habit helps in understanding low-level memory constraints.

3) Ceiling Division Conversion

When calculating "required days/trips," you often need $\lceil \frac{b}{a} \rceil$:

Universal Formula: (b + a - 1) // a
Logic: As long as $b$ is not perfectly divisible by $a$, adding $a-1$ will always force the integer division to round up by one.

5. Post-Loop State Cheat Sheet

When the while l <= r loop terminates:

Pointer	Physical Meaning
`l (low)`	Points to the first element that satisfies condition (or `>= target`)
`r (high)`	Points to the last element that fails condition (or `< target`)

II.Binary Search Questions

Find the boundary of element in sorted array

34. Find First and Last Position of Element in Sorted Array

Given an array of integers nums sorted in non-decreasing order, find the starting and ending position of a given target value.

If target is not found in the array, return [-1, -1].

You must write an algorithm with O(log n) runtime complexity.

Example 1:

Input: nums = [5,7,7,8,8,10], target = 8
Output: [3,4]

Example 2:

Input: nums = [5,7,7,8,8,10], target = 6
Output: [-1,-1]

Example 3:

Input: nums = [], target = 0
Output: [-1,-1]

class Solution {
private:
    int lowerBound(vector<int>& nums, int n, int target) {
        int left = 0;
        int right = n - 1;
        while (left <= right) {
            int mid = left + (right - left) / 2;
            if (nums[mid] >= target) {
                right = mid - 1;
            } else {
                left = mid + 1;
            }
        }
        return left;
    }
public:
    vector<int> searchRange(vector<int>& nums, int target) {
        int n = nums.size();
        int start = lowerBound(nums, n, target);
        if (start == n or nums[start] != target) {
            return {-1, -1};
        }
        // find the first element that is greater than target, then use its index to minus 1, we can get the the end position of target
        int end = lowerBound(nums, n, target + 1) - 1;
        return {start, end};
    }
};

class Solution:
    def searchRange(self, nums: List[int], target: int) -> List[int]:
        def lower_bound(nums, n, target):
            l = 0
            r = n - 1
            while l <= r:
                m = (l + r) // 2
                if nums[m] >= target:
                    r -= 1
                else:
                    l += 1
            return l
        
        n = len(nums)
        start = lower_bound(nums, n, target)
        if start == n or nums[start] != target:
            return [-1, -1]
        end = lower_bound(nums, n, target + 1) - 1
        return [start, end]

1283. Find the Smallest Divisor Given a Threshold

Given an array of integers nums and an integer threshold, we will choose a positive integer divisor, divide all the array by it, and sum the division's result. Find the smallest divisor such that the result mentioned above is less than or equal to threshold.

Each result of the division is rounded to the nearest integer greater than or equal to that element. (For example: 7/3 = 3 and 10/2 = 5).

The test cases are generated so that there will be an answer.

Example 1:

Input: nums = [1,2,5,9], threshold = 6
Output: 5
Explanation: We can get a sum to 17 (1+2+5+9) if the divisor is 1. 
If the divisor is 4 we can get a sum of 7 (1+1+2+3) and if the divisor is 5 the sum will be 5 (1+1+1+2).

Example 2:

Input: nums = [44,22,33,11,1], threshold = 5
Output: 44

Constraints:

1 <= nums.length <= 5 * 104
1 <= nums[i] <= 106
nums.length <= threshold <= 106

# Note: You don't need to sort the input array here because the answer is not required to be an element of the array; we are searching within the range from 1 to max(nums).
class Solution:
    def smallestDivisor(self, nums: List[int], threshold: int) -> int:
        l, r = 1, max(nums)
        while l <= r:
            m = (l + r) // 2
            total = 0
            if sum((x + m - 1) // m for x in nums) <= threshold:
                r = m - 1
            else:
                l = m + 1
        return l

1) When to Shrink `l` & `r`?

The direction of your search depends on the condition:

sum(mid) <= threshold: mid is feasible, but a smaller answer might exist → Search Left → r = mid - 1
sum(mid) > threshold: mid is too small, you must increase it to satisfy the condition → Search Right → l = mid + 1

2) Why is `l` the answer after the loop?

The loop terminates when l > r (specifically, l = r + 1).

r stops at the last position that fails the condition (the last False).
l stops exactly one position to the right of r (the first True).

Logic: Because we only move l right when mid is invalid, and move r left when mid is valid, the search concludes with(结束于) r at the last invalid value and l at the first valid value. Therefore, l is the smallest feasible answer.

Find the least

2187. Minimum Time to Complete Trips

You are given an array time where time[i] denotes the time taken by the ith bus to complete one trip.

Each bus can make multiple trips successively; that is, the next trip can start immediately after completing the current trip. Also, each bus operates independently; that is, the trips of one bus do not influence the trips of any other bus.

You are also given an integer totalTrips, which denotes the number of trips all buses should make in total. Return the minimum time required for all buses to complete at least totalTrips trips.

Example 1:

Input: time = [1,2,3], totalTrips = 5
Output: 3
Explanation:
- At time t = 1, the number of trips completed by each bus are [1,0,0]. 
  The total number of trips completed is 1 + 0 + 0 = 1.
- At time t = 2, the number of trips completed by each bus are [2,1,0]. 
  The total number of trips completed is 2 + 1 + 0 = 3.
- At time t = 3, the number of trips completed by each bus are [3,1,1]. 
  The total number of trips completed is 3 + 1 + 1 = 5.
So the minimum time needed for all buses to complete at least 5 trips is 3.

Example 2:

Input: time = [2], totalTrips = 1
Output: 2
Explanation:
There is only one bus, and it will complete its first trip at t = 2.
So the minimum time needed to complete 1 trip is 2.

Constraints:

1 <= time.length <= 105
1 <= time[i], totalTrips <= 107

# Ensure that the right boundary is large enough to cover totalTrips. The most reliable way is min(time) * totalTrips.
class Solution:
    def minimumTime(self, time: List[int], totalTrips: int) -> int:
        min_t = min(time)
        l = min_t
        r = min_t * totalTrips
        while l <= r:
            m = (l + r) // 2
            if sum(m // x for x in time) >= totalTrips:
                r = m - 1
            else:
                l = m + 1
        # After the loop, 'l' is the smallest time that satisfies the condition
        return l

1011. Capacity To Ship Packages Within D Days

A conveyor(传送带，传送装置；传播者，传达者) belt(腰带，皮带；传送带；地带) has packages that must be shipped from one port to another within days days.

The ith package on the conveyor belt has a weight of weights[i]. Each day, we load the ship with packages on the conveyor belt (in the order given by weights). We may not load more weight than the maximum weight capacity of the ship.

Return the least weight capacity of the ship that will result in all the packages on the conveyor belt being shipped within days days.

Example 1:

Input: weights = [1,2,3,4,5,6,7,8,9,10], days = 5
Output: 15
Explanation: A ship capacity of 15 is the minimum to ship all the packages in 5 days like this:
1st day: 1, 2, 3, 4, 5
2nd day: 6, 7
3rd day: 8
4th day: 9
5th day: 10

Note that the cargo must be shipped in the order given, so using a ship of capacity 14 and splitting the packages into parts like (2, 3, 4, 5), (1, 6, 7), (8), (9), (10) is not allowed.

Example 2:

Input: weights = [3,2,2,4,1,4], days = 3
Output: 6
Explanation: A ship capacity of 6 is the minimum to ship all the packages in 3 days like this:
1st day: 3, 2
2nd day: 2, 4
3rd day: 1, 4

Example 3:

Input: weights = [1,2,3,1,1], days = 4
Output: 3
Explanation:
1st day: 1
2nd day: 2
3rd day: 3
4th day: 1, 1

Constraints:

1 <= days <= weights.length <= 5 * 104
1 <= weights[i] <= 500

class Solution:
    def shipWithinDays(self, weights: List[int], days: int) -> int:
        # l: must be at least the heaviest package
        # r: the sum of all packages (shipping everything in 1 day)
        l, r = max(weights), sum(weights)
        
        while l <= r:
            mid = (l + r) // 2
            
            # Greedy check: how many days are needed with capacity 'mid'?
            need = 1
            cur = 0
            for w in weights:
                if cur + w <= mid:
                    cur += w
                else:
                    need += 1
                    cur = w # Start new day with the current package
            
            if need <= days:
                # Valid capacity, try to find a smaller one
                r = mid - 1
            else:
                # Capacity too small, need more power
                l = mid + 1
        return l

475. Heaters

Winter is coming! During the contest, your first job is to design a standard heater with a fixed warm radius to warm all the houses.

Every house can be warmed, as long as the house is within the heater's warm radius range.

Given the positions of houses and heaters on a horizontal line, return the minimum radius standard of heaters so that those heaters could cover all houses.

Notice that all the heaters follow your radius standard, and the warm radius will be the same.

Example 1:

Input: houses = [1,2,3], heaters = [2]
Output: 1
Explanation: The only heater was placed in the position 2, and if we use the radius 1 standard, then all the houses can be warmed.

Example 2:

Input: houses = [1,2,3,4], heaters = [1,4]
Output: 1
Explanation: The two heaters were placed at positions 1 and 4. We need to use a radius 1 standard, then all the houses can be warmed.

Example 3:

Input: houses = [1,5], heaters = [2]
Output: 3

Constraints:

1 <= houses.length, heaters.length <= 3 * 104
1 <= houses[i], heaters[i] <= 109

# Strategy: For each house, find the nearest heaters on both the left and right sides.
# The minimum radius required for a house is the distance to its closest heater.
# The global answer is the maximum of these minimum distances.
class Solution:
    def findRadius(self, houses: List[int], heaters: List[int]) -> int:
        ans = 0
        houses.sort()
        heaters.sort()
        n = len(heaters)
        
        for x in houses:
            # Binary Search for the first heater >= house (Lower Bound)
            i = bisect.bisect_left(heaters, x)
            
            # Distance to the nearest heater on the left (largest heater <= x)
            # If i == 0, no heater exists on the left
            ld = x - heaters[i - 1] if i > 0 else float('inf')
            
            # Distance to the nearest heater on the right (smallest heater >= x)
            # If i == n, no heater exists on the right
            rd = heaters[i] - x if i < n else float('inf')
            
            # The house only needs to be covered by the CLOSER of the two
            # Update global max radius to ensure this house (and all others) are covered
            ans = max(ans, min(ld, rd))
            
        return ans

875. Koko Eating Bananas

Koko loves to eat bananas. There are n piles(痔疮，堆) of bananas, the ith pile has piles[i] bananas. The guards have gone and will come back in h hours.

Koko can decide her bananas-per-hour eating speed of k. Each hour, she chooses some pile of bananas and eats k bananas from that pile. If the pile has less than k bananas, she eats all of them instead and will not eat any more bananas during this hour.

Koko likes to eat slowly but still wants to finish eating all the bananas before the guards return.

Return the minimum integer k such that she can eat all the bananas within h hours.

Example 1:

Input: piles = [3,6,7,11], h = 8
Output: 4

Example 2:

Input: piles = [30,11,23,4,20], h = 5
Output: 30

Example 3:

Input: piles = [30,11,23,4,20], h = 6
Output: 23

Constraints:

1 <= piles.length <= 104
piles.length <= h <= 109
1 <= piles[i] <= 109

class Solution:
    def minEatingSpeed(self, piles: List[int], h: int) -> int:
        # l: Smallest possible speed (1 banana/hr)
        # r: A "guaranteed feasible" upper bound (sum of all bananas)
        l = 1
        r = sum(piles) # Note: max(piles) is a tighter, more efficient bound
        
        while l <= r:
            m = (l + r) // 2
            
            # check(m): Calculate total hours needed at speed 'm'
            # (x + m - 1) // m is the integer version of math.ceil(x / m)
            hours_needed = sum((x + m - 1) // m for x in piles)
            
            if hours_needed <= h:
                # current speed 'm' is feasible (True), try a smaller speed
                r = m - 1
            else:
                # current speed 'm' is too slow (False), must increase speed
                l = m + 1
                
        # After the loop, l is the first speed that makes check(m) True
        return l

Find the most

275. H-Index II

Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper and citations is sorted in non-descending order, return the researcher's h-index.

According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.

You must write an algorithm that runs in logarithmic time.

Example 1:

Input: citations = [0,1,3,5,6]
Output: 3
Explanation: [0,1,3,5,6] means the researcher has 5 papers in total and each of them had received 0, 1, 3, 5, 6 citations respectively.
Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.

Example 2:

Input: citations = [1,2,100]
Output: 2

Constraints:

n == citations.length
1 <= n <= 105
0 <= citations[i] <= 1000
citations is sorted in ascending order.

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        n = len(citations)
        # Search Range: 0 to n (Max possible H-index is the number of papers)
        l, r = 0, n

        # Pattern: T T T...F F F (Looking for the LAST True)
        while l <= r:
            h = (l + r) // 2
            
            # check(h): Are there at least 'h' papers with >= 'h' citations?
            # Since sorted, citations[n-h] is the h-th largest value.
            if h == 0 or citations[n - h] >= h:
                # This 'h' works! Try a larger value to the right.
                l = h + 1      
            else:
                # Too many papers requested or citations too low. Search left.
                r = h - 1      

        # Per the "Last True" template, 'r' is the answer after l > r
        return r

1) The Strategy: "Find the Largest Valid H"

The H-Index definition states: "A scientist has index $h$ if $h$ of their $n$ papers have at least $h$ citations." Since the citations array is sorted, the papers with the most citations are at the end of the array.

The Condition: If we pick a value h, the paper at index n - h is the "weakest" paper in our set of $h$ papers. If citations[n - h] >= h, then all $h$ papers have at least $h$ citations.
Monotonicity: If a researcher satisfies the condition for $h=5$, they might satisfy it for $h=6$. If they fail for $h=5$, they will definitely fail for $h=6$.
Pattern: [T, T, T, T, F, F] — We want the Last True.

[LeetCode] 644. Maximum Average Subarray II

Given an array consisting of n integers, find the contiguous subarray whose length is greater than or equal to k that has the maximum average value. And you need to output the maximum average value.

Example 1:

Input: [1,12,-5,-6,50,3], k = 4
Output: 12.75
Explanation:
when length is 5, maximum average value is 10.8,
when length is 6, maximum average value is 9.16667.
Thus return 12.75.

Note:

1 <= k <= n <= 10,000.
Elements of the given array will be in range [-10,000, 10,000].
The answer with the calculation error less than 10-5 will be accepted.

class Solution:
    def findMaxAverage(self, nums: List[int], k: int) -> float:
        n = len(nums)

        def check(mid: float) -> bool:
            # Transform: sum(nums[i] - mid) >= 0
            pre = 0.0      # sum(b[0..i])
            pre_k = 0.0    # sum(b[0..i-k])
            min_pre = 0.0  # min(pre[0...i-k])

            # Initial window of size k
            for i in range(k):
                pre += nums[i] - mid
            if pre >= 0: return True

            # Sliding window with variable start
            for i in range(k, n):
                pre += nums[i] - mid
                pre_k += nums[i - k] - mid
                # Greedy: keep track of the smallest prefix sum seen so far
                # that allows for a subarray length >= k
                min_pre = min(min_pre, pre_k)
                
                if pre - min_pre >= 0:
                    return True
            return False

        # Range: Between the smallest and largest possible numbers
        l, r = min(nums), max(nums)
        eps = 1e-5 # Precision threshold

        # Binary search for the maximum feasible average
        while r - l > eps:
            mid = (l + r) / 2
            if check(mid):
                l = mid  # mid is feasible, try to increase it
            else:
                r = mid  # mid is too high, decrease it
        
        return l

1) How the `check` Function Works: Prefix Sums & Greedy Strategy

The check function validates the condition in $O(n)$ time using a combination of prefix sums and a greedy sliding window:

Pre-processing: Subtract mid from every element in the array ($nums[i] - mid$). This transforms the problem from finding an average to finding a subarray sum $\ge 0$.
Sliding Window:
- Maintain the current prefix sum pre (representing the sum from $0$ to $i$).
- Maintain min_pre (representing the minimum prefix sum encountered between index $0$ and $i-k$).
Greedy Validation:
- We are looking for a pair $(i, j)$ such that pre[i] - pre[j] $\ge 0$(where $i - j \ge k$) .
- To maximize this difference, we simply subtract the smallest prefix sum (min_pre) that occurred at least $k$ positions ago.
- If pre - min_pre $\ge 0$, we have successfully found a subarray of length $\ge k$ that satisfies the condition.

2226. Maximum Candies Allocated to K Children

You are given a 0-indexed integer array candies. Each element in the array denotes a pile of candies of size candies[i]. You can divide each pile into any number of sub piles, but you cannot merge two piles together.

You are also given an integer k. You should allocate piles of candies to k children such that each child gets the same number of candies. Each child can be allocated candies from only one pile of candies and some piles of candies may go unused.

Return the maximum number of candies each child can get.

Example 1:

Input: candies = [5,8,6], k = 3
Output: 5
Explanation: We can divide candies[1] into 2 piles of size 5 and 3, and candies[2] into 2 piles of size 5 and 1. We now have five piles of candies of sizes 5, 5, 3, 5, and 1. We can allocate the 3 piles of size 5 to 3 children. It can be proven that each child cannot receive more than 5 candies.

Example 2:

Input: candies = [2,5], k = 11
Output: 0
Explanation: There are 11 children but only 7 candies in total, so it is impossible to ensure each child receives at least one candy. Thus, each child gets no candy and the answer is 0.

Constraints:

1 <= candies.length <= 105
1 <= candies[i] <= 107
1 <= k <= 1012

# In binary search for the **maximum** value (`TTTTFFFF`), we move the left boundary (`l = m + 1`) when the condition is met to seek a larger valid answer, ultimately returning **`r`** as the last "True" position.
class Solution:
    def maximumCandies(self, candies: List[int], k: int) -> int:
        def ok(m):
            cnt = 0
            for x in candies:
                cnt += x // m
                if cnt >= k:
                    return True
            return False
        l = 1
        r = max(candies)
        while l <= r:
            m = (l + r) // 2
            if ok(m):
                l = m + 1
            else:
                r = m - 1
        return r

Binary Search on an Indirect Value

215. Kth Largest Element in an Array

Given an integer array nums and an integer k, return the kth largest element in the array.

Note that it is the kth largest element in the sorted order, not the kth distinct element.

Can you solve it without sorting?

Example 1:

Input: nums = [3,2,1,5,6,4], k = 2
Output: 5

Example 2:

Input: nums = [3,2,3,1,2,4,5,5,6], k = 4
Output: 4

Constraints:

1 <= k <= nums.length <= 105
-104 <= nums[i] <= 104

class Solution:
    def findKthLargest(self, nums: List[int], k: int) -> int:
        l = min(nums)
        r = max(nums)
        while l <= r:
            m = (l + r) // 2
            cnt = sum(x >= m for x in nums)
            if cnt >= k:
                l = m + 1
            else:
                r = m - 1
        return r

// std::numeric_limits<int>::min()
// std::numeric_limits<int>::max()
class Solution {
public:
    int findKthLargest(vector<int>& nums, int k) {
        auto [mn, mx] = minmax_element(nums.begin(), nums.end());
        int l = *mn;
        int r = *mx;
        auto checkKLargest = [&](int m) {
            int cnt = 0;
            for (int x : nums) {
                if (x >= m) cnt++;
            }
            return cnt >= k; 
        };
        while (l <= r) {
            int m = l + (r - l) / 2;
            if (checkKLargest(m)) {
                l = m + 1;
            } else {
                r = m - 1;
            }
        }
        return r;
    }
};

[LeetCode] Search in a Sorted Array of Unknown Size

Given an integer array sorted in ascending order, write a function to search target in nums. If target exists, then return its index, otherwise return -1. However, the array size is unknown to you. You may only access the array using an ArrayReader interface, where ArrayReader.get(k) returns the element of the array at index k (0-indexed).

You may assume all integers in the array are less than 10000, and if you access the array out of bounds, ArrayReader.get will return 2147483647.

Example 1:

array
target
nums

Example 2:

array
target
nums

Note:

You may assume that all elements in the array are unique.
The value of each element in the array will be in the range [-9999, 9999].

# Exponential expansion is a method used to find a valid search boundary when the size of a sorted array is unknown or unbounded.
def doubling_search(nums, target):
    l = 0
    r = 1
    while True:
        try:
            if nums[r] < target:
                l = r
                r *= 2
            else:
                break
        except IndexError:
            break

    while l <= r:
        m = (l + r) // 2
        try:
            val = nums[m]
        except IndexError:
            r = m - 1
            continue

        if val > target:
            r = m - 1
        elif val == target:
            return m
        else:
            l = m + 1
    return -1

The Kth Smallest/Biggest

Nearly everyone has used the Multiplication Table. The multiplication table of size m x n is an integer matrix mat where mat[i][j] == i * j (1-indexed).

Given three integers m, n, and k, return the kth smallest element in the m x n multiplication table.

Example 1:

Input: m = 3, n = 3, k = 5
Output: 3
Explanation: The 5th smallest number is 3.

Example 2:

Input: m = 2, n = 3, k = 6
Output: 6
Explanation: The 6th smallest number is 6.

Constraints:

1 <= m, n <= 3 * 104
1 <= k <= m * n

class Solution:
    def findKthNumber(self, m: int, n: int, k: int) -> int:
        def count(x):
            cnt = 0
            for i in range(1, m + 1):
                cnt += min(x // i, n)
            return cnt

        l = 0
        r = m * n
        while l <= r:
            mid = (l + r) // 2
            if count(mid) >= k:
                r = mid - 1
            else:
                l = mid + 1
        return l

class Solution {
public:
    int findKthNumber(int m, int n, int k) {
        int l = 0;
        int r = m * n;
        
        // Capture external variables by reference using &.
        auto count = [&](int x) {
            int cnt = 0;
            for (int i = 1; i <= m; i++) {
                cnt += min(x / i, n);
            }
            return cnt;
        }; // <--- IMPORTANT: Do not forget the semicolon after the lambda definition!

        while (l <= r) {
            int mid = l + (r - l) / 2;
            if (count(mid) >= k) {
                r = mid - 1;
            } else {
                l = mid + 1;
            }
        }
        return l;
    }
};

Algorithm Study Plan

Tue, 20 Jan 2026 00:00:00 GMT

Plan

按照专题刷题，而不是随机刷题。同一个专题，一个套路可以解决多个题目，刷题效率高。此外，这能让你从不同的角度去观察、思考同一个算法，从而深刻地理解算法的本质。螺旋上升式学习：先完成难度分 ≤1700 的题目。把各个题单、各个知识点的基础题刷一遍，再刷更难的题目。难度分低的题目一般只会考察一个知识点，而难度分高的题目会同时考察多个知识点。

https://leetcode.cn/discuss/post/3141566/ru-he-ke-xue-shua-ti-by-endlesscheng-q3yd/

Degree of Completion

Sliding Window: 2026.1.19 - 2026.1.23

At most k distinct elements (by using len(cnt) to check if the window is valid)
At most k occurrences of each element (by using cnt[s[r]] > k to check if the window is valid. If adding s[r] makes the window not valid, we need shrink the window to make it valid again)
Exactly K Distinct Elements = At most(K) - At most(K-1)
Some variant questions:
- Including exactly 5 vowels(other character are not vowels) need to divide and conquer firstly. and then do the at most
- exactly

Sliding Window

Tue, 20 Jan 2026 00:00:00 GMT

Sliding Window is a way to look at a small part of data and move it forward one step at a time, instead of starting over each time.

Fixed-length sliding window

1456. Maximum Number of Vowels in a Substring of Given Length

Given a string s and an integer k, return the maximum number of vowel letters in any substring of s with length k.

Vowel letters in English are 'a', 'e', 'i', 'o', and 'u'.

Example 1:

Input: s = "abciiidef", k = 3
Output: 3
Explanation: The substring "iii" contains 3 vowel letters.

Example 2:

Input: s = "aeiou", k = 2
Output: 2
Explanation: Any substring of length 2 contains 2 vowels.

Example 3:

Input: s = "leetcode", k = 3
Output: 2
Explanation: "lee", "eet" and "ode" contain 2 vowels.

Constraints:

1 <= s.length <= 105
s consists of lowercase English letters.
1 <= k <= s.length

class Solution {
public:
    bool isVowel(char c) {
        return c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u';
    }
    
    int maxVowels(string s, int k) {
        // const std::set<char> vowelSet{'a', 'e', 'i', 'o', 'u'};
        int n = s.size();
        int vowelCnt = 0;
        int ans = 0;
        for (int i = 0; i < n; i++) {
            if (isVowel(s[i])) vowelCnt++;
            if (i < k - 1) continue;
            ans = std::max(ans, vowelCnt);
            if (isVowel(s[i - k + 1])) vowelCnt--;
        }
        return ans;
    }
    
};

class Solution:
    def maxVowels(self, s: str, k: int) -> int:
        ans = 0
        vowel_cnt = 0

        for i, c in enumerate(s):
            if c in "aieou":
                vowel_cnt += 1
            if i < k - 1:
                continue
            ans = max(ans, vowel_cnt)
            if s[i - k + 1] in "aieou":
                vowel_cnt -= 1
        return ans

643. Maximum Average Subarray I

You are given an integer array nums consisting of n elements, and an integer k.

Find a contiguous subarray whose length is equal to k that has the maximum average value and return this value. Any answer with a calculation error less than 10-5 will be accepted.

Example 1:

Input: nums = [1,12,-5,-6,50,3], k = 4
Output: 12.75000
Explanation: Maximum average is (12 - 5 - 6 + 50) / 4 = 51 / 4 = 12.75

Example 2:

Input: nums = [5], k = 1
Output: 5.00000

Constraints:

n == nums.length
1 <= k <= n <= 105
-104 <= nums[i] <= 104

class Solution {
public:
    double findMaxAverage(vector<int>& nums, int k) {
        double ans = -1e18;
        double sum = 0;
        int n = nums.size();
        for (int i = 0; i < n; i++) {
            sum += nums[i];
            if (i < k - 1) continue;
            ans = std::max(ans, sum / k);
            sum -= nums[i - k + 1];
        }
        return ans;
    }
};

class Solution:
    def findMaxAverage(self, nums: List[int], k: int) -> float:
        ans = float('-inf')
        sum = 0
        for i in range(len(nums)):
            sum += nums[i]
            if i < k - 1:
                continue
            ans = max(ans, sum / k)
            sum -= nums[i - k + 1]
        return ans

2461. Maximum Sum of Distinct Subarrays With Length K

You are given an integer array nums and an integer k. Find the maximum subarray sum of all the subarrays of nums that meet the following conditions:

The length of the subarray is k, and
All the elements of the subarray are distinct.

Return the maximum subarray sum of all the subarrays that meet the conditions*.* If no subarray meets the conditions, return 0.

A subarray is a contiguous non-empty sequence of elements within an array.

Example 1:

Input: nums = [1,5,4,2,9,9,9], k = 3
Output: 15
Explanation: The subarrays of nums with length 3 are:
- [1,5,4] which meets the requirements and has a sum of 10.
- [5,4,2] which meets the requirements and has a sum of 11.
- [4,2,9] which meets the requirements and has a sum of 15.
- [2,9,9] which does not meet the requirements because the element 9 is repeated.
- [9,9,9] which does not meet the requirements because the element 9 is repeated.
We return 15 because it is the maximum subarray sum of all the subarrays that meet the conditions

Example 2:

Input: nums = [4,4,4], k = 3
Output: 0
Explanation: The subarrays of nums with length 3 are:
- [4,4,4] which does not meet the requirements because the element 4 is repeated.
We return 0 because no subarrays meet the conditions.

Constraints:

1 <= k <= nums.length <= 105
1 <= nums[i] <= 105

class Solution {
public:
    long long maximumSubarraySum(vector<int>& nums, int k) {
        long long ans = 0;
        long long sum = 0;
        std::unordered_map<int, int> map; // stores the frequency of elements in the window
        int left = 0;
        int n = nums.size();
        for (int right = 0; right < n; right++) {
            int x = nums[right];
            map[x]++;
            sum += x;
            left = right - k + 1;
            if (left < 0) continue;
            if (map.size() == k) ans = std::max(ans, sum); // is used to check whether duplicates exist
            int o = nums[left];
            sum -= o;
            map[o]--;
            if (map[o] == 0) map.erase(o);
        }
        return ans;
    }
};

class Solution:
    def maximumSubarraySum(self, nums: List[int], k: int) -> int:
        ans, sum, left = 0, 0, 0
        d = defaultdict(int)

        for right, x in enumerate(nums):
            d[x] += 1
            sum += x
            left = right - k + 1
            if left < 0:
                continue
            if len(d) == k: # d stores the frequency of elements in the window
                ans = max(ans, sum)
            
            out = nums[left]
            sum -= out
            d[out] -= 1
            if d[out] == 0:
                del d[out] 
        return ans

30. Substring with Concatenation of All Words

You are given a string s and an array of strings words. All the strings of words are of the same length.

A concatenated string is a string that exactly contains all the strings of any permutation of words concatenated.

For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated string because it is not the concatenation of any permutation of words.

Return an array of the starting indices of all the concatenated substrings in s. You can return the answer in any order.

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]

Output: [0,9]

Explanation:

The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words. The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]

Output: []

Explanation:

There is no concatenated substring.

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]

Output: [6,9,12]

Explanation:

The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"]. The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"]. The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"].

Constraints:

1 <= s.length <= 104
1 <= words.length <= 5000
1 <= words[i].length <= 30
s and words[i] consist of lowercase English letters.

// ❌ 不在 → [l, r) → 不加 1 ✅ 在 → [l, r] → 加 1

class Solution {
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        vector<int> ans;
        int n = s.size();
        int m = words.size();
        int wordLen = words[0].size();
        int winLen = m * wordLen;

        map<string, int> map1;
        for (const string& word : words) {
            map1[word]++;
        }

      
        for (int i = 0; i < wordLen; i++) {
            map<string, int> map2;

            for (int j = i; j + wordLen <= n; j += wordLen) {
                // 1. 进窗口
                string w = s.substr(j, wordLen);
                map2[w]++;

                // 2. 窗口还没满
                if (j + wordLen - i < winLen) continue;

                // 3. 判断
                if (map1 == map2)
                    ans.push_back(j + wordLen - winLen);

                // 4. 出窗口
                string out = s.substr(j + wordLen - winLen, wordLen);
                map2[out]--;
                if (map2[out] == 0) map2.erase(out);
            }
        }

        return ans;
    }
};

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        ans = []
        n = len(s)
        word_len = len(words[0])
        win_len = len(words) * word_len
        d1 = defaultdict(int)

        for w in words:
            d1[w] += 1

        for i in range(word_len):
            d2 = defaultdict(int)

            for j in range(i + word_len, n + 1, word_len):
                w = s[j-word_len:j]
                d2[w] += 1
                
                if j - i < win_len:
                    continue

                if d1 == d2:
                    ans.append(j - win_len)

                out = s[j - win_len : j - win_len + word_len]
                d2[out] -= 1
                if d2[out] == 0:
                    del d2[out]
        return ans

1004. Max Consecutive Ones III

Given a binary array nums and an integer k, return the maximum number of consecutive 1's in the array if you can flip at most k 0's.↳

Example 1:

Input: nums = [1,1,1,0,0,0,1,1,1,1,0], k = 2
Output: 6
Explanation: [1,1,1,0,0,1,1,1,1,1,1]
Bolded numbers were flipped from 0 to 1. The longest subarray is underlined.

Example 2:

Input: nums = [0,0,1,1,0,0,1,1,1,0,1,1,0,0,0,1,1,1,1], k = 3
Output: 10
Explanation: [0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1]
Bolded numbers were flipped from 0 to 1. The longest subarray is underlined.

Constraints:

1 <= nums.length <= 105
nums[i] is either 0 or 1.
0 <= k <= nums.length

class Solution:
    def longestOnes(self, nums: List[int], k: int) -> int:
        ans = 0
        l = 0
        cnt = 0
        for r, x in enumerate(nums):
            cnt += int(not x)
            while cnt > k:
                cnt -= int(not nums[l])
                l += 1
            ans = max(ans, r - l + 1)
        return ans

class Solution {
public:
    int longestOnes(vector<int>& nums, int k) {
        // you can flip at most k 0, it means the window has at most k zero, it can be 0, 1, to k.
        int ans = 0;
        int cnt = 0;
        int l = 0;
        int n = nums.size();
        for (int r = 0; r < n; r++) {
            // accumulate the amount of zero
            cnt += !nums[r];
            while (cnt > k) {
                cnt -= !nums[l];
                l++;
            }
            ans = max(ans, r - l + 1);
        }
        return ans;
    }
};

Variable-length sliding window

Variable-length sliding windows are mainly divided into three categories: finding the longest subarray, finding the shortest subarray, and finding the number of subarrays.

A sliding window is equivalent to maintaining a queue . Moving the right pointer can be seen as enqueuing , and moving the left pointer can be seen as dequeuing .

< ≤ → direct sliding window == → at most K − at most (K − 1) ≥ → total − at most (K − 1)

At Most

:::important

At Most（至多）

≤ K
单调性最好
滑动窗口首选

“至多”之所以单调性最好，是因为扩张和收缩对合法性的影响方向是完全相反且确定的。

at most K distinct
at most K occurrences
sum ≤ K

:::

1446. Consecutive Characters

The power of the string is the maximum length of a non-empty substring that contains only one unique character.

Given a string s, return the power of s.

Example 1:

Input: s = "leetcode"
Output: 2
Explanation: The substring "ee" is of length 2 with the character 'e' only.

Example 2:

Input: s = "abbcccddddeeeeedcba"
Output: 5
Explanation: The substring "eeeee" is of length 5 with the character 'e' only.

Constraints:

1 <= s.length <= 500
s consists of only lowercase English letters.

class Solution:
    def maxPower(self, s: str) -> int:
        ans = 0
        l = 0
        cnt = defaultdict(int)
        for r, c in enumerate(s):
            cnt[c] += 1
            while len(cnt) > 1:
                cnt[s[l]] -= 1
                if cnt[s[l]] == 0: #因为你的check条件是size，所以当这个map中的某个元素的个数为0，要把它移出，不移除会让这个check条件一直符合，最终会让l一直++，到最后会超过s的长度，导致溢出。
                    del cnt[s[l]]
                l += 1
            ans = max(ans, r - l + 1)
        return ans

class Solution {
public:
    int maxPower(string s) {
        int ans = 0;
        int l = 0;
        int n = s.size();
        map<char, int> cnt;
        for (int r = 0; r < n; r++) {
            cnt[s[r]]++;
            while (cnt.size() > 1) {
                cnt[s[l]]--;
                if (cnt[s[l]] == 0) {
                    cnt.erase(s[l]);
                }
                l++;
            }
            ans = max(ans, r - l + 1);
        }
        return ans;
    }
};

1004. Max Consecutive Ones III

Given a binary array nums and an integer k, return the maximum number of consecutive 1's in the array if you can flip at most k 0's.

Example 1:

Input: nums = [1,1,1,0,0,0,1,1,1,1,0], k = 2
Output: 6
Explanation: [1,1,1,0,0,1,1,1,1,1,1]
Bolded numbers were flipped from 0 to 1. The longest subarray is underlined.

Example 2:

Input: nums = [0,0,1,1,0,0,1,1,1,0,1,1,0,0,0,1,1,1,1], k = 3
Output: 10
Explanation: [0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1]
Bolded numbers were flipped from 0 to 1. The longest subarray is underlined.

Constraints:

1 <= nums.length <= 105
nums[i] is either 0 or 1.
0 <= k <= nums.length

class Solution:
    def longestOnes(self, nums: List[int], k: int) -> int:
        ans = 0
        l = 0
        cnt = 0
        for r, x in enumerate(nums):
            cnt += int(not x)
            while cnt > k:
                cnt -= int(not nums[l])
                l += 1
            ans = max(ans, r - l + 1)
        return ans

class Solution {
public:
    int longestOnes(vector<int>& nums, int k) {
        // you can flip at most k 0, it means the window has at most k zero, it can be 0, 1, to k.
        int ans = 0;
        int cnt = 0;
        int l = 0;
        int n = nums.size();
        for (int r = 0; r < n; r++) {
            // accumulate the amount of zero
            cnt += !nums[r];
            while (cnt > k) {
                cnt -= !nums[l];
                l++;
            }
            ans = max(ans, r - l + 1);
        }
        return ans;
    }
};

[LeetCode] 340. Longest Substring with At Most K Distinct Characters (at most k distinct elements )

Given a string, find the length of the longest substring T that contains at most k distinct characters.

Example 1:

Input: s = "eceba", k = 2
Output: 3
Explanation: T is "ece" which its length is 3.

Example 2:

Input: s = "aa", k = 1
Output: 2
Explanation: T is "aa" which its length is 2.

class Solution {
public:
    int lengthOfLongestSubstringKDistinct(string s, int k) {
        int ans = 0;
        map<char, int> cnt;
        int l = 0;
        int n = s.size();
        for (int r = 0; r < n; r++) {
            cnt[s[r]]++;
            while (cnt.size() > k) {
                cnt[s[l]]--;
                if (cnt[s[l]] == 0) {
                    cnt.erase(s[l]);
                }
                l++;
            }
            ans = max(ans, r - l + 1);
        }
        return ans;
    }
};

class Solution:
    def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
        ans = 0
        cnt = defaultdict(int)
        l = 0
        for r, c in enumerate(s):
            cnt[c] += 1 # cnt[c] is an integer (the frequency of character c)
            while len(cnt) > k: // we need check how many distinct characters are currently in the sliding window. 
                cnt[s[l]] -= 1
                if cnt[s[l]] == 0:
                    del cnt[s[l]]
                l += 1
            ans = max(ans, r - l + 1)
        return ans

904. Fruit Into Baskets (at most 2 distinct elements )

You are visiting a farm that has a single row of fruit trees arranged from left to right. The trees are represented by an integer array fruits where fruits[i] is the type of fruit the ith tree produces.

You want to collect as much fruit as possible. However, the owner has some strict rules that you must follow:

You only have two baskets, and each basket can only hold a single type of fruit. There is no limit on the amount of fruit each basket can hold.
Starting from any tree of your choice, you must pick exactly one fruit from every tree (including the start tree) while moving to the right. The picked fruits must fit in one of your baskets.
Once you reach a tree with fruit that cannot fit in your baskets, you must stop.

Given the integer array fruits, return the maximum number of fruits you can pick.

Example 1:

Input: fruits = [1,2,1]
Output: 3
Explanation: We can pick from all 3 trees.

Example 2:

Input: fruits = [0,1,2,2]
Output: 3
Explanation: We can pick from trees [1,2,2].
If we had started at the first tree, we would only pick from trees [0,1].

Example 3:

Input: fruits = [1,2,3,2,2]
Output: 4
Explanation: We can pick from trees [2,3,2,2].
If we had started at the first tree, we would only pick from trees [1,2].

Constraints:

1 <= fruits.length <= 105
0 <= fruits[i] < fruits.length

class Solution {
public:
    int totalFruit(vector<int>& fruits) {
        int ans = 0;
        map<int, int> cnt; // count each fruit type inside the window
        int l = 0;
        int n = fruits.size();
        for (int r = 0; r < n; r++) {
            cnt[fruits[r]]++; // do increment
            while (cnt.size() > 2) { // if the size of the fruit type exceed 2, we need shrink the window from the left
                cnt[fruits[l]]--;
                if (cnt[fruits[l]] == 0) {
                    cnt.erase(fruits[l]);
                }
                l++;
            }
            ans = max(ans, r - l + 1);
        }
        return ans;
    }
};

class Solution:
    def totalFruit(self, fruits: List[int]) -> int:
        ans = 0
        cnt = defaultdict(int)
        l = 0
        for r, x in enumerate(fruits):
            cnt[x] += 1
            while len(cnt) > 2: # reflects number of fruit types
                cnt[fruits[l]] -= 1 
                if cnt[fruits[l]] == 0:
                    del cnt[fruits[l]]
                l += 1
            ans = max(ans, r - l + 1)
        return ans

3. Longest Substring Without Repeating Characters （at most 1 occurrence）

Given a string s, find the length of the longest substring without duplicate characters.

Example 1:

Input: s = "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3. Note that "bca" and "cab" are also correct answers.

Example 2:

Input: s = "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Example 3:

Input: s = "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.

class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        int ans = 0;
        int l = 0; // l is leftbound and r is right bound
        map<char, int> cnt;
        int n = s.size();
        for (int r = 0; r < n; r++) {
            cnt[s[r]]++;
            
            // if the occurrences of the current character is greater than one, it is not valid substring, we need shrink the left boundary to make it valid
            while (cnt[s[r]] > 1) {
                cnt[s[l]]--;
                if (cnt[s[l]] == 0) {
                    cnt.erase(s[l]);
                }
                l++;
            }
            ans = max(ans, r - l + 1); 
            
        }
        return ans;
    }
};

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        ans = 0
        l = 0
        cnt = defaultdict(int)
        for r, c in enumerate(s):
            cnt[c] += 1
            while cnt[c] > 1:
                cnt[s[l]] -= 1
                l += 1
            ans = max(ans, r - l + 1)
        return ans

3090. Maximum Length Substring With Two Occurrences （at most 2 occurrences）

Given a string s, return the maximum length of a substring such that it contains at most two occurrences of each character.

Example 1:

Input: s = "bcbbbcba"

Output: 4

Explanation:

The following substring has a

length

of 4 and contains at most two

occurrences

of each character: "bcbbbcba".

Example 2:

Input: s = "aaaa"

Output: 2

Explanation:

The following substring has a

length

of 2 and contains at most two

occurrences

of each character: "aaaa".

Constraints:

2 <= s.length <= 100
s consists only of lowercase English letters.

class Solution {
public:
    int maximumLengthSubstring(string s) {
        int ans = 0;
        map<char, int> cnt;
        int l = 0;
        int n = s.size();
        for (int r = 0; r < n; r++) {
            cnt[s[r]]++;
            while (cnt[s[r]] > 2) { // if the count exceed 2, the window becomes invalid. we will shrink the window from the left
                cnt[s[l]]--; // decreses the count of s[l]
                l++; // move the left boundary to next position
            }
            ans = max(ans, r - l + 1);
        }
        return ans;
    }
};

class Solution:
    def maximumLengthSubstring(self, s: str) -> int:
        ans = 0
        d = defaultdict(int)
        l = 0
        for r, c in enumerate(s):
            d[c] += 1
            while d[c] > 2:
                d[s[l]] -= 1
                l += 1
            ans = max(ans, r - l + 1)
        return ans

Exactly K Distinct

:::important

Exactly K Distinct Elements

:::

1248. Count Number of Nice Subarrays

Given an array of integers nums and an integer k. A continuous subarray is called nice if there are k odd numbers on it.

Return the number of nice sub-arrays.

Example 1:

Input: nums = [1,1,2,1,1], k = 3
Output: 2
Explanation: The only sub-arrays with 3 odd numbers are [1,1,2,1] and [1,2,1,1].

Example 2:

Input: nums = [2,4,6], k = 1
Output: 0
Explanation: There are no odd numbers in the array.

Example 3:

Input: nums = [2,2,2,1,2,2,1,2,2,2], k = 2
Output: 16

Constraints:

1 <= nums.length <= 50000
1 <= nums[i] <= 10^5
1 <= k <= nums.length

class Solution {
public:
    int numberOfSubarrays(vector<int>& nums, int k) {
        return atMostNumberOfSubarrays(nums, k)
            - atMostNumberOfSubarrays(nums, k - 1);

    }

private:
    int atMostNumberOfSubarrays(vector<int>& nums, int k) {
        int ans = 0;
        int oddCnt = 0;
        int left = 0;
        int n = nums.size();
        for (int right = 0; right < n; right++) {
            oddCnt += nums[right] & 1;
            while (oddCnt > k) {
                oddCnt -= nums[left] & 1;
                left++;
            }
            ans += right - left + 1;
        }
        return ans;
    }
};

# Why exactly K cannot be handled directly with a sliding window
# Imagine the current window contains exactly k odd numbers.
# When the right pointer moves:
# If the new number is odd → the count becomes k + 1 (invalid)
# If the new number is even → the count stays k (still valid)
# When the left pointer moves:
# If the removed number is odd → the count becomes k − 1 (invalid)
# If the removed number is even → the count stays k (still valid)
class Solution:
    """
    exactly K = at most K - at most (k - 1)
    """

    def numberOfSubarrays(self, nums: List[int], k: int) -> int: 
        # This nested function exists only to help solve this problem.
        # share the context via closures
        def atMost(k):
            ans = l = cnt = 0
            for r, x in enumerate(nums):
                cnt += x & 1
                while cnt > k:
                    cnt -= nums[l] & 1
                    l += 1
                # when the while loop finishes, cnt must be less than or equal to k, it means the window [l, r] satisfies the constraint. so all subarrays ending at r and starting from any index between l and r are valid, so we add r - l + 1.
                ans += r - l + 1
            return ans
        
        # To compute the number of subarrays with exactly k odd numbers by using the numbers of subarrays with at most k odd numbers to substracting the one with at most k - 1 odd numbers.
        return atMost(k) - atMost(k - 1)

2062. Count Vowel Substrings of a String （exactly including 5 all vowels 这个和其他exactly k 不一样）

:::note

For “contains all 5 vowels”, expanding r is monotonic (valid stays valid), but shrinking l can break validity, so the classic “invalid → shrink” template doesn’t directly apply.

那还能用滑窗吗？

可以，只是换一种形式（合法时缩）：

不合法 → 扩 r 去变合法
合法 → 尝试缩 l，并统计答案

扩到合法，再缩到刚好不合法，再继续扩

:::

A substring is a contiguous (non-empty) sequence of characters within a string.

A vowel substring is a substring that only consists of vowels ('a', 'e', 'i', 'o', and 'u') and has all five vowels present in it.

Given a string word, return the number of vowel substrings in word.

Example 1:

Input: word = "aeiouu"
Output: 2
Explanation: The vowel substrings of word are as follows (underlined):
- "aeiouu"
- "aeiouu"

Example 2:

Input: word = "unicornarihan"
Output: 0
Explanation: Not all 5 vowels are present, so there are no vowel substrings.

Example 3:

Input: word = "cuaieuouac"
Output: 7
Explanation: The vowel substrings of word are as follows (underlined):
- "cuaieuouac"
- "cuaieuouac"
- "cuaieuouac"
- "cuaieuouac"
- "cuaieuouac"
- "cuaieuouac"
- "cuaieuouac"

Constraints:

1 <= word.length <= 100
word consists of lowercase English letters only.

class Solution {
public:
    bool isVowel(char c) {
        return c == 'a' || c == 'e' || c == 'i'
            || c == 'o' || c == 'u';
    }

    int countVowelSubstrings(string word) {
        int ans = 0;
        int n = word.size();
        
        int i = 0;
        while (i < n) {
            if (!isVowel(word[i])) {
                i++;
                continue;
            }

            int j = i;
            while (j < n && isVowel(word[j])) j++;

            if (j - i >= 5) {
                unordered_map<char, int> cnt;
                int l = i;
                for (int r = i; r < j; r++) {
                    cnt[word[r]]++;
                    while (cnt.size() == 5) {
                        ans += j - r; // [l, r] is a valid substring, so [l, r], [l, r+1], [l, r+2], ..., [l, j-1] are also valid.

                        cnt[word[l]]--;
                        if (cnt[word[l]] == 0)
                            cnt.erase(word[l]);
                        l++;
                    }
                }
            }
            i = j;
        }

        return ans;
    }
};

class Solution:
    def countVowelSubstrings(self, word: str) -> int:
        ans = 0
        vowels = set('aeiou')
        i = 0
        while i < len(word):
            if word[i] not in vowels:
                i += 1
                continue
            j = i
            while j < len(word) and word[j] in vowels:
                j += 1
            if j - i >= len(vowels):
                cnt = defaultdict(int)
                l = i
                for r in range(i, j):
                    cnt[word[r]] += 1
                    while len(cnt) == len(vowels):
                        ans += j - r
                        cnt[word[l]] -= 1
                        if cnt[word[l]] == 0:
                            del cnt[word[l]]
                        l += 1
            i = j

        return ans

class Solution:
    def countVowelSubstrings(self, word: str) -> int:
        """
        当窗口 [l, r] 已经包含 a e i o u：
        [l, r]
        [l, r+1]
        [l, r+2]
        ……
        全部都是合法子串
        所以是 len(s) - r
        """
        ans = 0
        for s in re.findall(r'[aeiou]+', word):
            if len(s) < 5:
                continue
            cnt = defaultdict(int)
            l = 0
            for r, c in enumerate(s):
                cnt[c] += 1
                while len(cnt) == 5:
                    ans += len(s) - r
                    cnt[s[l]] -= 1
                    if cnt[s[l]] == 0:
                        del cnt[s[l]]
                    l += 1
        return ans

992. Subarrays with K Different Integers （Exactly K Different Integers）

Given an integer array nums and an integer k, return the number of good subarrays of nums.

A good array is an array where the number of different integers in that array is exactly k.

For example, [1,2,3,1,2] has 3 different integers: 1, 2, and 3.

A subarray is a contiguous part of an array.

Example 1:

Input: nums = [1,2,1,2,3], k = 2
Output: 7
Explanation: Subarrays formed with exactly 2 different integers: [1,2], [2,1], [1,2], [2,3], [1,2,1], [2,1,2], [1,2,1,2]

Example 2:

Input: nums = [1,2,1,3,4], k = 3
Output: 3
Explanation: Subarrays formed with exactly 3 different integers: [1,2,1,3], [2,1,3], [1,3,4].

Constraints:

1 <= nums.length <= 2 * 104
1 <= nums[i], k <= nums.length

class Solution {
public:
    int subarrayAtMostKDistinct(vector<int>& nums, int k) {
        int n = nums.size();
        int ans = 0;
        map<int, int>cnt;
        int l = 0;
        for (int r = 0; r < n; r++) {
            cnt[nums[r]]++;
            while (cnt.size() > k) { // distinc elements is greater than k
                cnt[nums[l]]--;
                if (cnt[nums[l]] == 0) {
                    cnt.erase(nums[l]);
                }
                l++;
            }
            /**
            [l, r] 中最多有 k 个不同元素

            [l, r]
            [l+1, r]
            [l+2, r]
            ...
            [r, r]

            r - l + 1 个
            **/
            ans += r - l + 1;
        }
        return ans;

    }
    int subarraysWithKDistinct(vector<int>& nums, int k) {
        return subarrayAtMostKDistinct(nums, k) - subarrayAtMostKDistinct(nums, k - 1);
    }
};

class Solution:
    def subarraysWithKDistinct(self, nums: List[int], k: int) -> int:
        def atMost(K):
            cnt = defaultdict(int)
            l = 0
            res = 0

            for r, x in enumerate(nums):
                cnt[x] += 1

                while len(cnt) > K:
                    cnt[nums[l]] -= 1
                    if cnt[nums[l]] == 0:
                        del cnt[nums[l]]
                    l += 1

                res += (r - l + 1)
            return res
        
        return atMost(k) - atMost(k - 1)

713. Subarray Product Less Than K (less than k (belong to at most))

Given an array of integers nums and an integer k, return the number of contiguous subarrays where the product of all the elements in the subarray is strictly less than k.

Example 1:

Input: nums = [10,5,2,6], k = 100
Output: 8
Explanation: The 8 subarrays that have product less than 100 are:
[10], [5], [2], [6], [10, 5], [5, 2], [2, 6], [5, 2, 6]
Note that [10, 5, 2] is not included as the product of 100 is not strictly less than k.

Example 2:

Input: nums = [1,2,3], k = 0
Output: 0

Constraints:

1 <= nums.length <= 3 * 104
1 <= nums[i] <= 1000
0 <= k <= 106

class Solution:
    def numSubarrayProductLessThanK(self, nums: List[int], k: int) -> int:
        # 当 k ≤ 1 时，不可能存在 product < k 的子数组，所以答案必为 0。
        if k <= 1:
            return 0
        ans, product, l = 0, 1, 0
        for r, x in enumerate(nums):
            product *= x
            while product >= k:
                product //= nums[l] # /会使变成float
                l += 1
            ans += r - l + 1
        return ans

At Least

:::important

At Least（至少）

≥ K
不稳定
通常转化为 at most

满足 ≥ K 的数量 = 所有可能的数量 − 不满足 ≥ K（也就是 ≤ K−1）的数量

at least K = total − at most (K − 1) 这种还没遇到这样的题目

:::

2962. Count Subarrays Where Max Element Appears at Least K Times

You are given an integer array nums and a positive integer k.

Return the number of subarrays where the maximum element of nums appears at least k times in that subarray.

A subarray is a contiguous sequence of elements within an array.

Example 1:

Input: nums = [1,3,2,3,3], k = 2
Output: 6
Explanation: The subarrays that contain the element 3 at least 2 times are: [1,3,2,3], [1,3,2,3,3], [3,2,3], [3,2,3,3], [2,3,3] and [3,3].

Example 2:

Input: nums = [1,4,2,1], k = 3
Output: 0
Explanation: No subarray contains the element 4 at least 3 times.

Constraints:

1 <= nums.length <= 105
1 <= nums[i] <= 106
1 <= k <= 105

class Solution:
    def countSubarrays(self, nums: List[int], k: int) -> int:
        ans = 0
        l = 0
        cnt = defaultdict(int)
        mx = max(nums)
        n = len(nums)
        for r, x in enumerate(nums):
            cnt[x] += 1
            while cnt[mx] >= k:
                ans += n - r
                cnt[nums[l]] -= 1
                l += 1
        return ans

Greedy + Two Pointers + run-based scanning

1839. Longest Substring Of All Vowels in Order

A string is considered beautiful if it satisfies the following conditions:

Each of the 5 English vowels ('a', 'e', 'i', 'o', 'u') must appear at least once in it.
The letters must be sorted in alphabetical order (i.e. all 'a's before 'e's, all 'e's before 'i's, etc.).

For example, strings "aeiou" and "aaaaaaeiiiioou" are considered beautiful, but "uaeio", "aeoiu", and "aaaeeeooo" are not beautiful.

Given a string word consisting of English vowels, return the length of the longest beautiful substring of word. If no such substring exists, return 0.

A substring is a contiguous sequence of characters in a string.

Example 1:

Input: word = "aeiaaioaaaaeiiiiouuuooaauuaeiu"
Output: 13
Explanation: The longest beautiful substring in word is "aaaaeiiiiouuu" of length 13.

Example 2:

Input: word = "aeeeiiiioooauuuaeiou"
Output: 5
Explanation: The longest beautiful substring in word is "aeiou" of length 5.

Example 3:

Input: word = "a"
Output: 0
Explanation: There is no beautiful substring, so return 0.

class Solution {
public:
    int longestBeautifulSubstring(string word) {
        int ans = 0;
        int l = 0, r = 0;
        int n = word.size();
        while (r < n) {
            if (word[r] != 'a') {
                r++;
                continue;
            }
            int l = r;
            r += 1;
            int type = 1;
            while (r < n && word[r] >= word[r - 1]) {
                if (word[r] > word[r - 1]) {
                    type++;
                }
                r++;
            }
            if (type == 5)
                ans = max(ans, r - l);
        }
        return ans;
    }
};

class Solution:
    def longestBeautifulSubstring(self, word: str) -> int:
        ans, l, r = 0, 0, 0
        n = len(word)
        while r < n:
            if word[r] != 'a':
                r += 1
                continue
            l = r
            r += 1
            type = 1 # initially we set the type to be 1, because the first char is 'a'
            while r < n and word[r] >= word[r - 1]: # if next char is greater than or equal to previous one, it means it is sorted in alphabetical order
                if word[r] != word[r - 1]:
                    type += 1
                r += 1

            if type == 5:
                ans = max(ans, r - l)
        return ans

Leetcode150

Sun, 18 Jan 2026 00:00:00 GMT

Array / String

You are given two integer arrays nums1 and nums2, sorted in non-decreasing order, and two integers m and n, representing the number of elements in nums1 and nums2 respectively.

Merge nums1 and nums2 into a single array sorted in non-decreasing order.

The final sorted array should not be returned by the function, but instead be stored inside the array nums1. To accommodate this, nums1 has a length of m + n, where the first m elements denote the elements that should be merged, and the last n elements are set to 0 and should be ignored. nums2 has a length of n.

Example 1:

Input: nums1 = [1,2,3,0,0,0], m = 3, nums2 = [2,5,6], n = 3
Output: [1,2,2,3,5,6]
Explanation: The arrays we are merging are [1,2,3] and [2,5,6].
The result of the merge is [1,2,2,3,5,6] with the underlined elements coming from nums1.

Example 2:

Input: nums1 = [1], m = 1, nums2 = [], n = 0
Output: [1]
Explanation: The arrays we are merging are [1] and [].
The result of the merge is [1].

Example 3:

Input: nums1 = [0], m = 0, nums2 = [1], n = 1
Output: [1]
Explanation: The arrays we are merging are [] and [1].
The result of the merge is [1].
Note that because m = 0, there are no elements in nums1. The 0 is only there to ensure the merge result can fit in nums1.

class Solution:
    def merge(self, nums1: List[int], m: int, nums2: List[int], n: int) -> None:
        """
        Do not return anything, modify nums1 in-place instead.
        """
        i, j, k = m - 1, n - 1, m + n - 1

        while i >= 0 and j >= 0:
            if nums1[i] <= nums2[j]:
                nums1[k] = nums2[j]
                j -= 1
            else:
                nums1[k] = nums1[i]
                i -= 1
            k -= 1

        if j >= 0:
            nums1[: k + 1] = nums2[: j + 1]
            
            

class Solution {
public:
    void merge(vector<int>& nums1, int m, vector<int>& nums2, int n) {
        int i = m - 1;
        int j = n - 1;
        int k = m + n - 1;
        while (i >= 0 && j >= 0) {
            if (nums1[i] <= nums2[j]) {
                nums1[k--] = nums2[j--]; 
            } else {
                nums1[k--] = nums1[i--];
            }
        }
        while (j >= 0) {
            nums1[k--] = nums2[j--];
        }
    }
};

BQs

Thu, 15 Jan 2026 00:00:00 GMT

https://www.mockquestions.com/position/Engineer/

Template

Why our company?

1）强调技术栈匹配

I’m excited about your tech stack, especially ___, and I’d love to work on challenging engineering problems. 我对你们的技术栈很感兴趣，尤其是___，我很想解决有挑战的工程问题。

2）强调成长 + 学习

I’m looking for a place where I can keep learning, improve my system design skills, and work with strong engineers. 我希望在一个能持续学习、提升系统设计能力、并和优秀工程师合作的环境里成长。

3）强调产品影响力（科技岗很爱）

Your product solves a real problem for users, and I want my work to create meaningful value. 你们的产品解决真实用户问题，我希望我的工作能创造有意义的价值。

MockQuestions

Describe a difficult project and how you overcame it.

The GPU x project was a low-level and highly complex project.==One major challenge for me was switching from Java to C++==, while also learning OpenGL and GPU-related concepts at the same time

GPU x 项目是一个非常底层、技术复杂度很高的项目。我面临的主要挑战是需要从 Java 切换到 C++，同时还要学习 OpenGL 以及 GPU 相关的底层知识。.

Since the project involved low-level graphics and performance optimization, the learning curve was steep. ==To overcome this, I actively asked experienced teammates for guidance and collected high-quality internal technical documents within the company.== I combined these resources with official documentation and hands-on experiments to gradually build my understanding.

为了克服学习成本高的问题，我主动向周围有经验的同事请教，并整理和学习了公司内部已有的优质技术资料，同时结合官方文档和实践不断验证理解。

Through this process, I successfully adapted to C++ development and gained a much deeper understanding of low-level graphics systems and GPU optimization.

通过这个过程，我顺利完成了语言和技术栈的切换，也对底层图形系统和 GPU 优化有了更深入的认识。

What was your greatest accomplishment as an Engineer?

Build Guides

Thu, 01 Jan 2026 00:00:00 GMT

🚀 博客构建全流程指南

这是一份基于 Fuwari 模板与 Astro 框架的博客构建全流程指南。本指南整合了从本地环境搭建到 GitHub Pages 自动化部署的完整步骤。

1. 环境准备

在开始之前，请确保你的设备已安装以下工具：

Node.js: 版本需 18。
pnpm: 推荐的包管理器。安装命令：npm install -g pnpm
Git: 用于代码版本管理。

2. 初始化项目

你可以通过以下两种方式之一创建博客：

方式 A（推荐）: 访问 Fuwari 模板页直接生成新仓库。
方式 B (命令行):

pnpm create fuwari@latest

创建完成后，克隆仓库至本地并安装依赖：

pnpm install

3. 本地开发与配置

在正式发布前，你需要进行个性化设置。

基础配置: 编辑 src/config.ts 修改站点标题、作者、社交链接等。
运行预览: 执行 pnpm dev。
访问 http://localhost:4321 查看实时效果。
该模式支持热更新，修改文件后浏览器会自动刷新。
清理图标: 若需删除默认图标，可在 public/ 目录下替换 favicon.svg，或在 src/components 中移除相关的 <Icon /> 标签。

4. 撰写内容

Fuwari 使用 Markdown 存储文章，位置在 src/content/posts/。

创建新文章:

pnpm new-post <文件名>

配置 Frontmatter: 在 .md 文件顶部配置元数据：

---
title: 我的第一篇文章
published: 2026-01-22
description: 文章描述
image: ./cover.jpg
tags: [教程, Astro]
category: 技术
draft: false
---

5. GitHub Pages 部署配置

为了让全球用户都能访问，我们需要配置自动化部署。

A. 修改 Astro 配置

编辑 astro.config.mjs，确保路径正确：

export default defineConfig({
  site: "https://<你的用户名>.github.io",
  base: "/<仓库名>/", // 如果仓库名是 username.github.io，则此项留空或填 "/"
});

B. 创建部署脚本 (GitHub Actions)

在项目根目录创建 .github/workflows/deploy.yml，并粘贴以下核心配置：

name: Deploy to GitHub Pages

on:
  # 每次推送到 `main` 分支时触发这个“工作流程”
  # 如果你使用了别的分支名，请按需将 `main` 替换成你的分支名
  push:
    branches: [ main ]
  # 允许你在 GitHub 上的 Actions 标签中手动触发此“工作流程”
  workflow_dispatch:

# 允许 job 克隆 repo 并创建一个 page deployment
permissions:
  contents: read
  pages: write
  id-token: write

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout your repository using git
        uses: actions/checkout@v5
      - name: Install, build, and upload your site
        uses: withastro/action@v5
        # with:
          # path: . # 存储库中 Astro 项目的根位置。（可选）
          # node-version: 20 # 用于构建站点的特定 Node.js 版本，默认为 20。（可选）
          # package-manager: pnpm@latest # 应使用哪个 Node.js 包管理器来安装依赖项和构建站点。会根据存储库中的 lockfile 自动检测。（可选）
          # build-cmd: pnpm run build # 用于构建你的网站的命令。默认运行软件包的构建脚本或任务。（可选）
        # env:
          # PUBLIC_POKEAPI: 'https://pokeapi.co/api/v2' # 对变量值使用单引号。（可选）

  deploy:
    needs: build
    runs-on: ubuntu-latest
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

该脚本会在你每次 git push 时自动完成安装、构建并发布。

C. 开启 GitHub 设置

进入 GitHub 仓库 Settings -> Pages。
在 Build and deployment 下的 Source 选项中，选择 GitHub Actions。

6. 发布上线

执行以下命令将代码推送到 GitHub：

git add .
git commit -m "Initial blog setup"
git push origin main

检查进度: 在 GitHub 仓库的 Actions 选项卡中可以看到部署进度。完成后，你的博客将运行在 https://<用户名>.github.io/<仓库名>/。

💡 常用命令速查表

命令	作用
`pnpm dev`	启动本地开发服务器
`pnpm build`	执行本地构建（生成 `dist` 静态文件夹）
`pnpm new-post <name>`	快速创建新文章模板
`pnpm format`	自动格式化美化代码

💡 常用Markdown语法速查表

语法	作用
`==content==`	高亮

Astro

Thu, 01 Jan 2026 00:00:00 GMT

指令改造

做一个全局命令，比如叫 np，让它支持：

np "Double&Triple Pointers"

① 创建脚本

新建文件：

mkdir -p ~/.local/bin
~/.local/bin/np

内容：

#!/usr/bin/env bash
npm run -- new "$@"

② 加执行权限

chmod +x ~/.local/bin/np

③ 确保 PATH 包含它

把下面加入 ~/.bashrc 或 ~/.zshrc：

export PATH="$HOME/.local/bin:$PATH"
source ~/.zshrc

✅ 以后就能用：

np "Double&Triple Pointers"

mark/highlight

1）安装我们自己插件需要的依赖

npm i unist-util-visit

2）在你的项目里创建一个文件（路径一定要对） src/plugins/remark-mark.mjs

文件内容完整复制进去：

import { visit } from "unist-util-visit";

export default function remarkMark() {
  return (tree, file) => {
    visit(tree, "text", (node, index, parent) => {
      if (!node.value.includes("==")) return;

      const parts = node.value.split(/(==)/g);
      const newNodes = [];
      let inMark = false;

      for (const part of parts) {
        if (part === "==") {
          if (inMark) {
            newNodes.push({ type: "html", value: "</mark>" });
          } else {
            newNodes.push({ type: "html", value: "<mark>" });
          }
          inMark = !inMark;
        } else if (part.length > 0) {
          newNodes.push({ type: "text", value: part });
        }
      }

      // 替换当前节点
      parent.children.splice(index, 1, ...newNodes);
      
      // 返回索引，让 visit 继续处理新插入的节点
      return index + newNodes.length;
    });
  };
}

3）打开项目根目录的 astro.config.mjs，在 import 区域加上这一行（一定要放在 export default defineConfig(...) 之前）

import remarkMark from "./src/plugins/remark-mark.mjs";

4）把 remarkMark 加进 remarkPlugins（建议放第一个）

markdown: {
  remarkPlugins: [
    remarkMark,
    remarkMath,
    remarkReadingTime,
    remarkExcerpt,
    remarkGithubAdmonitionsToDirectives,
    remarkDirective,
    remarkSectionize,
    parseDirectiveNode,
  ],
  rehypePlugins: [
    // ...
  ],
},

Typora

Thu, 01 Jan 2026 00:00:00 GMT

typora-0.11.18 last free version

https://github.com/wyf9661/typora-free

HighLight

Setting - Markdown - 扩展语法 - 勾选高亮 - 重启(渲染失效的)

Alexander Lee

deduplicate

I. File Deduplication (文件去重)

1. Problem Statement (题目描述)

2. Core Approach (核心思路)

1) Directory Traversal (目录遍历)

2) Hashing Files (文件哈希)

3) Grouping Duplicates (分组重复文件)

3. Code Implementation (代码实现)

1) Python Example (可独立运行)

4. Complexity Analysis (复杂度分析)

1) Time Complexity (时间复杂度)

2) Space Complexity (空间复杂度)

5. Optimization Strategies (优化策略)

1) I/O Bound Optimization (I/O瓶颈优化)

2) CPU Bound Optimization (CPU瓶颈优化)

II. Detect Duplicate Files (文件去重-按大小+哈希)

1. Problem Statement (题目描述)

2. Core Idea (核心思路)

1) Two-Stage Filtering (两阶段过滤)

2) Performance Insight (性能关键点)

3. Algorithm Steps (算法步骤)

1) Step Flow (步骤流程)

4. Code Implementation (代码实现)

1) Python Example (可独立运行)

5. Complexity Analysis (复杂度分析)

1) Time Complexity (时间复杂度)

2) Space Complexity (空间复杂度)

6. System Design Discussion (系统设计讨论)

1) Large File Handling (大文件处理)

2) I/O Bound Optimization (I/O瓶颈优化)

3) CPU Bound Optimization (CPU瓶颈优化)

4) Real-time Detection (实时检测)

III. Find Duplicate Files by Content (按内容查找重复文件)

1. Problem Statement (题目描述)

2. Core Idea (核心思路)

1) Hash Map Grouping (哈希表分组)

2) String Parsing (字符串解析)

3. Algorithm Steps (算法步骤)

1) Step Flow (步骤流程)

4. Code Implementation (代码实现)

1) Python Example (可独立运行)

5. Complexity Analysis (复杂度分析)

1) Time Complexity (时间复杂度)

2) Space Complexity (空间复杂度)

6. Interview Notes (面试要点)

1) Why Hash Map (为什么用哈希表)

2) Why Not Compare Every Pair (为什么不两两比较)

3) Edge Case (边界情况)

Prefill-Decode Disaggregation

I. Prefill-Decode Disaggregation (PD 分离)

1. Motivation (动机)

2. Architecture (架构)

1) Two Pools (两个资源池)

2) KV Cache Transfer (KV缓存传输)

3. Runnable Example (可运行示例)

4. Benefits and Trade-offs (优缺点)

5. Key Formula — Transfer Latency (传输延迟)

6. Related Concepts (相关概念)

nn.Module

I. nn.Module (神经网络模块基类)

1. Lifecycle (生命周期)

1) __init__ — Structure (结构定义)

2) forward — Computation (计算定义)

2. Parameter Management (参数管理)

1) nn.Parameter — Learnable (可学习参数)

2) register_buffer — Non-learnable State (非可学习状态)

3) named_parameters vs parameters (命名参数 vs 参数迭代器)

3. Hooks (钩子)

1) Forward Hook (前向钩子)

2) Backward Hook (反向钩子)

4. state_dict and Serialization (state_dict 与序列化)

1) Save and Load (保存与加载)

2) load_state_dict — strict Flag (strict 标志)

5. Training vs Eval Mode (训练模式 vs 推理模式)

1) .train() / .eval() — Mode Switch (模式切换)

NumPy Overview

I. NumPy Overview(数值计算库)

1. What is NumPy (是什么)

1) Core Data Structure

I. `nn.Module` (神经网络模块基类)

1) `init` — Structure (结构定义)

2) `forward` — Computation (计算定义)

1) `nn.Parameter` — Learnable (可学习参数)

2) `register_buffer` — Non-learnable State (非可学习状态)

3) `named_parameters` vs `parameters` (命名参数 vs 参数迭代器)

4. `state_dict` and Serialization (`state_dict` 与序列化)

2) `load_state_dict` — `strict` Flag (`strict` 标志)

1) `.train()` / `.eval()` — Mode Switch (模式切换)