I. Python `collections` Module — Complete Learning Manual#

Python's collections module provides specialized container datatypes (特殊容器数据类型) that extend the built-in dict, list, and tuple. The seven main classes are: defaultdict, Counter, OrderedDict, deque, namedtuple, ChainMap, and UserDict / UserList / UserString. Each solves a specific pain-point of the standard built-ins with minimal overhead.

1. defaultdict — Default Value Dict (默认值字典)#

A defaultdict (默认值字典) behaves exactly like a regular dict, except that accessing a missing key (缺失键) automatically creates it by calling the default_factory (默认工厂函数) — eliminating KeyError and verbose setdefault() boilerplate.

1) Constructor (构造函数)#

1
collections.defaultdict(default_factory=None, **kwargs)

default_factory is any zero-argument callable: int, list, set, dict, or a custom lambda.

2) `defaultdict(int)` — Frequency counter (频率计数)#

1
from collections import defaultdict
2

3
text = "apple banana apple cherry banana apple"
4

5
freq = defaultdict(int)        # missing key → 0
6

7
for word in text.split():
8
    freq[word] += 1            # no KeyError on first access
9

10
print(dict(freq))
11
# → {'apple': 3, 'banana': 2, 'cherry': 1}
12

13
# Compare with plain dict (verbose):
14
freq2 = {}
15
for word in text.split():
16
    freq2[word] = freq2.get(word, 0) + 1   # needs .get()

3) `defaultdict(list)` — Grouping (分组)#

1
from collections import defaultdict
2

3
students = [
4
    ("Alice", "Math"),
5
    ("Bob",   "Science"),
6
    ("Alice", "Science"),
7
    ("Carol", "Math"),
8
    ("Bob",   "Math"),
9
]
10

11
by_name = defaultdict(list)
12

13
for name, subject in students:
14
    by_name[name].append(subject)   # missing key → [] automatically
15

16
print(dict(by_name))
17
# → {'Alice': ['Math', 'Science'], 'Bob': ['Science', 'Math'], 'Carol': ['Math']}

4) `defaultdict(set)` — Unique grouping (去重分组)#

1
from collections import defaultdict
2

3
edges = [(1, 2), (1, 3), (2, 3), (1, 2)]   # duplicate edge (1,2)
4

5
graph = defaultdict(set)
6

7
for u, v in edges:
8
    graph[u].add(v)
9
    graph[v].add(u)
10

11
print(dict(graph))
12
# → {1: {2, 3}, 2: {1, 3}, 3: {1, 2}}   (no duplicates)

5) `defaultdict(dict)` — Nested dict (嵌套字典)#

1
from collections import defaultdict
2

3
# 2-level nested defaultdict
4
matrix = defaultdict(lambda: defaultdict(int))
5

6
matrix["row1"]["col1"] += 10
7
matrix["row1"]["col2"] += 20
8
matrix["row2"]["col1"] += 30
9

10
for row, cols in matrix.items():
11
    print(f"{row}: {dict(cols)}")
12
# → row1: {'col1': 10, 'col2': 20}
13
# → row2: {'col1': 30}

6) Custom `default_factory` (自定义工厂函数)#

1
from collections import defaultdict
2

3
# Factory that returns a specific default value
4
dd = defaultdict(lambda: "N/A")
5
dd["name"] = "Alice"
6

7
print(dd["name"])     # → Alice
8
print(dd["age"])      # → N/A   (key created with "N/A")
9
print(dd["city"])     # → N/A
10

11
# Factory with counter
12
id_counter = [0]
13
def next_id():
14
    id_counter[0] += 1
15
    return id_counter[0]
16

17
registry = defaultdict(next_id)
18
print(registry["alice"])   # → 1
19
print(registry["bob"])     # → 2
20
print(registry["alice"])   # → 1  (already exists)

7) `default_factory` attribute — Inspect and change#

1
from collections import defaultdict
2

3
dd = defaultdict(list)
4
print(dd.default_factory)    # → <class 'list'>
5

6
dd.default_factory = set     # change factory at runtime
7
dd["new_key"].add(42)
8
print(dict(dd))              # → {'new_key': {42}}
9

10
dd.default_factory = None    # disable factory → KeyError on missing keys
11
try:
12
    _ = dd["missing"]
13
except KeyError as e:
14
    print(f"KeyError: {e}")  # → KeyError: 'missing'

8) `missing` — How defaultdict works internally#

1
from collections import defaultdict
2

3
class MyDefaultDict(dict):
4
    """Manual implementation of defaultdict logic."""
5

6
    def __init__(self, factory):
7
        super().__init__()
8
        self.factory = factory
9

10
    def __missing__(self, key):
11
        # Called automatically when key is not found
12
        value = self.factory()
13
        self[key] = value
14
        return value
15

16
d = MyDefaultDict(list)
17
d["x"].append(1)
18
d["x"].append(2)
19
d["y"].append(3)
20
print(dict(d))   # → {'x': [1, 2], 'y': [3]}

Note: __missing__ is only triggered by d[key] access, NOT by d.get(key). get() always returns None (or the provided default) without creating the key.

9) Inherits all `dict` methods#

1
from collections import defaultdict
2

3
dd = defaultdict(int, a=1, b=2)
4

5
# All standard dict methods work
6
print(dd.keys())              # → dict_keys(['a', 'b'])
7
print(dd.values())            # → dict_values([1, 2])
8
print(dd.items())             # → dict_items([('a', 1), ('b', 2)])
9
print(dd.get("x", 99))        # → 99  (no key created)
10
print("a" in dd)              # → True
11
dd.update({"c": 3})
12
print(dd.pop("a"))            # → 1
13
print(dict(dd))               # → {'b': 2, 'c': 3}

2. Counter — Multiset / Frequency Map (计数器)#

A Counter (计数器) is a dict subclass designed for counting hashable objects (统计可哈希对象). Missing keys return 0 instead of raising KeyError. It supports arithmetic operations (算术运算) between counters.

1) Constructor — Three ways to create#

1
from collections import Counter
2

3
# From an iterable
4
c1 = Counter("abracadabra")
5
print(c1)   # → Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
6

7
# From a dict
8
c2 = Counter({"cats": 4, "dogs": 8})
9
print(c2)   # → Counter({'dogs': 8, 'cats': 4})
10

11
# From keyword arguments
12
c3 = Counter(red=3, blue=1, green=5)
13
print(c3)   # → Counter({'green': 5, 'red': 3, 'blue': 1})

2) Missing key → 0 (缺失键返回0)#

1
from collections import Counter
2

3
c = Counter("hello")
4
print(c["l"])    # → 2  (exists)
5
print(c["z"])    # → 0  (missing — no KeyError!)
6
print("z" in c) # → False  (not stored, just returns 0)

3) `most_common(n)` — Top N elements (最高频N个元素)#

1
from collections import Counter
2

3
words = "the quick brown fox jumps over the lazy dog the fox".split()
4
c = Counter(words)
5

6
print(c.most_common(3))
7
# → [('the', 3), ('fox', 2), ('quick', 1)]
8

9
print(c.most_common())        # all elements, sorted by frequency
10
print(c.most_common()[:-4:-1])# least common 3 (tail trick)
11
# → [('dog', 1), ('lazy', 1), ('over', 1)]

4) `elements()` — Expand back to iterable (展开为可迭代)#

1
from collections import Counter
2

3
c = Counter(a=3, b=1, c=2)
4

5
print(list(c.elements()))
6
# → ['a', 'a', 'a', 'b', 'c', 'c']  (ordered by insertion)
7

8
# Reconstruct a sorted list
9
print(sorted(c.elements()))
10
# → ['a', 'a', 'a', 'b', 'c', 'c']
11

12
# Elements with count ≤ 0 are excluded
13
c["x"] = -1
14
print(list(c.elements()))    # 'x' not included

5) `subtract()` / `update()` — In-place operations (就地运算)#

1
from collections import Counter
2

3
inventory = Counter(apples=10, oranges=5, bananas=8)
4

5
# subtract: reduces counts (allows negatives)
6
sold = Counter(apples=3, oranges=5, bananas=10)
7
inventory.subtract(sold)
8
print(inventory)
9
# → Counter({'apples': 7, 'bananas': -2, 'oranges': 0})
10

11
# update: adds counts (merges)
12
restocked = Counter(apples=5, bananas=15)
13
inventory.update(restocked)
14
print(inventory)
15
# → Counter({'bananas': 13, 'apples': 12, 'oranges': 0})

6) Arithmetic operators (算术运算符)#

1
from collections import Counter
2

3
a = Counter(x=4, y=2, z=0)
4
b = Counter(x=1, y=3, w=5)
5

6
print(a + b)    # add counts
7
# → Counter({'x': 5, 'w': 5, 'y': 5})
8

9
print(a - b)    # subtract, keep only positives
10
# → Counter({'x': 3})
11

12
print(a & b)    # intersection: min of each count
13
# → Counter({'x': 1, 'y': 2})
14

15
print(a | b)    # union: max of each count
16
# → Counter({'w': 5, 'x': 4, 'y': 3})
17

18
# Unary operators
19
print(+a)       # remove zero and negative counts
20
print(-a)       # negate — flip sign, keep negatives as positives

7) Total count and filtering (总计数与过滤)#

1
from collections import Counter
2

3
c = Counter(a=5, b=3, c=0, d=-2)
4

5
# Total of all positive counts (Python 3.10+)
6
print(c.total())       # → 8   (5+3+0 = 8, negatives excluded)
7

8
# Keep only positive counts
9
positive = +c
10
print(positive)        # → Counter({'a': 5, 'b': 3})
11

12
# Keep only negative counts (useful for "owed" quantities)
13
negative = -c
14
print(negative)        # → Counter({'d': 2})

8) Practical: anagram check, top-K, word frequency#

1
from collections import Counter
2

3
# ── Anagram check (变位词检测) ──
4
def is_anagram(s1: str, s2: str) -> bool:
5
    return Counter(s1.lower()) == Counter(s2.lower())
6

7
print(is_anagram("listen", "silent"))   # → True
8
print(is_anagram("hello",  "world"))    # → False
9

10
# ── Character frequency difference ──
11
def missing_chars(have: str, need: str) -> Counter:
12
    deficit = Counter(need) - Counter(have)
13
    return deficit
14

15
print(missing_chars("aab", "aaabbc"))
16
# → Counter({'a': 1, 'b': 1, 'c': 1})
17

18
# ── Top-K frequent words ──
19
import re
20

21
text = """To be or not to be that is the question
22
          whether tis nobler in the mind to suffer"""
23

24
words  = re.findall(r'\w+', text.lower())
25
top5   = Counter(words).most_common(5)
26
print(top5)
27
# → [('to', 3), ('be', 2), ('the', 2), ('or', 1), ('not', 1)]

9) Inherits all `dict` methods#

1
from collections import Counter
2

3
c = Counter("mississippi")
4

5
print(c.keys())           # → dict_keys(['m', 'i', 's', 'p'])
6
print(c.values())         # → dict_values([1, 4, 4, 2])
7
print(c.items())          # → dict_items([('m', 1), ('i', 4), ('s', 4), ('p', 2)])
8
print(c.get("i"))         # → 4
9
print(c.get("z"))         # → None   (get() returns None, not 0)
10

11
# del sets count to 0 conceptually, but removes the key
12
del c["m"]
13
print("m" in c)           # → False
14
print(c["m"])             # → 0  (missing key returns 0)

3. OrderedDict — Ordered Dictionary (有序字典)#

Since Python 3.7, plain dict preserves insertion order. OrderedDict (有序字典) still offers unique advantages: order-sensitive equality, move_to_end(), and popitem(last=True/False) for implementing LRU Cache (LRU缓存) and similar structures.

1) Basic usage and order-sensitive equality#

1
from collections import OrderedDict
2

3
od = OrderedDict()
4
od["banana"] = 3
5
od["apple"]  = 5
6
od["cherry"] = 1
7

8
print(od)
9
# → OrderedDict([('banana', 3), ('apple', 5), ('cherry', 1)])
10

11
# Order-sensitive equality (顺序敏感的相等判断)
12
od1 = OrderedDict([("a", 1), ("b", 2)])
13
od2 = OrderedDict([("b", 2), ("a", 1)])
14
d1  = {"a": 1, "b": 2}
15

16
print(od1 == od2)   # → False  (same keys/values, different order)
17
print(od1 == d1)    # → True   (OrderedDict == dict ignores order)

2) `move_to_end(key, last=True)` — Reposition a key#

1
from collections import OrderedDict
2

3
od = OrderedDict.fromkeys("ABCDE")
4

5
od.move_to_end("B")          # move B to end (last=True default)
6
print(list(od))              # → ['A', 'C', 'D', 'E', 'B']
7

8
od.move_to_end("E", last=False)  # move E to front
9
print(list(od))              # → ['E', 'A', 'C', 'D', 'B']

3) `popitem(last=True)` — LIFO / FIFO removal#

1
from collections import OrderedDict
2

3
od = OrderedDict.fromkeys("ABCDE")
4

5
print(od.popitem(last=True))    # → ('E', None)  LIFO (like a stack)
6
print(od.popitem(last=False))   # → ('A', None)  FIFO (like a queue)
7
print(list(od))                 # → ['B', 'C', 'D']

4) LRU Cache implementation (LRU缓存实现)#

1
from collections import OrderedDict
2

3
class LRUCache:
4
    """
5
    Least Recently Used Cache (最近最少使用缓存)
6
    using OrderedDict for O(1) get and put.
7
    """
8

9
    def __init__(self, capacity: int):
10
        self.capacity = capacity
11
        self.cache    = OrderedDict()
12

13
    def get(self, key: int) -> int:
14
        if key not in self.cache:
15
            return -1
16
        self.cache.move_to_end(key)    # mark as recently used
17
        return self.cache[key]
18

19
    def put(self, key: int, value: int) -> None:
20
        if key in self.cache:
21
            self.cache.move_to_end(key)
22
        self.cache[key] = value
23
        if len(self.cache) > self.capacity:
24
            self.cache.popitem(last=False)  # evict least recently used
25

26
cache = LRUCache(3)
27
cache.put(1, 10)
28
cache.put(2, 20)
29
cache.put(3, 30)
30
print(cache.get(1))   # → 10  (1 moved to end)
31
cache.put(4, 40)      # evicts key 2 (least recently used)
32
print(cache.get(2))   # → -1  (evicted)
33
print(cache.get(3))   # → 30
34
print(cache.get(4))   # → 40

5) `reversed()` — Reverse iteration#

1
from collections import OrderedDict
2

3
od = OrderedDict([("a", 1), ("b", 2), ("c", 3)])
4

5
for key in reversed(od):
6
    print(key, od[key])
7
# → c 3
8
# → b 2
9
# → a 1

4. deque — Double-Ended Queue (双端队列)#

A deque (双端队列) supports O(1) append and pop from both ends. Unlike a list (where insert(0, x) is O(n)), deque is the correct data structure for queues (队列), stacks (栈), and sliding windows (滑动窗口).

1) Constructor#

1
from collections import deque
2

3
d1 = deque()                         # empty
4
d2 = deque([1, 2, 3, 4, 5])          # from iterable
5
d3 = deque("abcde")                  # from string
6
d4 = deque(range(10), maxlen=5)      # bounded deque (固定长度)
7

8
print(d2)   # → deque([1, 2, 3, 4, 5])
9
print(d4)   # → deque([5, 6, 7, 8, 9], maxlen=5)  (first 5 discarded)

2) `append()` / `appendleft()` — Add to ends (两端添加)#

1
from collections import deque
2

3
d = deque([3, 4, 5])
4

5
d.append(6)         # right end  → deque([3, 4, 5, 6])
6
d.appendleft(2)     # left end   → deque([2, 3, 4, 5, 6])
7
d.appendleft(1)     #            → deque([1, 2, 3, 4, 5, 6])
8

9
print(d)            # → deque([1, 2, 3, 4, 5, 6])

3) `pop()` / `popleft()` — Remove from ends (两端弹出)#

1
from collections import deque
2

3
d = deque([1, 2, 3, 4, 5])
4

5
print(d.pop())       # → 5   (right)  → deque([1, 2, 3, 4])
6
print(d.popleft())   # → 1   (left)   → deque([2, 3, 4])
7
print(d)             # → deque([2, 3, 4])

4) `extend()` / `extendleft()` — Batch add (批量添加)#

1
from collections import deque
2

3
d = deque([3, 4])
4

5
d.extend([5, 6, 7])          # right: → deque([3, 4, 5, 6, 7])
6
d.extendleft([2, 1, 0])      # left, each prepended individually
7
                             # 2 → [2,3..], 1 → [1,2,3..], 0 → [0,1,2,3..]
8
print(d)   # → deque([0, 1, 2, 3, 4, 5, 6, 7])

Note: extendleft([a, b, c]) results in [c, b, a, ...] because each element is prepended one by one — the iterable is effectively reversed.

5) `rotate(n)` — Circular rotation (循环旋转)#

1
from collections import deque
2

3
d = deque([1, 2, 3, 4, 5])
4

5
d.rotate(2)     # rotate RIGHT by 2
6
print(d)        # → deque([4, 5, 1, 2, 3])
7

8
d.rotate(-2)    # rotate LEFT by 2 (undo)
9
print(d)        # → deque([1, 2, 3, 4, 5])
10

11
# Circular buffer simulation (循环缓冲区)
12
ring = deque(range(5))
13
for _ in range(8):
14
    print(ring[0], end=" ")
15
    ring.rotate(-1)
16
# → 0 1 2 3 4 0 1 2

6) `maxlen` — Bounded / sliding window (有界滑动窗口)#

1
from collections import deque
2

3
# Keep only the last 3 elements
4
window = deque(maxlen=3)
5

6
for i in range(7):
7
    window.append(i)
8
    print(f"added {i}: {list(window)}")
9
# → added 0: [0]
10
# → added 1: [0, 1]
11
# → added 2: [0, 1, 2]
12
# → added 3: [1, 2, 3]   ← 0 dropped automatically
13
# → added 4: [2, 3, 4]
14
# → added 5: [3, 4, 5]
15
# → added 6: [4, 5, 6]
16

17
# Moving average (滑动平均)
18
def moving_average(data, window_size):
19
    w = deque(maxlen=window_size)
20
    result = []
21
    for val in data:
22
        w.append(val)
23
        result.append(sum(w) / len(w))
24
    return result
25

26
print(moving_average([1, 2, 3, 4, 5, 6], 3))
27
# → [1.0, 1.5, 2.0, 3.0, 4.0, 5.0]

7) `insert()` / `remove()` / `count()` / `index()`#

1
from collections import deque
2

3
d = deque([1, 2, 3, 2, 4])
4

5
d.insert(2, 99)       # insert 99 at position 2
6
print(d)              # → deque([1, 2, 99, 3, 2, 4])
7

8
d.remove(99)          # remove first occurrence
9
print(d)              # → deque([1, 2, 3, 2, 4])
10

11
print(d.count(2))     # → 2  (occurrences of 2)
12
print(d.index(3))     # → 2  (first index of 3)

8) `reverse()` / `copy()` / `clear()`#

1
from collections import deque
2

3
d = deque([1, 2, 3, 4, 5])
4

5
d.reverse()
6
print(d)        # → deque([5, 4, 3, 2, 1])
7

8
d2 = d.copy()   # shallow copy
9
d2.append(0)
10
print(d)        # → deque([5, 4, 3, 2, 1])  (original unchanged)
11

12
d.clear()
13
print(d)        # → deque([])
14
print(len(d))   # → 0

9) Performance comparison vs list (与list性能对比)#

1
import timeit
2
from collections import deque
3

4
# prepend 100_000 items
5
list_time  = timeit.timeit(lambda: [0] * 100_000, number=100)
6
deque_time = timeit.timeit(lambda: deque([0] * 100_000), number=100)
7

8
# insert at front
9
n = 10_000
10
t_list  = timeit.timeit(lambda: [None] + list(range(n)), number=1000)
11
t_deque = timeit.timeit(lambda: deque([None]) + deque(range(n)), number=1000)
12

13
print(f"list  front-insert: {t_list:.4f}s")
14
print(f"deque front-insert: {t_deque:.4f}s")
15
# deque is orders of magnitude faster for front operations

5. namedtuple — Immutable Record (具名元组)#

namedtuple creates a tuple subclass (元组子类) whose fields can be accessed by name as well as by index. It is immutable (不可变), memory-efficient, and self-documenting.

1) Factory function `namedtuple(typename, field_names)`#

1
from collections import namedtuple
2

3
# Three equivalent ways to define field names:
4
Point = namedtuple("Point", ["x", "y"])
5
Point = namedtuple("Point", "x y")
6
Point = namedtuple("Point", "x, y")
7

8
p = Point(3, 4)
9
print(p)           # → Point(x=3, y=4)
10
print(p.x, p.y)    # → 3 4       (by name)
11
print(p[0], p[1])  # → 3 4       (by index)
12
print(p == (3, 4)) # → True      (is a tuple subclass)

2) `_make()` — Create from iterable (从可迭代对象创建)#

1
from collections import namedtuple
2

3
Employee = namedtuple("Employee", "name age department salary")
4

5
data = ["Alice", 30, "Engineering", 95000]
6
emp  = Employee._make(data)
7
print(emp)
8
# → Employee(name='Alice', age=30, department='Engineering', salary=95000)
9

10
# From CSV row
11
import csv, io
12
csv_data = "Bob,25,Marketing,60000"
13
for row in csv.reader(io.StringIO(csv_data)):
14
    e = Employee._make(row)
15
    print(f"{e.name} in {e.department}")
16
# → Bob in Marketing

3) `_asdict()` — Convert to OrderedDict (转换为有序字典)#

1
from collections import namedtuple
2

3
Point3D = namedtuple("Point3D", "x y z")
4
p = Point3D(1, 2, 3)
5

6
d = p._asdict()
7
print(d)            # → {'x': 1, 'y': 2, 'z': 3}
8
print(type(d))      # → <class 'dict'>
9

10
# Serialize to JSON
11
import json
12
print(json.dumps(p._asdict()))   # → {"x": 1, "y": 2, "z": 3}

4) `_replace()` — Create modified copy (创建修改副本)#

1
from collections import namedtuple
2

3
# namedtuple is IMMUTABLE — _replace() returns a new instance
4
Person = namedtuple("Person", "name age city")
5
alice  = Person("Alice", 30, "NYC")
6

7
# "update" one field
8
older_alice = alice._replace(age=31)
9
print(alice)        # → Person(name='Alice', age=30, city='NYC')  (unchanged)
10
print(older_alice)  # → Person(name='Alice', age=31, city='NYC')

5) `_fields` / `_field_defaults` — Introspection (内省)#

1
from collections import namedtuple
2

3
Config = namedtuple("Config", "host port timeout", defaults=["localhost", 8080, 30])
4

5
print(Config._fields)          # → ('host', 'port', 'timeout')
6
print(Config._field_defaults)  # → {'host': 'localhost', 'port': 8080, 'timeout': 30}
7

8
c1 = Config()                  # all defaults
9
c2 = Config("example.com")     # override host only
10
print(c1)  # → Config(host='localhost', port=8080, timeout=30)
11
print(c2)  # → Config(host='example.com', port=8080, timeout=30)

6) `rename=True` — Auto-rename invalid field names#

1
from collections import namedtuple
2

3
# 'class' and '2bad' are invalid Python identifiers
4
T = namedtuple("T", ["class", "2bad", "ok"], rename=True)
5
print(T._fields)   # → ('_0', '_1', 'ok')  (invalid names → _index)
6

7
t = T(1, 2, 3)
8
print(t._0, t._1, t.ok)   # → 1 2 3

7) Subclassing namedtuple — Adding methods#

1
from collections import namedtuple
2
import math
3

4
class Vector(namedtuple("Vector", "x y")):
5
    """Extend namedtuple with custom methods."""
6

7
    def magnitude(self) -> float:
8
        return math.sqrt(self.x**2 + self.y**2)
9

10
    def dot(self, other: "Vector") -> float:
11
        return self.x * other.x + self.y * other.y
12

13
    def __add__(self, other):
14
        return Vector(self.x + other.x, self.y + other.y)
15

16
v1 = Vector(3, 4)
17
v2 = Vector(1, 2)
18

19
print(v1.magnitude())   # → 5.0
20
print(v1.dot(v2))       # → 11.0
21
print(v1 + v2)          # → Vector(x=4, y=6)

6. ChainMap — Multi-scope Lookup (多层级查找映射)#

A ChainMap (链式映射) groups multiple dicts into a single, updateable view. Lookups search the dicts from first to last, returning the first match. Writes always go to the first map. Perfect for modeling variable scopes (变量作用域) like Python's own LEGB rule.

1) Basic lookup (基本查找)#

1
from collections import ChainMap
2

3
defaults  = {"color": "red",  "user": "guest", "timeout": 30}
4
env_vars  = {"color": "blue", "debug": True}
5
cli_args  = {"timeout": 10}
6

7
# Priority: cli_args > env_vars > defaults
8
config = ChainMap(cli_args, env_vars, defaults)
9

10
print(config["color"])    # → blue   (from env_vars, overrides defaults)
11
print(config["user"])     # → guest  (only in defaults)
12
print(config["timeout"])  # → 10     (from cli_args, highest priority)
13
print(config["debug"])    # → True   (from env_vars)

2) Writes go to first map only (写入仅影响第一个映射)#

1
from collections import ChainMap
2

3
base    = {"x": 1, "y": 2}
4
overlay = {}
5

6
cm = ChainMap(overlay, base)
7

8
cm["x"] = 99      # written to overlay (first map)
9
cm["z"] = 0       # new key also goes to overlay
10

11
print(overlay)    # → {'x': 99, 'z': 0}
12
print(base)       # → {'x': 1, 'y': 2}   (unchanged!)
13
print(cm["x"])    # → 99   (overlay shadows base)
14
print(cm["y"])    # → 2    (from base)

3) `new_child(m=None)` — Push a new scope (推入新作用域)#

1
from collections import ChainMap
2

3
# Simulate nested scopes (模拟嵌套作用域)
4
global_scope = ChainMap({"x": 1, "y": 2})
5
local_scope  = global_scope.new_child({"x": 10, "z": 3})
6

7
print(local_scope["x"])   # → 10   (local shadows global)
8
print(local_scope["y"])   # → 2    (falls through to global)
9
print(local_scope["z"])   # → 3    (local only)
10

11
# Pop the local scope (返回父作用域)
12
parent_scope = local_scope.parents
13
print(parent_scope["x"])  # → 1    (original global value)

4) `maps` attribute — Access underlying dicts (访问底层字典列表)#

1
from collections import ChainMap
2

3
cm = ChainMap({"a": 1}, {"b": 2}, {"c": 3})
4

5
print(cm.maps)
6
# → [{'a': 1}, {'b': 2}, {'c': 3}]
7

8
# Modify underlying dicts directly
9
cm.maps[1]["b"] = 99
10
print(cm["b"])   # → 99

5) Practical: CLI argument + environment + defaults#

1
from collections import ChainMap
2
import os
3

4
def build_config(cli_args: dict) -> ChainMap:
5
    """Three-tier configuration (三层配置): CLI > ENV > defaults."""
6
    defaults = {
7
        "host":    "localhost",
8
        "port":    8080,
9
        "debug":   False,
10
        "workers": 4,
11
    }
12
    env_config = {
13
        k.lower().replace("app_", ""): v
14
        for k, v in os.environ.items()
15
        if k.startswith("APP_")
16
    }
17
    return ChainMap(cli_args, env_config, defaults)
18

19
config = build_config({"port": 9090, "debug": True})
20
print(config["host"])     # → localhost (from defaults)
21
print(config["port"])     # → 9090      (from cli_args)
22
print(config["debug"])    # → True      (from cli_args)

7. UserDict / UserList / UserString — Custom Containers (自定义容器基类)#

UserDict, UserList, and UserString are wrapper classes designed for safe subclassing. Subclassing built-in dict / list directly can miss overrides because C-level methods call each other without going through Python. UserDict etc. route ALL operations through Python methods.

1) `UserDict` — Custom dict with validation (带验证的自定义字典)#

1
from collections import UserDict
2

3
class TypedDict(UserDict):
4
    """A dict that only accepts string keys and int values."""
5

6
    def __setitem__(self, key, value):
7
        if not isinstance(key, str):
8
            raise TypeError(f"Key must be str, got {type(key).__name__}")
9
        if not isinstance(value, int):
10
            raise TypeError(f"Value must be int, got {type(value).__name__}")
11
        super().__setitem__(key, value)   # delegate to UserDict
12

13
td = TypedDict()
14
td["score"] = 100
15
td["count"] = 42
16
print(td)            # → {'score': 100, 'count': 42}
17

18
try:
19
    td[123] = 10     # invalid key
20
except TypeError as e:
21
    print(f"Error: {e}")   # → Error: Key must be str, got int
22

23
try:
24
    td["x"] = "hello"  # invalid value
25
except TypeError as e:
26
    print(f"Error: {e}")   # → Error: Value must be int, got str

2) `UserList` — Custom list with constraints (带约束的自定义列表)#

1
from collections import UserList
2

3
class BoundedList(UserList):
4
    """A list that enforces a maximum length (最大长度限制)."""
5

6
    def __init__(self, maxlen: int, iterable=()):
7
        self.maxlen = maxlen
8
        super().__init__()
9
        for item in iterable:
10
            self.append(item)
11

12
    def append(self, item):
13
        if len(self.data) >= self.maxlen:
14
            raise OverflowError(f"List is full (max {self.maxlen})")
15
        self.data.append(item)
16

17
    def insert(self, index, item):
18
        if len(self.data) >= self.maxlen:
19
            raise OverflowError(f"List is full (max {self.maxlen})")
20
        self.data.insert(index, item)
21

22
bl = BoundedList(3, [1, 2, 3])
23
print(bl)   # → [1, 2, 3]
24

25
try:
26
    bl.append(4)
27
except OverflowError as e:
28
    print(f"Error: {e}")   # → Error: List is full (max 3)

3) `UserString` — Custom string with transforms (带转换的自定义字符串)#

1
from collections import UserString
2

3
class SlugString(UserString):
4
    """Auto-converts string to URL-safe slug (URL友好字符串)."""
5

6
    def __init__(self, seq=""):
7
        import re
8
        slug = re.sub(r'[^a-z0-9]+', '-', str(seq).lower()).strip('-')
9
        super().__init__(slug)
10

11
    def __add__(self, other):
12
        return SlugString(self.data + "-" + str(other))
13

14
s = SlugString("Hello World! This is a Test.")
15
print(s)           # → hello-world-this-is-a-test
16

17
s2 = s + "extra"
18
print(s2)          # → hello-world-this-is-a-test-extra
19
print(len(s))      # → 28   (all str methods work)
20
print(s.upper())   # → HELLO-WORLD-THIS-IS-A-TEST

8. Comparison Table (对比总结)#

Class	Based on	Missing key	Ordered	Mutable	Best use case
`defaultdict`	dict	auto-creates	insertion	✅	Grouping, counting
`Counter`	dict	returns 0	insertion	✅	Frequency, multiset ops
`OrderedDict`	dict	KeyError	insertion	✅	LRU cache, order-sensitive eq
`deque`	list-like	IndexError	yes	✅	Queue, stack, sliding window
`namedtuple`	tuple	AttributeError	yes	❌	Immutable records, CSV rows
`ChainMap`	dict view	KeyError	first-wins	✅ (first)	Config layers, scopes
`UserDict`	dict	KeyError	insertion	✅	Safe dict subclassing

💡 One-line Takeaway
Use defaultdict to eliminate KeyError boilerplate, Counter for frequency analysis and multiset arithmetic, deque when you need O(1) operations on both ends, namedtuple for self-documenting immutable records, OrderedDict for LRU caches and order-sensitive comparisons, and ChainMap for multi-tier configuration or scope simulation.

I. Python collections Module — Complete Learning Manual#

1. defaultdict — Default Value Dict (默认值字典)#

1) Constructor (构造函数)#

2) defaultdict(int) — Frequency counter (频率计数)#

3) defaultdict(list) — Grouping (分组)#

4) defaultdict(set) — Unique grouping (去重分组)#

5) defaultdict(dict) — Nested dict (嵌套字典)#

6) Custom default_factory (自定义工厂函数)#

7) default_factory attribute — Inspect and change#

8) __missing__ — How defaultdict works internally#

9) Inherits all dict methods#

2. Counter — Multiset / Frequency Map (计数器)#

1) Constructor — Three ways to create#

2) Missing key → 0 (缺失键返回0)#

3) most_common(n) — Top N elements (最高频N个元素)#

4) elements() — Expand back to iterable (展开为可迭代)#

5) subtract() / update() — In-place operations (就地运算)#

6) Arithmetic operators (算术运算符)#

7) Total count and filtering (总计数与过滤)#

8) Practical: anagram check, top-K, word frequency#

9) Inherits all dict methods#

3. OrderedDict — Ordered Dictionary (有序字典)#

1) Basic usage and order-sensitive equality#

2) move_to_end(key, last=True) — Reposition a key#

3) popitem(last=True) — LIFO / FIFO removal#

4) LRU Cache implementation (LRU缓存实现)#

5) __reversed__() — Reverse iteration#

4. deque — Double-Ended Queue (双端队列)#

1) Constructor#

2) append() / appendleft() — Add to ends (两端添加)#

3) pop() / popleft() — Remove from ends (两端弹出)#

4) extend() / extendleft() — Batch add (批量添加)#

5) rotate(n) — Circular rotation (循环旋转)#

6) maxlen — Bounded / sliding window (有界滑动窗口)#

7) insert() / remove() / count() / index()#

8) reverse() / copy() / clear()#

9) Performance comparison vs list (与list性能对比)#

5. namedtuple — Immutable Record (具名元组)#

1) Factory function namedtuple(typename, field_names)#

2) _make() — Create from iterable (从可迭代对象创建)#

3) _asdict() — Convert to OrderedDict (转换为有序字典)#

4) _replace() — Create modified copy (创建修改副本)#

5) _fields / _field_defaults — Introspection (内省)#

6) rename=True — Auto-rename invalid field names#

7) Subclassing namedtuple — Adding methods#

6. ChainMap — Multi-scope Lookup (多层级查找映射)#

1) Basic lookup (基本查找)#

2) Writes go to first map only (写入仅影响第一个映射)#

3) new_child(m=None) — Push a new scope (推入新作用域)#

4) maps attribute — Access underlying dicts (访问底层字典列表)#

5) Practical: CLI argument + environment + defaults#

7. UserDict / UserList / UserString — Custom Containers (自定义容器基类)#

1) UserDict — Custom dict with validation (带验证的自定义字典)#

2) UserList — Custom list with constraints (带约束的自定义列表)#

3) UserString — Custom string with transforms (带转换的自定义字符串)#

8. Comparison Table (对比总结)#

I. Python `collections` Module — Complete Learning Manual#

2) `defaultdict(int)` — Frequency counter (频率计数)#

3) `defaultdict(list)` — Grouping (分组)#

4) `defaultdict(set)` — Unique grouping (去重分组)#

5) `defaultdict(dict)` — Nested dict (嵌套字典)#

6) Custom `default_factory` (自定义工厂函数)#

7) `default_factory` attribute — Inspect and change#

8) `missing` — How defaultdict works internally#

9) Inherits all `dict` methods#

3) `most_common(n)` — Top N elements (最高频N个元素)#

4) `elements()` — Expand back to iterable (展开为可迭代)#

5) `subtract()` / `update()` — In-place operations (就地运算)#

9) Inherits all `dict` methods#

2) `move_to_end(key, last=True)` — Reposition a key#

3) `popitem(last=True)` — LIFO / FIFO removal#

5) `reversed()` — Reverse iteration#

2) `append()` / `appendleft()` — Add to ends (两端添加)#

3) `pop()` / `popleft()` — Remove from ends (两端弹出)#

4) `extend()` / `extendleft()` — Batch add (批量添加)#

5) `rotate(n)` — Circular rotation (循环旋转)#

6) `maxlen` — Bounded / sliding window (有界滑动窗口)#

7) `insert()` / `remove()` / `count()` / `index()`#

8) `reverse()` / `copy()` / `clear()`#

1) Factory function `namedtuple(typename, field_names)`#

2) `_make()` — Create from iterable (从可迭代对象创建)#

3) `_asdict()` — Convert to OrderedDict (转换为有序字典)#

4) `_replace()` — Create modified copy (创建修改副本)#

5) `_fields` / `_field_defaults` — Introspection (内省)#

6) `rename=True` — Auto-rename invalid field names#

3) `new_child(m=None)` — Push a new scope (推入新作用域)#

4) `maps` attribute — Access underlying dicts (访问底层字典列表)#

1) `UserDict` — Custom dict with validation (带验证的自定义字典)#

2) `UserList` — Custom list with constraints (带约束的自定义列表)#

3) `UserString` — Custom string with transforms (带转换的自定义字符串)#