
I. msgspec.Struct — High-Performance Typed Data Structures
msgspec.Struct is a high-performance (高性能), type-safe (类型安全) data class alternative from the msgspec library. It is designed as a faster, leaner replacement for Python dataclasses, attrs, and Pydantic models — with native support for JSON / MessagePack serialization (序列化) and validation (验证) baked in at the C level. 1. Installation & Import
pip install msgspecimport msgspecfrom msgspec import Struct, field2. Defining a Struct (定义结构体)
1) Basic Definition
from msgspec import Struct
class Point(Struct): x: float y: float
p = Point(x=1.0, y=2.0)print(p) # Point(x=1.0, y=2.0)print(p.x) # 1.0dataclasses, Struct instances are immutable by default (默认不可变) and implemented in C — construction and attribute access are significantly faster.2) Fields with Default Values (默认值)
class User(Struct): name: str age: int = 0 active: bool = True
u = User(name="Alice")print(u) # User(name='Alice', age=0, active=True)3) field() — Advanced Field Configuration
from msgspec import Struct, field
class Config(Struct): tags: list[str] = field(default_factory=list) # Mutable default name: str = field(default="unnamed")field(default_factory=list) instead — just like with Python dataclasses.4) Nested Structs (嵌套结构体)
class Address(Struct): city: str country: str
class Person(Struct): name: str address: Address
p = Person(name="Bob", address=Address(city="NYC", country="US"))print(p.address.city) # NYC3. Struct Configuration Options (结构体配置)
Pass options to the class definition via keyword arguments:
class MyStruct(Struct, frozen=True, order=True, eq=True, kw_only=True): x: int y: int1) Option Reference Table
| Option | Default | Effect |
|---|---|---|
frozen=True | False | Makes the struct immutable (不可变) — fields cannot be reassigned after creation |
order=True | False | Enables <, >, <=, >= comparison operators |
eq=True | True | Enables == / != based on field values |
kw_only=True | False | All fields must be passed as keyword arguments |
array_like=True | False | Serializes as a JSON array [...] instead of object {...} |
gc=False | True | Disables garbage collector tracking — faster for structs with no reference cycles |
weakref=True | False | Enables weak references to the struct instance |
rename | None | Rename fields during (de)serialization — e.g., rename="camel" |
tag | None | Adds a type tag for tagged unions (标签联合) |
tag_field | "type" | The field name used to store the tag value |
2) frozen=True — Immutable Struct
class ImmutablePoint(Struct, frozen=True): x: float y: float
p = ImmutablePoint(x=1.0, y=2.0)p.x = 99.0 # ❌ TypeError: immutable typeScenario: Configuration objects, cache keys, value objects (值对象) that should never change.
3) order=True — Sortable Structs
class Version(Struct, order=True): major: int minor: int patch: int
versions = [Version(1, 2, 0), Version(1, 0, 5), Version(2, 0, 0)]print(sorted(versions))# [Version(1,0,5), Version(1,2,0), Version(2,0,0)]Scenario: Sorting records, priority queues, range comparisons.
4) rename="camel" — Field Name Mapping
class ApiResponse(Struct, rename="camel"): user_name: str created_at: str
import msgspecobj = ApiResponse(user_name="Alice", created_at="2025-01-01")print(msgspec.json.encode(obj))# b'{"userName":"Alice","createdAt":"2025-01-01"}'rename value | Effect |
|---|---|
”camel” | user_name → userName |
”pascal” | user_name → UserName |
”lower” | userName → username |
dict | Explicit per-field mapping |
Scenario: Interoperating with REST APIs that use camelCase JSON keys.
4. Serialization & Deserialization (序列化与反序列化)
1) JSON Encoding
import msgspec
class Order(Struct): id: int item: str price: float
order = Order(id=1, item="book", price=9.99)
# Encode to JSON bytesdata = msgspec.json.encode(order)print(data) # b'{"id":1,"item":"book","price":9.99}'2) JSON Decoding with Type Validation
# Decode + validate in one steporder2 = msgspec.json.decode(data, type=Order)print(order2) # Order(id=1, item='book', price=9.99)print(order2 == order) # Truemsgspec.json.decode() performs schema validation (模式验证) at decode time. If the JSON does not match the expected type, it raises msgspec.ValidationError with a descriptive message.3) MessagePack Encoding (二进制序列化)
# Encode to binary MessagePackbinary = msgspec.msgpack.encode(order)print(binary) # b'\x83\xa2id\x01\xa4item\xa4book\xa5price\xcb@#\xeb...'
# Decode from binaryorder3 = msgspec.msgpack.decode(binary, type=Order)| Format | Function | Output |
|---|---|---|
| JSON | msgspec.json.encode/decode | Human-readable bytes |
| MessagePack | msgspec.msgpack.encode/decode | Compact binary |
4) array_like=True — Array Serialization
class Point(Struct, array_like=True): x: float y: float
p = Point(1.0, 2.0)print(msgspec.json.encode(p)) # b'[1.0,2.0]'Scenario: Compact serialization for large volumes of records (matrices, time series, coordinate data).
5) Handling Validation Errors
bad_json = b'{"id": "not-a-number", "item": "book", "price": 9.99}'
try: msgspec.json.decode(bad_json, type=Order)except msgspec.ValidationError as e: print(e) # Expected `int`, got `str` - at `$.id`5. Type Annotations & Supported Types (类型注解)
1) Built-in Types
class Example(Struct): a: int b: float c: str d: bool e: bytes f: None2) Collections (集合类型)
from typing import Optional
class Collections(Struct): items: list[str] mapping: dict[str, int] pair: tuple[int, str] unique: set[int] maybe: Optional[str] = None # str | None3) Optional and Union Types
from typing import Union
class Response(Struct): data: Union[str, int, None] # Can be str, int, or None error: str | None = None # Python 3.10+ shorthand4) Literal Types — Constrained Values (约束值)
from typing import Literal
class Status(Struct): state: Literal["pending", "running", "done", "failed"]
s = Status(state="running")msgspec.json.decode(b'{"state":"invalid"}', type=Status)# ❌ ValidationError: Expected one of 'pending', 'running', 'done', 'failed'Scenario: Enforcing valid enum-like values without a full Enum class.
5) datetime, UUID, Decimal
from datetime import datetimefrom uuid import UUIDfrom decimal import Decimal
class Event(Struct): id: UUID timestamp: datetime amount: Decimaldatetime is serialized as an ISO 8601 string. UUID as a string. Decimal as a JSON number string.6. Tagged Unions (标签联合) — Polymorphic Types
1) Defining a Tagged Union
Use tag=True (or a custom tag string) to enable discriminated unions (判别联合):
from typing import Union
class Cat(Struct, tag=True): name: str indoor: bool
class Dog(Struct, tag=True): name: str breed: str
Animal = Union[Cat, Dog]When serialized, a "type" field is added automatically:
cat = Cat(name="Whiskers", indoor=True)print(msgspec.json.encode(cat))# b'{"type":"Cat","name":"Whiskers","indoor":true}'
dog = Dog(name="Rex", breed="Husky")print(msgspec.json.encode(dog))# b'{"type":"Dog","name":"Rex","breed":"Husky"}'2) Decoding a Tagged Union
data = b'{"type":"Dog","name":"Rex","breed":"Husky"}'animal = msgspec.json.decode(data, type=Animal)print(type(animal)) # <class 'Dog'>print(animal.breed) # HuskyScenario: Event systems, polymorphic API responses, command/event patterns where the same endpoint can return different shapes.
3) Custom Tag Values
class Circle(Struct, tag="circle"): radius: float
class Rectangle(Struct, tag="rect"): width: float height: float
Shape = Union[Circle, Rectangle]
c = Circle(radius=5.0)print(msgspec.json.encode(c))# b'{"type":"circle","radius":5.0}'7. Utility Methods (工具方法)
1) msgspec.structs.asdict() — Convert to Dict
from msgspec import structs
p = Point(x=1.0, y=2.0)d = structs.asdict(p)print(d) # {'x': 1.0, 'y': 2.0}2) msgspec.structs.astuple() — Convert to Tuple
t = structs.astuple(p)print(t) # (1.0, 2.0)3) msgspec.structs.replace() — Copy with Changes
Like dataclasses.replace() — creates a new instance with some fields updated:
p2 = structs.replace(p, x=99.0)print(p2) # Point(x=99.0, y=2.0)print(p) # Point(x=1.0, y=2.0) ← original unchangedScenario: Immutable update patterns (不可变更新模式) — create a modified copy without mutating the original.
4) msgspec.structs.fields() — Inspect Field Definitions
for f in structs.fields(Point): print(f.name, f.type, f.default)# x <class 'float'> NODEFAULT# y <class 'float'> NODEFAULTScenario: Writing generic serializers, validators, or introspection tools.
8. Inheritance (继承)
class Animal(Struct): name: str age: int
class Dog(Animal): breed: str
d = Dog(name="Rex", age=3, breed="Husky")print(d) # Dog(name='Rex', age=3, breed='Husky')9. Performance Comparison (性能对比)
| Library | Construct | JSON Encode | JSON Decode+Validate |
|---|---|---|---|
| msgspec.Struct | ⚡ Fastest | ⚡ Fastest | ⚡ Fastest |
dataclasses | Fast | Needs json.dumps | No validation |
attrs | Fast | Needs extra lib | No validation |
pydantic v2 | Medium | Fast | Fast (Rust core) |
pydantic v1 | Slow | Slow | Slow |
msgspec is typically 5–10× faster than Pydantic v1 and 2–3× faster than Pydantic v2 for both encoding and decoding, while using significantly less memory.10. Real-World Scenarios (实战场景)
1) FastAPI / HTTP API Request/Response Models
from msgspec import Structimport msgspec
class CreateUserRequest(Struct): username: str email: str age: int | None = None
class UserResponse(Struct): id: int username: str email: str
# Decoding incoming JSON bodybody = b'{"username":"alice","email":"alice@example.com"}'req = msgspec.json.decode(body, type=CreateUserRequest)
# Encoding outgoing responseresp = UserResponse(id=42, username=req.username, email=req.email)print(msgspec.json.encode(resp))# b'{"id":42,"username":"alice","email":"alice@example.com"}'2) Config File Parsing with Validation
import msgspecfrom msgspec import Structfrom typing import Literal
class ServerConfig(Struct): host: str = "0.0.0.0" port: int = 8080 mode: Literal["debug", "production"] = "production" workers: int = 4
config_json = b'{"host":"127.0.0.1","port":9000,"mode":"debug"}'config = msgspec.json.decode(config_json, type=ServerConfig)print(config.mode) # debug3) High-throughput MessagePack Messaging (e.g., vLLM, message queues)
class InferenceRequest(Struct): request_id: str prompt: str max_tokens: int = 512 temperature: float = 1.0
class InferenceResponse(Struct): request_id: str output: str finish_reason: Literal["stop", "length", "error"]
# Fast binary serialization for IPC / queue transportreq = InferenceRequest(request_id="req-001", prompt="Hello!")binary = msgspec.msgpack.encode(req)
resp_data = msgspec.msgpack.decode(binary, type=InferenceRequest)4) Event / Command Pattern with Tagged Unions
from typing import Unionfrom msgspec import Structimport msgspec
class StartJob(Struct, tag=True): job_id: str config: dict
class StopJob(Struct, tag=True): job_id: str reason: str
Command = Union[StartJob, StopJob]
# Dispatcherdef handle(data: bytes): cmd = msgspec.json.decode(data, type=Command) if isinstance(cmd, StartJob): print(f"Starting job {cmd.job_id}") elif isinstance(cmd, StopJob): print(f"Stopping job {cmd.job_id}: {cmd.reason}")
handle(b'{"type":"StartJob","job_id":"abc","config":{}}')# Starting job abcmsgspec.Struct gives you the ergonomics of a dataclass, the validation of Pydantic, and the serialization speed of hand-written C — use frozen=True for immutable value objects, tag=True for polymorphic types, and rename="camel" for seamless REST API integration.