880 words
4 minutes
dataclass

I. Dataclass
Python dataclass is a decorator (装饰器) that automatically generates special methods like
__init__ and __repr__ for classes primarily used to store data. It reduces boilerplate code by letting you declare fields as class variables with type annotations. The dataclass makes your code more readable and maintainable (可读性和可维护性) by eliminating repetitive method definitions.
1. Basic Dataclass Definition
The @dataclass decorator (装饰器) automatically adds __init__, __repr__, and __eq__ methods based on the class variables you define with type hints (类型提示). Use this when you need a simple container for data without writing repetitive constructor code.
The `@dataclass` decorator auto-generates:init(self, x, y)— constructorrepr— pretty string representationeq— equality comparison
1) Basic Implementation
from dataclasses import dataclass
@dataclassclass Person: name: str age: int email: str = "unknown@email.com" # Default value
# Usage exampleperson1 = Person("Alice", 25, "alice@email.com")person2 = Person("Bob", 30) # Uses default email
print(person1) # Automatically generated __repr__print(person1 == person2) # Automatically generated __eq__Note: Fields without default values must come before fields with default values, otherwise Python raises a SyntaxError (语法错误).
2. Field Customization
The field() function (字段函数) provides fine-grained control over individual dataclass fields, allowing you to set default factories (默认工厂), exclude fields from comparisons, or mark fields as private (私有).
1) Using field() with Parameters
from dataclasses import dataclass, fieldimport randomfrom typing import List
@dataclassclass Student: name: str student_id: int = field(init=False) # Not in __init__ grades: List[int] = field(default_factory=list) # Mutable default _internal_id: int = field(default=0, repr=False) # Hidden in __repr__
def __post_init__(self): # Initialize after dataclass generation self.student_id = random.randint(1000, 9999) self._internal_id = hash(self.name)
# Usage examplestudent = Student("Alice")student.grades.append(95) # Works with mutable defaultprint(student) # Shows name and grades, but not _internal_idNote: Always use default_factory (默认工厂) for mutable types like lists or dictionaries. Using
grades: List[int] = [] would cause all instances to share the same list.
3. Dataclass Parameters
The @dataclass decorator accepts parameters that control which methods are generated. Use frozen=True for immutable objects, order=True for sorting capabilities, and kw_only=True to enforce keyword arguments.
1) Configuration Options
from dataclasses import dataclass
@dataclass(frozen=True, order=True)class Point: x: int y: int
@dataclass(kw_only=True) # Python 3.10+class Configuration: host: str port: int = 8080
# Usage examplesp1 = Point(1, 2)p2 = Point(1, 3)# p1.x = 5 # This would raise FrozenInstanceErrorprint(p1 < p2) # Works because order=True
# Must use keyword argumentsconfig = Configuration(host="localhost", port=3000)# config = Configuration("localhost", 3000) # This would failNote: When using frozen=True, the dataclass becomes immutable (不可变的) — you cannot modify attributes after creation. This is ideal for configuration objects or value objects.
4. Inheritance with Dataclasses
Dataclasses support inheritance (继承), with fields from parent classes being combined with child class fields. Use this when you need to extend data containers while maintaining the automatic method generation.
1) Extending Dataclasses
from dataclasses import dataclass
@dataclassclass Vehicle: brand: str model: str year: int
@dataclassclass Car(Vehicle): doors: int electric: bool = False
# Usage examplemy_car = Car("Tesla", "Model 3", 2023, doors=4, electric=True)print(my_car) # Includes all fields from both classesNote: When inheriting, the field order matters — child class fields are appended after parent fields. All fields without defaults in the parent must come before child fields with defaults.
5. Comparison Table: Regular Class vs Dataclass
This table compares the boilerplate code (样板代码) required for a simple data container using a regular class versus a dataclass.
| Feature | Regular Class | Dataclass |
|---|---|---|
| Lines of Code | ~10-15 lines | ~3-5 lines |
| init method | Manual implementation | Auto-generated |
| repr method | Manual implementation | Auto-generated |
| eq method | Manual implementation | Auto-generated |
| Type hints | Optional in body | Required for fields |
| Default values | In init method | Direct field assignment |
| Mutable defaults | Safe with proper code | Must use default_factory |
1) Code Comparison Example
# Regular class - 15 linesclass RegularPerson: def __init__(self, name: str, age: int, email: str = "unknown"): self.name = name self.age = age self.email = email
def __repr__(self): return f"RegularPerson(name='{self.name}', age={self.age}, email='{self.email}')"
def __eq__(self, other): if not isinstance(other, RegularPerson): return False return (self.name, self.age, self.email) == (other.name, other.age, other.email)
# Dataclass - 4 lines@dataclassclass DataclassPerson: name: str age: int email: str = "unknown"💡 One-line Takeaway
Python dataclasses automatically generate __init__, __repr__, and __eq__ from type-annotated fields, eliminating boilerplate code for simple data containers.
Python dataclasses automatically generate __init__, __repr__, and __eq__ from type-annotated fields, eliminating boilerplate code for simple data containers.