880 words
4 minutes
dataclass

I. Dataclass#

Python dataclass is a decorator (装饰器) that automatically generates special methods like __init__ and __repr__ for classes primarily used to store data. It reduces boilerplate code by letting you declare fields as class variables with type annotations. The dataclass makes your code more readable and maintainable (可读性和可维护性) by eliminating repetitive method definitions.

1. Basic Dataclass Definition#

The @dataclass decorator (装饰器) automatically adds __init__, __repr__, and __eq__ methods based on the class variables you define with type hints (类型提示). Use this when you need a simple container for data without writing repetitive constructor code.
The `@dataclass` decorator auto-generates:
  • init(self, x, y) — constructor
  • repr — pretty string representation
  • eq — equality comparison

1) Basic Implementation#

from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
email: str = "unknown@email.com" # Default value
# Usage example
person1 = Person("Alice", 25, "alice@email.com")
person2 = Person("Bob", 30) # Uses default email
print(person1) # Automatically generated __repr__
print(person1 == person2) # Automatically generated __eq__
Note: Fields without default values must come before fields with default values, otherwise Python raises a SyntaxError (语法错误).

2. Field Customization#

The field() function (字段函数) provides fine-grained control over individual dataclass fields, allowing you to set default factories (默认工厂), exclude fields from comparisons, or mark fields as private (私有).

1) Using field() with Parameters#

from dataclasses import dataclass, field
import random
from typing import List
@dataclass
class Student:
name: str
student_id: int = field(init=False) # Not in __init__
grades: List[int] = field(default_factory=list) # Mutable default
_internal_id: int = field(default=0, repr=False) # Hidden in __repr__
def __post_init__(self):
# Initialize after dataclass generation
self.student_id = random.randint(1000, 9999)
self._internal_id = hash(self.name)
# Usage example
student = Student("Alice")
student.grades.append(95) # Works with mutable default
print(student) # Shows name and grades, but not _internal_id
Note: Always use default_factory (默认工厂) for mutable types like lists or dictionaries. Using grades: List[int] = [] would cause all instances to share the same list.

3. Dataclass Parameters#

The @dataclass decorator accepts parameters that control which methods are generated. Use frozen=True for immutable objects, order=True for sorting capabilities, and kw_only=True to enforce keyword arguments.

1) Configuration Options#

from dataclasses import dataclass
@dataclass(frozen=True, order=True)
class Point:
x: int
y: int
@dataclass(kw_only=True) # Python 3.10+
class Configuration:
host: str
port: int = 8080
# Usage examples
p1 = Point(1, 2)
p2 = Point(1, 3)
# p1.x = 5 # This would raise FrozenInstanceError
print(p1 < p2) # Works because order=True
# Must use keyword arguments
config = Configuration(host="localhost", port=3000)
# config = Configuration("localhost", 3000) # This would fail
Note: When using frozen=True, the dataclass becomes immutable (不可变的) — you cannot modify attributes after creation. This is ideal for configuration objects or value objects.

4. Inheritance with Dataclasses#

Dataclasses support inheritance (继承), with fields from parent classes being combined with child class fields. Use this when you need to extend data containers while maintaining the automatic method generation.

1) Extending Dataclasses#

from dataclasses import dataclass
@dataclass
class Vehicle:
brand: str
model: str
year: int
@dataclass
class Car(Vehicle):
doors: int
electric: bool = False
# Usage example
my_car = Car("Tesla", "Model 3", 2023, doors=4, electric=True)
print(my_car) # Includes all fields from both classes
Note: When inheriting, the field order matters — child class fields are appended after parent fields. All fields without defaults in the parent must come before child fields with defaults.

5. Comparison Table: Regular Class vs Dataclass#

This table compares the boilerplate code (样板代码) required for a simple data container using a regular class versus a dataclass.
FeatureRegular ClassDataclass
Lines of Code~10-15 lines~3-5 lines
init methodManual implementationAuto-generated
repr methodManual implementationAuto-generated
eq methodManual implementationAuto-generated
Type hintsOptional in bodyRequired for fields
Default valuesIn init methodDirect field assignment
Mutable defaultsSafe with proper codeMust use default_factory

1) Code Comparison Example#

# Regular class - 15 lines
class RegularPerson:
def __init__(self, name: str, age: int, email: str = "unknown"):
self.name = name
self.age = age
self.email = email
def __repr__(self):
return f"RegularPerson(name='{self.name}', age={self.age}, email='{self.email}')"
def __eq__(self, other):
if not isinstance(other, RegularPerson):
return False
return (self.name, self.age, self.email) == (other.name, other.age, other.email)
# Dataclass - 4 lines
@dataclass
class DataclassPerson:
name: str
age: int
email: str = "unknown"
💡 One-line Takeaway
Python dataclasses automatically generate __init__, __repr__, and __eq__ from type-annotated fields, eliminating boilerplate code for simple data containers.
dataclass
https://lxy-alexander.github.io/blog/posts/python/oop/dataclass/
Author
Alexander Lee
Published at
2026-03-19
License
CC BY-NC-SA 4.0