Data Classes

The Problem: The “Traditional” Class

Before dataclasses, if you wanted a simple class to hold some data, you had to write a lot of code yourself. Consider a simple class to represent a 2D point.

# The "old" way, without dataclasses
class Point:
    def __init__(self, x: int, y: int):
        self.x = x
        self.y = y
 
    # Without this, printing the object is unhelpful (<__main__.Point object at ...>)
    def __repr__(self):
        return f"Point(x={self.x}, y={self.y})"
 
    # Without this, p1 == p2 would compare memory addresses, not values
    def __eq__(self, other):
        if not isinstance(other, Point):
            return NotImplemented
        return self.x == other.x and self.y == other.y
 
# --- Usage ---
p1 = Point(10, 20)
p2 = Point(10, 20)
 
print(p1)  # Output: Point(x=10, y=20)
print(p1 == p2) # Output: True

This is a lot of code for something so simple. We had to manually implement the initializer (__init__), the string representation (__repr__), and the equality logic (__eq__).

The Solution: The @dataclass Approach

The @dataclass decorator handles all of that for you. All you need to do is define the class variables with type hints.

Here is the exact same Point class, rewritten as a dataclass:

from dataclasses import dataclass
 
# The "new" way, with a dataclass
@dataclass
class Point:
    x: int
    y: int
 
# --- Usage ---
p1 = Point(10, 20)
p2 = Point(10, 20)
 
print(p1)  # Output: Point(x=10, y=20)
print(p1 == p2) # Output: True

Look how much cleaner that is! By adding the @dataclass decorator, Python automatically generated the following methods for us behind the scenes:

  • __init__(self, x: int, y: int): The constructor.
  • __repr__(self): The string representation for printing.
  • __eq__(self, other): The logic for the == operator.

Advanced Features and Examples

The @dataclass decorator is highly customizable through its arguments and the field() function.

1. Default Values

You can provide default values for fields just like you would in a function signature.

from dataclasses import dataclass
 
@dataclass
class Book:
    title: str
    author: str
    pages: int
    genre: str = "Unknown" # Default value
 
book1 = Book("The Hobbit", "J.R.R. Tolkien", 310)
book2 = Book("Another Book", "Some Author", 500, genre="Fantasy")
 
print(book1)
# Output: Book(title='The Hobbit', author='J.R.R. Tolkien', pages=310, genre='Unknown')

2. Mutable Default Values (Important!)

A common pitfall in Python is using a mutable type (like a list or dict) as a default argument. Dataclasses provide a safe way to handle this using default_factory.

from dataclasses import dataclass, field
from typing import List
 
@dataclass
class InventoryItem:
    name: str
    # Use default_factory to create a new list for each instance
    # This prevents all instances from sharing the same list
    tags: List[str] = field(default_factory=list)
 
item1 = InventoryItem("Laptop")
item1.tags.append("electronics")
 
item2 = InventoryItem("Desk")
 
print(item1) # Output: InventoryItem(name='Laptop', tags=['electronics'])
print(item2) # Output: InventoryItem(name='Desk', tags=[])

Without default_factory=list, both item1 and item2 would share the exact same list, which is almost never what you want.

3. Immutable Instances (frozen=True)

If you want to ensure the data in your object cannot be changed after creation, you can make it “frozen”. This is great for creating simple, constant data structures.

from dataclasses import dataclass
 
@dataclass(frozen=True)
class User:
    user_id: int
    username: str
 
user = User(101, "alex")
print(user.username) # 'alex'
 
# This will raise an error!
try:
    user.username = "bob"
except Exception as e:
    print(f"Error: {e}")
    # Output: Error: cannot assign to field 'username'

4. Ordering and Comparison (order=True)

By default, dataclasses only generate __eq__. If you also want comparison methods (__lt__, __le__, __gt__, __ge__) to be generated, set order=True. The comparison will be done field by field, in the order you define them.

from dataclasses import dataclass
 
@dataclass(order=True)
class Product:
    name: str
    price: float
 
p1 = Product("Apple", 0.50)
p2 = Product("Banana", 0.25)
 
print(p1 > p2) # Output: True (because 0.50 > 0.25)
print(p1 < p2) # Output: False```

Summary Table

FeatureRegular Class@dataclass
InitializationManual __init__Automatic
RepresentationManual __repr__Automatic, clean
EqualityManual __eq__Automatic == checking
Code VerbosityHighLow
Best ForClasses with complex methods and logicClasses that primarily store data