5. Dataclasses In Python
By Bernd Klein. Last modified: 19 Feb 2024.
Journey through Dataclasses: Advanced Techniques with Python Dataclasses
We delve into the powerful world of Python dataclasses in this chapter of our Python tutorial. Dataclasses are an essential feature introduced in Python 3.7 to simplify the creation and management of classes primarily used to store data. The chapter begins by pointing out the the difference of traditional class structures for representing data and introducing dataclasses as an attractive alternative. We will explore the syntax and functionalities of dataclasses and we will demonstrate how code readability will be enhanced.
Readers will gain a comprehensive understanding of the principles underlying dataclasses and will be able to applicate them in their Python projects.
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
First Examples
In our first example, we remain true to our beloved robot Marvin. So, we start with a "traditional" Python class Robot_traditional
representing a Robot:
class Robot_traditional:
def __init__(self, model, serial_number, manufacturer):
self.model = model
self.serial_number = serial_number
self.manufacturer = manufacturer
The boilerplate code within the __init__
method, as seen in various class definitions, follows a similar pattern to this example, with the only variation being the names assigned to attributes. This becomes particularly tedious as the number of attributes grows.
If we have a look at the same example in dataclass
notation, we see that the code gets a lot leaner. But before we can use dataclass
as a decorator for our class, we have to import it from the module dataclasses
.
from dataclasses import dataclass
@dataclass
class Robot:
model: str
serial_number: str
manufacturer: str
Here, the dataclass decorator automates the generation of special methods like __init__
, reducing the need for boilerplate code. The class definition is concise, making it clearer and more maintainable, especially as the number of attributes increases.
This example demonstrates how using dataclasses for a class primarily used to store data, such as a robot representation, offers a more streamlined and readable alternative to traditional class structures.
Initializing robots of both classes is the same:
x = Robot_traditional("NanoGuardian XR-2000", "234-76", "Cyber Robotics Co.")
y = Robot("MachinaMaster MM-42", "986-42", "Quantum Automations Inc.")
Yet, there are other differences in these classes. The class decorator dataclass
has not only created the special method __init__
but also __repr__
, __eq__
, __ne__
, and __hash__
. Methods which you would have to add manually to the call Robot_traditional
.
Let's have a look at __repr__
and compare it to the traditional class definition:
print(repr(x))
print(repr(y)) # uses __repr__
OUTPUT:
<__main__.Robot_traditional object at 0x7f21c0acb410> Robot(model='MachinaMaster MM-42', serial_number='986-42', manufacturer='Quantum Automations Inc.')
We can see that __repr__
has been also implicitly installed by dataclass
. To get the same result for Robot_traditional
, we have to implement the method explicitly:
class Robot_traditional:
def __init__(self, model, serial_number, manufacturer):
self.model = model
self.serial_number = serial_number
self.manufacturer = manufacturer
def __repr__(self):
return f"Robot_traditional(model='{self.model}', serial_number='{self.serial_number}', manufacturer='{self.manufacturer}')"
x = Robot_traditional("NanoGuardian XR-2000", "234-76", "Cyber Robotics Co.")
print(repr(x))
OUTPUT:
Robot_traditional(model='NanoGuardian XR-2000', serial_number='234-76', manufacturer='Cyber Robotics Co.')
Immutable Classes
In our chapter "Creating Immutable Classes in Python" of our tutorial, we discuss the rationale behind the need for immutability and explore different methods of creating them.
It's very simple with dataclasses. All you have to do is to call the decorator with frozen
set to True
:
from dataclasses import dataclass
@dataclass(frozen=True)
class ImmutableRobot:
name: str
brandname: str
We can use this class and test two robots for equality. They are equal, if all the attributes are the same. Remember that __eq__
has been automaticalle created.
x1 = ImmutableRobot("Marvin", "NanoGuardian XR-2000")
x2 = ImmutableRobot("Marvin", "NanoGuardian XR-2000")
print(x1 == x2)
OUTPUT:
True
Let's look at the hash values:
print(x1.__hash__(), x2.__hash__())
OUTPUT:
-7581403571593916930 -7581403571593916930
First, we can see that __hash__
was automatically created, and it was implemented correctly. This ensures that robots which are equal will produce identical hash values.
Now, let's implement a similar class without utilizing dataclass. In our previously defined class Robot_traditional
, the instances are mutable, as we have the ability to modify the attributes. The following example is an immutable version. We can see immediately that there is a lot more coding involved. We have to implement __init__', the getter properties,
eqand
hash`:
class ImmutableRobot_traditional:
def __init__(self, name: str, brandname: str):
self._name = name
self._brandname = brandname
@property
def name(self) -> str:
return self._name
@property
def brandname(self) -> str:
return self._brandname
def __eq__(self, other):
if not isinstance(other, ImmutableRobot_traditional):
return False
return self.name == other.name and self.brandname == other.brandname
def __hash__(self):
return hash((self.name, self.brandname))
x1 = ImmutableRobot_traditional("Marvin", "NanoGuardian XR-2000")
x2 = ImmutableRobot_traditional("Marvin", "NanoGuardian XR-2000")
print(x1 == x2)
OUTPUT:
True
print(x1.__hash__(), x2.__hash__())
OUTPUT:
-7581403571593916930 -7581403571593916930
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Upcoming online Courses
Immutable Classes for mappings
Having an immutable class with a __hash__
method means that we can use our class in sets and dictionaries. We illustrate this in the following example:
from dataclasses import dataclass
@dataclass(frozen=True)
class ImmutableRobot:
name: str
brandname: str
robot1 = ImmutableRobot("Marvin", "NanoGuardian XR-2000")
robot2 = ImmutableRobot("R2D2", "QuantumTech Sentinel-7")
robot3 = ImmutableRobot("Marva", "MachinaMaster MM-42")
# we create a set of Robots:
robots = {robot1, robot2, robot3}
print("The robots in the set robots:")
for robo in robots:
print(robo)
# now a dictionary with robots as keys:
activity = {robot1: 'activated', robot2: 'activated', robot3: 'deactivated'}
print("\nAll the activated robots:")
for robo, mode in activity.items():
if mode == 'activated':
print(f"{robo} is activated")
OUTPUT:
The robots in the set robots: ImmutableRobot(name='Marva', brandname='MachinaMaster MM-42') ImmutableRobot(name='R2D2', brandname='QuantumTech Sentinel-7') ImmutableRobot(name='Marvin', brandname='NanoGuardian XR-2000') All the activated robots: ImmutableRobot(name='Marvin', brandname='NanoGuardian XR-2000') is activated ImmutableRobot(name='R2D2', brandname='QuantumTech Sentinel-7') is activated
Summary
Using a dataclass provides several benefits over traditional class definitions in Python:
-
Automatic Generation of Special Methods: With dataclass, you don't need to manually write special methods like
__init__
,__repr__
,__eq__
,__hash__
, etc. Thedataclass
decorator automatically generates these methods based on the class attributes you define. -
Concise Syntax:
dataclass
uses a concise syntax for defining classes, reducing boilerplate code. You only need to specify the attributes of the class, and the decorator takes care of the rest. -
Built-in Comparison Methods:
dataclass
automatically provides implementations for comparison methods like__eq__
,__ne__
,__lt__
,__le__
,__gt__
, and__ge__
, based on the attributes of the class. -
Immutable Instances: By specifying
frozen=True
in thedataclass
decorator, you can make instances of the class immutable, which can help prevent accidental modification of data. -
Integration with Type Hinting:
dataclass
integrates seamlessly with Python's type hinting system, allowing you to specify the types of attributes for better code readability and static analysis. -
Default Values and Defaults Factory: You can specify default values for attributes directly within the class definition or by using factory functions, reducing the need for boilerplate code in the
__init__
method. -
Inheritance Support:
dataclass
supports inheritance, allowing you to create subclasses with additional attributes or methods while inheriting the behavior of the parent class. -
Customization: Although
dataclass
provides automatic generation of special methods, you can still customize or override these methods if needed, giving you flexibility in defining class behavior.
Overall, dataclass
simplifies the process of creating classes in Python, making code more concise, readable, and maintainable, especially for classes that primarily serve as containers for data.
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Exercises
Exercise 1: Book Information
Create a Book
class using dataclass
to represent information about books. Each book should have the following attributes:
- Title
- Author
- ISBN (International Standard Book Number)
- Publication Year
- Genre
Write a program that does the following:
- Define the
Book
class usingdataclass
. - Create instances of several books.
- Print out the details of each book, including its title, author, ISBN, publication year, and genre.
You can use this exercise to practice defining dataclass
, creating instances, and accessing attributes of dataclass
objects. Additionally, you can explore how to add methods or customizations to the Book
class, such as implementing a method to calculate the age of the book based on the publication year or adding validation for ISBN numbers.
Solutions
Solution 1
from dataclasses import dataclass
@dataclass
class Book:
title: str
author: str
isbn: str
publication_year: int
genre: str
# Create instances of several books
book1 = Book("The Great Gatsby", "F. Scott Fitzgerald", "9780743273565", 1925, "Fiction")
book2 = Book("To Kill a Mockingbird", "Harper Lee", "9780061120084", 1960, "Fiction")
book3 = Book("1984", "George Orwell", "9780451524935", 1949, "Science Fiction")
# Print out the details of each book
print("Book 1:")
print("Title:", book1.title)
print("Author:", book1.author)
print("ISBN:", book1.isbn)
print("Publication Year:", book1.publication_year)
print("Genre:", book1.genre)
print("\nBook 2:")
print("Title:", book2.title)
print("Author:", book2.author)
print("ISBN:", book2.isbn)
print("Publication Year:", book2.publication_year)
print("Genre:", book2.genre)
print("\nBook 3:")
print("Title:", book3.title)
print("Author:", book3.author)
print("ISBN:", book3.isbn)
print("Publication Year:", book3.publication_year)
print("Genre:", book3.genre)
OUTPUT:
Book 1: Title: The Great Gatsby Author: F. Scott Fitzgerald ISBN: 9780743273565 Publication Year: 1925 Genre: Fiction Book 2: Title: To Kill a Mockingbird Author: Harper Lee ISBN: 9780061120084 Publication Year: 1960 Genre: Fiction Book 3: Title: 1984 Author: George Orwell ISBN: 9780451524935 Publication Year: 1949 Genre: Science Fiction
Live Python training
Enjoying this page? We offer live Python training courses covering the content of this site.
Upcoming online Courses