A Brand-new feature in Python 3.7 is “Data Classes”. Data classes are a way of automating the generation of boiler-plate code for classes which store multiple properties.
They also carry the benefit of using Python 3’s new type hinting.
Dataclasses come in the new dataclasses
module within the standard library in Python 3.7 and there are 2 important things you’ll need.
dataclass
decorator, for decorating a data classfield
method, for configuring fieldsIn the default setting, any dataclass will implement __init__
, __repr__
, __str__
and __eq__
for you.
The __init__
method will have keyword-arguments with the same type annotations that are specified on the class.
The __eq__
method will compare all dataclass attributes in order.
All fields are declared at the top of the class and type hinting is required.
This __init__
method will have a signature of (field_a: int, field_b: str) -> None
. You can see this by just typing print(inspect.signature(example.__init__))
Quite importantly, the type hints are merely hints. So giving the wrong types doesn’t issue a warning or attempt a conversion.
Because type hinting is required (otherwise the field is ignored), if you don’t have a specific type, use the Any
type from the typing
module.
The dataclass decorator has a frozen
argument, which is False by default. If specified, fields will be “frozen”, ie read-only and if eq
is set to True, which it is by default then the __hash__
magic will be implemented and object instances will be hashable so you can use them as dictionary keys or within a set.
The core type in dataclasses is the Field
type, which belongs to a dataclass.
By default, just setting a class attribute will instantiate a Field on your class as shown in previous examples.
If you need to customise the behaviour, you can use the field factory inside the dataclasses
module.
The parameters to field() are:
default
: If provided, this will be the default value for this field. This is needed because the field call itself replaces the normal position of the default value.default_factory
: A 0-argument callable that will be called when a default value is needed for this field.init
: Included as a parameter to the generated __init__ method.repr
: Included in the string returned by the generated __repr__ method.compare
: Included in the generated equality and comparison methods (__eq__, __gt__, et al.).hash
: Included in the generated __hash__ method.There is also another argument, metadata which is not in use yet.
Similar to keyword arguments, fields with default values must be declared last.
Demonstrating the default factory argument,
You can declare a __post_init__
method, which will run after the auto-generated __init__
.
Inheritance works as you would expect. You need to wrap the classes in dataclass
for the inherited and the base class definitions.
Although, because you can’t declare a non-default field after a default one, you can’t mix default and non-default fields between base and child classes.
This example raises TypeError: non-default argument ‘field_a’ follows default argument
This is pretty annoying and probably going to stop people from using either inheritance or default fields too much.
All-in-all, this is a great feature and I’ll likely stop using attrs once Python 3.7 is released.
Check out my new course on Pluralsight for moving from Python 2 to 3.