Anthony Shaw

@anthonypjshaw

A brief tour of Python 3.7 data classes

A Brand-new feature in Python 3.7 is “Data Classes”. Data classes are a way of automating the generation of boiler-plate code for classes which store multiple properties.

They also carry the benefit of using Python 3’s new type hinting.

Dataclasses come in the new dataclasses module within the standard library in Python 3.7 and there are 2 important things you’ll need.

  1. The dataclass decorator, for decorating a data class
  2. The field method, for configuring fields

Default magic methods

In the default setting, any dataclass will implement __init__, __repr__, __str__ and __eq__ for you.

The __init__ method will have keyword-arguments with the same type annotations that are specified on the class.

The __eq__ method will compare all dataclass attributes in order.

All fields are declared at the top of the class and type hinting is required.

This __init__ method will have a signature of (field_a: int, field_b: str) -> None. You can see this by just typing print(inspect.signature(example.__init__))

Type hinting

Quite importantly, the type hints are merely hints. So giving the wrong types doesn’t issue a warning or attempt a conversion.

Because type hinting is required (otherwise the field is ignored), if you don’t have a specific type, use the Any type from the typing module.

Mutability

The dataclass decorator has a frozen argument, which is False by default. If specified, fields will be “frozen”, ie read-only and if eq is set to True, which it is by default then the __hash__ magic will be implemented and object instances will be hashable so you can use them as dictionary keys or within a set.

Customizing the fields

The core type in dataclasses is the Field type, which belongs to a dataclass.

By default, just setting a class attribute will instantiate a Field on your class as shown in previous examples.

If you need to customise the behaviour, you can use the field factory inside the dataclasses module.

The parameters to field() are:

  • default: If provided, this will be the default value for this field. This is needed because the field call itself replaces the normal position of the default value.
  • default_factory: A 0-argument callable that will be called when a default value is needed for this field.
  • init: Included as a parameter to the generated __init__ method.
  • repr: Included in the string returned by the generated __repr__ method.
  • compare: Included in the generated equality and comparison methods (__eq__, __gt__, et al.).
  • hash: Included in the generated __hash__ method.

There is also another argument, metadata which is not in use yet.

Similar to keyword arguments, fields with default values must be declared last.

Demonstrating the default factory argument,

Post-Init Processing

You can declare a __post_init__ method, which will run after the auto-generated __init__.

Inheritance

Inheritance works as you would expect. You need to wrap the classes in dataclass for the inherited and the base class definitions.

Although, because you can’t declare a non-default field after a default one, you can’t mix default and non-default fields between base and child classes.

This example raises TypeError: non-default argument ‘field_a’ follows default argument

This is pretty annoying and probably going to stop people from using either inheritance or default fields too much.

All-in-all, this is a great feature and I’ll likely stop using attrs once Python 3.7 is released.

Still stuck on Python 2?

Check out my new course on Pluralsight for moving from Python 2 to 3.

More by Anthony Shaw

Topics of interest

More Related Stories