Providing attribute access to Python dictionary entries
Thanks to Asher Sterkin, BlackSwan Technologies SVP of Engineering, who proposed the idea for this project and provided invaluable guidance on its development. The project is an offshoot of BlackSwan’s work developing a Cloud AI Operating System, or CAIOS, which is intended to provide 10x productivity improvements when coding for cloud/serverless environments.
JavaScript has advantages over native Python when it comes to accessing attribute values in a dictionary object. In this article, we will demonstrate how to achieve the same level of usability and performance in Python as with JavaScript.
JavaScript Dictionary Access
With JavaScript, key/value pairs can be accessed directly from a dictionary object either through the indexer or as a property of the object.
var dict = {FirstName: “Chris”, “one”: 1, 1: “some value”};
// using indexer
var name = dict[“FirstName”];
// as property
var name = dict.FirstName;
In other words, in JavaScript, one could use dict.x and dict[‘x’] or dict[y] where y=’x’ interchangeably.
Python Dictionary
Even though it is possible to access object attributes by obj.attr notation, it does not work for dictionaries.
In the dictionary, you can get the value using the following methods:
dict = {“Name”: "Chris", "Age": 25,}dict[’Name’]
dict[x] where x=’Name’
dict.get(‘Name’, default) or dict.get(x, default)
Web API and Configuration Files
When using Python, almost all external documents are converted through one of these formats into dictionaries: JSON/YAML configuration files, messages exchanged via Web API, or AWS lambda events. XML sometimes is also used.
AWS SDK
Our team often has to work with deeply nested files like data coming from the AWS SDK or as an event parameter of the Lambda function handler.
{
'Buckets': [{
'Name': 'string',
'CreationDate': datetime(2015, 1, 1)
}],
'Owner': {
'DisplayName': 'string',
'ID': 'string'
}
}
Code Write/Read Speed Optimization
The problem is work efficiency. For example, JavaScript notation requires only 75% (one dot character vs two brackets and quotes) of the writing and reading overhead when compared to Python.
In order to provide non-trivial access to attributes in Python, one has to implement two magic methods: __getattr__ and __setattr __.
Based on the discussion above, we need to extend the behavior of the existing dict class with these two magic methods. The adapter design pattern accomplishes this task. There are two options to consider: Object Adapter or Class Adapter.
Applying the Object Adapter design pattern means wrapping the original dict object with an external one and implementing the required magic methods.
Python collections.abc
One possibility is to implement Mapping and Mutable Mapping abstractions from the collections.abc module, then to add __getattr__ and __setattr__ magic methods to them. Indeed, that was how the initial version of jdict was implemented.
This method turned out to be heavyweight and inefficient:
UserDict
UserDict is another possible form of Object Adapter for a Python dictionary. In this case, it comes from the Python standard library.
Using this option does not offer any significant advantage, since:
Named Tuples
Another idea was to make the dictionary behave like named tuples, which supports attribute-level access.
This approach also turned out to be ineffective:
After completing our the research, we came to the conclusion that applying
the Class Adapter design pattern has the best potential.
The class adapter uses inheritance and can only extend the base class and supply additional functionality to it.
This is how our Class Adapter code looks:
from typing import Any, Union
from copy import deepcopy
import json
class jdict(dict):
"""
The class gives access to the dictionary through the attribute name.
"""
def __getattr__(self, name: str) -> Union[Any]:
try:
return self.__getitem__(name)
except KeyError:
raise AttributeError(name + ' not in dict')
def __setattr__(self, key: str, value: Any) -> None:
self.__setitem__(key, value)
__deepcopy__
def __deepcopy__(self, memo):
return jdict((k, deepcopy(v, memo)) for k,v in self.items())
We also added the __deepcopy__ method to the adapter. Without this magic method deepcopy() a jdict object will produce a dict object, thus
losing the advantage of attribute-level access.
from caios.jdict import jdict
import copy
py_dict = dict(a = [1, 2, 3], b = 7)
j_dict = jdict(a = [1, 2, 3], b = 7)
py_copy = copy.deepcopy(py_dict)
j_copy = copy.deepcopy(j_dict)
print(type(py_copy))
<class 'dict'>
print(type(j_copy))
<class 'caios.jdict.jdict.jdict'>
While applying the Class Adapter design pattern turned out to be the optimal starting point, it still left open the question of how to deal with nested data structures. In other words, what should be done about having jdict containing another dict.
In order to solve this problem, we need to consider separately JSON object deserialization and explicit creation of a dict somewhere in the underlying SDK.
JSON Decoding
When working with data that we receive from external sources in JSON format, the following translations are performed by default when decoding in python:
An object_pairs_hook, if specified, will be called with the result of every JSON object decoded with an ordered list of pairs. The return value of
object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders. If object_hook also is defined, then the object_pairs_hook takes priority.
Thus, we utilize this hook in order to create jdict instead of dict during JSON decoding, This approach covers 80% of the cases we practically have to deal with.
Botocore Patch
The object pairs hook mentioned above, however, does not help with Boto3 SDK. The reason for this is that AWS service APIs return XML instead of JSON, and the results are parsed by the BaseXMLResponseParser, which creates and populates the dict object directly.
Structure of Python
Since in this case the JSON hook does not help, we need to look at automatic rewriting of compiled Python code.
To understand how Python works and how we can solve this problem, let’s
look at the full path of the program from source code to execution.
Abstract Syntax Tree (AST)
To solve the problem, based on the structure of the full path of the program from source code to execution, we need to replace the code inside the AST. By traversing the AST, we will change the regular dictionary to jdict. Thus, the Boto3 SDK will return the jdict, as is required.
Below is the code of the class that walks through the abstract syntax tree and changes the Python dictionary to jdict.
import ast
from typing import Any
class jdictTransformer(ast.NodeTransformer):
"""
The visitor class of the node that traverses the abstract syntax tree and calls the visitor function for each node found. Inherits from class NodeTransformer.
"""
def visit_Module(self, node: Any) -> Any:
node = self.generic_visit(node)
import_node = ast.ImportFrom(module='caios.jdict.jdict',
names=[ast.alias(name='jdict')],
level=0)
node.body.insert(0, import_node)
return node
def visit_Dict(self, node: Any) -> Any:
node = self.generic_visit(node)
name_node = ast.Name(id='jdict', ctx=ast.Load())
new_node = ast.Call(func=name_node, args=[node], keywords=[])
return new_node
Patch Module
Using AST, we created a patch for the module botocore. To convert XML to jdict in runtime:
def patch_module(module: str) -> None:
parsers = sys.modules[module]
filename = parsers.__dict__[‘__file__’]
src = open(filename).read()
inlined = transform(src)
code = compile(inlined, filename, ‘exec’)
exec(code, vars(parsers))
In this case, we are patching the botocore parsers file.
import boto3
import caios.jdict
caios.jdict.patch_module(‘botocore.parsesrs’)
There are several limitations to the method above:
To Be Pursued
At the moment, our program does not use such configuration files as YAML
(we don’t need them at the moment). Also, the program does not support
csv and tables. We are currently in the development of a program that will work with AWS tables.
While working on this project, we did not discover any suitable third-party
libraries to utilize . At the time of final writing for this article, I did, in fact, encounter several possibilities, namely:
In our project, we conceivably could use any of these options. All three
are based on the idea of creating an adapter and overriding the dictionary functions in it. Plus, some of them add functionality that is not required for our work.