Python 3.7 is in Beta! It’s time to get testing… is it any faster?
Here are the major speed boosts you’ll get with Python 3.7 versus 3.6
Warning: some of the topics in this article are quite detailed and beyond the level I normally blog at. If you don’t know some of the terms or meanings, just download and explore the examples — keep reading and playing!
Time for some go faster stripes..
The title of this change is “Speedup method calls 1.2x”, which is a bit misleading..
There are many ways to change CPython, either by modifying the execution of the Opcode, or by adding new Opcodes. Adding new Opcodes requires a lot of discussion and testing and this change introduces new Opcodes. Opcodes are selected by the compilation process in CPython. Once your code is converted into an Abstract-Syntax-Tree, the compiler explores each branch and converts it to Opcodes. The execution of your code goes through the Opcodes in a massive switch statement inside a loop and calls different C-functions for each opcode.
For reference, Python 3.6 has 3 Opcodes for calling functions. All of these were either added or modified in Python 3.6.
CALL_FUNCTION
,CALL_FUNCTION_KW
,CALL_FUNCTION_EX
Python 3.7 adds 2 new Opcodes, [LOAD_METHOD](https://github.com/python/cpython/blob/master/Python/ceval.c#L3021-L3105)
and [CALL_METHOD](https://github.com/python/cpython/blob/master/Python/ceval.c#L3021-L3105)
for when the compiler sees x.method(...)
it uses these new Opcodes.
As an example calling 3 functions with different signatures:
Running this on Python 3.6 and Python 3.7 we can see no change in the resulting code, or performance.
Another example with bound methods (ie those belonging to an instance of a class),
The results of this show:
LOAD_METHOD
opcode replaces loading bound methods as attributes and just calling them as normal functions. Remember, LOAD_METHOD
and CALL_METHOD
are faster than CALL_FUNCTION
for instance methods.
LOAD_METHOD
replaces LOAD_ATTR
, which is essentially getting the BoundMethod instance on the object instance. [LOAD_METHOD](https://github.com/python/cpython/blob/master/Objects/object.c#L1074-L1155)
is a copy of the logic in LOAD_ATTR
but better optimised when the method hasn’t been overridden and it has positional arguments.
Coming out of this, you might have some questions
No, because this speed boost is to remove object-related slow-downs
Keyword arguments require special treatment in the execution loop because there is no equivalent in C (which CPython is written in), some extra code has to compile 2 tuples to pass to the method.
Variable arguments, whether positional or keyword also require special treatment.
This change should encourage you in class-design to follow the DRY (don’t repeat yourself) principle and add private methods that reduce duplication of logic across multiple public methods. Prior to 3.7, the performance hit would have been a strong consideration and copying+pasting code was an accepted practice where speed was required.
In future, we might see more scenarios undergo similar treatment.
Some unicode characters have an unfortunate issue when scanning a string for occurrences using str.find(x)
, seeing up to 25x slow down.
$ ./python -m perf timeit -s 's = "一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝丞丟丠両丢丣两严並丧丨丩个丫丬中丮丯丰丱串丳临丵丶丷丸丹为主丼丽举丿乀乁乂乃乄久乆乇么义乊之乌乍乐乑乒乓乔乕乖乗乘乙乚乛乜九乞也习乡乢乣乤乥书乧乨乩乪乫乬乭乮乯买乱乲乳乴乵乶乷乸乹乺乻乼乽乾乿亀亁亂亃亄亅了亇予争 亊事二亍于亏亐云互亓五井亖亗亘亙亚些亜亝亞亟亠亡亢亣交亥亦产亨亩亪享京亭亮亯亰亱亲亳亴亵亶亷亸亹人亻亼亽亾亿什仁仂仃仄仅仆仇仈仉今介仌仍从仏仐仑仒仓仔仕他仗付仙仚仛仜 仝仞仟仠仡仢代令以仦仧仨仩仪仫们仭仮仯仰仱仲仳仴仵件价仸仹仺任仼份仾仿"*100' -- 's.find("乎")'
Unpatched: Median +- std dev: 761 us +- 108 usPatched: Median +- std dev: 117 us +- 9 us
In Python 3.7, the expected Unicode code-point size is no longer hard-coded and the methods are optimised long (mostly unusual) characters.
These are still slower, but now 3x slower than ASCII characters instead of 25x!
The fwalk
function in the os
module (only in Python 3) is a directory-tree generator.
It behaves exactly like walk(), except that it yields a 4-tuple response (dirpath, dirnames, filenames, dirfd)
The change was to modify the implementation to use the scandir
method instead of listdir
, which is Operating-System optimised and much faster.
In the regular-expression module (re
) there is a method compile
which compiles a regular-expression string and an optional set of flags. These flags can be RegEx flags, passed to the RegEx library.
A change was made in Python 3.6 which slowed down this call when flags were passed which were integers. Python 3.7 “fixes” the slowdown but is still not as fast as Python 3.5
* Faster than 3.6
Per change change log
Matching and searching case-insensitive regular expressions is much slower than matching and searching case-sensitive regular expressions. Case-insensitivity requires converting every character in input string to lower case and disables some optimizations. But there are only 2669 cased characters (52 in ASCII mode). For all other characters in the pattern we can use case-sensitive matching.
The speed improvement is significant, if you’re matching ASCII characters you can see up to a 20x improvements in matching time since it’s now doing a lookup instead of running lower()
over each character.
Check out my new course on Pluralsight for moving from Python 2 to 3.