Anthony Shaw

@anthonypjshaw

5 speed improvements in Python 3.7

February 11th 2018
Python 3.7 is in Beta! It’s time to get testing… is it any faster?

Here are the major speed boosts you’ll get with Python 3.7 versus 3.6

Warning: some of the topics in this article are quite detailed and beyond the level I normally blog at. If you don’t know some of the terms or meanings, just download and explore the examples — keep reading and playing!
Time for some go faster stripes..

1. Calling methods faster (maybe)

The title of this change is “Speedup method calls 1.2x”, which is a bit misleading..

There are many ways to change CPython, either by modifying the execution of the Opcode, or by adding new Opcodes. Adding new Opcodes requires a lot of discussion and testing and this change introduces new Opcodes. Opcodes are selected by the compilation process in CPython. Once your code is converted into an Abstract-Syntax-Tree, the compiler explores each branch and converts it to Opcodes. The execution of your code goes through the Opcodes in a massive switch statement inside a loop and calls different C-functions for each opcode.

For reference, Python 3.6 has 3 Opcodes for calling functions. All of these were either added or modified in Python 3.6.

  • For calling positional argument-only functions: CALL_FUNCTION ,
  • For calling positional and keyword functions: CALL_FUNCTION_KW,
  • For calling variable positional or keyword functions:CALL_FUNCTION_EX

Python 3.7 adds 2 new Opcodes, LOAD_METHODand CALL_METHOD for when the compiler sees x.method(...) it uses these new Opcodes.

As an example calling 3 functions with different signatures:

Running this on Python 3.6 and Python 3.7 we can see no change in the resulting code, or performance.

Another example with bound methods (ie those belonging to an instance of a class),

The results of this show:

  • The new LOAD_METHOD opcode replaces loading bound methods as attributes and just calling them as normal functions. Remember, LOAD_METHOD and CALL_METHOD are faster than CALL_FUNCTION for instance methods.
  • Bound methods with keyword-arguments are the same as in Python 3.6, you won’t get any performance change.
  • Bound methods with no arguments are now faster

LOAD_METHOD replaces LOAD_ATTR , which is essentially getting the BoundMethod instance on the object instance. LOAD_METHOD is a copy of the logic in LOAD_ATTR but better optimised when the method hasn’t been overridden and it has positional arguments.

Coming out of this, you might have some questions

So if I put my functions into a class, will that make them faster?

No, because this speed boost is to remove object-related slow-downs

What is it about Keyword and Variable Arguments that they get special treatment?

Keyword arguments require special treatment in the execution loop because there is no equivalent in C (which CPython is written in), some extra code has to compile 2 tuples to pass to the method.

Variable arguments, whether positional or keyword also require special treatment.

So many caveats, I like keyword-arguments, will I see any difference?

This change should encourage you in class-design to follow the DRY (don’t repeat yourself) principle and add private methods that reduce duplication of logic across multiple public methods. Prior to 3.7, the performance hit would have been a strong consideration and copying+pasting code was an accepted practice where speed was required.

In future, we might see more scenarios undergo similar treatment.

2. str.find() is faster for some characters

Some unicode characters have an unfortunate issue when scanning a string for occurrences using str.find(x), seeing up to 25x slow down.

$ ./python -m perf timeit -s 's = "一丁丂七丄丅丆万丈三上下丌不与丏丐丑丒专且丕世丗丘丙业丛东丝丞丟丠両丢丣两严並丧丨丩个丫丬中丮丯丰丱串丳临丵丶丷丸丹为主丼丽举丿乀乁乂乃乄久乆乇么义乊之乌乍乐乑乒乓乔乕乖乗乘乙乚乛乜九乞也习乡乢乣乤乥书乧乨乩乪乫乬乭乮乯买乱乲乳乴乵乶乷乸乹乺乻乼乽乾乿亀亁亂亃亄亅了亇予争 亊事二亍于亏亐云互亓五井亖亗亘亙亚些亜亝亞亟亠亡亢亣交亥亦产亨亩亪享京亭亮亯亰亱亲亳亴亵亶亷亸亹人亻亼亽亾亿什仁仂仃仄仅仆仇仈仉今介仌仍从仏仐仑仒仓仔仕他仗付仙仚仛仜 仝仞仟仠仡仢代令以仦仧仨仩仪仫们仭仮仯仰仱仲仳仴仵件价仸仹仺任仼份仾仿"*100' -- 's.find("乎")'

Unpatched: Median +- std dev: 761 us +- 108 us
Patched: Median +- std dev: 117 us +- 9 us

In Python 3.7, the expected Unicode code-point size is no longer hard-coded and the methods are optimised long (mostly unusual) characters.

These are still slower, but now 3x slower than ASCII characters instead of 25x!

3. os.fwalk is 2x faster

The fwalk function in the os module (only in Python 3) is a directory-tree generator.

It behaves exactly like walk(), except that it yields a 4-tuple response (dirpath, dirnames, filenames, dirfd)

The change was to modify the implementation to use the scandir method instead of listdir, which is Operating-System optimised and much faster.

4. Regular expressions are faster*

In the regular-expression module (re) there is a method compile which compiles a regular-expression string and an optional set of flags. These flags can be RegEx flags, passed to the RegEx library.

A change was made in Python 3.6 which slowed down this call when flags were passed which were integers. Python 3.7 “fixes” the slowdown but is still not as fast as Python 3.5

* Faster than 3.6

5. Regular expressions are faster for case-insensitive matching

Per change change log

Matching and searching case-insensitive regular expressions is much slower than matching and searching case-sensitive regular expressions. Case-insensitivity requires converting every character in input string to lower case and disables some optimizations. But there are only 2669 cased characters (52 in ASCII mode). For all other characters in the pattern we can use case-sensitive matching.

The speed improvement is significant, if you’re matching ASCII characters you can see up to a 20x improvements in matching time since it’s now doing a lookup instead of running lower() over each character.

Still stuck on Python 2?

Check out my new course on Pluralsight for moving from Python 2 to 3.

More by Anthony Shaw

More Related Stories