The FAT Python project was started by Victor Stinner in October 2015 to try to solve issues of previous attempts of “static optimizers” for Python. Victor has created a set of changes to CPython (Python Enhancement Proposals or “PEPs”), some example optimizations and benchmarks. We’ll explore those 3 levels in this article. ### Optimizations in the AST > Not sure what the AST is? [Then read my other article first](https://hackernoon.com/modifying-the-python-language-in-7-minutes-b94b0a99ce14?gi=70b2d55c18a5). Transforming an Abstract Syntax Tree (AST) is a more logical way to optimize code. By staying at a higher-level you can make decisions traversing up, down and across a branch. For example, some optimizations which can be implemented with an AST optimizer: * [Copy propagation](https://en.wikipedia.org/wiki/Copy_propagation) : replace x=1; y=x with x=1; y=1 * [Constant folding](https://en.wikipedia.org/wiki/Constant_folding) : replace 1+1 with 2 * [Dead code elimination](https://en.wikipedia.org/wiki/Dead_code_elimination) Using **_guards_** (see next chapter), it is possible to implement a much wider choice of optimizations. Examples: * Simplify iterable: replace range(3) with (0, 1, 2) when used as iterable * [Loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling) * Call pure builtins: replace len(“abc”) with 3 * Copy used builtin symbols to constants ### How does FAT work FAT Optimizer is a set of static optimizations, based on 3 major changes to CPython. These changes may be merged into Python 3.7 or 3.8, but for now we have to compile them by hand. Why do we need to make changes to CPython? Well, CPython doesn’t have an API for optimizing the AST. The AST is not pluggable, so enter _PEP 511_.. #### PEP 511 PEP 511 is a proposal to add a process to optimize an AST instance. The AST instance is a object-oriented representation of your code. The optimizer could be generic, looking at general optimizations, or more bespoke. A bespoke optimizer could look at a set of domain specific changes, e.g. NumPy or Pandas “anti-patterns” and optimize them in the syntax tree. In replacement of a static linter that simply recommends changes, the optimizer could make those changes for you. PEP 511 draws out a change to the `sys` module, to set optimizers which can be used, then it provides the API for optimizer instances. In future, optimizers could be included in Python packages and shared on PyPi (pretty cool-huh!?). PEP 511 also assumes that in order to be effective, the optimizers will need some other changes to core CPython internals, enter PEP 509.. #### PEP 509 In Python, the builtin dict type is used by many instructions. For example, the `LOAD_GLOBAL` instruction looks up a variable in the global namespace, or in the builtins namespace (two dict lookups). Python uses dict for the builtins namespace, globals namespace, type namespaces, instance namespaces, etc. The local namespace (function namespace) is usually optimized to an array, but it can be a dict too. Python is hard to optimize because almost everything is **mutable**: builtin functions, function code, global variables, local variables, … can be modified at runtime. Implementing optimizations respecting the Python semantics requires to detect when “something changes”: we will call these checks “guards”. The speedup of optimizations depends on the speed of guard checks. PEP 509 proposes to add a **private** version to dictionaries to implement fast guards on namespaces. How do you get guards in Python? Enter PEP 510 #### PEP 510 PEP 510 proposes to add a _public_ _API_ to the Python C API to add specialized codes with guards to a function. When the function is called, a specialized code is used if nothing changed, otherwise use the original bytecode. > It’s simpler to see all this in action so let’s install those PEPs and test out an optimizer. ### Getting FAT running Compile a copy of CPython 3.6 patched with [PEP 509](https://www.python.org/dev/peps/pep-0509/), [PEP 510](https://www.python.org/dev/peps/pep-0510) and [PEP 511](https://www.python.org/dev/peps/pep-0511/). You’ll need a C compiler to do this, if you don’t have one already, [GCC is the easy option.](https://gcc.gnu.org/install/) git clone [https://github.com/haypo/cpython](https://github.com/haypo/cpython) -b fatpython fatpython cd fatpython ./configure CFLAGS='-O0' --enable-shared make > On my mac, I had to copy `_sysconfigdata_dm_darwin_darwin.py` from the build directory to the lib directory. Install fat. FAT is an implementation of PEP510 for specialised guards. git clone [https:**//**github**.**com**/**haypo**/**fat](https://github.com/haypo/fat) cd fat **../**python setup**.**py build cp **\-**v build**/**lib**\*/**fat**.\***so **../**Lib cd **..** > For OS X users, use `./python.exe` instead of `./python` throughout this article Install `fatoptimizer`, an implementation of PEP 511 with a set of optimizations. git clone [https:**//**github**.**com**/**haypo**/**fatoptimizer](https://github.com/haypo/fatoptimizer) cd Lib ln **\-**s **../**fatoptimizer**/**fatoptimizer **.** Now we have a working CPython executable with FAT. A simple test can show the guards described in PEP 510.  Now to compare speed, run a basic timeit, checking the call of a function that returns the length of a basic string. `$ ./python.exe -X fat -m timeit -s ‘def f(): return len(“abc”)’ ‘f()’ 2000000 loops, best of 5: 115 nsec per loop`  I tested this again (removing `-X fat` ) with CPython 3.6 and 2.7. That’s a 24% improvement over 3.6. The optimizations themselves are described in the [fatoptimizer documentation](https://fatoptimizer.readthedocs.io/en/latest/optimizations.html). ### How does it work? If you disassemble the basic test, it looks normal. Call a builtin function, return value. \>>> import dis \>>> dis.dis(func) 1 0 LOAD\_GLOBAL 0 (len) 3 LOAD\_CONST 1 ('abc') 6 CALL\_FUNCTION 1 (1 positional, 0 keyword pair) 9 RETURN\_VALUE But, with fat you can access the specialized (i.e. optimized) version of that function. This was generated when the method was loaded into the ast. Fat preserves the original ast and stores an optimized method. \>>> dis.dis(fat.get\_specialized(func)\[0\]\[0\]) 1 0 LOAD\_CONST 1 (3) 3 RETURN\_VALUE When we called timeit over our method, it was running the optimized function, which just returns 3 instead of calling `len` . It’s clear to see why that’s faster. However, in Python you can replace anything with anything, including a builtin in the global namespace. If you were to replace `len` with something else, it removes the guard and executes the original ast. ### What next for FAT? There are connections to the FAST\_METHODCALL implementation that are being reintroduced (I think) in 3.7. This means less overhead for calling methods, a big slowdown in CPython. Combining these 3 PEPs, we could see both implementation of guards as well as well as a range of optimizers out on PyPi.