This paper is available on arxiv under CC 4.0 license.
Authors:
(1) JOANNA C. S. SANTOS, University of Notre Dame, USA;
(2) MEHDI MIRAKHORLI, University of Hawaii at Manoa, USA;
(3) ALI SHOKRI, Virginia Tech, USA.
Seneca: Taint-Based Call Graph Construction for Object Deserialization
Conclusions, Acknowledgment, and References
This section discusses relevant works related to object deserialization and call graph construction.
Call graphs are a core data structure for multiple analyses. Thus, previous works focused on devising algorithms for their construction. Among these works, we have CHA [Dean et al. 1995] and RTA [Bacon and Sweeney 1996], which are two well-known algorithms that over-approximates possible call paths by relying on methods’ signatures. Since these algorithms are overly conservative, multiple works discussed frameworks to make them more precise [Grove and Chambers 2001; Grove et al. 1997; Tip and Palsberg 2000]. Moreover, previous research also focused on creating application-only call graphs, that disregard unnecessary library classes, while keeping on the graph the nodes and edges that are important for the underlying analysis [Ali and Lhoták 2012]. In this paper, we focused on solving the challenge of computing call graphs that are sound concerning object serialization and deserialization.
Previous research on static analysis also explored the challenges involving supporting re"ection features [Bodden et al. 2011; Li et al. 2014, 2019; Smaragdakis et al. 2015], dynamic proxies [Fourtounis et al. 2018], enterprise frameworks [Antoniadis et al. 2020] and RMI-based programs [Sharp and Rountev 2006]. These approaches involve making assumptions when performing the analysis, to create analyses that are not overly imprecise. Unlike these prior works, however, we focused on object deserialization that has its own unique challenges, as described in Section 2.3.
Multiple characteristics of call graphs (e.g., precision, soundness, performance, and recall) have been widely studied in the past [Ali et al. 2019; Kummita et al. 2021; Murphy et al. 1998; Sui et al. 2020]. Murphy et al. [Murphy et al. 1998] studied multiple call graph construction approaches for C programs, finding discrepancies among the generated call graphs across different approaches. Sui et al. [Sui et al. 2018] focused on the support for dynamic language features, aiming to create a benchmark for dynamic features for Java.
There is a line of research that explored call graph’s soundness of Java (or JVM-like) programs [Ali et al. 2019; Reif et al. 2019, 2018]. In particular, recent empirical studies [Reif et al. 2019, 2018] show that although serialization-related features are widely used, they are not well-supported in existing approaches. Thus, we built an approach to enhance existing points-to analysis to support the construction of sound call graphs with respect to serialization-related features.
Many works explored the problem of performing pointer analysis of programs [Bastani et al. 2019; Feng et al. 2015; Heintze and Tardieu 2001; Hind 2001; Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Rountev et al. 2001; Smaragdakis and Kastrinis 2018]. These approaches focus on computing over- or under-approximations to improve one or more aspects of the analysis, such as its soundness, precision, performance, and scalability. Existing pointer analysis approaches make the sets finite such that the problem can be algorithmically solvable. In this paper, however, we focus on aiding points to analysis to soundly handle serialization-related features in a program, which are currently not well-supported because it relies on reflection [Reif et al. 2018].
More recently there were approaches published that aimed at detecting untrusted object deserialization for PHP [Koutroumpouchos et al. 2019; Shahriar and Haddad 2016] and .NET [Shcherbakov and Balliu 2021]. Shcherbakov and Balliu [Shcherbakov and Balliu 2021] described an approach to semi-automatically detect and exploit object injection vulnerabilities .NET applications. It relies on existing publicly available gadgets to perform the detection and exploitation. Koutroumpouchos et al. described ObjectMap [Koutroumpouchos et al. 2019] which is tool that performs black-box analysis of Web applications to pinpoint potential insecure deserialization vulnerabilities. It works by inserting payloads into the parameters of HTTP GET/POST requests and then monitoring the target web application for errors to infer whether the application is vulnerable or not.
Recent works [Cao et al. 2023; Haken 2018; Rasheed and Dietrich 2020] focused on deserialization vulnerabilities in Java programs. Rasheed and Dietrich [Rasheed and Dietrich 2020] described a hybrid approach that first performs a static analysis of a Java program to find potential call chains that can lead to sinks, where reflective method calls are made. It then uses the results of the static analysis to perform fuzzing in order to generate malicious objects.
Unlike these prior works, we aimed to create an approach that can create sound call graphs with respect to serialization-related features. Our call graph is intended to be used by downstream client analyses, including, but not limited to, vulnerability detection.
In the past few years, there was a spike of vulnerabilities associated with deserialization of objects [Cifuentes et al. 2015]. Thus, existing works also studied vulnerabilities rooted at untrusted deserialization vulnerabilities [Dietrich et al. 2017a; Peles and Hay 2015]. Pele et al. [Peles and Hay 2015] conducted an empirical investigation of deserialization of pointers that lead to vulnerabilities in Android applications and SDKs. Dietrich et al. [Dietrich et al. 2017a] demonstrated how seemingly innocuous objects trigger vulnerabilities when deserialized, leading to denial of service attacks. In this paper, we describe an approach that could help client analyses focused on detecting instances of untrusted object deserialization.