This paper is available on arxiv under CC 4.0 license.
Authors:
(1) JOANNA C. S. SANTOS, University of Notre Dame, USA;
(2) MEHDI MIRAKHORLI, University of Hawaii at Manoa, USA;
(3) ALI SHOKRI, Virginia Tech, USA.
Seneca: Taint-Based Call Graph Construction for Object Deserialization
Conclusions, Acknowledgment, and References
Multiple programming languages (e.g., Ruby, Python, PHP, and Java) allow objects to be converted into an abstract representation, a process called object serialization (or “marshalling”). The process of reconstructing an object from its underlying abstract representation is called object deserialization (or “unmarshalling”). Serialization and deserialization of objects are widely used for inter-process communication and for improving the codes’ performance by saving objects to be reused later (e.g., saving machine learning models [Ten 2023]).
During object serialization/deserialization, methods from the objects’ classes may be invoked. For instance, classes’ constructors, getter/setter methods, or methods with specific signatures may be invoked when reconstructing the object. These are the callback methods of the serialization/deserialization mechanism. Each programming language has their own serialization/deserialization protocol, abstract representation, and callback methods. The Java’s default serialization and deserialization mechanism is thoroughly described at their specification page [Oracle 2010]. We briefly present this mechanism in the next subsection.
The default Java’s Serialization API converts a snapshot of an object graph into a byte stream. During this process only data is serialized (i.e., the object’s fields) whereas the code associated with the object’s class (i.e., methods) is within the classpath of the receiver [Schneider and Muñoz 2016]. All non-transient and non-static fields are serialized by default.
The classes ObjectInputStream and ObjectOutputStream can be used for deserializing and serializing an object, respectively. They can only serialize/deserialize objects whose class implements the java.io.Serializable interface. If implemented by a Serializable class, the methods listed below can be invoked by Java during object serialization and/or deserialization:
• void writeObject(ObjectOutputStream): it customizes the serialization of the object’s state.
• Object writeReplace(): this method replaces the actual object that will be written in the stream.
• void readObject(ObjectInputStream): it customizes the retrieval of an object’s state from the stream.
• void readObjectNoData(): in the exceptional situation that a receiver has a subclass in its classpath but not its super class, this method is invoked to initialize the object’s state.
• Object readResolve(): this is the inverse of writeResolve. It allows classes to replace a specific instance that is being read from the stream.
• void validateObject(): it validates an object after it is deserialized. For this callback to be invoked, the class has to also implement the ObjectInputValidation interface and register the validator by invoking the method registerValidation from the ObjectInputStream class.
Figures 1 and 2 depicts the sequence of these callback methods invocations. As depicted in this figure, during serialization of an object, the callback methods writeReplace and writeObject are invoked (if these are implemented by the class of the object being deserialized). Similarly, during object deserializaton, four callback methods can be invoked, namely, readObject, readObjectNoData, readResolve, and validateObject.
Listing 1 has three serializable classes[2]: Dog, Cat and Shelter. Two of these classes have serialization callback methods (lines 5-10 and 13-14). The code at line 21-26 serializes a Shelter object
s1 into a file, whose path is provided as program arguments. The code instantiates a FileOutputStream and passes the instance to an ObjectOutputStream’s constructor during its instantiation. Then, it calls writeObject(s1), which serializes s1 as a byte stream and saves it into a file. Since the object s1 has a list field (pets) that contains two objects (a Cat and a Dog instance) the callback methods of these classes invoked.
2.2.1 Untrusted Object Deserialization To illustrate how a seemingly harmless mechanism can lead to serious vulnerabilities, consider the case that the program in Listing 1 contains two more serializable classes (CacheManager and Task), as shown in Listing 2. An attacker would create a CacheManager object (cm) as shown in Figure 3. Then, the attacker serializes and encodes this malicious object (cm) into a text file and specifies it as a program argument for the main method in Listing 1. When the program reads the object from the file, it triggers the chain of method calls depicted in
Figure 3. This sequence of method calls ends in an execution sink (Runtime.getRuntime.exec() on line 8 of the Task class in Listing 2).
Listing 2. Gadget classes that can be used to exploit an untrusted object deserialization vulnerablity
Although this request with a malicious serialized object results in a ClassCastException, the malicious command will be executed anyway, because the type cast check occurs a!er the deserialization process took place. As we can see from this example, classes can be specially combined to create a chain of method calls. These classes are called “gadget classes” as they are used to bootstrap a chain of method calls that will end in an execution sink.
From the examples shown in Section 2.2, we observe two major challenges that should be handled by a static analyzer in order to construct a sound call graph with respect to serialization-related features: (i) the callback methods that are invoked during object serialization/deserialization; and (ii) the fields within the class can be allocated in unexpected ways, and they dictate which callbacks are invoked at runtime. For instance, if the code snippet in Listing 1 had only the cat object in the list (line 22), then the calls to readResolve/writeReplace methods in Dog would not be made.
Existing pointer analysis algorithms leverage on allocation instructions (i.e., new T()) within the program to infer the possible runtime types for objects [Bastani et al. 2019; Feng et al. 2015; Heintze and Tardieu 2001; Hind 2001; Kastrinis and Smaragdakis 2013; Lhoták and Hendren 2006; Rountev et al. 2001; Smaragdakis and Kastrinis 2018]. However, as we demonstrated in the examples, the allocations of objects and their fields and invocations to callback methods are made on-the-"y by Java’s serialization/deserialization mechanism. During static analysis, we can only pinpoint that there is an InputStream object that provides a stream of bytes from a source (e.g., a file, socket, etc.) to an ObjectInputStream instance, but the contents of this stream is uncertain. Hence, the deserialized object and its state are unknown (i.e., the allocations within its fields). As a result, existing static analyses fail to support serialization-related features.
[2] We only show their fields and callback methods due to space constraints.