Skip to content

Latest commit

 

History

History
461 lines (361 loc) · 21 KB

java.md

File metadata and controls

461 lines (361 loc) · 21 KB

CodeQL workshop for Java: Unsafe deserialization in Apache Struts

  • Analyzed language: Java
  • Difficulty level: 200

Overview

Problem statement

Serialization is the process of converting in memory objects to text or binary output formats, usually for the purpose of sharing or saving program state. This serialized data can then be loaded back into memory at a future point through the process of deserialization.

In languages such as Java, Python and Ruby, deserialization provides the ability to restore not only primitive data, but also complex types such as library and user defined classes. This provides great power and flexibility, but introduces a signficant attack vector if the deserialization happens on untrusted user data without restriction.

Apache Struts is a popular open-source MVC framework for creating web applications in Java. In 2017, a researcher from the predecessor of the GitHub Security Lab found CVE-2017-9805, an XML deserialization vulnerability in Apache Struts that would allow remote code execution.

The problem occurred because included as part of the Apache Struts framework is the ability to accept requests in multiple different formats, or content types. It provides a pluggable system for supporting these content types through the ContentTypeHandler interface, which provides the following interface method:

    /**
     * Populates an object using data from the input stream
     * @param in The input stream, usually the body of the request
     * @param target The target, usually the action class
     * @throws IOException If unable to write to the output stream
     */
    void toObject(Reader in, Object target) throws IOException;

New content type handlers are defined by implementing the interface and defining a toObject method which takes data in the specified content type (in the form of a Reader) and uses it to populate the Java object target, often via a deserialization routine. However, the in parameter is typically populated from the body of a request without sanitization or safety checks. This means it should be treated as "untrusted" user data, and only deserialized under certain safe conditions.

In this workshop, we will write a query to find CVE-2017-9805 in a database built from the known vulnerable version of Apache Struts.

Setup instructions for Visual Studio Code

To take part in the workshop you will need to follow these steps to get the CodeQL development environment setup:

  1. Install the Visual Studio Code IDE.
  2. Download and install the CodeQL extension for Visual Studio Code. Full setup instructions are here.
  3. Set up the starter workspace.
    • Important: Don't forget to git clone --recursive or git submodule update --init --remote, so that you obtain the standard query libraries.
  4. Open the starter workspace: File > Open Workspace > Browse to vscode-codeql-starter/vscode-codeql-starter.code-workspace.
  5. Download and unzip the apache_struts_cve_2017_9805.zip database.
  6. Choose this database in CodeQL (using Ctrl + Shift + P to open the command palette, then selecting "CodeQL: Choose Database").
  7. Create a new file in the codeql-custom-queries-java directory called UnsafeDeserialization.ql.

Documentation links

If you get stuck, try searching our documentation and blog posts for help and ideas. Below are a few links to help you get started:

Workshop

The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step. Each step has a hint that describes useful classes and predicates in the CodeQL standard libraries for Java. You can explore these in your IDE using the autocomplete suggestions (Ctrl + Space) and the jump-to-definition command (F12).

Section 1: Finding XML deserialization

XStream is a Java framework for serializing Java objects to XML used by Apache Struts. It provides a method XStream.fromXML for deserializing XML to a Java object. By default, the input is not validated in any way, and is vulnerable to remote code execution exploits. In this section, we will identify calls to fromXML in the codebase.

  1. Find all method calls in the program.

    Hint
    • A method call is represented by the MethodAccess type in the CodeQL Java library.
    Solution
    import java
    
    from MethodAccess call
    select call
  2. Update your query to report the method being called by each method call.

    Hints
    • Add a CodeQL variable called method with type Method.
    • MethodAccess has a predicate called getMethod() for returning the method.
    • Add a where clause.
    Solution
    import java
    
    from MethodAccess call, Method method
    where call.getMethod() = method
    select call, method
  3. Find all calls in the program to methods called fromXML.

    Hint
    • Method.getName() returns a string representing the name of the method.
    Solution
    import java
    
    from MethodAccess fromXML, Method method
    where
      fromXML.getMethod() = method and
      method.getName() = "fromXML"
    select fromXML

    However, as we now want to report only the call itself, we can inline the temporary method variable like so:

    import java
    
    from MethodAccess fromXML
    where fromXML.getMethod().getName() = "fromXML"
    select fromXML
  4. The XStream.fromXML method deserializes the first argument (i.e. the argument at index 0). Update your query to report the deserialized argument.

    Hint
    • MethodCall.getArgument(int i) returns the argument at the i-th index.
    • The arguments are expressions in the program, represented by the CodeQL class Expr. Introduce a new variable to hold the argument expression.
    Solution
    import java
    
    from MethodAccess fromXML, Expr arg
    where
      fromXML.getMethod().getName() = "fromXML" and
      arg = fromXML.getArgument(0)
    select fromXML, arg
  5. Recall that predicates allow you to encapsulate logical conditions in a reusable format. Convert your previous query to a predicate which identifies the set of expressions in the program which are deserialized directly by fromXML. You can use the following template:

    predicate isXMLDeserialized(Expr arg) {
      exists(MethodAccess fromXML |
        // TODO fill me in
      )
    }

    exists is a mechanism for introducing temporary variables with a restricted scope. You can think of them as their own from-where-select. In this case, we use it to introduce the fromXML temporary variable, with type MethodAccess.

    Hint
    • Copy the where clause of the previous query.
    Solution
    import java
    
    predicate isXMLDeserialized(Expr arg) {
      exists(MethodAccess fromXML |
        fromXML.getMethod().getName() = "fromXML" and
        arg = fromXML.getArgument(0)
      )
    }
    
    from Expr arg
    where isXMLDeserialized(arg)
    select arg

Section 2: Find the implementations of the toObject method from ContentTypeHandler

Like predicates, classes in CodeQL can be used to encapsulate reusable portions of logic. Classes represent single sets of values, and they can also include operations (known as member predicates) specific to that set of values. You have already seen numerous instances of CodeQL classes (MethodAccess, Method etc.) and associated member predicates (MethodAccess.getMethod(), Method.getName(), etc.).

  1. Create a CodeQL class called ContentTypeHandler to find the interface org.apache.struts2.rest.handler.ContentTypeHandler. You can use this template:

    class ContentTypeHandler extends RefType {
      ContentTypeHandler() {
          // TODO Fill me in
      }
    }
    Hint
    • Use RefType.hasQualifiedName(string packageName, string className) to identify classes with the given package name and class name. For example:
      from RefType r
      where r.hasQualifiedName("java.lang", "String")
      select r
    • Within the characteristic predicate you can use the magic variable this to refer to the RefType
    Solution
    import java
    
    /** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */
    class ContentTypeHandler extends RefType {
      ContentTypeHandler() {
        this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler")
      }
    }
  2. Create a CodeQL class called ContentTypeHandlerToObject for identfying Methods called toObject on classes whose direct super-types include ContentTypeHandler.

    Hint
    • Use Method.getName() to identify the name of the method.
    • To identify whether the method is declared on a class whose direct super-type includes ContentTypeHandler, you will need to:
      • Identify the declaring type of the method using Method.getDeclaringType().
      • Identify the super-types of that type using RefType.getASuperType()
      • Use instanceof to assert that one of the super-types is a ContentTypeHandler
    Solution
    /** A `toObject` method on a subtype of `org.apache.struts2.rest.handler.ContentTypeHandler`. */
    class ContentTypeHandlerToObject extends Method {
      ContentTypeHandlerToObject() {
        this.getDeclaringType().getASupertype() instanceof ContentTypeHandler and
        this.hasName("toObject")
      }
    }
  3. toObject methods should consider the first parameter as untrusted user input. Write a query to find the first (i.e. index 0) parameter for toObject methods.

    Hint
    • Use Method.getParameter(int index) to get the i-th index parameter.
    • Create a query with a single CodeQL variable of type ContentTypeHandlerToObject.
    Solution
    from ContentTypeHandlerToObject toObjectMethod
    select toObjectMethod.getParameter(0)

Section 3: Unsafe XML deserialization

We have now identified (a) places in the program which receive untrusted data and (b) places in the program which potentially perform unsafe XML deserialization. We now want to tie these two together to ask: does the untrusted data ever flow to the potentially unsafe XML deserialization call?

In program analysis we call this a data flow problem. Data flow helps us answer questions like: does this expression ever hold a value that originates from a particular other place in the program?

We can visualize the data flow problem as one of finding paths through a directed graph, where the nodes of the graph are elements in program, and the edges represent the flow of data between those elements. If a path exists, then the data flows between those two nodes.

Consider this example Java method:

int func(int tainted) {
   int x = tainted;
   if (someCondition) {
     int y = x;
     callFoo(y);
   } else {
     return x;
   }
   return -1;
}

The data flow graph for this method will look something like this:

drawing

This graph represents the flow of data from the tainted parameter. The nodes of graph represent program elements that have a value, such as function parameters and expressions. The edges of this graph represent flow through these nodes.

CodeQL for Java provides data flow analysis as part of the standard library. You can import it using semmle.code.java.dataflow.DataFlow. The library models nodes using the DataFlow::Node CodeQL class. These nodes are separate and distinct from the AST (Abstract Syntax Tree, which represents the basic structure of the program) nodes, to allow for flexibility in how data flow is modeled.

There are a small number of data flow node types – expression nodes and parameter nodes are most common.

In this section we will create a data flow query by populating this template:

/**
 * @name Unsafe XML deserialization
 * @kind problem
 * @id java/unsafe-deserialization
 */
import java
import semmle.code.java.dataflow.DataFlow

// TODO add previous class and predicate definitions here

class StrutsUnsafeDeserializationConfig extends DataFlow::Configuration {
  StrutsUnsafeDeserializationConfig() { this = "StrutsUnsafeDeserializationConfig" }
  override predicate isSource(DataFlow::Node source) {
    exists(/** TODO fill me in **/ |
      source.asParameter() = /** TODO fill me in **/
    )
  }
  override predicate isSink(DataFlow::Node sink) {
    exists(/** TODO fill me in **/ |
      /** TODO fill me in **/
      sink.asExpr() = /** TODO fill me in **/
    )
  }
}

from StrutsUnsafeDeserializationConfig config, DataFlow::Node source, DataFlow::Node sink
where config.hasFlow(source, sink)
select sink, "Unsafe XML deserialization"
  1. Complete the isSource predicate using the query you wrote for Section 2.

    Hint
    • You can translate from a query clause to a predicate by:
      • Converting the variable declarations in the from part to the variable declarations of an exists
      • Placing the where clause conditions (if any) in the body of the exists
      • Adding a condition which equates the select to one of the parameters of the predicate.
    • Remember to include the ContentTypeHandlerToObject class you defined earlier.
    Solution
      override predicate isSource(Node source) {
        exists(ContentTypeHandlerToObject toObjectMethod |
          source.asParameter() = toObjectMethod.getParameter(0)
        )
      }
  2. Complete the isSink predicate by using the final query you wrote for Section 1. Remember to use the isXMLDeserialized predicate!

    Hint
    • Complete the same process as above.
    Solution
      override predicate isSink(Node sink) {
        exists(Expr arg |
          isXMLDeserialized(arg) and
          sink.asExpr() = arg
        )
      }

You can now run the completed query. You should find exactly one result, which is the CVE reported by our security researchers in 2017!

For this result, it is easy to verify that it is correct, because both the source and sink are in the same method. However, for many data flow problems this is not the case.

We can update the query so that it not only reports the sink, but it also reports the source and the path to that source. We can do this by making these changes: The answer to this is to convert the query to a path problem query. There are five parts we will need to change:

  • Convert the @kind from problem to path-problem. This tells the CodeQL toolchain to interpret the results of this query as path results.
  • Add a new import DataFlow::PathGraph, which will report the path data alongside the query results.
  • Change source and sink variables from DataFlow::Node to DataFlow::PathNode, to ensure that the nodes retain path information.
  • Use hasFlowPath instead of hasFlow.
  • Change the select to report the source and sink as the second and third columns. The toolchain combines this data with the path information from PathGraph to build the paths.
  1. Convert your previous query to a path-problem query.

    Solution
    /**
    * @name Unsafe XML deserialization
    * @kind path-problem
    * @id java/unsafe-deserialization
    */
    import java
    import semmle.code.java.dataflow.DataFlow
    import DataFlow::PathGraph
    
    predicate isXMLDeserialized(Expr arg) {
      exists(MethodAccess fromXML |
        fromXML.getMethod().getName() = "fromXML" and
        arg = fromXML.getArgument(0)
      )
    }
    
    /** The interface `org.apache.struts2.rest.handler.ContentTypeHandler`. */
    class ContentTypeHandler extends RefType {
      ContentTypeHandler() {
        this.hasQualifiedName("org.apache.struts2.rest.handler", "ContentTypeHandler")
      }
    }
    
    /** A `toObject` method on a subtype of `org.apache.struts2.rest.handler.ContentTypeHandler`. */
    class ContentTypeHandlerToObject extends Method {
      ContentTypeHandlerToObject() {
        this.getDeclaringType().getASupertype() instanceof ContentTypeHandler and
        this.hasName("toObject")
      }
    }
    
    class StrutsUnsafeDeserializationConfig extends DataFlow::Configuration {
      StrutsUnsafeDeserializationConfig() { this = "StrutsUnsafeDeserializationConfig" }
      override predicate isSource(DataFlow::Node source) {
        exists(ContentTypeHandlerToObject toObjectMethod |
          source.asParameter() = toObjectMethod.getParameter(0)
        )
      }
      override predicate isSink(DataFlow::Node sink) {
        exists(Expr arg |
          isXMLDeserialized(arg) and
          sink.asExpr() = arg
        )
      }
    }
    
    from StrutsUnsafeDeserializationConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
    where config.hasFlowPath(source, sink)
    select sink, source, sink, "Unsafe XML deserialization"

For more information on how the vulnerability was identified, you can read the blog disclosing the original problem.

Although we have created a query from scratch to find this problem, it can also be found with one of our default security queries, UnsafeDeserialization.ql. You can see this on a vulnerable copy of Apache Struts that has been analyzed on LGTM.com, our free open source analysis platform.

What's next?