Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Document fixes for deserialization vulnerabilities by framework #11700

Conversation

JLLeitschuh
Copy link
Contributor

@JLLeitschuh JLLeitschuh commented Dec 14, 2022

Related #11603

@JLLeitschuh JLLeitschuh requested a review from a team as a code owner December 14, 2022 19:37
@JLLeitschuh JLLeitschuh force-pushed the doc/JLL/improve-java-unsafe-deserialization-documentation branch from c74e10b to 2efb5e9 Compare December 14, 2022 19:37
Comment on lines 95 to 103
<tr>
<td>Kryo</td>
<td>com.esotericsoftware:kryo and com.esotericsoftware:kryo5</td>
<td>com.esotericsoftware:kryo versions after 5.0 Yes; com.esotericsoftware:kryo5 Yes</td>
<td>Don't call <code>com.esotericsoftware.kryo(5).Kryo#setRegistrationRequired</code> with the argument <code>false</code>.</td>
</tr>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some research on this and Kryo is no longer vulnerable by default. It seems that sometime between the end of the 4.x release line and the latest release this was fixed to be secure by default.

Currently, the Kryo query still present in CodeQL checks for this incorrectly.

CodeQL currently attempts to detect calls to setRegistrationRequired(true) for the use case to be safe. But actually, the library currently requires the user to call setRegistrationRequired(false) to be vulnerable. The logic in CodeQL needs to be inverted to be correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to track down the exact version this was fixed. I've reached out to the maintainers here: EsotericSoftware/kryo#929

Copy link
Contributor Author

@JLLeitschuh JLLeitschuh Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with the maintainer that versions 5.0.0 and later are all secure-by-default.

Comment on lines 83 to 88
<tr>
<td>SnakeYAML</td>
<td>org.yaml:snakeyaml</td>
<td><a href="https://bitbucket.org/snakeyaml/snakeyaml/wiki/CVE%20&amp;%20NIST.md">No</a>. <a href="https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in">Maintainer response</a>.</td>
<td>Instantiate the <code>org.yaml.snakeyaml.Yaml</code> instance explicitly with an instance of <code>org.yaml.snakeyaml.constructor.SafeConstructor</code> as an argument.</td>
</tr>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been my pet project the past two weeks. I'm still working to convince the maintainer to make this secure by default. If this does end up happening, I'll update this documentation.

@JLLeitschuh JLLeitschuh force-pushed the doc/JLL/improve-java-unsafe-deserialization-documentation branch 2 times, most recently from e6a9c6b to 4e60f89 Compare December 14, 2022 20:01
@smowton
Copy link
Contributor

smowton commented Dec 16, 2022

@JLLeitschuh we would rather not have specific research re: when and how particular libs are vulnerable in qhelp files -- it would be better to get documentation such as this published by a security research organisation (e.g. integrated into https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html ?), then cite that from our qhelp.

@JLLeitschuh
Copy link
Contributor Author

This seems like a contradiction. All of the QHelp files have bad and good examples. How is this any different?

This knowledge is currently heavily scattered across the internet and very difficult to track down. Also, the current CodeQL deserialization QHelp is resulting in incorrect understandings by end users. As an example, the Oracle team incorrectly understood the SnakeYaml vulnerability due to CodeQLs documentation.

https://www.websec.ca/publication/Blog/CVE-2022-21404-Another-story-of-developers-fixing-vulnerabilities-unknowingly-because-of-CodeQL

@smowton
Copy link
Contributor

smowton commented Dec 16, 2022

@JLLeitschuh I think the goal is that qhelp should contain a general example of the class of vulnerability and how it should be handled, but not a library-by-library how-to guide, since the latter will get out of date and is better hosted by an organisation that's particularly in that business, e.g. OWASP. I think you should get it hosted by a reputable authority and then link it from our qhelp using phrasing like "...for specific advice about how to secure a particular library, see..."

@JLLeitschuh
Copy link
Contributor Author

I think the goal is that qhelp should contain a general example of the class of vulnerability and how it should be handled, but not a library-by-library how-to guide, since the latter will get out of date and is better hosted by an organization that's particularly in that business, e.g. OWASP.

This is exactly what CodeQL already does. As QL query authors, you have to encode in the query what "safe" means, otherwise the query will issue false positives all over the place. The problem here, is that that knowledge that is encoded in the query, is not captured in the documentation associated with the query. This seems wrong.

As an end user, why am I getting this alert? If you have two uses of some chunk of code in your codebase, why is one getting flagged, and the other isn't? Most likely, this is because of intentional filtering encoded into the ql query. Even as someone who writes ql, it can be quite complicated to figure out why one chunk of code is/isn't being flagged.

I think you should get it hosted by a reputable authority

GitHub has made itself this reputable authority by flagging code as vulnerable. It is the responsibility of the query authors to be able to explain, both in CodeQL, and in the documentation that query flags, why the vulnerability was flagged. CodeQL (either through the documentation or through the query results) should be able to communicate this per-library.

TL;DR: GitHub and CodeQL is already encoding this information in the query, even if it is a little out of date. If CodeQL has the responsibility of flagging a vulnerability, the documentation should be able to adequately describe why it is flagging it as vulnerable.

@JLLeitschuh
Copy link
Contributor Author

the documentation should be able to adequately describe why it is flagging it as vulnerable.

If you look at the queries, this is often because the user has/hasn't done something to make their use case vulnerable. That information about what they need to do/haven't done is essential to actually fixing the vulnerability. Again, CodeQL is capturing this in the query, but isn't articulating this to the end user.

To provide a concrete example:

exists(Method m | m = ma.getMethod() |
m instanceof ObjectInputStreamReadObjectMethod and
sink = ma.getQualifier() and
not exists(DataFlow::ExprNode node |
node.getExpr() = sink and
node.getTypeBound()
.(RefType)
.hasQualifiedName("org.apache.commons.io.serialization", "ValidatingObjectInputStream")
)
or

CodeQL has already encoded in the query that this is a safe way to do object deserialization, but the documentation doesn't reflect this.

Another example:

ma.getMethod() instanceof ObjectMapperReadMethod and
sink = ma.getArgument(0) and
(
exists(UnsafeTypeConfig config | config.hasFlowToExpr(ma.getAnArgument()))
or
exists(EnableJacksonDefaultTypingConfig config | config.hasFlowToExpr(ma.getQualifier()))
or
hasArgumentWithUnsafeJacksonAnnotation(ma)
) and

CodeQl has included in the query a search for the code that makes Jackson vulnerable, but this also doesn't get captured anywhere in the documentation.

I get that it's difficult to keep documentation and code in sync. If that's truly the problem then, maybe the why should be more explicitly communicated in the query results. But it should be somewhere.

Without this why end users are far more likely to simply consider the result a false positive. They can't be expected to do extensive research on every vulnerability. As an example, I've spent the past week learning about deserialization vulnerabilities and the various gadget chains that exist.

If I were able to wave a magic wand, I would completely re-write this query so that every framework is it's own query result and each framework is documented independently. As an end-user this would be the most clear because the documentation would be more narrowly focused on my particular use case.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 19, 2022

QHelp previews:

java/ql/src/Security/CWE/CWE-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data

Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary serializable objects is easily exploitable and in many cases allows an attacker to execute arbitrary code. Even before a deserialized object is returned to the caller of a deserialization method a lot of code may have been executed, including static initializers, constructors, and finalizers. Automatic deserialization of fields means that an attacker may craft a nested combination of objects on which the executed initialization code may have unforeseen effects, such as the execution of arbitrary code.

There are many different serialization frameworks. This query currently supports Kryo, XmlDecoder, XStream, SnakeYaml, JYaml, JsonIO, YAMLBeans, HessianBurlap, Castor, Burlap, Jackson, Jabsorb, Jodd JSON, Flexjson, Gson and Java IO serialization through ObjectInputStream/ObjectOutputStream.

Recommendation

Avoid deserialization of untrusted data if at all possible. If the architecture permits it then use other formats instead of serialized objects, for example JSON or XML. However, these formats should not be deserialized into complex objects because this provides further opportunities for attack. For example, XML-based deserialization attacks are possible through libraries such as XStream and XmlDecoder.

Alternatively, a tightly controlled whitelist can limit the vulnerability of code, but be aware of the existence of so-called Bypass Gadgets, which can circumvent such protection measures.

Recommendations specific to particular frameworks supported by this query:

FastJson - com.alibaba:fastjson

  • Secure by Default: Partially
  • Recommendation: Call com.alibaba.fastjson.parser.ParserConfig#setSafeMode with the argument true before deserializing untrusted data.

FasterXML - com.fasterxml.jackson.core:jackson-databind

  • Secure by Default: Yes
  • Recommendation: Don't call com.fasterxml.jackson.databind.ObjectMapper#enableDefaultTyping and don't annotate any object fields with com.fasterxml.jackson.annotation.JsonTypeInfo passing either the CLASS or MINIMAL_CLASS values to the annotation. Read this guide.

Kryo - com.esotericsoftware:kryo and com.esotericsoftware:kryo5

  • Secure by Default: Yes for com.esotericsoftware:kryo5 and for com.esotericsoftware:kryo >= v5.0.0
  • Recommendation: Don't call com.esotericsoftware.kryo(5).Kryo#setRegistrationRequired with the argument false on any Kryo instance that may deserialize untrusted data.

ObjectInputStream - Java Standard Library

  • Secure by Default: No
  • Recommendation: Use a validating input stream, such as org.apache.commons.io.serialization.ValidatingObjectInputStream.

SnakeYAML - org.yaml:snakeyaml

  • Secure by Default: No
  • Recommendation: Pass an instance of org.yaml.snakeyaml.constructor.SafeConstructor to org.yaml.snakeyaml.Yaml's constructor before using it to deserialize untrusted data.

XML Decoder - Standard Java Library

  • Secure by Default: No
  • Recommendation: Do not use with untrusted user input.

Example

The following example calls readObject directly on an ObjectInputStream that is constructed from untrusted data, and is therefore inherently unsafe.

public MyObject {
  public int field;
  MyObject(int field) {
    this.field = field;
  }
}

public MyObject deserialize(Socket sock) {
  try(ObjectInputStream in = new ObjectInputStream(sock.getInputStream())) {
    return (MyObject)in.readObject(); // unsafe
  }
}

Rewriting the communication protocol to only rely on reading primitive types from the input stream removes the vulnerability.

public MyObject deserialize(Socket sock) {
  try(DataInputStream in = new DataInputStream(sock.getInputStream())) {
    return new MyObject(in.readInt());
  }
}

References

@atorralba
Copy link
Contributor

I made some minor changes so that the query help file renders correctly, I hope you don't mind @JLLeitschuh. That should help with the review to determine whether to accept this.

@coadaflorin
Copy link
Contributor

@JLLeitschuh thanks for gathering this data and creating this PR. The team and I discussed and we agree that this is helpful information for developers to have when trying to understand what the problem is and how to fix it. At the moment the way in which the table is formatted is creating horizontal scrolling and hiding some essential information from the user. Is it possible to try a format that requires less space horizontally?

We'll also plan some adjustments to avoid showing findings for libraries which are secure by default that developers will report as False Positives.

@JLLeitschuh
Copy link
Contributor Author

JLLeitschuh commented Dec 24, 2022

Hi All,

I'm on vaacation until the 28th so won't have a response/fix until then. Do you have any suggestions for alternative formatting? I'm open to anything. Unfortunately, AFAIK, there is still no way for external contributors to preview QHelp files in their rendered format. If this is now possible, please let me know and I'm happy to do some iterating to see if I can come up with something

@coadaflorin
Copy link
Contributor

There is a bot response that shows how the message would look like:
#11700 (comment)

Clicking the twisty should reveal the rendered response. I'll reach out to the internal documentation team and see if they have a suggestion for us.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style suggestions, independent of the question of table formatting

<td>Yes</td>
<td>
Don't call <code>com.fasterxml.jackson.databind.ObjectMapper#enableDefaultTyping</code> and don't annotate any object fields with <code>com.fasterxml.jackson.annotation.JsonTypeInfo</code> passing either the <code>CLASS</code> or <code>MINIMAL_CLASS</code> values to the annotation.
Read <a href="https://cowtowncoder.medium.com/jackson-2-10-safe-default-typing-2d018f0ce2ba">this guide</a>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we find a more authoritative source than a Medium post?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cowtowncoder is the primary maintainer of FasterJackson

@JLLeitschuh
Copy link
Contributor Author

Update: SnakeYaml is actually fixing the library so it's not insecure-by-default. 🎉

https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in

@JLLeitschuh
Copy link
Contributor Author

JLLeitschuh commented Jan 5, 2023

@coadaflorin the links in the table don't seem to be showing up in the commented rendere. Is that an issue with the markdown generator?

Also, is that comment live updating as the PR gets updated, or does someone from GitHub need to have it get regenerated?

@coadaflorin
Copy link
Contributor

coadaflorin commented Jan 6, 2023

I wrote a sample here with how we could turn that table into a list. I created a branch here. If you like this format, let's use it here.

As for the generated view, it's automatically by an action generated when a pull request is triggered against a .qhelp file. I don't think the action is triggered when we make changes. I generated a markdown for the file I wrote manually using the CLI. Instruction on how to do that here

Here's how the Recommendation would look like as a list versus a table:

As a list

Recommendation

Avoid deserialization of untrusted data if at all possible. If the architecture permits it then use other formats instead of serialized objects, for example JSON or XML. However, these formats should not be deserialized into complex objects because this provides further opportunities for attack. For example, XML-based deserialization attacks are possible through libraries such as XStream and XmlDecoder.

Alternatively, a tightly controlled whitelist can limit the vulnerability of code, but be aware of the existence of so-called Bypass Gadgets, which can circumvent such protection measures.

Recommendations specific to particular frameworks supported by this query:

FastJson - com.alibaba:fastjson

  • Secure by Default: Partially
  • Recommendation: Call com.alibaba.fastjson.parser.ParserConfig#setSafeMode with the argument true before deserializing untrusted data.

FasterXML - com.fasterxml.jackson.core:jackson-databind

  • Secure by Default: Yes
  • Recommendation: Don't call com.fasterxml.jackson.databind.ObjectMapper#enableDefaultTyping and don't annotate any object fields with com.fasterxml.jackson.annotation.JsonTypeInfo passing either the CLASS or MINIMAL_CLASS values to the annotation. Read this guide.

Kryo - com.esotericsoftware:kryo and com.esotericsoftware:kryo5

  • Secure by Default: Yes for com.esotericsoftware:kryo5 and for com.esotericsoftware:kryo >= v5.0.0
  • Recommendation: Don't call com.esotericsoftware.kryo(5).Kryo#setRegistrationRequired with the argument false on any Kryo instance that may deserialize untrusted data.

ObjectInputStream - Java Standard Library

  • Secure by Default: No
  • Recommendation: Use a validating input stream, such as org.apache.commons.io.serialization.ValidatingObjectInputStream.

SnakeYAML - org.yaml:snakeyaml

  • Secure by Default: No
  • Recommendation: Pass an instance of org.yaml.snakeyaml.constructor.SafeConstructor to org.yaml.snakeyaml.Yaml's constructor before using it to deserialize untrusted data.

XML Decoder - Standard Java Library

  • Secure by Defauly: No
  • Recommendation: Do not use with untrusted user input.
As a table

Recommendation

Avoid deserialization of untrusted data if at all possible. If the architecture permits it then use other formats instead of serialized objects, for example JSON or XML. However, these formats should not be deserialized into complex objects because this provides further opportunities for attack. For example, XML-based deserialization attacks are possible through libraries such as XStream and XmlDecoder.

Alternatively, a tightly controlled whitelist can limit the vulnerability of code, but be aware of the existence of so-called Bypass Gadgets, which can circumvent such protection measures.

Recommendations specific to particular frameworks supported by this query:

Project Maven Coordinates Secure by Default Recommendation
XMLDecoder Java Standard Library No Do not use with untrusted user input.
ObjectInputStream Java Standard Library No Use a validating input stream, such as org.apache.commons.io.serialization.ValidatingObjectInputStream.
FastJson com.alibaba:fastjson Partially Call com.alibaba.fastjson.parser.ParserConfig\#setSafeMode with the argument true before deserializing untrusted data.
SnakeYAML org.yaml:snakeyaml No (maintainer response) Pass an instance of org.yaml.snakeyaml.constructor.SafeConstructor to org.yaml.snakeyaml.Yaml's constructor before using it to deserialize untrusted data.
FasterXML jackson-databind com.fasterxml.jackson.core:jackson-databind Yes Don't call com.fasterxml.jackson.databind.ObjectMapper\#enableDefaultTyping and don't annotate any object fields with @JsonTypeInfo(CLASS) or @JsonTypeInfo(MINIMAL_CLASS) if untrusted data may be deserialized. Read this guide.
Kryo com.esotericsoftware:kryo and com.esotericsoftware:kryo5 com.esotericsoftware:kryo >= 5.0.0 and com.esotericsoftware:kryo5 Yes Don't call com.esotericsoftware.kryo(5).Kryo\#setRegistrationRequired with the argument false on any Kryo instance that may deserialize untrusted data.

@JLLeitschuh
Copy link
Contributor Author

I think I'm inclined to go with the list format you proposed. Do you want to create a PR against my branch and I'll merge it into this PR?

@coadaflorin
Copy link
Contributor

coadaflorin commented Jan 9, 2023

I think I managed to create the correct PR here
Hope that helps :)

@JLLeitschuh
Copy link
Contributor Author

It looks like your PR has conflicts as-is. Can you resolve those? After that, I'm happy to merge it

@coadaflorin
Copy link
Contributor

Looks like the list is in, I'm all good on my side.

@JLLeitschuh JLLeitschuh force-pushed the doc/JLL/improve-java-unsafe-deserialization-documentation branch from ce816c2 to 4c1c12d Compare January 10, 2023 16:18
atorralba
atorralba previously approved these changes Jan 12, 2023
@atorralba atorralba added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Jan 12, 2023
@smowton smowton merged commit 8aa2c23 into github:main Jan 12, 2023
@JLLeitschuh JLLeitschuh deleted the doc/JLL/improve-java-unsafe-deserialization-documentation branch January 17, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Java ready-for-doc-review This PR requires and is ready for review from the GitHub docs team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants