Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities. #29334

Closed
wants to merge 3 commits into from

Conversation

ScrapCodes
Copy link
Member

@ScrapCodes ScrapCodes commented Aug 3, 2020

What changes were proposed in this pull request?

Update pom, with the latest release of jackson to 2.9.10 and 2.9.10.5. Which have fixes for all the reported CVEs so far - as per github advisories page.

Why are the changes needed?

Currently Apache Spark branch-2.4 depends on jackson release, which is not maintained (as per comment in FasterXML/jackson-databind#2186 (comment)).

There are some CVEs reported on the issue SPARK-32495 , but my understanding is since the release is not a maintained version, the advisory info is not up to date.
And it is also possible, that there are vulnerabilities which apply to higher version and are not yet reported for the un-maintained release.

Does this PR introduce any user-facing change?

Yes, the downstream users of spark who continue to depend on the older jackson release, will have to upgrade too.
If the package is affected with high severity security vulnerability, it might still be a good step.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Aug 3, 2020

Test build #126972 has finished for PR 29334 at commit 64ee209.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

Hi @dongjoon-hyun @srowen @Fokko and @HyukjinKwon, you have been involved with jackson upgrade previously, and would like to have your opinion around this upgrade too.

Basically, 2.6.x is no longer maintained, and since we are planning to continue to release 2.4.x line, it might be good to upgrade jackson to a maintained release too.

Thanks !

@ScrapCodes
Copy link
Member Author

Once the discussion around this is settled, will start working on releasing 2.4.7.

@SparkQA
Copy link

SparkQA commented Aug 3, 2020

Test build #126975 has finished for PR 29334 at commit 8a2abac.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ScrapCodes You can read the thread here on updating the Spark 3.0 branch to the latest version of Jackson: #21596 There were already some issues with None handling in Scala.

With Jackson, it is important to check where the vulnerability lies. Sometimes, it is in code-paths that aren't used by Spark, for example, plugins that are not used. I think it is up to the Spark community to decide on this. But, looking at the behavioral changes mentioned in #21596 I think we should be cautious.

@@ -224,6 +224,10 @@
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required because, jackson-scala-module underwent this change:

FasterXML/jackson-module-scala@d6071ac#diff-fdc3abdfd754eeb24090dbd90aeec2ce

And Spark-core uses scala-reflect library at one place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.spark.deploy.security.HBaseDelegationTokenProvider

@srowen
Copy link
Member

srowen commented Aug 3, 2020

Yeah, the problem is that there are behavior changes. You can see one in the UT failures. It outputs an explicit null for fields that are null. I think it can be disabled, but not sure what else we'd have to fix where to maintain behavior. I recall it wasn't trivial when I looked at it a while ago.

@dongjoon-hyun dongjoon-hyun changed the title [RFC][SPARK-32495] Update jackson versions to a maintained release, to fix various security vulnerabilities. [RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities. Aug 4, 2020
@dongjoon-hyun
Copy link
Member

I also agree with @srowen 's opinion. However, if we have a clean patch passing all UTs, we may reconsider.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft August 4, 2020 05:03
@dongjoon-hyun dongjoon-hyun changed the title [RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities. [WIP][RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities. Aug 4, 2020
@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127034 has finished for PR 29334 at commit 9cd3d9a.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127037 has finished for PR 29334 at commit 9cd3d9a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127039 has finished for PR 29334 at commit 7cff657.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

Test build #127039 has finished for PR 29334 at commit 7cff657.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Failing due to corrupted .m2 directory.

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127049 has finished for PR 29334 at commit 7cff657.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127050 has finished for PR 29334 at commit 7cff657.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

@shaneknapp and @HyukjinKwon I am sorry for calling for your attention, but jenkins is failing with corrupted m2 directory again. I was also wondering, why github could not trigger the UTs, does it think that this is just a build patch?

@ScrapCodes
Copy link
Member Author

@Fokko, @srowen and @dongjoon-hyun Thank for giving me the feedback. I agree, with you guys. But, I wanted to give this patch a try - can it be done in a clean way? This patch is almost the same as one mentioned at, #21596

Also, the behaviour change was due to a bug in jackson 2.7.x. I have tested, it worked fine.

Somehow the Jenkins, does not pass due to corrupted m2 cache. Do you think, that it is worth a try?

@ScrapCodes
Copy link
Member Author

Alright, one more FasterXML/jackson-databind#2798, Shall we consider 2.10.x ? change is the same and the later is free from whole store house of CVEs.

@ScrapCodes ScrapCodes marked this pull request as ready for review August 5, 2020 04:49
@ScrapCodes ScrapCodes changed the title [WIP][RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities. [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities. Aug 5, 2020
@HyukjinKwon
Copy link
Member

@ScrapCodes, yes, the m2 is corrupted in Jenkins machine. In the master, this dependency check is being skipped in Jenkins and GitHub Actions build runs instead.

In other branches, they are not done yet (see SPARK-32249). I was hesitant about porting GitHub actions back yet but maybe we should go and port it back.

For the meanwhile, maybe we can try to remove .m2 in Jenkins machine, @shaneknapp. This solves the problem temporarily IIRC.

@ScrapCodes
Copy link
Member Author

@HyukjinKwon Thanks for looking in to it, and it is my mistake, I did not know that github actions are not ported to other branches yet. I am not 100% sure that they should be ported or not, does it effect our total resource capacity on github actions?

@HyukjinKwon
Copy link
Member

Yes, I think it does. That was one of reasons why I was hesitant. FYI, there was a bit of discussions and updates about resources at SPARK-32264.

Given that the PRs (to branch-2.4 or branch-3.0) are not very frequent, I guess it might be okay ...

@HyukjinKwon
Copy link
Member

For the PR itself, I agree with @srowen's and @dongjoon-hyun comments at #29334 (comment) and #29334 (comment). Sorry for late responses.

@ScrapCodes
Copy link
Member Author

Alrighty, then I will skip this for 2.4.7 release, even though I still feel that this might be safe and good for people in general, provided jackson 2.6.7 had last release in 2017 (ignoring the micro CVE backport release in 2019).

If we do need to revisit this at a later point, we can always take care of it in the next release. Thanks !

Alright, one more FasterXML/jackson-databind#2798, Shall we consider 2.10.x ? change is the same and the later is free from whole store house of CVEs.

We can ignore this, as it does not include the code paths(org.jsecurity and com.pastdev.httpcomponents) spark uses (hopefully )

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 11, 2020

Test build #127337 has finished for PR 29334 at commit 7cff657.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 12, 2020

Test build #127378 has finished for PR 29334 at commit 7cff657.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 13, 2020

Test build #127397 has finished for PR 29334 at commit 7cff657.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 13, 2020

Test build #127398 has finished for PR 29334 at commit 7cff657.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes ScrapCodes closed this Aug 14, 2020
@ScrapCodes ScrapCodes reopened this Aug 14, 2020
@ScrapCodes
Copy link
Member Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 15, 2020

Test build #127476 has finished for PR 29334 at commit 7cff657.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

@HyukjinKwon and @dongjoon-hyun , it seems the test passed, with just the basic changes which are very similar to what was done earlier to migrate to Jackson 2.9.x #21596. Do you want to reconsider your opinion now?

@srowen
Copy link
Member

srowen commented Aug 17, 2020

I'm still uneasy about it because client applications will inherit this dependency change too. Willing to be convinced otherwise but I know this was the root of all the reluctance to update this before 3.0

@ScrapCodes
Copy link
Member Author

Is it difficult to suggest, to upgrade to a maintained version of a library which has otherwise got, so many security vulnerabilities? Even if those security vulnerabilities have no impact, at least one could save himself form swarm of alert generate by so called "security analysing" software.

Currently used version(2.6), has not been released since 2017, except a micro release(in 2019) that fixed only known CVEs at that point.

Lastly, change required to upgrade to latest jackson is just (s/NON_NULL/NON_ABSENT), can we make it simpler for our client application to upgrade.

@srowen
Copy link
Member

srowen commented Aug 18, 2020

We can try shading it. That would remove the classpath issue at the cost of some complexity.
I don't know what the right answer is. I'm not so much looking to solve alerts from static analysis, though I agree that's an issue for some users. I'd rather ask whether the CVE affects Spark's usage and it's not clear. And then weigh that against the potential to change user app behavior.
At this point I'd just encourage people to adopt 3.0 as well, as a fix too.

@Fokko
Copy link
Contributor

Fokko commented Aug 19, 2020

Sorry for being late, kinda busy the last few weeks.

If we decide to shade Jackson, we must be very careful that it isn't part of any public APIs. As mentioned before, not all the alerts generated by Jackson, are vulnerabilities to Spark, since many of them are code paths that aren't used by Spark. I'm mostly with Sean and would like to encourage users to move to 3.0. I've checked and there aren't many follow up PR's that patch issues caused by the Jackson update, so maybe we're making this bigger than it is.

@cowtowncoder
Copy link

cowtowncoder commented Aug 22, 2020

A quick note that might be relevant on question of CVEs affecting Spark.
All dozens of CVEs against jackson-databind for past 2 years or so are related to the same feature, default (polymorphic) typing, something that is NOT enabled by default.

I wrote this a while ago:

https://medium.com/@cowtowncoder/on-jackson-cves-dont-panic-here-is-what-you-need-to-know-54cd0d6e8062

to give full description of these issues.
So unless Spark uses "Default Typing" (enable ALL values to use polymorphic typing), these are not relevant.
But I also understand that due to security tools' true/false settings, it is annoying to end users to get all the warnings even if there is no actual problem.

I will also note that:

  • 2.10.x and later are not affected (CVEs will not be applicable past 2.9.x)
  • I plan on release 2.9.10.6 within next couple of days (by end of August 2020); currently there are 4 reported issues for which CVE id is or will be allocated.

I hope this helps.

@ScrapCodes
Copy link
Member Author

ScrapCodes commented Aug 24, 2020

Thank you @cowtowncoder, @srowen and @Fokko. Indeed, the Security vulnerabilities seems to serve the purpose of generating the false alarm at least in this case, since we are not using those code paths that affected. And, if some client application depends on Spark and uses jackson-databind, they need to deal with security issues on their own.

Best thing to do is upgrade to 3.0, but it is sort of difficult to upgrade for folks who have recently upgraded to Spark 2.4.x . This is also the reason we are still maintaining the release version 2.4.x. Lot of great suggestions have chimed in, shading the jar comes with it's own set of complexity. I am not absolutely sure, but If we cannot upgrade as is, I had suggest we can re-consider this later.

Thanks again everyone for chiming in and providing valuable suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants