[SPARK-26995][K8S] Make ld-linux-x86-64.so.2 visible to snappy native library under /lib in docker image with Alpine Linux #23898

LucaCanali · 2019-02-26T16:29:26Z

Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy.

The issue can be reproduced for example as follows: Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")
The key part of the error stack is as follows SparkException: Task failed while writing rows. .... Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: Noded by /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so)

The source of the error appears to be that libsnappyjava.so needs ld-linux-x86-64.so.2 and looks for it in /lib, while in Alpine Linux 3.9.0 with libc6-compat version 1.1.20-r3 ld-linux-x86-64.so.2 is located in /lib64.
Note: this issue is not present with Alpine Linux 3.8 and libc6-compat version 1.1.19-r10

What changes were proposed in this pull request?

A possible workaround proposed with this PR is to modify the Dockerfile by adding a symbolic link between /lib and /lib64 so that linux-x86-64.so.2 can be found in /lib. This is probably not the cleanest solution, but I have observed that this is what happened/happens already when using Alpine Linux 3.8.1 (a version of Alpine Linux which was not affected by the issue reported here).

How was this patch tested?

Manually tested by running a simple workload with spark-shell, using docker on a client machine and using Spark on a Kubernetes cluster. The test workload is: Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")

Added a test to the KubernetesSuite / BasicTestsSuite

…inux 3.9.0 SPARK-26995

vanzin · 2019-02-26T18:28:33Z

Any easy way to change an existing integration test to use snappy?

vanzin · 2019-02-26T21:09:03Z

(And, BTW, the PR title should explain the fix, not the problem.)

LucaCanali · 2019-02-26T21:19:21Z

Thanks @vanzin for the comments. I have updated the PR title. I'll look at the test for this.

vanzin · 2019-02-26T21:25:21Z

Sorry, but the PR title still does not explain the fix.

e.g. "Make Alpine lib directory visible to snappy native library in docker image."

Or something.

vanzin · 2019-02-26T21:49:41Z

ok to test

SparkQA · 2019-02-26T22:06:36Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8311/

SparkQA · 2019-02-26T22:08:56Z

Test build #102806 has finished for PR 23898 at commit 10b8a1c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-26T22:17:22Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8311/

felixcheung

looks reasonable...

SparkQA · 2019-02-28T20:28:06Z

Test build #102874 has finished for PR 23898 at commit 388dcf5.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-28T20:34:00Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8362/

SparkQA · 2019-02-28T20:45:59Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8362/

SparkQA · 2019-02-28T20:53:33Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8363/

SparkQA · 2019-02-28T20:58:01Z

Test build #102875 has finished for PR 23898 at commit e7241cb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-28T21:04:32Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8363/

vanzin · 2019-03-01T19:52:39Z

examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLReadParquetExample.scala

Can you use the existing SparkSQLExample for this? Seems overkill to add a new example just so you can run it in the k8s tests.

I was hoping there was something simple like setting spark.io.compression.codec=snappy, but I'm not sure SparkPi would actually hit that. (Maybe SparkRemoteFileTest, which is run already, does, though?)

If the above doesn't work, or SparkSQLExample is too slow or doesn't work for this, then I think in this case it's better to skip the test until we can separate examples from k8s integration tests.

Good point, thanks. Setting spark.io.compression.codec=snappy seems to work to test this.

SparkQA · 2019-03-01T20:03:23Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8397/

SparkQA · 2019-03-01T20:08:04Z

Test build #102925 has finished for PR 23898 at commit e7490fe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-01T20:17:02Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8397/

SparkQA · 2019-03-01T21:10:25Z

Test build #102928 has finished for PR 23898 at commit 10b8a1c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-01T21:17:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8400/

SparkQA · 2019-03-01T21:29:43Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8400/

SparkQA · 2019-03-01T21:53:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8403/

SparkQA · 2019-03-01T21:59:43Z

Test build #102932 has finished for PR 23898 at commit a854a97.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-01T22:05:18Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8403/

vanzin · 2019-03-01T22:14:56Z

...ation-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/BasicTestsSuite.scala

      })
  }

+  test("Run SparkPi with spark.io.compression.codec=snappy for SPARK-26995.", k8sTestTag) {


Sorry to be a pain about these things, but could you instead change an existing test?

Integration tests take much longer to run than unit tests, the more we can test with the same invocation, the better.

Sure, no problem.

SparkQA · 2019-03-03T21:09:45Z

Test build #102956 has finished for PR 23898 at commit c626067.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-03T21:13:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8425/

SparkQA · 2019-03-03T21:24:58Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/8425/

vanzin · 2019-03-04T17:58:53Z

Merging to master.

… library under /lib in docker image with Alpine Linux Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy. The issue can be reproduced for example as follows: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")` The key part of the error stack is as follows `SparkException: Task failed while writing rows. .... Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: Noded by /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so)` The source of the error appears to be that libsnappyjava.so needs ld-linux-x86-64.so.2 and looks for it in /lib, while in Alpine Linux 3.9.0 with libc6-compat version 1.1.20-r3 ld-linux-x86-64.so.2 is located in /lib64. Note: this issue is not present with Alpine Linux 3.8 and libc6-compat version 1.1.19-r10 A possible workaround proposed with this PR is to modify the Dockerfile by adding a symbolic link between /lib and /lib64 so that linux-x86-64.so.2 can be found in /lib. This is probably not the cleanest solution, but I have observed that this is what happened/happens already when using Alpine Linux 3.8.1 (a version of Alpine Linux which was not affected by the issue reported here). Manually tested by running a simple workload with spark-shell, using docker on a client machine and using Spark on a Kubernetes cluster. The test workload is: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")` Added a test to the KubernetesSuite / BasicTestsSuite Closes apache#23898 from LucaCanali/dockerfileUpdateSPARK26995. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>

dongjoon-hyun · 2019-07-25T08:49:50Z

Hi, All. I'll backport this to branch-2.4.

…ative library under /lib in docker image with Alpine Linux ## What changes were proposed in this pull request? This is a back port of #23898. Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy. The issue can be reproduced for example as follows: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")` The key part of the error stack is as follows `SparkException: Task failed while writing rows. .... Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: Noded by /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so)` The source of the error appears to be that libsnappyjava.so needs ld-linux-x86-64.so.2 and looks for it in /lib, while in Alpine Linux 3.9.0 with libc6-compat version 1.1.20-r3 ld-linux-x86-64.so.2 is located in /lib64. Note: this issue is not present with Alpine Linux 3.8 and libc6-compat version 1.1.19-r10 A possible workaround proposed with this PR is to modify the Dockerfile by adding a symbolic link between /lib and /lib64 so that linux-x86-64.so.2 can be found in /lib. This is probably not the cleanest solution, but I have observed that this is what happened/happens already when using Alpine Linux 3.8.1 (a version of Alpine Linux which was not affected by the issue reported here). ## How was this patch tested? Manually tested by running a simple workload with spark-shell, using docker on a client machine and using Spark on a Kubernetes cluster. The test workload is: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")` Added a test to the KubernetesSuite / BasicTestsSuite Closes #25255 from dongjoon-hyun/SPARK-26995. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…ative library under /lib in docker image with Alpine Linux ## What changes were proposed in this pull request? This is a back port of apache#23898. Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy. The issue can be reproduced for example as follows: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")` The key part of the error stack is as follows `SparkException: Task failed while writing rows. .... Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so: Error loading shared library ld-linux-x86-64.so.2: Noded by /tmp/snappy-1.1.7-2b4872f1-7c41-4b84-bda1-dbcb8dd0ce4c-libsnappyjava.so)` The source of the error appears to be that libsnappyjava.so needs ld-linux-x86-64.so.2 and looks for it in /lib, while in Alpine Linux 3.9.0 with libc6-compat version 1.1.20-r3 ld-linux-x86-64.so.2 is located in /lib64. Note: this issue is not present with Alpine Linux 3.8 and libc6-compat version 1.1.19-r10 A possible workaround proposed with this PR is to modify the Dockerfile by adding a symbolic link between /lib and /lib64 so that linux-x86-64.so.2 can be found in /lib. This is probably not the cleanest solution, but I have observed that this is what happened/happens already when using Alpine Linux 3.8.1 (a version of Alpine Linux which was not affected by the issue reported here). ## How was this patch tested? Manually tested by running a simple workload with spark-shell, using docker on a client machine and using Spark on a Kubernetes cluster. The test workload is: `Seq(1,2).toDF("id").write.format("parquet").save("DELETEME1")` Added a test to the KubernetesSuite / BasicTestsSuite Closes apache#25255 from dongjoon-hyun/SPARK-26995. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

Workaround for issue when running Spark in Docker image with Alpine L…

10b8a1c

…inux 3.9.0 SPARK-26995

LucaCanali changed the title ~~[SPARK-26995][K8S] Running Spark in Docker image with Alpine Linux 3.9.0 throws errors when using snappy~~ [SPARK-26995][K8S] Fix Dockerfile to work around issue with Alpine Linux 3.9.0 Feb 26, 2019

felixcheung reviewed Feb 27, 2019

View reviewed changes

LucaCanali changed the title ~~[SPARK-26995][K8S] Fix Dockerfile to work around issue with Alpine Linux 3.9.0~~ [SPARK-26995][K8S] Make ld-linux-x86-64.so.2 visible to snappy native library under /lib in docker image with Alpine Linux Feb 27, 2019

vanzin reviewed Mar 1, 2019

View reviewed changes

LucaCanali force-pushed the dockerfileUpdateSPARK26995 branch from e7490fe to 10b8a1c Compare March 1, 2019 20:57

LucaCanali added 2 commits March 1, 2019 22:35

Added test.

d7d8c1f

Cleanup leftover noise.

a854a97

vanzin reviewed Mar 1, 2019

View reviewed changes

Merge test for SPARK-26995 with existing test.

c626067

vanzin closed this in f13ea15 Mar 4, 2019

cgiraldo added a commit to Gradiant/dockerized-spark that referenced this pull request May 14, 2019

fix bug apache/spark#23898

c113837

dongjoon-hyun mentioned this pull request Jul 25, 2019

[SPARK-26995][K8S][2.4] Make ld-linux-x86-64.so.2 visible to snappy native library under /lib in docker image with Alpine Linux #25255

Closed

Yikun mentioned this pull request Jun 22, 2023

Add Apache Spark Docker Official Image docker-library/official-images#13089

Merged

9 tasks

[SPARK-26995][K8S] Make ld-linux-x86-64.so.2 visible to snappy native library under /lib in docker image with Alpine Linux #23898

[SPARK-26995][K8S] Make ld-linux-x86-64.so.2 visible to snappy native library under /lib in docker image with Alpine Linux #23898

Uh oh!

Conversation

LucaCanali commented Feb 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

vanzin commented Feb 26, 2019

Uh oh!

vanzin commented Feb 26, 2019

Uh oh!

LucaCanali commented Feb 26, 2019

Uh oh!

vanzin commented Feb 26, 2019

Uh oh!

vanzin commented Feb 26, 2019

Uh oh!

SparkQA commented Feb 26, 2019

Uh oh!

SparkQA commented Feb 26, 2019

Uh oh!

SparkQA commented Feb 26, 2019

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 28, 2019

Uh oh!

SparkQA commented Feb 28, 2019

Uh oh!

SparkQA commented Feb 28, 2019

Uh oh!

SparkQA commented Feb 28, 2019

Uh oh!

SparkQA commented Feb 28, 2019

Uh oh!

SparkQA commented Feb 28, 2019

Uh oh!

vanzin Mar 1, 2019

Choose a reason for hiding this comment

Uh oh!

LucaCanali Mar 1, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

SparkQA commented Mar 1, 2019

Uh oh!

vanzin Mar 1, 2019

Choose a reason for hiding this comment

Uh oh!

LucaCanali Mar 3, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 3, 2019

Uh oh!

SparkQA commented Mar 3, 2019

Uh oh!

SparkQA commented Mar 3, 2019

Uh oh!

vanzin commented Mar 4, 2019

Uh oh!

dongjoon-hyun commented Jul 25, 2019

Uh oh!

LucaCanali commented Feb 26, 2019 •

edited

Loading