Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when reading from Azure Data Lake Gen2 with delta format #195

Closed
morazow opened this issue Mar 17, 2022 · 1 comment · Fixed by #204
Closed

Bug when reading from Azure Data Lake Gen2 with delta format #195

morazow opened this issue Mar 17, 2022 · 1 comment · Fixed by #204
Labels
bug Unwanted / harmful behavior

Comments

@morazow
Copy link
Contributor

morazow commented Mar 17, 2022

Situation

We get the following error when reading from Azure Data Lake Gen2 storage using delta format.

VM error: F-UDF-CL-LIB-1127: F-UDF-CL-SL-JAVA-1002: F-UDF-CL-SL-JAVA-1013:
com.exasol.ExaUDFException: F-UDF-CL-SL-JAVA-1080: Exception during run
com.google.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2053)
com.google.common.cache.LocalCache.get(LocalCache.java:3966)
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4863)
org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:562)
org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:453)
com.exasol.cloudetl.bucket.Bucket.getPathsFromDeltaLog(Bucket.scala:85)
com.exasol.cloudetl.bucket.Bucket.getPaths(Bucket.scala:78)
com.exasol.cloudetl.emitter.FilesMetadataEmitter.<init>(FilesMetadataEmitter.scala:27)
com.exasol.cloudetl.scriptclasses.FilesMetadataReader$.run(FilesMetadataReader.scala:31)
com.exasol.cloudetl.scriptclasses.FilesMetadataReader.run(FilesMetadataReader.scala...

It is because of the excluded org.codehaus.jackson:jackson-mapper-asl:1.9.13 dependency. An the replacement com.fasterxml.jackson.core:jackson-databind:2.13.1 is not used.

Acceptance Criteria

  • Find and replace the vulnerable jackson-mapper dependency
  • Check if there is azure storage for local (similar to localstack) so that we can include an integration test
@morazow morazow added the bug Unwanted / harmful behavior label Mar 17, 2022
@morazow
Copy link
Contributor Author

morazow commented Apr 6, 2022

I have looked into this further.

The issue is with Hadoop Azure library that depends on the old jackson dependency.

172.21.0.2:54518> Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.parseListFilesResponse(AbfsHttpOperation.java:528)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:391)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:290)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:217)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:191)
172.21.0.2:54518> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:189)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:302)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1054)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1024)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1006)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:490)
172.21.0.2:54518> org.apache.spark.sql.delta.storage.HadoopFileSystemLogStore.listFrom(HadoopFileSystemLogStore.scala:83)
172.21.0.2:54518> org.apache.spark.sql.delta.SnapshotManagement.listFrom(SnapshotManagement.scala:62)
172.21.0.2:54518> org.apache.spark.sql.delta.SnapshotManagement.listFrom$(SnapshotManagement.scala:61)
172.21.0.2:54518> org.apache.spark.sql.delta.DeltaLog.listFrom(DeltaLog.scala:62)

There is an effort to replace older Jackson versions HADOOP-16908 (corresponding pull request PR 3789). But this will be included in the next 3.4.0 version.

For now, we are going to include org.codehaus.jackson:jackson-mapper-asl:1.9.13 and suppress vulnerabilities.


Import query to reproduce above exception:

IMPORT INTO TEST.TEST1
FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
  BUCKET_PATH     = 'abfss://container@storageaccount.dfs.core.windows.net/2m5/*'
  DATA_FORMAT     = 'DELTA'
  CONNECTION_NAME = 'AZURE_ABFS_CONNECTION'
  TRUNCATE_STRING = 'true'
  PARALLELISM     = 'nproc()*2';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unwanted / harmful behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant