Skip to content

Conversation

@sravanigadey
Copy link

@sravanigadey sravanigadey commented May 31, 2022

Description of PR

Merging audit log files containing huge number of audit logs collected from a job like Hive or Spark job containing various S3 requests like list, head, get and put requests.

How was this patch tested?

Region : AP-South-1
Command used : mvn clean verify -Dparallel-tests -DtestsThreadCount=4 -Dscale
Getting these two errors while testing

[ERROR] testSTS(org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials)  Time elapsed: 7.931 s  <<< ERROR!
com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.
	at com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:462)
	at com.amazonaws.client.builder.AwsClientBuilder.configureMutableProperties(AwsClientBuilder.java:424)
	at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
	at org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials.testSTS(ITestS3ATemporaryCredentials.java:130)
[ERROR] testDTUtilShell(org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem)  Time elapsed: 1.861 s  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<1>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem.dtutil(ITestSessionDelegationInFileystem.java:739)
	at org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem.testDTUtilShell(ITestSessionDelegationInFileystem.java:750)

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@sravanigadey
Copy link
Author

When I run the test in trunk using same command as above i.e, 'mvn clean verify -Dparallel-tests -DtestsThreadCount=4 -Dscale' command, then also I got errors which are almost similar to the errors that I got when I run test in HADOOP-18258 branch.

[ERROR] testUnbufferBeforeRead(org.apache.hadoop.fs.contract.s3a.ITestS3AContractUnbuffer)  Time elapsed: 3.623 s  <<< FAILURE!
java.lang.AssertionError: failed to read expected number of bytes from stream. This may be transient expected:<1024> but was:<93>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.apache.hadoop.fs.contract.AbstractContractUnbufferTest.validateFileContents(AbstractContractUnbufferTest.java:139)
	at org.apache.hadoop.fs.contract.AbstractContractUnbufferTest.validateFullFileContents(AbstractContractUnbufferTest.java:132)
	at org.apache.hadoop.fs.contract.AbstractContractUnbufferTest.testUnbufferBeforeRead(AbstractContractUnbufferTest.java:63)
[ERROR] testSTS(org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials)  Time elapsed: 12.976 s  <<< ERROR!
com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.
	at com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:462)
	at com.amazonaws.client.builder.AwsClientBuilder.configureMutableProperties(AwsClientBuilder.java:424)
	at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
	at org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials.testSTS(ITestS3ATemporaryCredentials.java:130)
[ERROR] testDTUtilShell(org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem)  Time elapsed: 8.354 s  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<1>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem.dtutil(ITestSessionDelegationInFileystem.java:739)
	at org.apache.hadoop.fs.s3a.auth.delegation.ITestSessionDelegationInFileystem.testDTUtilShell(ITestSessionDelegationInFileystem.java:750)

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, need more commenting on the AuditTool, make some of the hot paths log into DEBUG logs.

  • Creation of the Local Directory for the audit logs needs to be unique for the different logging directories provided in the argument.
  • We should be cleaning up the local files after we merge them up? since it'll become a large issue later on with big audit files.
  • There should be some validation that needs to be done for the audit files? For eg, if we provide a directory that doesn't contain any audit files, what should happen?

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

@mehakmeet
Copy link
Contributor

few small nits here

  • You have added "." at the end of every line in javadocs, it should only be at the end of a sentence or end of the javadoc, not each line.
  • Some of the methods in AuditTool have wrong access modifiers, if you're not using a method outside this class just keep them private.
  • I have one doubt about putting the auditTool in the Hadoop shell script. Since, at the moment this is hadoop-aws specific tool, not sure if defining the command in hadoop-common is the right move. Can you check and see where s3guard tool's command is defined in hadoop-aws, that might be a more appropriate place for hadoop-aws specific shell commands.

After these, this LGTM.
CC: @mukund-thakur @steveloughran to review and merge.

@sravanigadey
Copy link
Author

sravanigadey commented Jun 22, 2022

Yeah changed the javadocs and access modifiers.
Yes, s3guard tool's command is defined in hadoop-aws in below shell script.
hadoop-tools/hadoop-aws/src/main/shellprofile.d/hadoop-s3guard.sh
I think we can add auditTool in the above shell script or we can create new shell script.
Which one is better?

@apache apache deleted a comment from hadoop-yetus Jun 22, 2022
@apache apache deleted a comment from hadoop-yetus Jun 22, 2022
@apache apache deleted a comment from hadoop-yetus Jun 22, 2022
@apache apache deleted a comment from hadoop-yetus Jun 22, 2022
@apache apache deleted a comment from hadoop-yetus Jun 22, 2022
@apache apache deleted a comment from hadoop-yetus Jun 22, 2022
@apache apache deleted a comment from hadoop-yetus Jun 22, 2022
@apache apache deleted a comment from hadoop-yetus Jun 22, 2022
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

  1. make a subcommand of s3guard
  2. needs a test of the tool itself
  3. that transformation to parquet should be happening during the merge...this is going to break all the test assertions.

i don't think there is any need to download locally before the merge...you can just read the remote files line by line and convert and then save

this is essentially a very small map phase in a mapreduce job

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • best to read in say 2MB byte array and save to output stream passed in
  • you can use readFully, but this doesn't actually handle that final block of data.

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. Added some comments.
One thing to keep in mind while writing code for this is:
Not write only for S3A, think what if we have to add similar stuff for Azure, so what all methods and classes can be reused in future and add unit tests for those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge files should return true/false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is going to cause OOM for bigger files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if we are first copying all the files from S3 to local fs, we can try to parallelize this as well.

@sravanigadey
Copy link
Author

pushed the changes using -f to resolve conflicts

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 64m 35s trunk passed
+1 💚 compile 1m 0s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 0m 55s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 52s trunk passed
+1 💚 mvnsite 1m 1s trunk passed
+1 💚 javadoc 0m 48s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 50s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 34s trunk passed
+1 💚 shadedclient 20m 41s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 21m 8s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 39s the patch passed
+1 💚 compile 0m 44s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 0m 44s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 38s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 27s the patch passed
+1 💚 mvnsite 0m 42s the patch passed
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
-1 ❌ javadoc 0m 32s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.
-1 ❌ spotbugs 1m 17s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 20m 15s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 41s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 51s The patch does not generate ASF License warnings.
123m 32s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
A known null value is checked to see if it is an instance of org.apache.avro.util.Utf8 in org.apache.hadoop.fs.s3a.audit.AvroDataRecord.customDecode(ResolvingDecoder) At AvroDataRecord.java:checked to see if it is an instance of org.apache.avro.util.Utf8 in org.apache.hadoop.fs.s3a.audit.AvroDataRecord.customDecode(ResolvingDecoder) At AvroDataRecord.java:[line 2392]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/12/artifact/out/Dockerfile
GITHUB PR #4383
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname Linux 334077fb170b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c622c55
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/12/testReport/
Max. process+thread count 553 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/12/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 14s trunk passed
+1 💚 compile 0m 54s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 0m 49s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 52s trunk passed
+1 💚 mvnsite 1m 1s trunk passed
+1 💚 javadoc 0m 48s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 51s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 34s trunk passed
+1 💚 shadedclient 20m 45s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 21m 12s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 40s the patch passed
+1 💚 compile 0m 44s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 0m 44s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 37s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 27s the patch passed
+1 💚 mvnsite 0m 42s the patch passed
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 31s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
-1 ❌ spotbugs 1m 18s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 20m 33s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 44s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 51s The patch does not generate ASF License warnings.
97m 54s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
A known null value is checked to see if it is an instance of org.apache.avro.util.Utf8 in org.apache.hadoop.fs.s3a.audit.AvroDataRecord.customDecode(ResolvingDecoder) At AvroDataRecord.java:checked to see if it is an instance of org.apache.avro.util.Utf8 in org.apache.hadoop.fs.s3a.audit.AvroDataRecord.customDecode(ResolvingDecoder) At AvroDataRecord.java:[line 2392]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/13/artifact/out/Dockerfile
GITHUB PR #4383
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname Linux e99f7c8dfa22 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d3865c2
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/13/testReport/
Max. process+thread count 723 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/13/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Jul 13, 2022
@apache apache deleted a comment from hadoop-yetus Jul 13, 2022
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 27s trunk passed
+1 💚 compile 1m 2s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 0m 54s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 51s trunk passed
+1 💚 mvnsite 1m 1s trunk passed
+1 💚 javadoc 0m 48s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 50s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 34s trunk passed
+1 💚 shadedclient 21m 12s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 21m 39s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 40s the patch passed
+1 💚 compile 0m 44s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 0m 44s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 37s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 25s the patch passed
+1 💚 mvnsite 0m 43s the patch passed
+1 💚 javadoc 0m 24s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 33s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
-1 ❌ spotbugs 1m 16s /new-spotbugs-hadoop-tools_hadoop-aws.html hadoop-tools/hadoop-aws generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 shadedclient 20m 33s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 41s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 52s The patch does not generate ASF License warnings.
98m 13s
Reason Tests
SpotBugs module:hadoop-tools/hadoop-aws
A known null value is checked to see if it is an instance of org.apache.avro.util.Utf8 in org.apache.hadoop.fs.s3a.audit.AvroDataRecord.customDecode(ResolvingDecoder) At AvroDataRecord.java:checked to see if it is an instance of org.apache.avro.util.Utf8 in org.apache.hadoop.fs.s3a.audit.AvroDataRecord.customDecode(ResolvingDecoder) At AvroDataRecord.java:[line 2392]
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/14/artifact/out/Dockerfile
GITHUB PR #4383
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname Linux 8cfd6f5a69ba 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / effb27b
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/14/testReport/
Max. process+thread count 559 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/14/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

this isn't a bug, just inefficient, as the instanceof is always false if the argument is null.

because this is generated code, it's not something which can be fixed. so instead we tell it to ignore this possible bug NP_NULL_INSTANCEOF

  1. edit the file hadoop-tools/hadoop-aws/dev-support/findbugs-exclude.xml
  2. under the bottom match entry, add an XML element to tell spotbugs to be quiet
  <Match>
    <Class name="org.apache.hadoop.fs.s3a.audit.AvroDataRecord"/>
    <Bug pattern="NP_NULL_INSTANCEOF"/>
  </Match>

you can check the module without having to wait for yetus to do it.

mvn spotbugs:spotbugs

@sravanigadey
Copy link
Author

Yeah okay then will ignore auto-generated class by adding suggested XML element in hadoop-tools/hadoop-aws/dev-support/findbugs-exclude.xml.
Thanks.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 31s trunk passed
+1 💚 compile 0m 52s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 0m 45s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 43s trunk passed
+1 💚 mvnsite 0m 53s trunk passed
+1 💚 javadoc 0m 40s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 40s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 27s trunk passed
+1 💚 shadedclient 24m 6s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 24m 29s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 37s the patch passed
+1 💚 compile 0m 41s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 0m 41s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 35s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 23s the patch passed
+1 💚 mvnsite 0m 40s the patch passed
+1 💚 javadoc 0m 20s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 0m 28s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 13s the patch passed
+1 💚 shadedclient 23m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 39s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
105m 33s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/15/artifact/out/Dockerfile
GITHUB PR #4383
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 4aebf732dd35 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 258532e
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/15/testReport/
Max. process+thread count 531 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/15/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. some minor changes requested, but otherwise its ready to go in

int fileLength = (int) fileStatus.getLen();
byte[] byteBuffer = new byte[fileLength];
try (FSDataInputStream fsDataInputStream =
fileSystem.open(fileStatus.getPath())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the new openFile builder; there are examples such as
org.apache.hadoop.fs.AvroFSInput#AvroFSInput(org.apache.hadoop.fs.FileContext, org.apache.hadoop.fs.Path) to show the basic strategy.

if you an add the filestatus in .withFileStatus() we can save a HEAD request opening files on s3 and abfs, for faster reads

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified as suggested

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 47s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 48m 26s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 33s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 26s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 29s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 13s trunk passed
+1 💚 shadedclient 26m 22s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 26m 38s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 44s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 35s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 15s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 11s the patch passed
+1 💚 shadedclient 26m 37s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 30s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
115m 28s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/16/artifact/out/Dockerfile
GITHUB PR #4383
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 4a270bfa10c8 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3ab4e4f
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/16/testReport/
Max. process+thread count 542 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/16/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 19s trunk passed
+1 💚 compile 0m 45s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 44s trunk passed
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 17s trunk passed
+1 💚 shadedclient 23m 59s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 24m 18s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 46s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 31s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 20s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 17s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 23m 33s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 38s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
106m 38s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/17/artifact/out/Dockerfile
GITHUB PR #4383
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 25ce51b2b1c6 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0864c1a
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/17/testReport/
Max. process+thread count 576 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/17/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 43m 28s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 0m 44s trunk passed
+1 💚 javadoc 0m 32s trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 17s trunk passed
+1 💚 shadedclient 23m 37s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 23m 56s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 47s the patch passed
+1 💚 compile 0m 36s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 0m 36s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 0m 31s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 35s the patch passed
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 23m 28s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 39s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
105m 37s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/18/artifact/out/Dockerfile
GITHUB PR #4383
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux cc4f13fbc226 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b21304c
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/18/testReport/
Max. process+thread count 568 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4383/18/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Nov 11, 2025
@github-actions github-actions bot closed this Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants