Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19233 [branch-3.4]: ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over Blob Endpoint #7392

Conversation

bhattmanish98
Copy link
Contributor

Description of PR:


This PR is in correlation to the series of work done under Parent Jira: [HADOOP-19179]
Jira for this Patch: [HADOOP-19233]

Currently, we only support rename and delete operations on the DFS endpoint. The reason for not supporting rename and delete operations on the Blob endpoint is that the Blob endpoint does not account for hierarchy. We need to ensure that the HDFS contracts are maintained when performing rename and delete operations. Renaming or deleting a directory over the Blob endpoint requires the client to handle the orchestration and rename or delete all the blobs within the specified directory.
 
The task outlines the considerations for implementing rename and delete operations for the FNS-blob endpoint to ensure compatibility with HDFS contracts.

  • Blob Endpoint Usage: The task addresses the need for abstraction in the code to maintain HDFS contracts while performing rename and delete operations on the blob endpoint, which does not support hierarchy.
  • Rename Operations: The AzureBlobFileSystem#rename() method will use a RenameHandler instance to handle rename operations, with separate handlers for the DFS and blob endpoints. This method includes prechecks, destination adjustments, and orchestration of directory renaming for blobs.
  • Atomic Rename: Atomic renaming is essential for blob endpoints, as it requires orchestration to copy or delete each blob within the directory. A configuration will allow developers to specify directories for atomic renaming, with a JSON file to track the status of renames.
  • Delete Operations: Delete operations are simpler than renames, requiring fewer HDFS contract checks. For blob endpoints, the client must handle orchestration, including managing orphaned directories created by Az-copy.
  • Orchestration for Rename/Delete: Orchestration for rename and delete operations over blob endpoints involves listing blobs and performing actions on each blob. The process must be optimized to handle large numbers of blobs efficiently.
  • Need for Optimization: Optimization is crucial because the ListBlob API can return a maximum of 5000 blobs at once, necessitating multiple calls for large directories. The task proposes a producer-consumer model to handle blobs in parallel, thereby reducing processing time and memory usage.
  • Producer-Consumer Design: The proposed design includes a producer to list blobs, a queue to store the blobs, and a consumer to process them in parallel. This approach aims to improve efficiency and mitigate memory issues.

… over Blob Endpoint (apache#7265)

Contributed by Manish Bhatt.
Signed off by Anuj Modi, Anmol Asrani
@bhattmanish98 bhattmanish98 marked this pull request as ready for review February 17, 2025 05:54
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 6m 47s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 19 new or modified test files.
_ branch-3.4 Compile Tests _
+1 💚 mvninstall 26m 35s branch-3.4 passed
+1 💚 compile 0m 24s branch-3.4 passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 22s branch-3.4 passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 22s branch-3.4 passed
+1 💚 mvnsite 0m 27s branch-3.4 passed
+1 💚 javadoc 0m 26s branch-3.4 passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 24s branch-3.4 passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 0m 43s branch-3.4 passed
+1 💚 shadedclient 18m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 20s the patch passed
+1 💚 compile 0m 19s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 19s the patch passed
+1 💚 compile 0m 15s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 13s hadoop-tools/hadoop-azure: The patch generated 0 new + 18 unchanged - 3 fixed = 18 total (was 21)
+1 💚 mvnsite 0m 19s the patch passed
+1 💚 javadoc 0m 16s hadoop-tools_hadoop-azure-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 javadoc 0m 18s hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 spotbugs 0m 44s the patch passed
+1 💚 shadedclient 18m 58s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 1s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 24s The patch does not generate ASF License warnings.
80m 25s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7392/1/artifact/out/Dockerfile
GITHUB PR #7392
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux af65e5716a93 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / 3c3c50f
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7392/1/testReport/
Max. process+thread count 552 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7392/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@bhattmanish98
Copy link
Contributor Author

============================================================
HNS-OAuth

[WARNING] Tests run: 161, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 757, Failures: 0, Errors: 0, Skipped: 141
[WARNING] Tests run: 171, Failures: 0, Errors: 0, Skipped: 25
[WARNING] Tests run: 262, Failures: 0, Errors: 0, Skipped: 23

============================================================
HNS-SharedKey

[WARNING] Tests run: 161, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 757, Failures: 0, Errors: 0, Skipped: 93
[WARNING] Tests run: 171, Failures: 0, Errors: 0, Skipped: 25
[WARNING] Tests run: 262, Failures: 0, Errors: 0, Skipped: 10

============================================================
NonHNS-SharedKey

[WARNING] Tests run: 161, Failures: 0, Errors: 0, Skipped: 11
[WARNING] Tests run: 741, Failures: 0, Errors: 0, Skipped: 339
[WARNING] Tests run: 171, Failures: 0, Errors: 0, Skipped: 27
[WARNING] Tests run: 262, Failures: 0, Errors: 0, Skipped: 11

============================================================
AppendBlob-HNS-OAuth

[WARNING] Tests run: 161, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 757, Failures: 0, Errors: 0, Skipped: 146
[WARNING] Tests run: 171, Failures: 0, Errors: 0, Skipped: 49
[WARNING] Tests run: 262, Failures: 0, Errors: 0, Skipped: 23

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 20 new or modified test files.
_ branch-3.4 Compile Tests _
+1 💚 mvninstall 21m 22s branch-3.4 passed
+1 💚 compile 0m 24s branch-3.4 passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 23s branch-3.4 passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 0m 22s branch-3.4 passed
+1 💚 mvnsite 0m 27s branch-3.4 passed
+1 💚 javadoc 0m 27s branch-3.4 passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 23s branch-3.4 passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 0m 46s branch-3.4 passed
+1 💚 shadedclient 18m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 17s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 18s the patch passed
+1 💚 compile 0m 15s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 0m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 12s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 1 new + 18 unchanged - 3 fixed = 19 total (was 21)
+1 💚 mvnsite 0m 18s the patch passed
+1 💚 javadoc 0m 16s hadoop-tools_hadoop-azure-jdkUbuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 javadoc 0m 17s hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06 generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 spotbugs 0m 42s the patch passed
+1 💚 shadedclient 19m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 0s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
68m 52s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7392/2/artifact/out/Dockerfile
GITHUB PR #7392
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 8f33c8236655 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / 1f7b036
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7392/2/testReport/
Max. process+thread count 551 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7392/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@anujmodi2021 anujmodi2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Backport from trunk PR#

@anujmodi2021 anujmodi2021 merged commit 302cf36 into apache:branch-3.4 Feb 17, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants