-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Azure Data Lake Gen2 connector for PinotFS #5116
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add tests using Mockito? If not, we can add tests for utility functions such as convertAzureStylePathToUriStylePath
.
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Outdated
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Outdated
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Outdated
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Outdated
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Outdated
Show resolved
Hide resolved
...ile-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/AzureGen2PinotFS.java
Outdated
Show resolved
Hide resolved
6112a99
to
4d2a8ff
Compare
Codecov Report
@@ Coverage Diff @@
## master #5116 +/- ##
============================================
- Coverage 58.32% 57.55% -0.77%
Complexity 12 12
============================================
Files 1209 1184 -25
Lines 64541 62424 -2117
Branches 9484 9143 -341
============================================
- Hits 37643 35929 -1714
+ Misses 24143 23847 -296
+ Partials 2755 2648 -107
Continue to review full report at Codecov.
|
md5 computation time benchmark (on macbook with ssd): (md5 compute/ total time) <- total time = IO(read contents from file) + md5 hash computation So, |
1. Testing have been done by attaching ADLS Gen2 to the local deployment. 2. move() is implemented by copy & delete because of azure sdk issue with rename() API. Azure/azure-sdk-for-java#8761
_blobServiceClient = | ||
new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient(); | ||
_fileSystemClient = serviceClient.getFileSystemClient(fileSystemName); | ||
LOGGER.error("AzureGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this an error log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. I was debugging and forgot to turn it back.
if (e.getStatusCode() == ALREADY_EXISTS_STATUS_CODE && e.getErrorCode().equals(PATH_ALREADY_EXISTS_ERROR_CODE)) { | ||
return true; | ||
} | ||
LOGGER.error("Exception thrown while calling mkdir (uri = {})", uri, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to print the error status code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm including e
, which is exception object to the log. This should include the status code information as part of the exception stack. Do you think it's better to add status code explicitly along with uri
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether the status code is included in the exception, but it'd be good to show it in the log. :)
mkdir(newDst); | ||
} else { | ||
// If src is a file, we need to copy. | ||
copySucceeded |= copySrcToDst(currentSrc, newDst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if part of the files failed? It will still return true, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch :) I updated. I also made the change to GcsPinotFS
, which had the similar issue.
URI parentUri = Paths.get(dstUri).getParent().toUri(); | ||
mkdir(parentUri); | ||
try { | ||
Path parentPath = Paths.get(dstUri.getPath()).getParent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for testing the case when the scheme doesn't match then throw the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a test to cover this case to LocalPinotFSTest
/** | ||
* Azure Data Lake Storage Gen2 implementation for the PinotFS interface. | ||
*/ | ||
public class AzureGen2PinotFS extends PinotFS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any test for this class? We've noticed the code coverage is only 50% though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a good way to test. I've been testing this by hooking this up to the live ADLS Gen2. One way is to mock every single Azure SDK API that i'm calling using Mockhito but this doesn't really check much.
Another potential approach is to create the integration test by incorporating Azurite https://github.com/Azure/Azurite, which is Azure storage service emulator. But, this doesn't support Azure Datalake Gen2.
By the way, I did verify all the functions by hooking up the live Data Lake Gen2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address comments from @jackjlli before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments! In terms of the tests, it'd still be good to have it. Even if the code works fine today, somebody else may change it in the future. And it'd be hard for another guy to test his change without following the tests you've done. You can add them later or just add a todo there. You decide. Thanks!
@jackjlli Thanks for the comment. I added |
[BUG] srcUri is decoded and not encoded back for rename API (ADLS Gen2) Azure/azure-sdk-for-java#8761