Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4514: Reduce Some FileSystem Calls. #309

Merged
merged 2 commits into from
Oct 3, 2023
Merged

Conversation

ayushtkn
Copy link
Member

No description provided.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@ayushtkn ayushtkn changed the title Reduce Some FileSystem Calls. TEZ-4514: Reduce Some FileSystem Calls. Sep 28, 2023
@tez-yetus

This comment was marked as outdated.

FileSystem fs = p.getFileSystem(conf);
p = fs.resolvePath(p.makeQualified(fs.getUri(), fs.getWorkingDirectory()));
FileSystem targetFS = p.getFileSystem(conf);
return targetFS.listFiles(p, false);
Copy link
Contributor

@abstractdog abstractdog Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can understand, this single listFiles call can be used instead of a "directory or file" check, making this method simpler, looks good

@@ -233,15 +230,11 @@ private static boolean addLocalResources(Configuration conf,
} else {
type = LocalResourceType.FILE;
}
RemoteIterator<LocatedFileStatus> fileStatuses = getListFilesFileStatus(configUri, conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getListFilesFileStatus receives a "String fileName" param, and here we pass a "configUri", can you unify and use whatever is closer to the truth? also I can see that getListFilesFileStatus creates an URI eventually, we can pass it here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thing, the name is URI, but it isn't a URI object but string, it is extracted from a conf which has name URI, so kept the name old as configURI

try {
fsStatus = fs.getFileStatus(stagingArea);
} catch (FileNotFoundException fnf) {
// Ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about returning if

if (fsStatus == null) {
  return fs;
}

and having the rest of the method unindented

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't return, there is an else block below if fsStatus is null

else {
      TezCommonUtils.mkDirForAM(fs, stagingArea);
    }

+ ", dagId=" + lastInProgressDAG.toString()
+ ", dagRecoveryFile=" + dagRecoveryFile
+ ", len=" + fileStatus.getLen());
LOG.info("Trying to recover dag from recovery file, dagId={}, dagRecoveryFile={}", lastInProgressDAG,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed fileStatus.getLen() from the log message, is it intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it was shooting an RPC just for log and file length, so removed it

Copy link
Contributor

@abstractdog abstractdog Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we about leaving the useful info on DEBUG level, but in that case, we can log the full FileStatus, like

LOG.info("Trying to recover dag from recovery file, dagId={}, dagRecoveryFile={}", lastInProgressDAG, 
    dagRecoveryFile);
if (LOG.isDebugEnabled()) {
    // extra RPC call
    FileStatus fileStatus = recoveryFS.getFileStatus(dagRecoveryFile);
    LOG.debug("Recovery file details: {}", fileStatus);
}

+ ", path=" + summaryFile.toString()
+ ", len=" + summaryFileStatus.getLen()
+ ", lastModTime=" + summaryFileStatus.getModificationTime());
if (LOG.isDebugEnabled()) {
Copy link
Contributor

@abstractdog abstractdog Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we might want to keep this on INFO level
recovery is not a heavily used code path under normal circumstances, and AN extra filesystem call due to getFileStatus is fine, especially if we're deep inside in debugging a non-reproducible recovery issue, where we usually want to see summary file info every time: which we would lose otherwise on DEBUG level, as by default we're on INFO level

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@abstractdog
Copy link
Contributor

thanks for the patch @ayushtkn , left some comments

@tez-yetus

This comment was marked as outdated.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 14m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 4m 56s Maven dependency ordering for branch
+1 💚 mvninstall 5m 21s master passed
+1 💚 compile 1m 1s master passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 compile 0m 57s master passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 checkstyle 1m 14s master passed
+1 💚 javadoc 1m 1s master passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 48s master passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+0 🆗 spotbugs 0m 53s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 4s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 8s Maven dependency ordering for patch
+1 💚 mvninstall 0m 35s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 38s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 javac 0m 33s the patch passed
+1 💚 checkstyle 0m 9s tez-api: The patch generated 0 new + 21 unchanged - 1 fixed = 21 total (was 22)
+1 💚 checkstyle 0m 18s The patch passed checkstyle in tez-dag
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 26s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 findbugs 1m 26s the patch passed
_ Other Tests _
+1 💚 unit 1m 58s tez-api in the patch passed.
+1 💚 unit 4m 17s tez-dag in the patch passed.
+1 💚 asflicense 0m 21s The patch does not generate ASF License warnings.
44m 28s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-309/5/artifact/out/Dockerfile
GITHUB PR #309
JIRA Issue TEZ-4514
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 1b3b600ca92b 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 7855c1f
Default Java Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-309/5/testReport/
Max. process+thread count 572 (vs. ulimit of 5500)
modules C: tez-api tez-dag U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-309/5/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog abstractdog self-requested a review October 3, 2023 15:36
Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@abstractdog abstractdog merged commit 2ad10b6 into apache:master Oct 3, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants