-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEZ-4514: Reduce Some FileSystem Calls. #309
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
FileSystem fs = p.getFileSystem(conf); | ||
p = fs.resolvePath(p.makeQualified(fs.getUri(), fs.getWorkingDirectory())); | ||
FileSystem targetFS = p.getFileSystem(conf); | ||
return targetFS.listFiles(p, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I can understand, this single listFiles call can be used instead of a "directory or file" check, making this method simpler, looks good
@@ -233,15 +230,11 @@ private static boolean addLocalResources(Configuration conf, | |||
} else { | |||
type = LocalResourceType.FILE; | |||
} | |||
RemoteIterator<LocatedFileStatus> fileStatuses = getListFilesFileStatus(configUri, conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getListFilesFileStatus receives a "String fileName" param, and here we pass a "configUri", can you unify and use whatever is closer to the truth? also I can see that getListFilesFileStatus creates an URI eventually, we can pass it here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thing, the name is URI, but it isn't a URI object but string, it is extracted from a conf which has name URI, so kept the name old as configURI
try { | ||
fsStatus = fs.getFileStatus(stagingArea); | ||
} catch (FileNotFoundException fnf) { | ||
// Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about returning if
if (fsStatus == null) {
return fs;
}
and having the rest of the method unindented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't return, there is an else block below if fsStatus is null
else {
TezCommonUtils.mkDirForAM(fs, stagingArea);
}
+ ", dagId=" + lastInProgressDAG.toString() | ||
+ ", dagRecoveryFile=" + dagRecoveryFile | ||
+ ", len=" + fileStatus.getLen()); | ||
LOG.info("Trying to recover dag from recovery file, dagId={}, dagRecoveryFile={}", lastInProgressDAG, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed fileStatus.getLen() from the log message, is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it was shooting an RPC just for log and file length, so removed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we about leaving the useful info on DEBUG level, but in that case, we can log the full FileStatus, like
LOG.info("Trying to recover dag from recovery file, dagId={}, dagRecoveryFile={}", lastInProgressDAG,
dagRecoveryFile);
if (LOG.isDebugEnabled()) {
// extra RPC call
FileStatus fileStatus = recoveryFS.getFileStatus(dagRecoveryFile);
LOG.debug("Recovery file details: {}", fileStatus);
}
+ ", path=" + summaryFile.toString() | ||
+ ", len=" + summaryFileStatus.getLen() | ||
+ ", lastModTime=" + summaryFileStatus.getModificationTime()); | ||
if (LOG.isDebugEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we might want to keep this on INFO level
recovery is not a heavily used code path under normal circumstances, and AN extra filesystem call due to getFileStatus is fine, especially if we're deep inside in debugging a non-reproducible recovery issue, where we usually want to see summary file info every time: which we would lose otherwise on DEBUG level, as by default we're on INFO level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
thanks for the patch @ayushtkn , left some comments |
This comment was marked as outdated.
This comment was marked as outdated.
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
No description provided.