-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42379][SS] Use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists #39936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc. @cloud-fan @viirya |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it's a small change, could you file a JIRA for trace-ability, please, @HeartSaVioR ?
|
Done. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I'd argue that the test case itself is incorrect. It's heavily bound to the internal implementation - it relies on custom filesystem instance which returns true for all exists() calls, but it does not apply to the getFileStatus() hence it will end up with "file does not exist". That said, test only passes if filesystem behaves incorrectly. If we change the custom filesystem instance to also provide dummy status for all getFileStatus() calls, the test case will fail. I don't know how we can test the exception in E2E query. As the exception class denotes, it mainly occurs from concurrent writes for same offset/commit log file. We should have created offset log file concurrently, between "determining start microbatch" and "writing offset log after planning", which is super hard from outside of the query. I'd suggest just remove the test. WDYT? |
|
I am fine w/ removing it. |
|
+1 to remove it. |
|
Test removed. Will merge once the build passes. |
LuciferYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
|
Sounds reasonable to remove it. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with the final commit.
|
Thanks, merging to master. |

What changes were proposed in this pull request?
This PR proposes to use FileSystem.exists in FileSystemBasedCheckpointFileManager.exists, which is consistent with other methods in FileSystemBasedCheckpointFileManager.
This PR also removes the test case QueryExecutionErrorsSuite.FAILED_RENAME_PATH: rename when destination path already exists because the test relies on incorrect custom file system instance with non-symmetric implementation between
FileSystemBasedCheckpointFileManager.existsvsFileSystem.exists.(See detailed explanation from #39936 (comment))
Why are the changes needed?
Other methods in FileSystemBasedCheckpointFileManager already uses FileSystem.exists for all cases checking existence of the path.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.