Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix occasional flakiness in preview action cancellation test #1070

Merged

Conversation

d-ronnqvist
Copy link
Contributor

@d-ronnqvist d-ronnqvist commented Oct 23, 2024

Bug/issue #, if applicable: rdar://137751315

Summary

This fixes an occasional flakiness issue in PreviewActionIntegrationTests/testCancelsConversion.

AFAICT there were a few different issues that this test could encounter:

  • Sometimes a date race in LogHandle/LogStorage caused the test to trap instead of accumulating log output
  • Sometimes a data race in starting or stopping the DirectoryMonitor caused the test to trap.
  • Sometimes a data race in updating the preview actions's logHandle variable caused the test to trap. This was fixed by not synchronizing access within the preview action and passing separate copies to the PreviewServer and ConvertAction.
  • The tread sanitizer reported an additional issue with the local loadError variable in SymbolGraphLoader
  • Sometimes the preview server failed to start with a NIOCore/SocketAddressError/unixDomainSocketPathTooLong error. This was fixed by creating a shorter temporary unix domain socket path for these tests and opting to skip the test if it can't create a short enough path.

After all those changes I revisited the removed testWatchRecoversAfterConversionErrors test and reimplemented it in the style of testCancelsConversion. AFACIT the in this PR addresses the flakiness in both tests.

Dependencies

None

Testing

  • Run these tests repeatedly using for i in {1..100}; do swift test --parallel --filter SwiftDocCUtilitiesTests.PreviewActionIntegrationTests; done
    • The tests should pass each time

I let this run over 5900 times (over night) without encountering any failures so I'm reasonably sure that most of the flakiness is gone 🤞

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added Updated tests
  • Ran the ./bin/test script and it succeeded
  • [ ] Updated documentation if necessary Not applicable

@d-ronnqvist d-ronnqvist force-pushed the fix-flaky-preview-cancellation-test branch from 13359b4 to cc7ad01 Compare October 24, 2024 11:58
@d-ronnqvist d-ronnqvist force-pushed the fix-flaky-preview-cancellation-test branch from cc7ad01 to 756cb7e Compare October 25, 2024 08:43
Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! ...just a few questions.

@@ -51,7 +54,7 @@ public enum LogHandle: TextOutputStream {
case .file(let fileHandle):
fileHandle.write(Data(string.utf8))
case .memory(let storage):
storage.text.append(string)
storage._text.sync { $0.append(string) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to synchronize fileHandle in the same way? Or does macOS/Linux guarantee that writing to a file is thread safe - i.e. synchronized at a lower level?

Same question for stdout and stderr...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't ever encountered any issues with writing to a file and I've never seen a pattern of synchronizing around fileHandle.write or print but I can't say for sure.


public extension Lock {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to mark this extension as non-public? Could some client package be using it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's safe because it's extending an internal type so its API was never actually available to the public.

Because Lock was defined as typealias Lock = Synchronized<Void> (no access modifier) it was never publicly available so a public extension to Lock could never exceed the Lock types internal access. I feel that there should have been a warning about this but there wasn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only found this because I tried to use this API from SwiftDocCUtilities and got a build error that it could't be accessed because it was internal.

self.monitoredConvertTask?.cancel()
self.monitoredConvertTask = Task {
defer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we need this defer block any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell it wasn't ever needed. The DirectoryMonitor already does a smarter automatic reloading when files are added to or removed from a monitored directory.

Removing the restart here means that we don't:

  • Restart (stop monitoring all files and directories and start monitoring them again) twice when a file is added removed (once here and once in the directory monitor itself).
  • Restart (stop monitoring all files and directories and start monitoring them again) when an already monitored file's content changed but no files were added or removed.

// If we couldn't create short enough socket path, skip the tests rather than fail them to avoid flakiness in the CI.
// The current implementation _should_ result in a path that's around 92 characters long, so the 11 character headroom
// should cover some amount of hypothetical changes to the `NSTemporaryDirectory` length and the `UUID().uuidString` length.
try XCTSkipIf(!isShortEnoughUnixDomainSocket(socketURL),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting XCTest feature: Skipping a test after it has started!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's typically done as the first step of a test but here I'm using it to stop before a detectable "failure" that's more caused by the environment in which the tests are executed than the test itself.

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist merged commit 90d64a8 into swiftlang:main Nov 8, 2024
2 checks passed
@d-ronnqvist d-ronnqvist deleted the fix-flaky-preview-cancellation-test branch November 8, 2024 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants