Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure no active threads in any threadpool for tests in the integrationTest package #4943

Merged
merged 16 commits into from
Dec 6, 2024

Conversation

cwperks
Copy link
Member

@cwperks cwperks commented Dec 3, 2024

Description

This PR adds an additional assertion on cluster stop to ensure there are no active threads in any threadpool except the management threadpool. Borrowing this idea from ISM which performs this check on test cleanup: https://github.com/opensearch-project/index-management/blob/main/src/test/kotlin/org/opensearch/indexmanagement/IndexManagementRestTestCase.kt#L358-L378

  • Category (Enhancement, New feature, Bug fix, Test fix, Refactoring, Maintenance, Documentation)

Test fix

Check List

  • New functionality includes testing
  • New functionality has been documented
  • New Roles/Permissions have a corresponding security dashboards plugin PR
  • API changes companion pull request created
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
@cwperks
Copy link
Member Author

cwperks commented Dec 3, 2024

Added a test that reproduces the issue

Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
@cwperks cwperks marked this pull request as ready for review December 3, 2024 22:14
@cwperks
Copy link
Member Author

cwperks commented Dec 3, 2024

@nibix This PR creates an integration test that simulates the issue found here: #4937 (comment)

It occurs on a document update request: https://opensearch.org/docs/latest/api-reference/document-apis/update-document/

Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
@cwperks
Copy link
Member Author

cwperks commented Dec 4, 2024

I'm seeing this error on the Windows runner:

2024-12-04T16:46:57.3431879Z 2024-12-04 16:46:56 opensearch[cluster_manager_2][generic][T#8] ERROR FsHealthService:214 - health check failed
2024-12-04T16:46:57.3432971Z java.lang.IllegalStateException: environment is not locked
2024-12-04T16:46:57.3434211Z 	at org.opensearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:1153) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
2024-12-04T16:46:57.3435893Z 	at org.opensearch.env.NodeEnvironment.nodeDataPaths(NodeEnvironment.java:901) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
2024-12-04T16:46:57.3437650Z 	at org.opensearch.monitor.fs.FsHealthService$FsHealthMonitor.monitorFSHealth(FsHealthService.java:212) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
2024-12-04T16:46:57.3439482Z 	at org.opensearch.monitor.fs.FsHealthService$FsHealthMonitor.run(FsHealthService.java:195) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
2024-12-04T16:46:57.3441494Z 	at org.opensearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:246) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
2024-12-04T16:46:57.3443538Z 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:991) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
2024-12-04T16:46:57.3445678Z 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
2024-12-04T16:46:57.3447215Z 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
2024-12-04T16:46:57.3448470Z 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
2024-12-04T16:46:57.3449458Z 	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2024-12-04T16:46:57.3451118Z Caused by: java.nio.file.NoSuchFileException: D:\a\security\security\build\testrun\integrationTest\temp\local_cluster_local_cluster_318647815281639249569\cluster_manager_2\data\nodes\0\node.lock
2024-12-04T16:46:57.3452959Z 	at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:85) ~[?:?]
2024-12-04T16:46:57.3454222Z 	at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
2024-12-04T16:46:57.3455548Z 	at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108) ~[?:?]
2024-12-04T16:46:57.3467150Z 	at java.base/sun.nio.fs.WindowsFileAttributeViews$Basic.readAttributes(WindowsFileAttributeViews.java:53) ~[?:?]
2024-12-04T16:46:58.2448848Z 	at java.base/sun.nio.fs.WindowsFileAttributeViews$Basic.readAttributes(WindowsFileAttributeViews.java:38) ~[?:?]
2024-12-04T16:46:58.6622882Z 	at java.base/sun.nio.fs.WindowsFileSystemProvider.readAttributes(WindowsFileSystemProvider.java:197) ~[?:?]
2024-12-04T16:46:58.9648564Z 	at java.base/java.nio.file.Files.readAttributes(Files.java:1853) ~[?:?]
2024-12-04T16:46:59.0996413Z 	at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:177) ~[lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
2024-12-04T16:46:59.5786522Z 	at org.opensearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:1150) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
2024-12-04T16:46:59.7052891Z 	... 9 more

willyborankin
willyborankin previously approved these changes Dec 5, 2024
@cwperks
Copy link
Member Author

cwperks commented Dec 5, 2024

@willyborankin The windows issues are consistently reproducible on the windows Github runners. I cannot figure out how to resolve the issue.

Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
This reverts commit b30b733.

Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Copy link

codecov bot commented Dec 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.43%. Comparing base (64f4d5b) to head (28a93af).
Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4943      +/-   ##
==========================================
- Coverage   71.43%   71.43%   -0.01%     
==========================================
  Files         334      334              
  Lines       22551    22552       +1     
  Branches     3590     3590              
==========================================
  Hits        16109    16109              
- Misses       4647     4649       +2     
+ Partials     1795     1794       -1     
Files with missing lines Coverage Δ
...urity/privileges/dlsfls/DlsFlsProcessedConfig.java 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

Copy link
Collaborator

@nibix nibix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! :)

@cwperks cwperks merged commit 7ae045c into opensearch-project:main Dec 6, 2024
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants