-
-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-39150] expose diagnostics across all the channels #122
Conversation
To be used by support-core, we need to be able to enumerate all active channels. We do this via WeakHashMap so that references get automatically garbage collected. Unclosed channel will remain in memory forever, which also helps us find those leaks.
... and see how they can be interpreted to aid diagnostics
JLS (http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.7) states that a write to non-volatile long variable can go in 32bit batch, so without it, read could retrieve a completely bogus value. There's no risk of writer contention here because that is serialized by the context in which it gets invoked.
Merging based on 🐝 given in #120 |
* that the other side has made to us but not yet returned yet. | ||
* | ||
* @since 2.26.3 | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the Javadocs @kohsuke 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not work in such way
Well, the Javadoc for the stable branch is not ideal. We are not supposed to provide new APIs in the stabilization branch, because it may cause binary incompatibilities for people upgrading from Jenkins LTS to later weekly releases. I'll fix it later before the release |
@@ -494,7 +515,8 @@ protected Channel(ChannelBuilder settings, CommandTransport transport) throws IO | |||
|
|||
transport.setup(this, new CommandReceiver() { | |||
public void handle(Command cmd) { | |||
updateLastHeard(); | |||
commandsReceived++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Risk of overflow?
Would it be worth adding JMX beans for this? |
#120 (comment) in the original PR has been also addressed improperly. Needs fix |
… not required after the merge of jenkinsci#122
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements. Bugfixes: * [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) - Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`. It was causing connection failures of Jenkins JNLP agents when using Java Web Start. ([PR jenkinsci#131](jenkinsci/remoting#131)) * [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) - `hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed. ([PR jenkinsci#131](jenkinsci/remoting#131)) Improvements: * [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) - Add logic for dumping diagnostics across all the channels. ([PR jenkinsci#122](jenkinsci/remoting#122), [PR jenkinsci#125](jenkinsci/remoting#125)) * [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) - Improve the caller/callee correlation diagnostics in thread dumps. ([PR jenkinsci#119](jenkinsci/remoting#119)) * [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) - Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes). ([PR jenkinsci#123](jenkinsci/remoting#123)) * [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) - Add extra logging to help diagnosing `IOHub` Thread spikes. ([PR #116](jenkinsci/remoting#116)) * [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) - When a proxy fails, report what caused the channel to go down. ([PR jenkinsci#128](jenkinsci/remoting#128))
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements. Bugfixes: * [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) - Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`. It was causing connection failures of Jenkins JNLP agents when using Java Web Start. ([PR #131](jenkinsci/remoting#131)) * [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) - `hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed. ([PR #131](jenkinsci/remoting#131)) Improvements: * [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) - Add logic for dumping diagnostics across all the channels. ([PR #122](jenkinsci/remoting#122), [PR #125](jenkinsci/remoting#125)) * [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) - Improve the caller/callee correlation diagnostics in thread dumps. ([PR #119](jenkinsci/remoting#119)) * [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) - Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes). ([PR #123](jenkinsci/remoting#123)) * [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) - Add extra logging to help diagnosing `IOHub` Thread spikes. ([PR #116](jenkinsci/remoting#116)) * [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) - When a proxy fails, report what caused the channel to go down. ([PR #128](jenkinsci/remoting#128))
* [JENKINS-37566] - FindBugs: Unclosed stream in hudson.remoting.Capability * [JENKINS-37566] - FindBugs: Stream not closed on Exception path in InitializeJarCacheMain#copyFile * [JENKINS-37566] - Fix Util#makeResource and remove obsolete hack * Extra Util#makeResource() polishing * Deprecate obsolete Util#mkdirs() * Exceptional case in setLastModifiedTime * Handle exception case during temp file deletion in FileSystemJarCache * Synchronize ProxyWriter#closed in write() * Synchronization of ProxyWriter#channel * Get rid of the obsolete collection in Channel#properties, fix synchronization * Checksum#calculateFor() - no need to put if absent since there is a check above * UWF_UNWRITTEN_FIELD in ExportTable$Entry * NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE in JnlpAgentEndpointResolver. Now we use a more agressive check * UWF_FIELD_NOT_INITIALIZED_IN_CONSTRUCTOR in org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort * UPM_UNCALLED_PRIVATE_METHOD - hudson.remoting.ChunkedOutputStream.frameSize() * FindBugs: Revert the stream closure at emoting Capability, just a design to be reworked * Unrealistic NP_NULL_ON_SOME_PATH in NioChannelHub * EI_EXPOSE_REP in DiagnosedStreamCorruptionException (diagnostics code) * SE_TRANSIENT_FIELD_NOT_RESTORED in PreloadJarTask, RemoteClassLoader, RemoteInvocationHandler and UserRequest * Leftover false-positive UG_SYNC_SET_UNSYNC_GET in Channel * EI_EXPOSE_REP in ChannelBuilder#properties * Suppress DMI_NONSERIALIZABLE_OBJECT_WRITTEN in ClassLoaderHolder#writeObject() * Better handling of Streams in hudson.remoting.Capability (still weird) * Fix the Util#makeResource() behavior for nested folders * FileSystemJarCache: tmp file may be renamed * JENKINS-37566 - Modification of Channel#dumpDiagnostics() actually is not required after the merge of #122 * Disable DP_DO_INSIDE_DO_PRIVILEGED as per feeback from @kohsue in #118 * Fix the FindBugs filter * [NP_NULL_ON_SOME_PATH] - Prevent NPE in ExportTable#diagnoseInvalidObjectId() * [RV_RETURN_VALUE_IGNORED_BAD_PRACTICE] - Propagate exceptions when FileSystemJarCache#retrieve() fails to retrieve the target * [DP_CREATE_CLASSLOADER_INSIDE_DO_PRIVILEGED] - Ignore the warning as per discussion in #118 * [VO_VOLATILE_INCREMENT] - Suppress warning in Channel#send() * FindBugs - Which#jarFile() should not suppress all exceptions * FindBugs - Suppress URF_UNREAD_FIELD in Pipewindow#Real * [JENKINS-37566] - Channel#properties should be a ConcurrentHashMap Reason - prevent possible delays and deadlocks in the getter method. Addresses the feedback from @stephenc * [JENKINS-37566] - DiagnosedStreamCorruptionException readAhead/readBack handlers should use a more efficient clone() command * [JENKINS-37566] - Suppress UG_SYNC_SET_UNSYNC_GET after switching to the ConcurrentHashMap * [JENKINS-37566] - hudson.remoting.Util should be tolerant against InvalidPathException * [JENKINS-37566] - make ProxyWriter more asynchronous (follow-up to @stephenc’s review) * [JENKINS-37566] - Add a comment about race condition as suggested by @stephenc
Continuing from #120
No code change since then, just re-targeting stable-2.x