Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.nio.channels.CancelledKeyException #3866

Closed
jjarokergc opened this issue Aug 25, 2023 · 4 comments
Closed

java.nio.channels.CancelledKeyException #3866

jjarokergc opened this issue Aug 25, 2023 · 4 comments

Comments

@jjarokergc
Copy link

jjarokergc commented Aug 25, 2023

I am attempting to troubleshoot the following problem and was hoping someone could point me in the right direction. Is this a problem in puppet dashboard or puppetdb?

Describe the Bug

Puppetdb is failing with the error below in a fresh installation of a 3-node setup of puppet master, db and db01 in a staging environment:

  • puppetdb version: 8.1.0
  • psql (PostgreSQL) 14.9 (Ubuntu 14.9-1.pgdg22.04+1)
  • puppetserver version: 8.2.0
  • Ubuntu 22

The same configuration is operating normally in my production environment:

  • puppetdb version: 7.13.1
  • psql (PostgreSQL) 11.21 (Ubuntu 11.21-1.pgdg20.04+1)
  • puppetserver version: 7.12.0
  • Ubuntu 20

This sometimes appears on puppet dashboard the following error. Other times, the 3 nodes are shown in "unreported" state (see below).

Internal Server Error
This error usually occurs because:
    We were unable to reach PuppetDB;
    The query to be executed was malformed resulting in an incorrectly encoded request.

Puppet server appears to work properly with no errors, so I am not sure if this defect is in puppet dashboard or puppetdb.

Steps to Reproduce

  1. New 3-node installation of server, db01, puppetdb, puppetdashboard (versions above)
  2. Accessing puppetdb web page fails

Additional Context

Unreported states
image

Stack Trace

2023-08-25T13:12:47.626-04:00 ERROR [p.p.middleware] #error {
 :cause nil
 :via
 [{:type java.nio.channels.CancelledKeyException
   :message nil
   :at [sun.nio.ch.SelectionKeyImpl ensureValid SelectionKeyImpl.java 75]}]
 :trace
 [[sun.nio.ch.SelectionKeyImpl ensureValid SelectionKeyImpl.java 75]
  [sun.nio.ch.SelectionKeyImpl interestOps SelectionKeyImpl.java 104]
  [java.nio.channels.spi.AbstractSelectableChannel register AbstractSelectableChannel.java 233]
  [java.nio.channels.SelectableChannel register SelectableChannel.java 260]
  [jdk.internal.reflect.GeneratedMethodAccessor19 invoke nil -1]
  [jdk.internal.reflect.DelegatingMethodAccessorImpl invoke DelegatingMethodAccessorImpl.java 43]
  [java.lang.reflect.Method invoke Method.java 568]
  [clojure.lang.Reflector invokeMatchingMethod Reflector.java 167]
  [clojure.lang.Reflector invokeInstanceMethod Reflector.java 102]
  [puppetlabs.puppetdb.query.monitor$stop_query_at_deadline_or_disconnect invokeStatic monitor.clj 328]
  [puppetlabs.puppetdb.query.monitor$stop_query_at_deadline_or_disconnect invoke monitor.clj 324]
  [puppetlabs.puppetdb.http.query$wrap_typical_query$fn__31076$fn__31078 invoke query.clj 434]
  [puppetlabs.puppetdb.http.query$wrap_typical_query$fn__31076 invoke query.clj 404]
  [bidi.ring$fn__35985 invokeStatic ring.cljc 25]
  [bidi.ring$fn__35985 invoke ring.cljc 21]
  [bidi.ring$fn__35970$G__35965__35979 invoke ring.cljc 16]
  [puppetlabs.puppetdb.middleware$fn__37279$make_pdb_handler__37288$fn__37291$fn__37293 invoke middleware.clj 429]
  [puppetlabs.puppetdb.middleware$wrap_with_illegal_argument_catch$fn__37121 invoke middleware.clj 100]
  [puppetlabs.puppetdb.middleware$verify_accepts_content_type$fn__37133 invoke middleware.clj 148]
  [puppetlabs.puppetdb.middleware$verify_content_type$fn__37145 invoke middleware.clj 181]
  [puppetlabs.puppetdb.middleware$verify_sync_version$fn__37229 invoke middleware.clj 363]
  [puppetlabs.puppetdb.middleware$wrap_with_metrics$fn__37190$fn__37199 invoke middleware.clj 281]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26528$fn__26529$fn__26530 invoke metrics.clj 23]
  [puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4 call nil -1]
  [com.codahale.metrics.Timer time Timer.java 101]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26528$fn__26529 invoke metrics.clj 23]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26528$fn__26529$fn__26530 invoke metrics.clj 23]
  [puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4 call nil -1]
  [com.codahale.metrics.Timer time Timer.java 101]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26528$fn__26529 invoke metrics.clj 23]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26528$fn__26529$fn__26530 invoke metrics.clj 23]
  [puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4 call nil -1]
  [com.codahale.metrics.Timer time Timer.java 101]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26528$fn__26529 invoke metrics.clj 23]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26528$fn__26529$fn__26530 invoke metrics.clj 23]
  [puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4 call nil -1]
  [com.codahale.metrics.Timer time Timer.java 101]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26528$fn__26529 invoke metrics.clj 23]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_ invokeStatic metrics.clj 26]
  [puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_ invoke metrics.clj 15]
  [puppetlabs.puppetdb.middleware$wrap_with_metrics$fn__37190 invoke middleware.clj 280]
  [puppetlabs.puppetdb.middleware$wrap_with_globals$fn__37116 invoke middleware.clj 94]
  [puppetlabs.puppetdb.middleware$wrap_with_exception_handling$fn__37125 invoke middleware.clj 115]
  [puppetlabs.puppetdb.http.server$build_app$fn__42708 invoke server.clj 80]
  [compojure.core$routing$fn__36589 invoke core.clj 151]
  [clojure.core$some invokeStatic core.clj 2718]
  [clojure.core$some invoke core.clj 2709]
  [compojure.core$routing invokeStatic core.clj 151]
  [compojure.core$routing doInvoke core.clj 148]
  [clojure.lang.RestFn invoke RestFn.java 423]
  [puppetlabs.puppetdb.pdb_routing$wrap_with_context$fn__46557 invoke pdb_routing.clj 32]
  [compojure.core$if_context$fn__36613 invoke core.clj 218]
  [compojure.core$routing$fn__36589 invoke core.clj 151]
  [clojure.core$some invokeStatic core.clj 2718]
  [clojure.core$some invoke core.clj 2709]
  [compojure.core$routing invokeStatic core.clj 151]
  [compojure.core$routing doInvoke core.clj 148]
  [clojure.lang.RestFn applyTo RestFn.java 139]
  [clojure.core$apply invokeStatic core.clj 669]
  [clojure.core$apply invoke core.clj 662]
  [compojure.core$routes$fn__36593 invoke core.clj 156]
  [compojure.core$routing$fn__36589 invoke core.clj 151]
  [clojure.core$some invokeStatic core.clj 2718]
  [clojure.core$some invoke core.clj 2709]
  [compojure.core$routing invokeStatic core.clj 151]
  [compojure.core$routing doInvoke core.clj 148]
  [clojure.lang.RestFn invoke RestFn.java 460]
  [puppetlabs.puppetdb.pdb_routing$pdb_app$fn__46568 invoke pdb_routing.clj 60]
  [compojure.core$if_context$fn__36613 invoke core.clj 218]
  [ring.middleware.params$wrap_params$fn__37084 invoke params.clj 67]
  [puppetlabs.puppetdb.middleware$wrap_with_certificate_cn$fn__37106 invoke middleware.clj 76]
  [puppetlabs.puppetdb.middleware$wrap_with_default_body$fn__37111 invoke middleware.clj 83]
  [puppetlabs.puppetdb.middleware$wrap_with_debug_logging$fn__37089 invoke middleware.clj 40]
  [puppetlabs.i18n.core$locale_negotiator$fn__272 invoke core.clj 361]
  [puppetlabs.trapperkeeper.services.webserver.jetty9_core$ring_handler$fn__48220 invoke jetty9_core.clj 460]
  [puppetlabs.trapperkeeper.services.webserver.jetty9_core.proxy$org.eclipse.jetty.server.handler.AbstractHandler$ff19274a handle nil -1]
  [org.eclipse.jetty.server.handler.HandlerWrapper handle HandlerWrapper.java 127]
  [org.eclipse.jetty.server.handler.ScopedHandler nextHandle ScopedHandler.java 235]
  [org.eclipse.jetty.server.handler.ContextHandler doHandle ContextHandler.java 1440]
  [org.eclipse.jetty.server.handler.ScopedHandler nextScope ScopedHandler.java 190]
  [org.eclipse.jetty.server.handler.ContextHandler doScope ContextHandler.java 1355]
  [org.eclipse.jetty.server.handler.ScopedHandler handle ScopedHandler.java 141]
  [org.eclipse.jetty.server.handler.ContextHandlerCollection handle ContextHandlerCollection.java 234]
  [org.eclipse.jetty.server.handler.HandlerCollection handle HandlerCollection.java 146]
  [org.eclipse.jetty.server.handler.gzip.GzipHandler handle GzipHandler.java 772]
  [org.eclipse.jetty.server.handler.RequestLogHandler handle RequestLogHandler.java 54]
  [com.puppetlabs.trapperkeeper.services.webserver.jetty9.utils.MDCRequestLogHandler handle MDCRequestLogHandler.java 36]
  [org.eclipse.jetty.server.handler.StatisticsHandler handle StatisticsHandler.java 181]
  [org.eclipse.jetty.server.handler.HandlerWrapper handle HandlerWrapper.java 127]
  [org.eclipse.jetty.server.Server handle Server.java 516]
  [org.eclipse.jetty.server.HttpChannel lambda$handle$1 HttpChannel.java 487]
  [org.eclipse.jetty.server.HttpChannel dispatch HttpChannel.java 732]
  [org.eclipse.jetty.server.HttpChannel handle HttpChannel.java 479]
  [org.eclipse.jetty.server.HttpConnection onFillable HttpConnection.java 277]
  [org.eclipse.jetty.io.AbstractConnection$ReadCallback succeeded AbstractConnection.java 311]
  [org.eclipse.jetty.io.FillInterest fillable FillInterest.java 105]
  [org.eclipse.jetty.io.ChannelEndPoint$1 run ChannelEndPoint.java 104]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill runTask EatWhatYouKill.java 338]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill doProduce EatWhatYouKill.java 315]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill tryProduce EatWhatYouKill.java 173]
  [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill run EatWhatYouKill.java 131]
  [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread run ReservedThreadExecutor.java 409]
  [org.eclipse.jetty.util.thread.QueuedThreadPool runJob QueuedThreadPool.java 883]
  [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner run QueuedThreadPool.java 1034]
  [java.lang.Thread run Thread.java 833]]}
@jjarokergc
Copy link
Author

jjarokergc commented Aug 26, 2023

This is additional information to help troubleshoot the problem.

  • The 500-error only appears when accessing the 'Overview' tab on puppet board. The other tabs show normal log messages, but the 500 error is always present when accessing 'Overview'
  • The Overview page loads correctly about 60% of the time.
  • When the overview page fails, the following message appears:
Internal Server Error
This error usually occurs because:
    We were unable to reach PuppetDB;
    The query to be executed was malformed resulting in an incorrectly encoded request.
Please have a look at the log output for further information.
  • I am not able to trigger the Overview page failure by refreshing or clicking between tabs. The overview page error appears randomly

@austb
Copy link
Contributor

austb commented Aug 29, 2023

Thanks @jjarokergc, we are going to start looking into this issue. Another user in the community Slack reported hitting this issue with Puppetboard as well.

I have not had the chance to spin up Puppetboard yet, but the user in Slack reported that setting PDB_PROMPTLY_END_QUERIES=false in their systemd unit file made the issue go away. That variable completely disables the query timeout code we just added, so if you set it I would recommend removing it after we release a fix for the bug.

rbrw added a commit to rbrw/puppetdb that referenced this issue Sep 5, 2023
Previously, any query a client issued after the first on a given
connection might provoke a CancelledKeyException when it tried to
register the query/socket/channel with the monitor because the
selection key for a given channel for a given selector is unique, and
a cancelled key is only cleaned up during the next call to select for
the relevant selector.

Any attempt to register a channel with a selector between the
cancellation of the previous key for that channel and the clean up in
the next select will provoke the CancelledKeyException.

To address the issue, adjust the registration code to retry whenever
it encounters that exception, until it succeeds.  While it's retrying,
the monitor select loop, which will have been woken up by the previous
query's forget, will issue a new select call, removing the canceled
key.

Fixes: puppetlabs#3866
rbrw added a commit to rbrw/puppetdb that referenced this issue Sep 7, 2023
Previously, any query a client issued after the first on a given
connection might provoke a CancelledKeyException when it tried to
register the query/socket/channel with the monitor because the
selection key for a given channel for a given selector is unique, and
a cancelled key is only cleaned up during the next call to select for
the relevant selector.

Any attempt to register a channel with a selector between the
cancellation of the previous key for that channel and the clean up in
the next select will provoke the CancelledKeyException.

To address the issue, adjust the registration code to retry whenever
it encounters that exception, until it succeeds.  While it's retrying,
the monitor select loop, which will have been woken up by the previous
query's forget, will issue a new select call, removing the canceled
key.

Fixes: puppetlabs#3866
rbrw added a commit to rbrw/puppetdb that referenced this issue Sep 7, 2023
Previously, any query a client issued after the first on a given
connection might provoke a CancelledKeyException when it tried to
register the query/socket/channel with the monitor because the
selection key for a given channel for a given selector is unique, and
a cancelled key is only cleaned up during the next call to select for
the relevant selector.

Any attempt to register a channel with a selector between the
cancellation of the previous key for that channel and the clean up in
the next select will provoke the CancelledKeyException.

To address the issue, adjust the registration code to retry whenever
it encounters that exception, until it succeeds.  While it's retrying,
the monitor select loop, which will have been woken up by the previous
query's forget, will issue a new select call, removing the canceled
key.

Fixes: puppetlabs#3866
rbrw added a commit to rbrw/puppetdb that referenced this issue Sep 7, 2023
Previously, any query a client issued after the first on a given
connection might provoke a CancelledKeyException when it tried to
register the query/socket/channel with the monitor because the
selection key for a given channel for a given selector is unique, and
a cancelled key is only cleaned up during the next call to select for
the relevant selector.

Any attempt to register a channel with a selector between the
cancellation of the previous key for that channel and the clean up in
the next select will provoke the CancelledKeyException.

To address the issue, adjust the registration code to retry whenever
it encounters that exception, until it succeeds.  While it's retrying,
the monitor select loop, which will have been woken up by the previous
query's forget, will issue a new select call, removing the canceled
key.

Fixes: puppetlabs#3866
rbrw added a commit to rbrw/puppetdb that referenced this issue Sep 7, 2023
Previously, any query a client issued after the first on a given
connection might provoke a CancelledKeyException when it tried to
register the query/socket/channel with the monitor because the
selection key for a given channel for a given selector is unique, and
a cancelled key is only cleaned up during the next call to select for
the relevant selector.

Any attempt to register a channel with a selector between the
cancellation of the previous key for that channel and the clean up in
the next select will provoke the CancelledKeyException.

To address the issue, adjust the registration code to retry whenever
it encounters that exception until it succeeds.  While it's retrying,
the monitor select loop, which will have been woken up by the previous
query's forget, will issue a new select call, removing the canceled
key.

Fixes: puppetlabs#3866
@austb austb closed this as completed in 9b986d9 Sep 7, 2023
@austb
Copy link
Contributor

austb commented Sep 14, 2023

The fix for this bug has been released in PuppetDB 8.1.1

@jjarokergc
Copy link
Author

Hello @austb,

Thank you. I just installed version 8.1.1 and Puppet board works without error messages in the logs.

Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants