Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task] CreateRecording form does not recognize when targets reappear #294

Closed
jan-law opened this issue Sep 23, 2021 · 17 comments · Fixed by #496
Closed

[Task] CreateRecording form does not recognize when targets reappear #294

jan-law opened this issue Sep 23, 2021 · 17 comments · Fixed by #496
Assignees
Labels
bug Something isn't working good first issue Good for newcomers needs-documentation

Comments

@jan-law
Copy link
Contributor

jan-law commented Sep 23, 2021

If a target becomes briefly unreachable while filling out
the “Create recording” form, clicking the "Create" button will trigger
an expected HTTP 500 error. When the target becomes available again,
the Create button sometimes stays greyed out and the template dropdown
remains empty. Clicking the “refresh targets” button doesn’t update
the template dropdown. If you exit and re-enter the Create Recording
form then you can successfully create a recording.

Steps to reproduce:

  1. Install cryostat operator in an Openshift cluster
  2. Install a sample app to monitor:
oc new-app --docker-image=quay.io/andrewazores/quarkus-test:0.0.2
oc patch svc/quarkus-test -p '{"spec":{"$setElementOrder/ports":[{"port":9096},{"port":9999}],"ports":[{"name":"jfr-jmx","port":9096}]}}'
  1. Select your target quarkus-test and click Create Recording
  2. While viewing the form, restart the pod:
oc scale --replicas=0 deployment/quarkus-test
oc scale --replicas=1 deployment/quarkus-test
  1. Notice the 'target appeared' notification was received. The templates dropdown should still be empty. Alternatively, if you filled out the entire Create Recording form before restarting the pod, the "Create" button might still be blue. Clicking on the blue button doesn't do anything.
@jan-law jan-law added the bug Something isn't working label Sep 23, 2021
@andrewazores andrewazores moved this to Pushed to Next Release in 2.1.0 Release May 13, 2022
@andrewazores andrewazores moved this to Todo in 2.2.0 Release May 13, 2022
@andrewazores andrewazores changed the title CreateRecording form does not recognize when targets reappear [Task] CreateRecording form does not recognize when targets reappear May 17, 2022
@maxcao13 maxcao13 self-assigned this Aug 2, 2022
@maxcao13
Copy link
Member

maxcao13 commented Aug 2, 2022

Hmm... I deployed cryostat operator with the latest images of [cryostat] and [cryostat-web] on Openshift through crc and followed the steps thereafter but I don't get the bug. Or at least, I get something different. When I restart the pod as I am on the CreateRecording tab, a "target appeared" notification never happens, however somehow cryostat is able to see the "new" target, see here

INFO: Creating connection for service:jmx:rmi:///jndi/rmi://10.217.0.122:9096/jmxrmi
Aug 02, 2022 10:05:08 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:38650): GET /api/v1/targets/service%3Ajmx%3Armi%3A%2F%2F%2Fjndi%2Frmi%3A%2F%2F10.217.0.122%3A9096%2Fjmxrmi/templates 200 75ms
Aug 02, 2022 10:05:08 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:38652): GET /api/v1/targets/service%3Ajmx%3Armi%3A%2F%2F%2Fjndi%2Frmi%3A%2F%2F10.217.0.122%3A9096%2Fjmxrmi/recordingOptions 200 95ms
Aug 02, 2022 10:05:14 PM io.cryostat.core.log.Logger info
INFO: Creating connection for service:jmx:rmi:///jndi/rmi://10.217.0.123:9096/jmxrmi
Aug 02, 2022 10:05:14 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.111:48230): GET /api/v1/targets/10.217.0.123:9096/events 200 640ms
Aug 02, 2022 10:05:15 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.111:48230): GET /api/v1/targets/10.217.0.123:9096/templates 200 63ms

where it just continues to GET from that target.
Similarly a "LOST" event never notifies either, but when I press the refresh targets button, it correctly doesn't show the target anymore.

@jan-law
Copy link
Contributor Author

jan-law commented Aug 3, 2022

The templates dropdown should still be empty.

It's been a while since I filed this issue. It looks like the template dropdown is always populated now.

When I restart the pod as I am on the CreateRecording tab, a "target appeared" notification never happens, however somehow cryostat is able to see the "new" target

If you fill out the Create Recording form until the 'Create' button is blue, then restart the pod, Cryostat will see the new target as you've noticed, but when you click the 'Create' button, I don't think there's an error message indicating that the target you're trying to start a recording on is unreachable.

@maxcao13
Copy link
Member

maxcao13 commented Aug 3, 2022

Never mind I get the FOUND notifications, but still no LOST ones (for the correct target that is being scaled).
image

But it seems if I press create, after about 5 - 10 seconds, I get a 500 NoRouteToHost error, which seems like it works albeit slow to respond.
image

Maybe it was fixed from some previous PR or I am missing some steps, either way, the LOST targets shouldn't be showing anymore in the TargetSelect, but still does. However, it does fix itself if I press refresh targets. I think I will look into that.

@jan-law
Copy link
Contributor Author

jan-law commented Aug 3, 2022

Maybe it was fixed from some previous PR or I am missing some steps, either way, the LOST targets shouldn't be showing anymore in the TargetSelect, but still does. However, it does fix itself if I press refresh targets. I think I will look into that.

Sounds good, thanks!

@maxcao13
Copy link
Member

maxcao13 commented Aug 3, 2022

I've been trying various things but it seems to me that the back-end cryostat k8Client never notices that a pod disappears, through the Action action == "DELETED" case in KubeApiPlatformClient.java. However, everytime we run these to restart the pod

oc scale --replicas=0 deployment/quarkus-test
oc scale --replicas=1 deployment/quarkus-test

or

oc delete pod # with restartPolicy=always

it falls into the MODIFIED case and hence the two LOST and FOUND notifications in the first screenshot popup and a new Pod instance of the same deployment with a new host address is discovered. However the target in the old Pod instance is never notified as "LOST". Should it?

@andrewazores
Copy link
Member

There definitely should be a LOST notification if a deployment or pod is deleted or scaled to 0 and the actual JVM container instance of the target goes away. If that's missing now then there's some new bug. The MODIFIED case that you've noticed is a known wonky workaround to the way that Kubernetes/OpenShift tells us about those resources being changed (see comment here), but in the past it worked properly for us.

I wonder if perhaps the LOST/FOUND WebSocket notifications are being sent in the wrong order and that causes the visual bug of the outdated target definition remaining on the frontend?

@andrewazores
Copy link
Member

I just did a simple make deploy_bundle with the Operator to deploy cryostat-operator-bundle:2.2.0-dev, then created a Cryostat CR and waited for everything to get deployed.

After doing that, make sample_app and make sample_app_quarkus do successfully deploy the sample applications in the cluster/namespace, and make undeploy_sample_app/make undeploy_sample_app_quarkus to remove them. I can see the applications coming and going with oc get all.

The operator is aware of the sample applications because by using oc logs -f to follow the logs of the Cryostat container I can see that some client (must be the operator) is making requests against the new targets when they come online. oc get flightrecorders also reflects that the operator sees the target applications and ex. oc get -o yaml flightrecorder quarkus-test-76bb776845-4276k shows event type data that the operator retrieved from Cryostat.

However - the Cryostat container logs do not show that it ever noticed the target applications appearing or disappearing - there are no logs of WebSocket messages being sent, and the web-client instance also does not show any target discovery notifications. The target selection dropdown does not dynamically update at all, but clicking the refresh icon to cause a re-query does work.

oc describe pod/cryostat-sample-58c54d8966-ln6l4:

Containers:
  cryostat-sample:
    Container ID:   cri-o://f510ce16cf5c70a8a2242299c61ed7c652cb7f0397acf60ce871acc3dc3f3d49
    Image:          quay.io/cryostat/cryostat:latest
    Image ID:       quay.io/cryostat/cryostat@sha256:01e0dc1c020318e2ef448036dac987cf90f5e231400b900c4760264e6e1cd013

That sha256 corresponds to the latest as of ~1 hour ago (https://quay.io/repository/cryostat/cryostat/manifest/sha256:01e0dc1c020318e2ef448036dac987cf90f5e231400b900c4760264e6e1cd013), which comes from this CI run: https://github.com/cryostatio/cryostat/runs/7673498075?check_suite_focus=true . That's simply building and pushing cryostatio/cryostat@9119967 .

So it does look like something in or around KubeApiPlatformClient is currently broken in the Cryostat backend as of latest main. The next step is to determine what change broke this, exactly, and whether that's because of an actual code change in Cryostat (that file hasn't changed much lately: https://github.com/cryostatio/cryostat/commits/main/src/main/java/io/cryostat/platform/internal/KubeApiPlatformClient.java) or perhaps because the code there has become outdated and doesn't work as expected anymore with a newer OpenShift version used by crc that I am running.

@andrewazores
Copy link
Member

andrewazores commented Aug 4, 2022

BUNDLE_IMG=quay.io/cryostat/cryostat-operator-bundle:2.1.2-dev make deploy_bundle looks the same for me.

@andrewazores
Copy link
Member

BUNDLE_IMG=quay.io/cryostat/cryostat-operator-bundle:2.0.0 make deploy_bundle works as expected.

@andrewazores
Copy link
Member

BUNDLE_IMG=quay.io/cryostat/cryostat-operator-bundle:2.1.0 make deploy_bundle displays the broken behaviour.

@andrewazores
Copy link
Member

I'm working on a git bisect on Cryostat itself with tag v2.0.0 as a good commit and v2.1.0 as a bad commit.

@andrewazores
Copy link
Member

13:39 < thvo> aazores ebaron I could see it works fine on cluster-bot but not crc of the same version (i.e. 4.10.18)
13:40 < aazores> nice find
13:40 < aazores> I think we still need to determine the root cause of the difference between those kinds of clusters, but at least it doesn't sound like it's a direct regression on our side
13:41 < aazores> this also means I can stop trying to do this git bisect :)
13:41 < aazores> well... maybe not
13:41 < aazores> I'll keep working on that just to see what did change in cryostat that makes 2.0.0 work and 2.1.0 not work on the same crc instance
13:42 < thvo> yeh sounds good we will see how it goes :D I will check out crc and cluster-bot
13:49 < ebaron> I'll try with crc as well
14:27 < aazores> okay, this git bisect is not turning out to be fruitful. I'm getting the broken behaviour always, even when rebuilding 2.0.0
14:33 < ebaron> aazores, it's working for me on crc as well
14:35 < ebaron> I'm thinking what might be happening is that downtime that occurs on single-node clusters like CRC when we modify the API server config to add ourselves for CORS
14:35 < aazores> (╯°□°)╯︵ ┻━┻
14:35 < aazores> yea, you think that downtime takes out the resource watcher?
14:35 < aazores> could very well be
14:36 < ebaron> it definitely would, but would be surprising that no errors would appear
14:37 < ebaron> what I did on CRC is wait for a while after running `make create_cryostat_cr`. the API server should go down and back up
14:37 < ebaron> a few minutes should be enough, you can tell using 'oc get all' or similar. this will fail if the API server is down
14:38 < aazores> I normally have a "watch -n 5 oc get all" running in a second terminal too
14:39 < aazores> I just spun up a clusterbot instance and deployed the 2.2.0-dev bundle and it's working as expect there, so yea... this definitely seems like it has to do with crc or our interaction with it, and not an outright bug in the discovery mechanism
14:40 < aazores> I think you're probably right that the kubernetes API outage causes the running Cryostat's endpoints watch to get stopped/killed, and it just never restarts doing that
14:40 < aazores> pretty sure it's just a websocket connection to the k8s API server so if that goes out the websocket closes and we stop getting notifications
14:40 < ebaron> aazores, it shouldn't matter if the web client is open during this time would it?
14:41 < aazores> I don't think so
14:42 < aazores> the backend would still be listening for changes from k8s whether or not a web-client is connected or has ever connected
14:42 < ebaron> it worked for me in CRC after the API server downtime
14:42 < aazores> and the backend logs would still show the target LOST/FOUND events we emit, even if there are no connected web-clients or other cryostat websocket clients
14:42 < ebaron> so if the API server dying kills the watcher then it must have reconnected on its own
14:43 < aazores> yea, I did get it to work in crc once with that 2.0.0 bundle, so I wonder if that was just a timing quirk for my run on that setup
14:43 < ebaron> 2.0.0 didn't have the OAuth integration so no downtime
14:43 < aazores> our backend implementation of that watcher has really not changed at all in a long time at this point
14:43 < aazores> ah right...
14:43 < aazores> yea, that'd do it
14:46 < aazores> annoyingly, the Watch object returned by the fabric8 k8s client is just a Closeable. nothing to check if it's still open/connected, and no documentation about what kind of exception might be thrown on unexpected server side disconnect
14:47 < aazores> admittedly I didn't look too hard and scroll back into the cryostat container logs when running this in crc and observing the failures, and I've already shut down my crc instance. thvo or ebaron are yours still up?
14:47 < aazores> I wonder if there is a "watch closed" exception in the logs that I just didn't see
14:47 < ebaron> mine is still up
14:47 < thvo> mine too
14:48 < aazores> could you try to reproduce the broken behaviour using 2.2.0-dev (so that the stacktrace matches latest main branch) and see if any exception does appear indicating that the watch got killed?

Current theory/explanation: it isn't a regression in Cryostat itself. What appears to be happening is that after the OAuth integration in Cryostat 2.1.0, deploying in CRC results in a temporary OpenShift/k8s API server outage since we modify the API server config to add ourselves for CORS. In a cluster-bot instance or other real OpenShift deployments there are multiple replicas of the API server and a rolling redeploy occurs with no overall downtime, but in CRC there is only one replica and so the config update results in an outage. During this outage the Cryostat container is already running and has a k8s Watch open with an underlying WebSocket to the API server. In CRC this gets interrupted, whereas in the cluster-bot instance it survives.

We're not yet sure why that Watch WebSocket sometimes manages to reconnect to the API server and allow Cryostat to continue receiving Endpoints resource updates, and sometimes not.

In Cryostat 2.0.0 or prior there was no OAuth integration, no API server CORS config, and therefore no API server update rollout/outage, so the Watch remains uninterrupted.

@maxcao13
Copy link
Member

maxcao13 commented Aug 4, 2022

So, this is really just a crc issue then since usually real OpenShift deployments have multiple replicas of the API server. If the theory is true, this particular bug shouldn't be that important right?

@andrewazores
Copy link
Member

The bugged behaviour where the target discovery LOST/FOUND notifications aren't coming through seems like it's just a consequence of that single API server replica in crc. But, it may still be a bug on our end that our resource watcher isn't reconnecting to the API server when it does come back online.

I'm not sure if this is the same cause of the issue you described where target FOUND notifications came through but LOST did not. That might be something separate since both of those originate from the same watcher, so it seems you should either be getting both kinds or neither.

It also sounds like there is still a frontend bug where the recording creation form lets you click the blue creation button even if the selected target has disappeared after you selected it.

@andrewazores
Copy link
Member

Thinking more about it now I'm not sure how in quicklab/cluster-bot the Watch WebSocket wouldn't be interrupted. After all, even if there are multiple replicas, it seems the long-lived WebSocket connection would get routed to just one of those replicas at a time through the load balancing service. Eventually that replica will get replaced during the rolling update, so the WebSocket connection would still be interrupted. However, if the watcher on the Cryostat side does try to reconnect here, then it should find another live API server replica through the load balancer and be able to immediately reconnect on its next attempt. In crc with its limited resources and single replica, the watcher reconnection would probably fail multiple times over a relatively extended period of time until the API server replica is available again. Digging through the fabric8 kubernetes-client sources I do see that the watcher should retry connections and should do so indefinitely, but it has an exponential backoff interval each time it fails, where the retry time doubles on each reconnection failure. So maybe even in crc the watcher will eventually come back online, it just might take some time to do so?

@andrewazores
Copy link
Member

Importing certificate /truststore/operator/..2022_08_04_20_03_47.2480200380/cryostat-sample-ca.crt ...
Certificate was added to keystore
[Storing /opt/cryostat.d/truststore.p12]
+ exec java -XX:+CrashOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=9091 -Dcom.sun.management.jmxremote.rmi.port=9091 -Djavax.net.ssl.trustStore=/opt/cryostat.d/truststore.p12 -Djavax.net.ssl.trustStorePassword=WcqleLqoX09tpD8GM87khrU7zeFJV79V -Dcom.sun.management.jmxremote.autodiscovery=false -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.password.file=/tmp/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/tmp/jmxremote.access -Dcom.sun.management.jmxremote.ssl.need.client.auth=true -Djavax.net.ssl.keyStore=/var/run/secrets/operator.cryostat.io/cryostat-sample-tls/keystore.p12 -Djavax.net.ssl.keyStorePassword=7BR2SZMYIOky4Mtgg8pI -Dcom.sun.management.jmxremote.ssl=true -Dcom.sun.management.jmxremote.registry.ssl=true -cp '/app/resources:/app/classes:/app/libs/cryostat-core-2.12.0.jar:/app/libs/common-7.1.1.jar:/app/libs/encoder-1.2.2.jar:/app/libs/flightrecorder-7.1.1.jar:/app/libs/flightrecorder.rules-7.1.1.jar:/app/libs/flightrecorder.rules.jdk-7.1.1.jar:/app/libs/nashorn-core-15.4.jar:/app/libs/asm-7.3.1.jar:/app/libs/asm-commons-7.3.1.jar:/app/libs/asm-analysis-7.3.1.jar:/app/libs/asm-tree-7.3.1.jar:/app/libs/asm-util-7.3.1.jar:/app/libs/openshift-client-5.4.1.jar:/app/libs/openshift-model-5.4.1.jar:/app/libs/kubernetes-model-common-5.4.1.jar:/app/libs/jackson-annotations-2.11.2.jar:/app/libs/openshift-model-operator-5.4.1.jar:/app/libs/openshift-model-operatorhub-5.4.1.jar:/app/libs/openshift-model-monitoring-5.4.1.jar:/app/libs/openshift-model-console-5.4.1.jar:/app/libs/kubernetes-client-5.4.1.jar:/app/libs/kubernetes-model-core-5.4.1.jar:/app/libs/kubernetes-model-rbac-5.4.1.jar:/app/libs/kubernetes-model-admissionregistration-5.4.1.jar:/app/libs/kubernetes-model-apps-5.4.1.jar:/app/libs/kubernetes-model-autoscaling-5.4.1.jar:/app/libs/kubernetes-model-apiextensions-5.4.1.jar:/app/libs/kubernetes-model-batch-5.4.1.jar:/app/libs/kubernetes-model-certificates-5.4.1.jar:/app/libs/kubernetes-model-coordination-5.4.1.jar:/app/libs/kubernetes-model-discovery-5.4.1.jar:/app/libs/kubernetes-model-events-5.4.1.jar:/app/libs/kubernetes-model-extensions-5.4.1.jar:/app/libs/kubernetes-model-flowcontrol-5.4.1.jar:/app/libs/kubernetes-model-networking-5.4.1.jar:/app/libs/kubernetes-model-metrics-5.4.1.jar:/app/libs/kubernetes-model-policy-5.4.1.jar:/app/libs/kubernetes-model-scheduling-5.4.1.jar:/app/libs/kubernetes-model-storageclass-5.4.1.jar:/app/libs/kubernetes-model-node-5.4.1.jar:/app/libs/okhttp-3.12.12.jar:/app/libs/okio-1.15.0.jar:/app/libs/logging-interceptor-3.12.12.jar:/app/libs/slf4j-api-1.7.30.jar:/app/libs/jackson-dataformat-yaml-2.11.2.jar:/app/libs/snakeyaml-1.26.jar:/app/libs/jackson-datatype-jsr310-2.11.2.jar:/app/libs/jackson-databind-2.11.2.jar:/app/libs/jackson-core-2.11.2.jar:/app/libs/zjsonpatch-0.3.0.jar:/app/libs/generex-1.0.2.jar:/app/libs/automaton-1.11-8.jar:/app/libs/dagger-2.34.1.jar:/app/libs/javax.inject-1.jar:/app/libs/commons-lang3-3.12.0.jar:/app/libs/commons-codec-1.15.jar:/app/libs/commons-io-2.8.0.jar:/app/libs/commons-validator-1.7.jar:/app/libs/commons-beanutils-1.9.4.jar:/app/libs/commons-digester-2.1.jar:/app/libs/commons-logging-1.2.jar:/app/libs/commons-collections-3.2.2.jar:/app/libs/httpclient-4.5.13.jar:/app/libs/httpcore-4.4.13.jar:/app/libs/vertx-web-4.2.5.jar:/app/libs/vertx-web-common-4.2.5.jar:/app/libs/vertx-auth-common-4.2.5.jar:/app/libs/vertx-bridge-common-4.2.5.jar:/app/libs/vertx-core-4.2.5.jar:/app/libs/netty-common-4.1.74.Final.jar:/app/libs/netty-buffer-4.1.74.Final.jar:/app/libs/netty-transport-4.1.74.Final.jar:/app/libs/netty-handler-4.1.74.Final.jar:/app/libs/netty-codec-4.1.74.Final.jar:/app/libs/netty-tcnative-classes-2.0.48.Final.jar:/app/libs/netty-handler-proxy-4.1.74.Final.jar:/app/libs/netty-codec-socks-4.1.74.Final.jar:/app/libs/netty-codec-http-4.1.74.Final.jar:/app/libs/netty-codec-http2-4.1.74.Final.jar:/app/libs/netty-resolver-4.1.74.Final.jar:/app/libs/netty-resolver-dns-4.1.74.Final.jar:/app/libs/netty-codec-dns-4.1.74.Final.jar:/app/libs/vertx-web-client-4.2.5.jar:/app/libs/vertx-web-graphql-4.2.5.jar:/app/libs/graphql-java-17.3.jar:/app/libs/java-dataloader-3.1.0.jar:/app/libs/antlr4-runtime-4.9.2.jar:/app/libs/reactive-streams-1.0.3.jar:/app/libs/graphql-java-extended-scalars-17.0.jar:/app/libs/nimbus-jose-jwt-9.16.1.jar:/app/libs/jcip-annotations-1.0-1.jar:/app/libs/bcprov-jdk15on-1.69.jar:/app/libs/slf4j-jdk14-1.7.30.jar:/app/libs/gson-2.8.9.jar:/app/libs/caffeine-3.0.1.jar:/app/libs/jsoup-1.14.2.jar:/opt/cryostat.d/clientlib.d/*' @/app/jib-main-class-file
Aug 04, 2022 8:03:56 PM io.cryostat.core.log.Logger info
INFO: cryostat started.
Aug 04, 2022 8:03:56 PM io.cryostat.core.log.Logger info
INFO: Selected SSL KeyStore strategy with keystore /var/run/secrets/operator.cryostat.io/cryostat-sample-tls/keystore.p12
Aug 04, 2022 8:03:56 PM io.cryostat.core.log.Logger info
INFO: Local config path set as /opt/cryostat.d/conf.d
Aug 04, 2022 8:03:56 PM io.cryostat.core.log.Logger info
INFO: Selecting configured PlatformDetectionStrategy "io.cryostat.platform.internal.OpenShiftPlatformStrategy"
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: Selected OpenShift Platform Strategy
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: Selecting configured AuthManager "io.cryostat.net.openshift.OpenShiftAuthManager"
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: Local save path for flight recordings set as /opt/cryostat.d/recordings.d
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: HTTPS service running on https://cryostat-sample-myproject.apps-crc.testing:443
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: HTTP Server Verticle Started
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: Max concurrent WebSocket connections: 2147483647
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: MessagingServer Verticle Started
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: RuleProcessor Verticle Started
Aug 04, 2022 8:03:57 PM io.cryostat.core.log.Logger info
INFO: WebServer Verticle Started
Aug 04, 2022 8:04:07 PM io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener onFailure
WARNING: Exec Failure
java.net.SocketTimeoutException: Connect timed out
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:133)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.openshift.client.internal.OpenShiftOAuthInterceptor.intercept(OpenShiftOAuthInterceptor.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Aug 04, 2022 8:04:17 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:50268): GET /health/liveness 204 68ms
Aug 04, 2022 8:04:18 PM io.cryostat.core.log.Logger info
INFO: Outgoing WS message: {"meta":{"category":"TargetJvmDiscovery","type":{"type":"application","subType":"json"},"serverTime":1659643458},"message":{"event":{"kind":"LOST","serviceRef":{"connectUrl":"service:jmx:rmi:///jndi/rmi://10.217.0.72:9091/jmxrmi","alias":"cryostat-sample-58c54d8966-pbvng","labels":{"app":"cryostat-sample","pod-template-hash":"58c54d8966","component":"cryostat","kind":"cryostat"},"annotations":{"platform":{"k8s.v1.cni.cncf.io/networks-status":"[{\n    \"name\": \"openshift-sdn\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.217.0.72\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]","openshift.io/scc":"restricted","k8s.v1.cni.cncf.io/network-status":"[{\n    \"name\": \"openshift-sdn\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.217.0.72\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]"},"cryostat":{"HOST":"10.217.0.72","PORT":"9091","NAMESPACE":"myproject","POD_NAME":"cryostat-sample-58c54d8966-pbvng"}}}}}}
Aug 04, 2022 8:04:18 PM io.cryostat.core.log.Logger info
INFO: Outgoing WS message: {"meta":{"category":"TargetJvmDiscovery","type":{"type":"application","subType":"json"},"serverTime":1659643458},"message":{"event":{"kind":"FOUND","serviceRef":{"connectUrl":"service:jmx:rmi:///jndi/rmi://10.217.0.72:9091/jmxrmi","alias":"cryostat-sample-58c54d8966-pbvng","labels":{"app":"cryostat-sample","pod-template-hash":"58c54d8966","component":"cryostat","kind":"cryostat"},"annotations":{"platform":{"k8s.v1.cni.cncf.io/networks-status":"[{\n    \"name\": \"openshift-sdn\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.217.0.72\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]","openshift.io/scc":"restricted","k8s.v1.cni.cncf.io/network-status":"[{\n    \"name\": \"openshift-sdn\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.217.0.72\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]"},"cryostat":{"HOST":"10.217.0.72","PORT":"9091","NAMESPACE":"myproject","POD_NAME":"cryostat-sample-58c54d8966-pbvng"}}}}}}
Aug 04, 2022 8:04:19 PM io.cryostat.core.log.Logger info
INFO: Creating connection for service:jmx:rmi:///jndi/rmi://10.217.0.72:9091/jmxrmi
Aug 04, 2022 8:04:20 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:48186): GET /api/v1/targets/10.217.0.72:9091/events 200 1138ms
Aug 04, 2022 8:04:20 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:48186): GET /api/v1/targets/10.217.0.72:9091/templates 200 379ms
Aug 04, 2022 8:04:20 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:48236): GET /api/v1/targets/10.217.0.72:9091/events 200 91ms
Aug 04, 2022 8:04:20 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:48236): GET /api/v1/targets/10.217.0.72:9091/templates 200 102ms
Aug 04, 2022 8:04:20 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:48244): GET /api/v1/targets/10.217.0.72:9091/events 200 62ms
Aug 04, 2022 8:04:21 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:48244): GET /api/v1/targets/10.217.0.72:9091/templates 200 118ms
Aug 04, 2022 8:04:21 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:48246): GET /api/v1/targets/10.217.0.72:9091/events 200 90ms
Aug 04, 2022 8:04:21 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:48246): GET /api/v1/targets/10.217.0.72:9091/templates 200 102ms
Aug 04, 2022 8:04:26 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:50512): GET /health/liveness 204 0ms
Aug 04, 2022 8:04:31 PM io.cryostat.core.log.Logger info
INFO: Removing cached connection for 10.217.0.72:9091: EXPIRED
Aug 04, 2022 8:04:31 PM io.cryostat.core.log.Logger info
INFO: Connection for service:jmx:rmi:///jndi/rmi://10.217.0.72:9091/jmxrmi closed
Aug 04, 2022 8:04:36 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:50780): GET /health/liveness 204 0ms
Aug 04, 2022 8:04:46 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:51052): GET /health/liveness 204 0ms
Aug 04, 2022 8:04:56 PM io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener onFailure
WARNING: Exec Failure
java.io.EOFException
	at okio.RealBufferedSource.require(RealBufferedSource.java:61)
	at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
	at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Aug 04, 2022 8:04:56 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:53812): GET /health/liveness 204 0ms
Aug 04, 2022 8:04:57 PM io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener onFailure
WARNING: Exec Failure
java.net.ConnectException: Failed to connect to /10.217.4.1:443
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:133)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.openshift.client.internal.OpenShiftOAuthInterceptor.intercept(OpenShiftOAuthInterceptor.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
	... 30 more

Aug 04, 2022 8:04:59 PM io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener onFailure
WARNING: Exec Failure
java.net.ConnectException: Failed to connect to /10.217.4.1:443
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:133)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.openshift.client.internal.OpenShiftOAuthInterceptor.intercept(OpenShiftOAuthInterceptor.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
	... 30 more

Aug 04, 2022 8:05:03 PM io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener onFailure
WARNING: Exec Failure
java.net.ConnectException: Failed to connect to /10.217.4.1:443
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:133)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.openshift.client.internal.OpenShiftOAuthInterceptor.intercept(OpenShiftOAuthInterceptor.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
	... 30 more

Aug 04, 2022 8:05:06 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:34126): GET /health/liveness 204 0ms
Aug 04, 2022 8:05:11 PM io.cryostat.core.log.Logger warn
WARNING: Exception thrown
io.fabric8.kubernetes.client.WatcherException: too old resource version: 36366 (36679)
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$TypedWatcherWebSocketListener.onMessage(WatchConnectionManager.java:103)
	at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
	at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 36366 (36679)
	... 11 more

Aug 04, 2022 8:05:17 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:35476): GET /health/liveness 204 1ms
Aug 04, 2022 8:05:26 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:35884): GET /health/liveness 204 0ms
Aug 04, 2022 8:05:36 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:36124): GET /health/liveness 204 0ms
Aug 04, 2022 8:05:46 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:36358): GET /health/liveness 204 0ms
Aug 04, 2022 8:05:56 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:36612): GET /health/liveness 204 0ms
Aug 04, 2022 8:06:06 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:36844): GET /health/liveness 204 0ms
Aug 04, 2022 8:06:16 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:37086): GET /health/liveness 204 0ms
Aug 04, 2022 8:06:27 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:37330): GET /health/liveness 204 1ms
Aug 04, 2022 8:06:36 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:37754): GET /health/liveness 204 0ms
Aug 04, 2022 8:06:46 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:37978): GET /health/liveness 204 0ms
Aug 04, 2022 8:06:56 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:38226): GET /health/liveness 204 0ms
Aug 04, 2022 8:07:06 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:38460): GET /health/liveness 204 0ms
Aug 04, 2022 8:07:16 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:38670): GET /health/liveness 204 0ms
Aug 04, 2022 8:07:26 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:38910): GET /health/liveness 204 0ms
Aug 04, 2022 8:07:35 PM io.cryostat.core.log.Logger info
INFO: Creating connection for service:jmx:rmi:///jndi/rmi://10.217.0.72:9091/jmxrmi
Aug 04, 2022 8:07:35 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:37000): GET /api/v1/targets/10.217.0.72:9091/events 200 219ms
Aug 04, 2022 8:07:35 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:37000): GET /api/v1/targets/10.217.0.72:9091/templates 200 26ms
Aug 04, 2022 8:07:35 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:37016): GET /api/v1/targets/10.217.0.72:9091/events 200 27ms
Aug 04, 2022 8:07:35 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.71:37016): GET /api/v1/targets/10.217.0.72:9091/templates 200 28ms
Aug 04, 2022 8:07:36 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:39144): GET /health/liveness 204 0ms
Aug 04, 2022 8:07:46 PM io.cryostat.core.log.Logger info
INFO: Removing cached connection for 10.217.0.72:9091: EXPIRED
Aug 04, 2022 8:07:46 PM io.cryostat.core.log.Logger info
INFO: Connection for service:jmx:rmi:///jndi/rmi://10.217.0.72:9091/jmxrmi closed
Aug 04, 2022 8:07:46 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:39380): GET /health/liveness 204 0ms
Aug 04, 2022 8:07:56 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:39686): GET /health/liveness 204 0ms
Aug 04, 2022 8:08:06 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:39902): GET /health/liveness 204 0ms
Aug 04, 2022 8:08:16 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:40118): GET /health/liveness 204 0ms
Aug 04, 2022 8:08:26 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:40344): GET /health/liveness 204 0ms
Aug 04, 2022 8:08:36 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:40562): GET /health/liveness 204 0ms
Aug 04, 2022 8:08:46 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:40792): GET /health/liveness 204 0ms
Aug 04, 2022 8:08:56 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:41044): GET /health/liveness 204 0ms
Aug 04, 2022 8:09:06 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:41264): GET /health/liveness 204 0ms
Aug 04, 2022 8:09:16 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:41476): GET /health/liveness 204 0ms
Aug 04, 2022 8:09:26 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:41708): GET /health/liveness 204 0ms
Aug 04, 2022 8:09:36 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:41930): GET /health/liveness 204 0ms
Aug 04, 2022 8:09:46 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:42148): GET /health/liveness 204 1ms
Aug 04, 2022 8:09:56 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:42396): GET /health/liveness 204 0ms
Aug 04, 2022 8:10:06 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:42618): GET /health/liveness 204 0ms
Aug 04, 2022 8:10:16 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:42860): GET /health/liveness 204 0ms
Aug 04, 2022 8:10:26 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:43090): GET /health/liveness 204 0ms
Aug 04, 2022 8:10:36 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:43306): GET /health/liveness 204 0ms
Aug 04, 2022 8:10:46 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:43526): GET /health/liveness 204 1ms
Aug 04, 2022 8:10:56 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:43832): GET /health/liveness 204 0ms
Aug 04, 2022 8:11:06 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:44052): GET /health/liveness 204 0ms
Aug 04, 2022 8:11:16 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:44258): GET /health/liveness 204 0ms
Aug 04, 2022 8:11:26 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:44488): GET /health/liveness 204 0ms
Aug 04, 2022 8:11:36 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:44704): GET /health/liveness 204 0ms
Aug 04, 2022 8:11:46 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:44916): GET /health/liveness 204 0ms
Aug 04, 2022 8:11:56 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:45164): GET /health/liveness 204 0ms
Aug 04, 2022 8:12:06 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:45384): GET /health/liveness 204 0ms
Aug 04, 2022 8:12:16 PM io.cryostat.core.log.Logger info
INFO: (10.217.0.1:45594): GET /health/liveness 204 0ms

Here's a recent log of 2.2.0-dev in crc.

Cryostat initially had a working watcher connection and soon after startup it even detected itself coming online and logged its own WebSocket notifications about that (no client was connected to actually send those messages to, but it logs regardless). Then we see some repeated io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener onFailure occurrences. There is an initial one at Aug 04, 2022 8:04:07 PM. The next at Aug 04, 2022 8:04:56 PM, then Aug 04, 2022 8:04:57 PM (+1s), Aug 04, 2022 8:04:59 PM (+2s), and Aug 04, 2022 8:05:03 PM (+4s). This looks like the exponential backoff interval logic that I found in the fabric8 sources.

Then we get:

Aug 04, 2022 8:05:11 PM io.cryostat.core.log.Logger warn
WARNING: Exception thrown
io.fabric8.kubernetes.client.WatcherException: too old resource version: 36366 (36679)
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$TypedWatcherWebSocketListener.onMessage(WatchConnectionManager.java:103)
	at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
	at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 36366 (36679)
	... 11 more

after this there are no more logged exceptions that look like watcher reconnection attempts.

We can (and do) catch these exceptions, but currently they are only logged and nothing else happens. The WatcherException in kubernetes-client 5.4.1 that we are using has a method isShouldRetry(). I will try logging that as well and reproduce the failure. I expect that we will see true on the initial exceptions and false on the last one. If so, then if we get an exception with the isShouldRetry() == false, we can close the watcher object and attempt to re-open a new one. Hopefully this doesn't just immediately run into the same WatcherException: too old resource version: 36366 (36679).

@andrewazores
Copy link
Member

Seems relevant: https://stackoverflow.com/questions/61409596/kubernetes-too-old-resource-version

@ebaron I think the fix for upstream Cryostat here is just to detect what kind of watcher exception occurred and if the watcher will be retrying. If not, close it (it's probably already auto-closed but to be safe) and schedule a new one to be opened. WDYT?

Repository owner moved this from Pushed to Next Release to Done in 2.1.0 Release Aug 6, 2022
Repository owner moved this from Todo to Done in 2.2.0 Release Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers needs-documentation
Projects
No open projects
Status: Done
Status: Done
3 participants