Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Internal Server Error while creating recording #1516

Closed
rjbaucells opened this issue Jun 2, 2023 · 7 comments
Closed

[Bug] Internal Server Error while creating recording #1516

rjbaucells opened this issue Jun 2, 2023 · 7 comments
Labels
bug Something isn't working high-priority

Comments

@rjbaucells
Copy link

rjbaucells commented Jun 2, 2023

I have a java based k8s pod with the following configuration:

java command line args:

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9091
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.autodiscovery=true

pod specs:

ports:
  - name: jfr-jmx
    containerPort: 9091
    protocol: TCP

service specs:

ports:
  - name: jfr-jmx
    protocol: TCP
    port: 9091
    targetPort: jfr-jmx

K8s pod is successfully discovered and displayed in the Topology dashboard:

image

Clicking on the icon displays the pod details:

image

Clicking on the "Create Recording" displays the following message in the Web App:

image

Cryostat pod logs the following exception:

Jun 02, 2023 9:08:55 PM org.slf4j.impl.JDK14LoggerAdapter fillCallerData
SEVERE: 10.242.148.69 - - [Fri, 2 Jun 2023 21:08:55 GMT] 26ms "GET /api/v2/targets/service%3Ajmx%3Armi%3A%2F%2F%2Fjndi%2Frmi%3A%2F%2F10-242-149-106.default.pod%3A9091%2Fjmxrmi/probes HTTP/1.1" 501 168 bytes "https://cryostat.dev.domain.com/topology" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
Jun 02, 2023 9:08:57 PM io.cryostat.core.log.Logger error
SEVERE: HTTP 500: Cannot invoke "org.openjdk.jmc.rjmx.IConnectionHandle.getServiceOrDummy(java.lang.Class)" because "connectionHandle" is null
io.vertx.ext.web.handler.HttpException: Internal Server Error
Caused by: java.lang.NullPointerException: Cannot invoke "org.openjdk.jmc.rjmx.IConnectionHandle.getServiceOrDummy(java.lang.Class)" because "connectionHandle" is null
	at org.openjdk.jmc.rjmx.ConnectionToolkit.getVMName(ConnectionToolkit.java:417)
	at org.openjdk.jmc.rjmx.ConnectionToolkit.isHotSpot(ConnectionToolkit.java:364)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceV2.isFlightRecorderCommercial(FlightRecorderServiceV2.java:111)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceV2.isFlightRecorderDisabled(FlightRecorderServiceV2.java:116)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceV2.<init>(FlightRecorderServiceV2.java:129)
	at org.openjdk.jmc.rjmx.services.jfr.internal.FlightRecorderServiceFactory.getServiceInstance(FlightRecorderServiceFactory.java:47)
	at io.cryostat.core.net.JFRJMXConnection.getService(JFRJMXConnection.java:140)
	at io.cryostat.net.web.http.api.v1.TargetRecordingsGetHandler.lambda$handleAuthenticated$0(TargetRecordingsGetHandler.java:128)
	at io.cryostat.net.TargetConnectionManager.executeConnectedTask(TargetConnectionManager.java:168)
	at io.cryostat.net.web.http.api.v1.TargetRecordingsGetHandler.handleAuthenticated(TargetRecordingsGetHandler.java:124)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:102)
	at io.cryostat.net.web.http.AbstractAuthenticatedRequestHandler.handle(AbstractAuthenticatedRequestHandler.java:72)
	at io.vertx.ext.web.impl.BlockingHandlerDecorator.lambda$handle$0(BlockingHandlerDecorator.java:48)
	at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)
	at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135)
	at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

Cryostat is able to create recordings of its own process without problems

Using JDK Mission Control

Adding additional args to the java application:

-Dcom.sun.management.jmxremote.rmi.port=9091
-Djava.rmi.server.hostname=127.0.0.1

I am able to start recording and download results using the JDK Mission Control against the same application:

kubectl -n default port-forward pod-name 9091:9091

image

image

But the pod is no longer discovered by Cryostat

Environment:

  • OS: CentOS 8
  • Environment: k8s 1.26

Cryostat

apiVersion: operator.cryostat.io/v1beta1
kind: Cryostat
metadata:
  name: cryostat
  namespace: default
spec:
  minimal: true
  enableCertManager: true
  serviceOptions:
    coreConfig:
      serviceType: NodePort
  storageOptions:
    pvc:
      spec:
        storageClassName: nfs
  networkOptions:
    coreConfig:
      annotations:
        alb.ingress.kubernetes.io/backend-protocol: HTTPS
        alb.ingress.kubernetes.io/certificate-arn: XXXXXX
        alb.ingress.kubernetes.io/group.name: internal
        alb.ingress.kubernetes.io/healthcheck-protocol: HTTPS
        alb.ingress.kubernetes.io/ip-address-type: dualstack
        alb.ingress.kubernetes.io/listen-ports: '[{ "HTTPS" : 443 }]'
        alb.ingress.kubernetes.io/load-balancer-name: k8s-internal-load-balancer
        alb.ingress.kubernetes.io/scheme: internal
        alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-Ext-2018-06
        alb.ingress.kubernetes.io/success-codes: '200'
        alb.ingress.kubernetes.io/target-type: instance
        kubernetes.io/ingress.class: alb
      ingressSpec:
        tls:
          - {}
        rules:
          - host: cryostat.dev.domain.com
            http:
              paths:
                - path: /
                  pathType: Prefix
                  backend:
                    service:
                      name: cryostat
                      port:
                        number: 8181
@rjbaucells rjbaucells added bug Something isn't working needs-triage Needs thorough attention from code reviewers labels Jun 2, 2023
@andrewazores
Copy link
Member

Hi @rjbaucells , thanks for the report.

Could you provide some details about the target application you are trying to connect to? What JDK vendor and version is it running on?

@andrewazores andrewazores added question Further information is requested and removed needs-triage Needs thorough attention from code reviewers labels Jun 2, 2023
@rjbaucells
Copy link
Author

I just updated the original Bug with more information.

App information:

OpenJDK 64-Bit Server VM (17.0.7+7-LTS) for linux-amd64 JRE (17.0.7+7-LTS), built on Apr 14 2023 01:22:09 by "jenkins" with gcc 7.3.1 20180303 (Red Hat 7.3.1-5)

Full Args:

-Xshare:on -XX:MaxRAMPercentage=75 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9091 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.autodiscovery=true -Djava.util.concurrent.ForkJoinPool.common.parallelism=5

Host:

CPU: AMD Zen (HT) SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 SSE4A AMD64

Brand: AMD EPYC 7571, Vendor: AuthenticAMD
Family: Zen (0x17), Model: <unknown> (0x1), Stepping: 0x2
Ext. family: 0x8, Ext. model: 0x0, Type: 0x0, Signature: 0x00800f12
Features: ebx: 0x02040800, ecx: 0xfed83203, edx: 0x178bfbff
Ext. features: eax: 0x00800f12, ebx: 0x40000000, ecx: 0x004001f3, edx: 0x2fd3fbff
Supports: On-Chip FPU, Virtual Mode Extensions, Debugging Extensions, Page Size Extensions, Time Stamp Counter, Model Specific Registers, Physical Address Extension, Machine Check Exceptions, CMPXCHG8B Instruction, On-Chip APIC, Fast System Call, Memory Type Range Registers, Page Global Enable, Machine Check Architecture, Conditional Mov Instruction, Page Attribute Table, 36-bit Page Size Extension, CLFLUSH Instruction, Intel Architecture MMX Technology, Fast Float Point Save and Restore, Streaming SIMD extensions, Streaming SIMD extensions 2, Hyper Threading, Streaming SIMD Extensions 3, PCLMULQDQ, Supplemental Streaming SIMD Extensions 3, Fused Multiply-Add, CMPXCHG16B, Streaming SIMD extensions 4.1, Streaming SIMD extensions 4.2, MOVBE, Popcount instruction, AESNI, XSAVE, OSXSAVE, AVX, F16C, LAHF/SAHF instruction support, Core multi-processor leagacy mode, Advanced Bit Manipulations: LZCNT, SSE4A: MOVNTSS, MOVNTSD, EXTRQ, INSERTQ, Misaligned SSE mode, SYSCALL/SYSRET, Execute Disable Bit, RDTSCP, Intel 64 Architecture, Invariant TSC

Pod limits:

Memory: 512 MiB

@andrewazores
Copy link
Member

andrewazores commented Jun 2, 2023

Thanks. That is JMC 8 I assume?

I believe this bug will be fixed by cryostatio/cryostat-core#223 / cryostatio/cryostat-core#228 . We ran into the same stack trace recently in a different scenario: cryostatio/cryostat-core#222 (comment) .

But the pod is no longer discovered by Cryostat

I need to look into this again. One of my colleagues also ran into a problem where setting some of the JMX-related flags caused the discovery to be broken. If the previous config consistently works, and adding -Dcom.sun.management.jmxremote.rmi.port=9091 -Djava.rmi.server.hostname=127.0.0.1 consistently breaks it, then that's a good starting point for me to dig deeper into the problem.

@rjbaucells
Copy link
Author

rjbaucells commented Jun 2, 2023

Yes, JMC 8

image

@andrewazores
Copy link
Member

I created #1517 to track the target discovery failure with the different JVM flag configurations you shared.

@rjbaucells
Copy link
Author

rjbaucells commented Jun 3, 2023

More information on my tests:

  • To connect to the pod using JMC 8 the following 2 args are required:
-Dcom.sun.management.jmxremote.rmi.port=9091
-Djava.rmi.server.hostname=127.0.0.1
  • Removing -Djava.rmi.server.hostname=127.0.0.1 makes Cryostat discover the pod, but JMC 8 no longer can connect to the JFR on the App. The bug is triggered when trying to start the recording.

@andrewazores
Copy link
Member

cryostatio/cryostat-core#222 / cryostatio/cryostat-core#228 should have fixed the root cause and #1525 will pull that change into the Cryostat server soon, after which this bug should be fixed in the latest development images.

@andrewazores andrewazores moved this from Todo to In Progress in 2.3.1 release Jun 13, 2023
@andrewazores andrewazores moved this from In Progress to Todo in 2.3.1 release Aug 2, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in 2.3.1 release Sep 12, 2023
@github-project-automation github-project-automation bot moved this from Backlog to Done in 2.4.0 release Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high-priority
Projects
No open projects
Status: Done
Status: Done
Development

No branches or pull requests

2 participants