Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verification : verifying "adrianmo/pravega-operator:pr-145-1" build to validate Bookie restarted with exception "java.lang.OutOfMemoryError: Java heap space" Issue #150

Closed
vedanthh opened this issue Mar 29, 2019 · 3 comments
Assignees
Labels
area/testing Issue related to a test case or testing framework kind/bug Something isn't working status/validated

Comments

@vedanthh
Copy link

vedanthh commented Mar 29, 2019

Verification : verifying adrianmo/pravega-operator:pr-145-1 build to validate Bookie restarted with exception "java.lang.OutOfMemoryError: Java heap space" Issue - #141 #144 PR #145

Environment details: PKS / K8 with medium cluster:

3 master: xlarge: 4 CPU, 16 GB Ram, 32 GB Disk
5 worker: 2xlarge: 8 CPU, 32 GB Ram, 64 GB Disk
Tier-1 storage is from VSAN datastore
Tier-2 storage curved on NFS Client provisioner using Isilon as backend
Pravega version: zk-closed-client-issue-0.5.0-2161.60655bf
Zookeeper Operator : pravega/zookeeper-operator:0.2.1
Pravega Operator: adrianmo/pravega-operator:pr-145-1
@vedanthh vedanthh changed the title Verification : verifying adrianmo/pravega-operator:pr-145-1 build to validate Bookie restarted with exception "java.lang.OutOfMemoryError: Java heap space" Issue - #141 #144 PR #145 Verification : verifying "adrianmo/pravega-operator:pr-145-1" build to validate Bookie restarted with exception "java.lang.OutOfMemoryError: Java heap space" Issue Mar 29, 2019
@vedanthh
Copy link
Author

Observing Bookie restart during longevity run with adrianmo/pravega-operator:pr-145-1 build after ~10 hrs with custom resource requests and limits.

# kubectl describe po/pravega-bookie-0
    State:          Running
      Started:      Tue, 26 Mar 2019 18:38:49 -0400
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Tue, 26 Mar 2019 07:47:32 -0400
      Finished:     Tue, 26 Mar 2019 18:38:49 -0400
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:      1
      memory:   2Gi
# kubectl get po
NAME                                                 READY     STATUS    RESTARTS   AGE
pravega-bookie-0                                     1/1       Running   1          16h
pravega-bookie-1                                     1/1       Running   1          16h
pravega-bookie-2                                     1/1       Running   0          16h
pravega-bookie-3                                     1/1       Running   1          16h
pravega-operator-778b8c9485-tnhjl                    1/1       Running   1          16h
pravega-pravega-controller-c67d6b758-gcmht           1/1       Running   1          16h
pravega-pravega-controller-c67d6b758-j6btb           1/1       Running   1          16h
pravega-pravega-segmentstore-0                       1/1       Running   0          16h
pravega-pravega-segmentstore-1                       1/1       Running   0          16h
pravega-pravega-segmentstore-2                       1/1       Running   0          16h
pravega-pravega-segmentstore-3                       1/1       Running   0          16h
pravega-zk-0                                         1/1       Running   0          16h
pravega-zk-1                                         1/1       Running   0          16h
pravega-zk-2                                         1/1       Running   0          16h
shaka-zulu-nfs-client-provisioner-59d7f8f84c-xvktb   1/1       Running   0          5d
zookeeper-operator-685bfcbbc5-tjtjr                  1/1       Running   0          5d

Will be validating issue with adrianmo/pravega-operator:pr-145-2 #151

@adrianmo
Copy link
Contributor

@vedanthh this issue was reported in #141 and fixed in #145. Please use the latest official operator, i.e., pravega/pravega-operator:0.3.2, which contains the fix for this.

@vedanthh
Copy link
Author

Longevity Ran fine for ~20 hours with adrianmo/pravega-operator:pr-145-2 still no bookie restart and OOM observed , Hence closing the issue.

@sumit-bm sumit-bm added kind/bug Something isn't working area/testing Issue related to a test case or testing framework status/validated labels Mar 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Issue related to a test case or testing framework kind/bug Something isn't working status/validated
Projects
None yet
Development

No branches or pull requests

3 participants