Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openshift Origin server 3.9 Failed to get system container stats for "/user.slice/user-0.slice/session-4.scope": failed to get cgroup stats for "/user.slice/user-0.slice/session-4.scope": failed to get container info for "/user.slice/user-0.slice/session-4.scope": unknown container "/user.slice/user-0.slice/session-4.scope" #19535

Closed
sadahamranawake opened this issue Apr 27, 2018 · 22 comments
Assignees
Labels
component/containers lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2 sig/containers

Comments

@sadahamranawake
Copy link

I'm having issues when starting openshift server. When I start it by
./openshift start (as instructed in the origin docs)
it produces this error all the time. (see ##### Current Result)

Version

openshift v3.9.0+191fece
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Steps To Reproduce
  1. ./openshift start
Current Result

I'm getting two errors.
[One]
E0427 12:19:34.473195 11668 summary.go:92] Failed to get system container stats for "/user.slice/user-0.slice/session-4.scope": failed to get cgroup stats for "/user.slice/user-0.slice/session-4.scope": failed to get container info for "/user.slice/user-0.slice/session-4.scope": unknown container "/user.slice/user-0.slice/session-4.scope"

[Two]
E0427 12:19:06.220993 11668 dnsmasq.go:105] unable to periodically refresh dnsmasq status: The name uk.org.thekelleys.dnsmasq was not provided by any .service files

Expected Result

Openshift all in one cluster up and running and can be accessible by web console

Additional Information

I'm a newbie to openshift/ docker and kube. So I can;t figure out how this error generating. Please help.

@wenchma
Copy link

wenchma commented May 3, 2018

I have the same issue:

##Environment info

openshift-origin-server-v3.9.0-191fece-linux-64bit
# cat /etc/redhat-release 
CentOS Linux release 7.0.1406 (Core)
# uname -a
Linux i-lwfsocpz 4.16.6-1.el7.elrepo.x86_64 #1 SMP Sun Apr 29 16:50:56 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

##Got results

E0503 11:51:02.187057   13590 summary.go:92] Failed to get system container stats for "/user.slice/user-0.slice/session-189.scope": failed to get cgroup stats for "/user.slice/user-0.slice/session-189.scope": failed to get container info for "/user.slice/user-0.slice/session-189.scope": unknown container "/user.slice/user-0.slice/session-189.scope"
E0503 11:51:02.388686   13590 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
W0503 11:51:02.389072   13590 reflector.go:341] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: watch of *v1beta1.PodSecurityPolicy ended with: The resourceVersion for the provided watch is too old.

@dove-young
Copy link

I have the same issue as above. Exactly same openshift version. OS version is RHEL 7.4 64bit

[root@OpenShift]# uname -a
Linux xxx-network-12 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[[root@OpenShift]# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.4 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.4"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.4 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.4:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.4"
[root@OpenShift]# docker version
Client:
 Version:      17.06.2-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 19:59:06 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.2-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 20:00:25 2017
 OS/Arch:      linux/amd64
 Experimental: false
[root@OpenShift]# docker info |grep Cgroup
Cgroup Driver: systemd
WARNING: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior.
         Reformat the filesystem with ftype=1 to enable d_type support.
         Running without d_type support will not be supported in future releases.

Error message goes

E0530 01:23:17.720369    9856 summary.go:92] Failed to get system container stats for "/user.slice/user-0.slice/session-4.scope": failed to get cgroup stats for "/user.slice/user-0.slice/session-4.scope": failed to get container info for "/user.slice/user-0.slice/session-4.scope": unknown container "/user.slice/user-0.slice/session-4.scope"
E0530 01:23:27.800522    9856 summary.go:92] Failed to get system container stats for "/user.slice/user-0.slice/session-4.scope": failed to get cgroup stats for "/user.slice/user-0.slice/session-4.scope": failed to get container info for "/user.slice/user-0.slice/session-4.scope": unknown container "/user.slice/user-0.slice/session-4.scope"
E0530 01:23:37.862243    9856 summary.go:92] Failed to get system container stats for "/user.slice/user-0.slice/session-4.scope": failed to get cgroup stats for "/user.slice/user-0.slice/session-4.scope": failed to get container info for "/user.slice/user-0.slice/session-4.scope": unknown container "/user.slice/user-0.slice/session-4.scope"
E0530 01:23:45.658357    9856 dnsmasq.go:105] unable to periodically refresh dnsmasq status: The name uk.org.thekelleys.dnsmasq was not provided by any .service files

@leoterry-ulrica
Copy link

I have the same issue as above.

Failed to get system container stats for "/user.slice/user-0.slice/session-882.scope": failed to get cgroup stats for "/user.slice/user-0.slice/session-882.scope": failed to get container info for "/user.slice/user-0.slice/session-882.scope": unknown container "/user.slice/user-0.slice/session-882.scope"
E0603 17:36:08.269943 2033 dnsmasq.go:105] unable to periodically refresh dnsmasq status: The name uk.org.thekelleys.dnsmasq was not provided by any .service files

@slawekgh
Copy link

i have the same issue

@jamesdube
Copy link

I facing the same issue, trying to install openshift will try dig around and come up with a solution

@rishikapoor028
Copy link

rishikapoor028 commented Jul 4, 2018

Did anybody manage to get this resolved? I am facing the same issue.

@tonylook
Copy link

I have the same issue, can anyone help?

@Mikedu1988
Copy link

same issue here, also so disappointed

@vladf3000
Copy link

I had the same issue, then used 'sudo oc cluster up' instead, as recommended here
That showed an error about insecure regisry, so I had to update /etc/docker/daemon.json as described here

@hsafe
Copy link

hsafe commented Oct 13, 2018

I have the same issue i.e with the /user.slice/user-0.slice/session-1.scope and dnsmasq error messages when running the binary openshift start...:(
@openshift team can you guys help us out?What above guys suggested to put in /etc/docker/daemon.json did not do any help for me :(

@hsafe
Copy link

hsafe commented Oct 13, 2018

Btw it is a Centos 7.5 minimal install with openshift 3.10.0 and a hint I could say that the service can be successfully started with oc cluster up...but it is not desired production status; and can't help complaining that there is no single source of A-Z installation and set-up of openshift 3.10 with binary out there ...either they are too old-not applicable to the new modern approach-or otherwise it is the kubeshift /oc cluster up which honestly does not get you far...

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2019
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 11, 2019
@lbrigman124
Copy link

Same issue on 3.11 with Centos 7.5 every 10 seconds

origin-node: E0301 09:48:46.768464 6434 summary.go:102] Failed to get system container stats for "/system.slice/origin-node.service": failed to get cgroup stats for "/system.slice/origin-node.service": failed to get container info for "/system.slice/origin-node.service": unknown container "/system.slice/origin-node.service"

@lbrigman124
Copy link

version:
oc v3.11.0+62803d0-1
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://example.com:8443
openshift v3.11.0+d0c29df-98
kubernetes v1.11.0+d4cacc0
uname -a
Linux example.com 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Steps reproduce:
Single node standup with openshift-ansible playbooks deploy_cluster.yml

@icheko
Copy link

icheko commented Mar 5, 2019

Is everyone getting this error when using xfs with d_type = 0 ?

I noticed @dove-young's error. I'm going to try to redeploy with d_type = 1 per docker (https://docs.docker.com/v17.09/engine/userguide/storagedriver/overlayfs-driver/#prerequisites) and see if it gets rid of the stats error.

[root@OpenShift]# docker info
...
WARNING: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior.
         Reformat the filesystem with ftype=1 to enable d_type support.
         Running without d_type support will not be supported in future releases.

@lbrigman124
Copy link

lbrigman124 commented Mar 6, 2019

I'm getting the same error message as listed above but I don't have the above warning message from docker info.
In my case, a reboot of the node cleared the error message.

@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vanloswang
Copy link

I have the same issue on 3.11. you should add some config below kubeletArguments in /etc/origin/node/node-config.yaml.My fix is as following:

# vim /etc/origin/node/node-config.yaml
  runtime-cgroups:
  - /systemd/system.slice
  kubelet-cgroups:
  - /systemd/system.slice

do not forget to restart node service.

vanloswang added a commit to vanloswang/openshift-ansible that referenced this issue Jun 20, 2019
@vanloswang
Copy link

This issue should be fix in openshift-ansible. refer to vanloswang/openshift-ansible@cabe815. I CANNOT make a pull request now because github return Ooops with code 500 to me.

@lejeczek
Copy link

lejeczek commented Apr 8, 2020

I see the same/similar when I try to deploy a cluster on Centos 7.x with:

openshift-ansible-playbooks-3.11.37-1.git.0.3b8b341.el7.noarch
openshift-ansible-roles-3.11.37-1.git.0.3b8b341.el7.noarch
openshift-ansible-docs-3.11.37-1.git.0.3b8b341.el7.noarch
openshift-ansible-3.11.37-1.git.0.3b8b341.el7.noarch

a cluster off KVM guests, 3 nodes of which 1 is a master, and on all three nodes show:
...
0408 18:32:03.587431 30412 summary.go:102] Failed to get system container stats for "/system.slice/origin-node.service": failed to get cgroup stats for "/system.slice/origin-node.service": failed to get container info for "/system.slice/origin-node.service": unknown container "/system.slice/origin-node.service"
E0408 18:32:13.606530 30412 summary.go:102] Failed to get system container stats for "/system.slice/origin-node.service": failed to get cgroup stats for "/system.slice/origin-node.service": failed to get container info for "/system.slice/origin-node.service": unknown container "/system.slice/origin-node.service"
E0408 18:32:23.612086 30412 summary.go:102] Failed to get system container stats for "/system.slice/origin-node.service": failed to get cgroup stats for "/system.slice/origin-node.service": failed to get container info for "/system.slice/origin-node.service": unknown container "/system.slice/origin-node.service"

Above is logged while deploy_cluster process is working and spits out:
...
FAILED - RETRYING: Verify that the catalog api server is running (57 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (56 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (55 retries left).
FAILED - RETRYING: Verify that the catalog api server is running (54 retries left).
...

megian pushed a commit to vshn/openshift-ansible that referenced this issue May 29, 2020
rlaurika pushed a commit to CSCfi/pouta-openshift-cluster that referenced this issue May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/containers lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P2 sig/containers
Projects
None yet
Development

No branches or pull requests