-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase journald rate limiter when running test-end-to-end-docker.sh #13669
Conversation
I've removed that offending deprecation warning using the recommended/new approach for |
Flake #13757 |
@ncdc I'm baffled by this error. The case is when I increase |
I'm off today but will look either this weekend or Monday.
…On Fri, Apr 14, 2017 at 11:45 AM Maciej Szulik ***@***.***> wrote:
@ncdc <https://github.com/ncdc> I'm baffled by this error. The case is
when I increase RateLimitInterval and RateLimitBurst in journald.conf and
get to oc rsh dc/docker-registry cat config.yml the end-to-end test will
hang waiting for the connection. Looking closer at the instance I've
noticed that generally docker will hang after some time. Usually, only
restart will help. Any ideas/thoughts what might be causing this? Pointers
where should I look for it? I can't reproduce this manually when running oc
cluster up and trying oc rsh.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13669 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYiY1bON7CA-nvh534qY1i6ce09D8ks5rv5SJgaJpZM4M2m4Q>
.
|
Sure, sure, no rush :) |
I wonder if we're starting to run into kubernetes/kubernetes#43922 |
I'll try to spin up a new env with that fix in, I'm hitting this quite consistently with this PR. |
I'm doubtful that's the issue but we'll see.
…On Tue, Apr 18, 2017 at 7:46 AM Maciej Szulik ***@***.***> wrote:
I'll try to spin up a new env with that fix in, I'm hitting this quite
consistently with this PR.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13669 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYqabKG0FsnhhUzuaFrQpCCPySFVHks5rxKKSgaJpZM4M2m4Q>
.
|
Nope, that PR didn't help with the problem 😢 My last resort is the current rebase, I guess. |
Can you get stack dumps from openshift and oc when this happens?
…On Wed, Apr 19, 2017 at 7:01 AM Maciej Szulik ***@***.***> wrote:
Nope, that PR didn't help with the problem 😢 My last resort is the
current rebase, I guess.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13669 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYm0v7-b1bp5tM85myGi0Hh3ePkoKks5rxelogaJpZM4M2m4Q>
.
|
Nope, the rebase does not help here, either. Sigh... @smarterclayton do you have any ideas why |
I really think we need stack dumps from openshift, oc, and possibly docker
when this happens. I don't remember the name of the environment variable
(maybe DEBUG?), but it's possible to enable the pprof http endpoint for
docker just like we have in openshift.
…On Wed, Apr 19, 2017 at 8:07 AM, Maciej Szulik ***@***.***> wrote:
Nope, the rebase does not help here, either. Sigh...
@smarterclayton <https://github.com/smarterclayton> do you have any ideas
why oc rsh dc/docker-registry cat config.yml in end-to-end test will hang
after increasing RateLimitInterval and RateLimitBurst in journald.conf?
I've tried manually changing those values and running oc rsh and it
worked perfectly, but for some reason it will fail when part of end-to-end.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13669 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYr4nswmjFJOvGZezVhRJUxFZWIo5ks5rxfj-gaJpZM4M2m4Q>
.
|
Is there something waiting for a newline in the output? Docker hang or
origin hang?
On Wed, Apr 19, 2017 at 8:19 AM, Andy Goldstein <notifications@github.com>
wrote:
… I really think we need stack dumps from openshift, oc, and possibly docker
when this happens. I don't remember the name of the environment variable
(maybe DEBUG?), but it's possible to enable the pprof http endpoint for
docker just like we have in openshift.
On Wed, Apr 19, 2017 at 8:07 AM, Maciej Szulik ***@***.***>
wrote:
> Nope, the rebase does not help here, either. Sigh...
>
> @smarterclayton <https://github.com/smarterclayton> do you have any
ideas
> why oc rsh dc/docker-registry cat config.yml in end-to-end test will hang
> after increasing RateLimitInterval and RateLimitBurst in journald.conf?
> I've tried manually changing those values and running oc rsh and it
> worked perfectly, but for some reason it will fail when part of
end-to-end.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#13669 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/
AAABYr4nswmjFJOvGZezVhRJUxFZWIo5ks5rxfj-gaJpZM4M2m4Q>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13669 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p00aAYU4MuOAkwOQVSngZ4IAsBLcks5rxfvbgaJpZM4M2m4Q>
.
|
The symptoms are as follows, the end-to-end will hang (not reacting to any sort of input), openshift seems to be working correctly. After some time docker daemon will hang (not reacting to any docker commands), and only restarting the daemon helps with it. I've collected some stack dumps and I'll be going through them today, with hope to find something useful. |
Getting stable tests is p0. |
b816dd0
to
b180270
Compare
It looks like the problem previously was that docker does not react well to restarting journald, so I've added restarting docker after restarting journald, as well. Seems to be working, let's give it a try. |
Fixed package typo in UPSTEAM cherry-pick commit. Waiting for green tests. |
Flake #14434. |
hack/test-end-to-end-docker.sh
Outdated
${USE_SUDO:+sudo} systemctl restart systemd-journald.service | ||
# Docker has "some" problems when journald is restarted, so we need to | ||
# restart docker, as well. | ||
${USE_SUDO:+sudo} systemctl restart docker.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fairly invasive. But I don't have a better place for it.
Also, please ensure that you're actually on a system with systemctl, i.e. macs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed, I've added os::util::ensure::system_binary_exists 'systemctl'
checks around these calls.
Removing LGTM, this breaks macs. |
Flke #10773. |
Evaluated for origin test up to b34ba02 |
LGTM |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2003/) (Base Commit: 6473593) |
Going to the head of the line |
Does not conflict with the PR being merged right now. Moving to the head of the line to fix flakes. |
Nice |
\o/ |
Fixes #12558 by increasing rate limiter values while running test-end-to-end-docker.sh. Additionally, this re-enables the tests that were disabled due to that error.
@stevekuznetsov another try...
@mfojtik fyi