Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v3.2] logs: k8s-file: restore poll sleep #10743

Closed
wants to merge 79 commits into from

Conversation

vrothberg
Copy link
Member

Commit 84b55ee attempted to fix a race waiting for the container
died event. Previously, Podman slept for duration of the polling
frequency which I considered to be a mistake. As it turns out, I was
mistaken since the file logger will, in fact, NOT read until EOF and
then stop logging but stop logging immediately after it woke up.

[NO TESTS NEEDED] as the race condition cannot be hit reliably.

Fixes: #10675
Signed-off-by: Valentin Rothberg rothberg@redhat.com

mheon and others added 30 commits May 18, 2021 09:42
One of the worst parts of a Podman release is writing the release
notes. It requires manually going through all merged commits
since the last release, figuring out what was actually done, and
writing a small blurb about what was fixed. The worst part of
this is the difficulty in finding the commits that were actually
included in previous releases - our extensive backports to prior
releases mean that there are usually dozens of commits that were
included in a prior release, but do not have a matching SHA (as
the original author did not do the backport, and often the commit
required massaging to cherry-pick in).

This script automates the job of finding commits in one release
branch that are not in another, with filtering to remove most
cherry-picked commits. It makes my life a lot easier during
releases, so I figured I'd include it in hack/ so anyone else
stuck with the enjoyable task of writing release notes can have a
slightly easier life.

The script is written in absolutely terrible Ruby and its
performance is absolutely terrible, but you only need to run it
once per major release and a 30-second wait to generate the list
of commits to include isn't bad.

Signed-off-by: Matthew Heon <mheon@redhat.com>
Missing the updated vendor bits, but the vendor dance is not yet
done.

Signed-off-by: Matthew Heon <mheon@redhat.com>
Last PR before 3.2.0-RC2

Signed-off-by: Matthew Heon <mheon@redhat.com>
Signed-off-by: Matthew Heon <mheon@redhat.com>
Update containers common to the latest HEAD.  Some bug fixes in libimage
forced us to have a clearer separation between ordinary images and
manifest lists.  Hence, when looking up manifest lists without recursing
into any of their instances, we need to use `LookupManifestList()`.

Also account for some other changes in c/common (e.g., the changed order
in the security labels).

Further vendor the latest HEAD from Buildah which is required to get the
bud tests to pass.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>

<MH: Stripped out vendor bits - just left remaining changes>

Signed-off-by: Matthew Heon <mheon@redhat.com>
[v3.2] Update vendors of container projects
Signed-off-by: Matthew Heon <mheon@redhat.com>
Signed-off-by: Matthew Heon <mheon@redhat.com>
When attempting to copy files into and out of running containers
within the host pidnamespace, the code was attempting to join the
host pidns again, and getting an error. This was causing the podman
cp command to fail. Since we are already in the host pid namespace,
we should not be attempting to join.  This PR adds a check to see if
the container is in NOT host pid namespace, and only then attempts to
join.

Fixes: containers#9985

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
A conversation on the customer portal suggests that to add an extra note
about the requirement of XDG_RUNTIME_DIR to be set.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
libimage now supports events which `libpod.Runtime` now uses for image
events.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>

<MH: Removed vendor bits, kept other changes>

Signed-off-by: Matthew Heon <mheon@redhat.com>
[NO TESTS NEEDED]

* Log the routing table output at Trace vs. Debug level. Reduce noise
  in debugging output.
* Tweak SDNotify message to report Warn when it fails. Previously
  failures were silent.

Signed-off-by: Jhon Honce <jhonce@redhat.com>
We have race conditions where a container can be removed
by two different processes when running podman --remove rm.

It can be cleaned up in the API or by the conmon executing
podman container cleanup.

When we fail to remove a container that does not exists we should
not be printing errors or warnings, we should just debug the fact.

[NO TESTS NEEDED] Since this is a race condition it is difficult to
test.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: flouthoc <flouthoc.git@gmail.com>
Fixes: containers#9893

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
ErrOCIRuntimeNotFound error is misleading. Try to make it more
understandable to the user that the OCI Runtime IE crun or runc is not
missing, but the command they attempted to run within the container is
missing.

[NO TESTS NEEDED] Regular tests should handle this.

Fixes: containers#10432

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Fixes: containers#10444

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Point to containers-certs.d(5) for details on the default paths, the
lookup logic and the structure of these directories.  Previously, the
man pages stated that the default path would be in `/etc/containers/...`
which is not entirely and a red herring for users (see containers#10116).

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
All of the tests has an assumption that RunLsContainer and RunLsContainerInPod completes
the container before returning.  But since the container is running
in back ground mode, the container could be still running before tools
attempt to remove it. Removing the "-d" from the command fixes the
container to match the assumption.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Make sure all containers exit after start

There is a race condition in that container could still be running when
we attempt to remove them.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Matthew Heon <mheon@redhat.com>
Signed-off-by: Matthew Heon <mheon@redhat.com>
Signed-off-by: Matthew Heon <mheon@redhat.com>
* pull: don't resolve short names on explicit docker:// reference

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
* store: ReloadIfChanged propagates errors from Modified()
* store: load additional image stores once
* store: fix graphLock reload

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Luap99 and others added 22 commits June 11, 2021 11:08
Podman machine is only intended for amd64 and arm64 architectures, set
the correct buildtags so that the `pkg/machine`, `pkg/machine/qemu` and
`pkg/machine/libvirt` packages compile correctly.

[NO TESTS NEEDED]

Fixes containers#10625

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
crun 0.20.1 changed an error message that we relied on. Deal
with it by accepting the old and new message.

Also (unrelated): sneak in some doc fixes to get rid of
nasty go-md2man warnings that have crept into man pages.

Signed-off-by: Ed Santiago <santiago@redhat.com>

<MH: Fixed cherry-pick conflicts>

Signed-off-by: Matthew Heon <mheon@redhat.com>
podman-remote build has to handle multiple different locations
for the Containerfile.  Currently this works in local mode but not
when using podman-remote.

Fixes: containers#9871

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
- fix network filters
- add prune filters
- pod create --share support comma separated namespaces

[NO TESTS NEEDED]

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Image prune --filter is fully implemented in the api, http api
yet not connected with the cli execution. User trying to use
filters does not see the effect. This commit adds glue code to enable
possiblity of using --filter in prune in the cli execution.

Signed-off-by: Jakub Guzik <jakubmguzik@gmail.com>
Fix a race in the k8s-file logs driver.  When "following" the logs,
Podman will print the container's logs until the end.  Previously,
Podman logged until the state transitioned into something non-running
which opened up a race with the container still running, possibly in
the "stopping" state.

To fix the race, log until we've seen the wait event for the specific
container.  In that case, conmon will have finished writing all logs to
the file, and Podman will read it until EOF.

Further tweak the integration tests for testing `logs -f` on a  running
container.  Previously, the test only checked for one of two lines
stating that there was a race.  Indeed the race was in using `run --rm`
where a log file may be removed before we could fully read it.

Fixes: containers#10596
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
When 127.0.0.53 is the only nameserver in /etc/resolv.conf assume
systemd-resolved is used. This is better because /etc/resolv.conf does
not have to be symlinked to /run/systemd/resolve/stub-resolv.conf in
order to use systemd-resolved.

[NO TESTS NEEDED]

Fixes: containers#10570

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
The api doc used wrong response examples for both the compat and libpod
network prune endpoints. Change the doc so that it matches the actual
return values. Also fix the endpoints to return an empty array instead
of null when no networks are removed.

[NO TESTS NEEDED]

Fixes: containers#10564

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
If a client closes the http connection during image pull, the
service should cancel the pull operation.

[NO TESTS NEEDED] I have no idea how we could test this reliable.

Fixes: containers#7558

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Certain event meta data was lost when converting the remote events to
libpod events and vice versa.  Enable the skipped system tests for
remote.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
There is race condition in the remote client attach logic. Because the
resize api call was handled in an extra goroutine the container was
started before the resize call happend. To fix this we have to call
resize in the same goroutine as attach. When the first resize is done
start a goroutine to listen on SIGWINCH in the background and resize
again if the signal is received.

Fixes containers#9859

Signed-off-by: Paul Holzinger <pholzing@redhat.com>

<MH: Fixed cherry-pick conflicts>

Signed-off-by: Matthew Heon <mheon@redhat.com>
The endpoint returns an array and not a single entry.

Fixes containers#10494

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Fix a bug in remote events where only one event would be sent if when
streaming is turned off.  The source of the bug was that the handler
attempted to implement the streaming logic and did it wrong.  The fix is
rather simple by removing this logic from the handler and let the events
backend handle streaming.

Fixes: containers#10529
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
a9cb824 changed the expectations of the
dockerfile parameter to be json data however it's a string. In order to
support both, let's attempt json and fall back to a string if the json
parsing fails.

Closes containers#10660

Signed-off-by: Alex Schultz <aschultz@redhat.com>
This reverts commit 9647d88. We
reverted the API bump (was a mistake, should have been left at
3.1.0) and now we need to revert the test changes.

Signed-off-by: Matthew Heon <mheon@redhat.com>
Network connect/disconnect has to call the cni plugins when the network
namespace is already configured. This is the case for `ContainerStateRunning`
and `ContainerStateCreated`. This is important otherwise the network is
not attached to this network namespace and libpod will throw errors like
`network inspection mismatch...` This problem happened when using
`docker-compose up` in attached mode.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Matthew Heon <mheon@redhat.com>
Also, revert minimum API version for the Libpod remote API to
v3.1.0.

Signed-off-by: Matthew Heon <mheon@redhat.com>
Signed-off-by: Matthew Heon <mheon@redhat.com>
Fix the suprious "Error: nil" messages.  Also add some more context to
logged error messages which makes error sources more obvious.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Commit 84b55ee attempted to fix a race waiting for the container
died event.  Previously, Podman slept for duration of the polling
frequence which I considerred to be a mistake.  As it turns out, I was
mistaken since the file logger will, in fact, NOT read until EOF and
then stop logging but stop logging immediately _after_ it woke up.

[NO TESTS NEEDED] as the race condition cannot be hit reliably.

Fixes: containers#10675
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 21, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 21, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 21, 2021

@vrothberg: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2021
@vrothberg vrothberg closed this Jun 21, 2021
@vrothberg
Copy link
Member Author

Wrong branch.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podman logs -f: misses final container output