Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman crashes when trying to podman ps with base64 unmarshalling error #1283

Closed
jlebon opened this issue Aug 16, 2018 · 23 comments
Closed

podman crashes when trying to podman ps with base64 unmarshalling error #1283

jlebon opened this issue Aug 16, 2018 · 23 comments
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@jlebon
Copy link
Contributor

jlebon commented Aug 16, 2018

/kind bug

Description

Running podman ps -a gives:

error creating libpod runtime: error retrieving all containers from state: error unmarshalling container c5e0871c45fbc1d5b15d486c146f5022c35b4a46a49d14622e6f90da469c65df config: ffjson error: (base64.CorruptInputError)illegal base64 data at input byte 3 offset=30325 line=1 char=30325

Running podman info also gives a similar error.

Steps to reproduce the issue:

  1. Not quite sure yet. Sounds like it might be a compat issue. I've just upgraded podman:
$ ros db diff 745023f80e89aab70cbb05147d02a78aa4575990169076491583fe6bffe89e3d 1b8e91c40dd34286b93ed898cc8df008545b02434bfd389526a1b0d551856523 | grep podman
  podman 0.7.4-4.git80612fb.fc28 -> 0.8.2.1-1.gitf38eb4f.fc28

Describe the results you received:

Error above.

Describe the results you expected:

No error.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

$ sudo podman version # (aside: this shouldn't require privileges, right?)
Version:       0.8.2.1
Go Version:    go1.10.3
OS/Arch:       linux/amd64
$ rpm -q podman
podman-0.8.2.1-1.gitf38eb4f.fc28.x86_64

Output of podman info:

could not get runtime: error retrieving all containers from state: error unmarshalling container c5e0871c45fbc1d5b15d486c146f5022c35b4a46a49d14622e6f90da469c65df config: ffjson error: (base64.CorruptInputError)illegal base64 data at input byte 3 offset=30325 line=1 char=30325

Additional environment details (AWS, VirtualBox, physical, etc.):

Fedora Silverblue laptop.

$ ros status -b
State: idle
AutomaticUpdates: check; rpm-ostreed-automatic.timer: no runs since boot
Deployments:
● ostree://fedora:fedora/28/x86_64/workstation
                   Version: 28.20180816.0 (2018-08-16 06:44:08)
                BaseCommit: 1b8e91c40dd34286b93ed898cc8df008545b02434bfd389526a1b0d551856523
              GPGSignature: Valid signature by 128CF232A9371991C8A65695E08E7E629DB62FB1
           LayeredPackages: chrome-gnome-shell compat-ffmpeg28 ffmpeg-libs gnome-tweak-tool krb5-workstation libvirt-client libvirt-nss mesa-vulkan-drivers mosh oci-kvm-hook pcsc-lite
                            rpmfusion-free-release strace tmux vim-X11 virt-install virt-manager xsel
@mheon
Copy link
Member

mheon commented Aug 16, 2018

Can you revert to 0.7.4 to get a podman inspect of the container in question?

@mheon mheon added the bug label Aug 16, 2018
@jlebon
Copy link
Contributor Author

jlebon commented Aug 16, 2018

@mheon
Copy link
Member

mheon commented Aug 16, 2018

We changed some aspects of our DB storage backend between 0.7.4 and 0.8.2 - most notably a different JSON decoder. It seems that it's barfing on this particular container, though I'm not sure of an exact reason yet.

A bigger issue this is exposing is that a single invalid container JSON in the DB is causing Podman to completely barf. We need to get a fix in for this.

@jcpowermac
Copy link

Also experiencing this issue

$ sudo rpm-ostree status -b
State: idle
AutomaticUpdates: ex-stage; rpm-ostreed-automatic.timer: last run 7h ago
Deployments:
● ostree://fedora-workstation:fedora/28/x86_64/workstation
                   Version: 28.20180816.0 (2018-08-16 06:44:08)
                BaseCommit: 1b8e91c40dd34286b93ed898cc8df008545b02434bfd389526a1b0d551856523
              GPGSignature: Valid signature by 128CF232A9371991C8A65695E08E7E629DB62FB1
           LayeredPackages: git sway tmux tmux-powerline zsh

Error:

$ sudo podman ps -a
error creating libpod runtime: error retrieving all containers from state: error unmarshalling container b5caaae654aa9d9641d06af308b911613f7024c8980183048bf73ecdf2bc008a config: ffjson error: (base64.CorruptInputError)illegal base64 data at input byte 3 offset=20444 line=1 char=20444

@mheon
Copy link
Member

mheon commented Aug 16, 2018

I'll see about getting a partial fix (allowing Podman to start and be used, but not allowing anything to be done to these containers except removing them) in tomorrow's release. An actual fix for the JSON decode issue will take longer.

@jlebon
Copy link
Contributor Author

jlebon commented Aug 17, 2018

@jcpowermac Note that for the time being you can:

$ curl -LO https://kojipkgs.fedoraproject.org//packages/podman/0.7.4/4.git80612fb.fc28/x86_64/podman-0.7.4-4.git80612fb.fc28.x86_64.rpm
$ rpm-ostree override replace podman-0.7.4-4.git80612fb.fc28.x86_64.rpm

@jcpowermac
Copy link

Thanks @jlebon

@mheon
Copy link
Member

mheon commented Aug 17, 2018

@jlebon @jcpowermac Can I get a /var/lib/containers/storage/libpod/bolt_state.db from someone for debugging? I'd like to dig under the hood and peak at the JSON.

Meanwhile, I am going to merge a fix that will make this nonfatal for the moment. You won't be able to see or interact with the container in question, but Podman should still be usable for creating and managing other containers.

@mheon
Copy link
Member

mheon commented Aug 17, 2018

I think the real fix is to convince Go to use the old JSON decoding algorithm if ffjson fails, but this does not seem to be something that is possible.

@jcpowermac
Copy link

@mheon
bolt_state.db.gz

@mheon
Copy link
Member

mheon commented Aug 17, 2018

Thanks, looking now

@mheon
Copy link
Member

mheon commented Aug 17, 2018

It looks like it's an error trying to unmarshall a net.IP - FFJSON wants to treat it as base64, but the Go unmarshaller just makes it a string.

@mheon
Copy link
Member

mheon commented Aug 17, 2018

However, trying on my local Podman - I'm producing basically identical fields in the DB when encoding net.IP?

@mheon
Copy link
Member

mheon commented Aug 17, 2018

Oh, wait - I'm testing with 0.8.1 which doesn't have the FFJSON decode changes...

@mheon
Copy link
Member

mheon commented Aug 17, 2018

Can replicate locally by upgrading 0.8.1 -> 0.8.2 with a container with a DNS server manually set

@mheon
Copy link
Member

mheon commented Aug 17, 2018

I think it's actually Base64'ing the net.IP struct while the standard-library JSON implementation is prettyprinting the IP as a string.

Easiest solution would be to store the IP as a string in the database. More correct solution would be to store the IP as a custom type based on net.IP implementing a correct Mashal() function.

Not sure why ffjson is skipping the Marshal() function of the actual struct in question though.

@mheon
Copy link
Member

mheon commented Aug 17, 2018

It might be more correct to fix this in FFJSON - cause net.IP to generate proper strings, similar to the standard library. I'll poke at this on Monday.

@mheon
Copy link
Member

mheon commented Aug 17, 2018

The answer here seems to be that FFJSON does not respect Marshal() and MarshalText() methods on types that are not structs, which is probably faster than what encoding/json does but is also incompatible in cases like net.IP (a []byte that implements MarshalText())

@mheon
Copy link
Member

mheon commented Aug 17, 2018

Potential alternative: switch from FFJSON to easyjson - Easyjson seems to respect the MarshalText() method which will resolve this problem.

@mheon
Copy link
Member

mheon commented Aug 20, 2018

I think we can fix this in FFJSON without much difficulty. I'll open a PR upstream and see about potentially switching us over to a patched version until the fix is merged.

@mheon
Copy link
Member

mheon commented Aug 20, 2018

I've spent several hours here, and I'm not really any closer to making ffjson do what we want. Going to investigate easyjson again.

@mheon
Copy link
Member

mheon commented Aug 22, 2018

Opened #1322 to fix

@mheon
Copy link
Member

mheon commented Aug 28, 2018

This should be fixed via #1322

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

3 participants