Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pebble silently fails on a readOnlyRootFilesystem #462

Closed
berkayoz opened this issue Aug 1, 2024 · 4 comments · Fixed by #481
Closed

Pebble silently fails on a readOnlyRootFilesystem #462

berkayoz opened this issue Aug 1, 2024 · 4 comments · Fixed by #481
Assignees
Labels
Bug An undesired feature ;-)

Comments

@berkayoz
Copy link
Member

berkayoz commented Aug 1, 2024

Hello,

Kubernetes has a securityContext option called readOnlyRootFilesystem that makes a container's filesystem read-only. In this scenario, Pebble tries to create some state like /var/lib/pebble/default/.pebble.state.2NLvJGDxF8jn~ and silently fails/hangs at startup. The container reports as running, and no logs indicating an error are communicated back(since I assume pebble couldn't start properly).

A good number of projects on Kubernetes make use of this option by default in their manifests/helm charts. This is a pain point in the drop-in docker image replacement story of rocks. We are currently trying to work around this by disabling this option, which might not be accepted/appreciated by users.

Your help is much appreciated, thank you!

cc: @cjdcordeiro

@benhoyt
Copy link
Contributor

benhoyt commented Aug 2, 2024

I am concerned that it fails/hangs silently at startup. That's definitely problematic, and we can look into that (probably next week). It should fail loudly, or at the very least log an error.

In terms of the readonly issue, Pebble does need a non-readonly directory to write the state file into. Can you set the PEBBLE environment variable to point to a writable directory?

@benhoyt benhoyt added Bug An undesired feature ;-) 24.10 labels Aug 2, 2024
@benhoyt
Copy link
Contributor

benhoyt commented Aug 2, 2024

I've confirmed that we're hanging silently on startup when writing the state file fails (one can reproduce this easily by modifying overlordStateBackend.Checkpoint to always return an error, or by creating a directory only accessible by root, and then running Pebble as a non-root user).

@berkayoz
Copy link
Member Author

Thank you for triaging the issue @benhoyt

One possible workaround would be to mount an emptyDir volume from Kubernetes and point pebble there, this is still not ideal for drop-in replacement scenarios however it might be a more acceptable workaround to keep the security benefits. We'll try and see if this works. I'll keep updating here on the progress, thanks again!

@IronCore864
Copy link
Contributor

IronCore864 commented Aug 12, 2024

I've provided a quick fix to check if the state dir is writable before creating the backend.

Here are some of my other thoughts coming from debugging this issue:

  1. Hanging silently: It isn't stuck; there is a default 5 min timeout in State.Unlock. However, during this period, there is no logging at all. From the user's perspective, it seems it hangs forever, which is a bad user experience. In fact, after the time out, it panics, which I think is OK because it probably can't be recovered automatically later. To improve user experience, I suggest that we at least add a notice log if the error is related to os.PathError (permission denied).
  2. During the timeout period, because the sleeping function does not actively check for interrupts, the console doesn't respond to Ctrl+C, which can be annoying for users because all they can do is to exit the current terminal and start a new one. I fixed this issue here, although it might not be needed any more since we already had a writable check before State.Unlock.
  3. Now we check the pebble dir before using it, and we slightly improved the user experience. The ultimate question for me is: When creating the backend and before using it, should the backend itself check the state dir is writable?
    • From the standpoint of defensive programming, it is beneficial to add this check, for example, in State.New and in State.Unlock maybe? But in our case, since State.Unlock and backend.Checkpoint are probably called quite often, if we add a check every time we run it, it's too much.
    • From the standpoint of the backend, logically speaking, the backend only handles read/write, and it's the user's responsibility (in this case, the user is overlord, I think, which creates the backend) to make sure the destination exists. For example, in Terraform, if we specify an AWS S3 bucket as the backend, the backend assumes it exists and is readable/writable. The responsibility of making sure the destination exists lies in the users (modules that call the backend).

So, I did not add any checks on the last point.

Please review the PR @benhoyt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug An undesired feature ;-)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants