-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugging: mount seems successful but no files seen from bucket #155
Comments
That sounds like an issue with implicit directories: |
Oh, great! I'll read up and test this out again tomorrow - will post an update (and hopefully be able to close the issue). Thank you! |
Okay this is great - making progress! I changed the working directory to be exactly where the workflow is, and then when I do a listing I see the contents!
And then I got a permissions error (still progress!): Traceback (most recent call last):
File "/opt/micromamba/envs/snakemake/bin/snakemake", line 10, in <module>
sys.exit(main())
File "/opt/micromamba/envs/snakemake/lib/python3.10/site-packages/snakemake/__init__.py", line 2945, in main
success = snakemake(
File "/opt/micromamba/envs/snakemake/lib/python3.10/site-packages/snakemake/__init__.py", line 563, in snakemake
logger.setup_logfile()
File "/opt/micromamba/envs/snakemake/lib/python3.10/site-packages/snakemake/logging.py", line 307, in setup_logfile
os.makedirs(os.path.join(".snakemake", "log"), exist_ok=True)
File "/opt/micromamba/envs/snakemake/lib/python3.10/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/opt/micromamba/envs/snakemake/lib/python3.10/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '.snakemake' Snakemake is trying to write a directory # Read/write access for all users
gcs.csi.ofek.dev/dir-mode: "0777"
gcs.csi.ofek.dev/file-mode: "0777" It looks like root has a strange id (the default that the storage uses) 🔒️ Working directory permissions:
total 3
-rw-rw-r-- 1 root 63147 233 Feb 10 22:57 Dockerfile
-rw-rw-r-- 1 root 63147 347 Feb 10 22:57 README.md
-rw-rw-r-- 1 root 63147 1144 Feb 10 22:57 Snakefile
-rw-rw-r-- 1 root 63147 203 Feb 10 22:57 environment.yaml Although when I tried to change that to 0 or the user id, the mount didn't work, period, so I won't mess with that for now. So I double checked the user that needs to run the workflow:
And then tried: ...
gcs.csi.ofek.dev/gid: "1000"
gcs.csi.ofek.dev/uid: "1000"
gcs.csi.ofek.dev/dir-mode: "0755"
gcs.csi.ofek.dev/file-mode: "0664" And then based on this issue I decided to try adding the implicit-dirs flag: gcs.csi.ofek.dev/gid: "1000"
gcs.csi.ofek.dev/uid: "1000"
gcs.csi.ofek.dev/dir-mode: "0755"
gcs.csi.ofek.dev/file-mode: "0664"
implicit-dirs: "true" Neither of those worked - I don't think I'm allowed to change the gid/uid because then the pvc stops working?
Do you have a suggestion for what I should try? In a nutshell, the container starts as root, and we do that for setup of things. THe working directory of the run is the mounted directory. When the workflow is run, it's done by a "flux" user (on behalf by root). So I assume what is happening is that flux doesn't have permission to write there, but I don't totally understand why, because if I set permissions to 0777 for file/directory I'd expect anyone could write there. Also heads up the "mount options" for fuse at this link is 404 https://ofek.dev/csi-gcs/dynamic_provisioning/#extra-flags. Update: opened a PR with a quick fix #156 And I really love being able to define these as annotations! At least for my operator, the user is in control of annotations (in the custom resource definition) and it's nice I don't have to edit / redeploy my operator every time to try something new. Update: also tried derivations of:
No luck yet, going to bring the cluster down for today and looking forward to hearing your feedback! |
I tried running the workflow as root, and it looks like the permissions issue is gone, but it doesn't see any of the data in the subdirectories (nor does it see the subdirectories). I tried doing an "ls" so it would show up, and I also added impicit-dirs to be true, neither made a difference.
|
Did you try setting the |
Interesting - I can try that for the latter case (running as root) but I'm afraid if I change it to the flux user, root will no longer be able to write files to the config map locations (root sets things up for the workflow). |
Still no go - I've tried both derivations of having things owned by the flux user and root, and the closest I can get is to have root own / run everything,
but I'm not actually able to see the subdirectory, it's like it doesn't exist. So the workflow fails.
Where can I ask for more help on this? |
Hiya! I have been trying this a few days, and reached a point I thought I'd ask for help. I basically have an operator that is setting up this driver to mount to an existing Google Storage bucket, and everything seems to be working, but when I list the content of the directory (that should be bound) I don't see anything in the storage. I'll try to walk through what I can see carefully so you can help (and maybe this will help me to debug a bit too!).
Bucket
I have files for a Snakemake workflow in the root of a bucket in a subdirectory - I'm assuming that mounting the root of this bucket would allow me to see the subdirectory too? E.g.,
and in that directory:
Although that's probably not important yet because I can't ls at the root to see the subdirectory. I am wondering if permissions have something to do with it - e.g., I see these options:
But I haven't done something like make everything public because the service account associated with the secret I have given Storage Admin and Storage Object Admin roles. Okay - so that's the storage bucket!
Secret
I created the service account with the above permissions, and followed instructions to generate the secret, e.g., a derivative of
One thing that I wasn't sure about in the instructions is when it says:
I added this as one of the roles:
but I'm not sure what encryption key this is talking about (and maybe this is the bug?) I couldn't figure out what else I was supposed to do from the getting started guide.
PVC and PV
My PVC and PV look okay? Here are the configs - these are created in Go and I'm outputting the kubectl output in yaml, so some of the settings here are defaults.
What sticks out to me as maybe erroneous is that although I made the capacity 25Gi, the spec resource -> requests is for 1Ki?
I'm actually a bit confused about this resource request, because in my code I set this to the same value as the capacity above, which should be 25:
If that is somehow not being set - where do I set it? Is there an annotation I should be using, and regardless, could that be the bug that the resource request is too small?
For my PV, it also looks OK:
Note that a
MiniCluster
is a CRD with an indexed job, a few config maps, etc. It's what creates the indexed job. Should that parent attribute be something else?I can also shell into one of the worker containers (that doesn't exit and fail because it's reliant on the main broker in the indexed job) and I see the volume at
/workflow
but it's empty.And here they are listed:
And what a pod (for the indexed job) sees:
Note that mount looks ok (/workflow should be read write from data)!
And I know that (from the volume standpoint) there are no errors, because the indexed job runs, and the main issue is that it can't find the data files.
So - I think there might be some issue with either permissions, missing metadata somewhere (perhaps for that weird size?) or something to do with an encryption key that I need an instruction for? Any help you might provide would be greatly appreciated! I've brought up my testing cluster a few times in the last couple of days, and I'm trying to find other examples online, but I've reached the point I'm not sure what to try next (and I hope you have some ideas).
The text was updated successfully, but these errors were encountered: