Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - conda environments fail to build #1409

Closed
iameskild opened this issue Aug 19, 2022 · 4 comments · Fixed by #1419
Closed

[BUG] - conda environments fail to build #1409

iameskild opened this issue Aug 19, 2022 · 4 comments · Fixed by #1419

Comments

@iameskild
Copy link
Member

OS system and architecture in which you are running QHub

Ubuntu on GCP

Expected behavior

Creating a conda environment in the filesystem namespace (from the qhub-config.yaml) or my personal namespace should build my environment (provided that it is a valid env).

Actual behavior

When a submitting a conda environment (in the filesystem namespace or in my personal namespace), it will fail to build with the following error message:

(example of build failing in the filesystem ns)

Looking for: ['python==3.9.13', 'ipykernel==6.15.1', 'ipywidgets==7.7.1', 'qhub-dask==0.4.3', 'param==1.12.2', 'python-graphviz==0.20.1', 'matplotlib==3.3.2', 'panel==0.13.1', 'voila==0.3.6', 'streamlit==1.10.0', 'dash==2.6.1', 'cdsdashboards-singleuser==0.6.2']


Preparing transaction: ...working... failed

CondaError: Unable to create prefix directory '/home/conda/filesystem/7f7f767440c1987bc8eeacb1741b638c71c44f30ffb25d9e0503b6f2f4d9fe11-20220819-012441-874213-109-cds'.
Check that you have sufficient permissions.

How to Reproduce the problem?

Build any valid conda env from the conda-store endpoint or by adding it to the qhub-config.yaml, and it will fail to build.

Command output

No response

Versions and dependencies used.

qhub version: v0.4.4rc3
conda-store version: v0.4.9 or v0.4.11

Compute environment

No response

Integrations

No response

Anything else?

No response

@iameskild iameskild added the type: bug 🐛 Something isn't working label Aug 19, 2022
@iameskild
Copy link
Member Author

@costrouc
Copy link
Member

@iameskild this has to do with a change that I made in the container default uid/gid. I'll provide a fix tomorrow morning

@iameskild
Copy link
Member Author

@costrouc @viniciusdc moving our slack conversation here for posterity.

CO: Issue is that conda-store in roughly 0.4.5+ now runs as user 1000 and not 0. So it no longer has 
permissions in that  folder. Not sure what the best route is. conda-store long term should not be running 
as root. I might chmod + chown that directory for conda-store

VC: I would say that long term each namespace/environemt should use a permission uuid based on 
keycloak permission system (though that might be a lot harder). For now, some kind of auto migration 
system from conda-store itself to move any environments and update its permission would work right?

VC:  > chmod + chown that directory for conda-store
Could we have a conda-store group, is that feasible? then we don't need to worry about user permissions

I think it makes sense to restrict the conda-store's permissions.

As for how to go about ensuring we this isn't a breaking change, could we add an initContainer as follows to the conda-store worker deployment:

      initContainers:
      - command:
        - /bin/chown
        - -R
        - "1000:1000"
        - /home/conda
        image: busybox:latest
        name: chmod-er
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /home/conda
          mountPropagation: None
          name: storage

I've tested this today on quansight-beta.qhub.dev and it does appear to correctly change the permissions for the existing files/folders under /home/conda:

drwxr-xr-x 13 1000 1000  4096 Aug 22 23:03 eeriksen@quansight.com

However I run into another permissions issue whenever I try to create a new env. The "default" gid still appears to be root:

drwxrwxr-x 13 1000 root 4096 Aug 22 23:04 6199e7747550f21efc268c887c71da3fc46117fe8f3b82876b2cfdfb14db7020-20220822-230259-522581-122-eae_test_5

Then when conda-store tries to change ownership, the following issue arises:

Logs from the conda-store-worker:

chown: changing ownership of '/home/conda/eeriksen@quansight.com/6199e7747550f21efc268c887c71da3fc46117fe8f3b82876b2cfdfb14db7020-20220822-230259-522581-122-eae_test_5': Operation not permitted
2022-08-22 23:04:15,296: WARNING/ForkPoolWorker-2] [CondaStoreWorker] ERROR | Command '['chown', '-R', '1000:1000', '/home/conda/eeriksen@quansight.com/6199e7747550f21efc268c887c71da3fc46117fe8f3b82876b2cfdfb14db7020-20220822-230259-522581-122-eae_test_5']' returned non-zero exit status 1.

I was able to get around this by adding fsGroup: 1000 to the pod's securityContext:

securityContext:
  fsGroup: 1000

@iameskild
Copy link
Member Author

The above solution works when updating existing deployments but fails when new users sign in and for fresh deployments. Although the deployment scripts complete successfully, the trouble is that new conda envs can't be created due to permissions issues. This is due to how the initContainers (added by the KubeSpawner) set the permissions for the mounted volumes (specifically the conda-store-mount), see here.

Changing this permission to anything other than root will then break existing deployments. A solution might be to add another initContainer which correctly sets the permissions for all files/folders in the /home/conda before the others are called initContainers are run. The last hurdle for this solution is making sure that this new initContainer is the first one that is executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants