Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requester pays with --mount #275

Closed
ccario83 opened this issue Oct 17, 2023 · 5 comments
Closed

Requester pays with --mount #275

ccario83 opened this issue Oct 17, 2023 · 5 comments

Comments

@ccario83
Copy link

I'm trying to use dsub's --mount parameter to mount a read-only bucket hosted on GCP. This bucket is not owned by me and the owner has requester pays set. I've provided --user-project, which I believe tells dsub to use this parameter and billable project with all relevant google calls. Is this information not being provided to gcsfuse?

My command (some parameters omitted for clarity):

dsub \
        --provider google-cls-v2 \
        --user-project "${PROJECT}"\
        --project "${PROJECT}"\
  ...
        --mount FILES=gs://fc-aou-datasets-controlled 

Output seems to indicate that the billable project (-u) isn't provided to gcsfuse.

Opening GCS connection...
Using mount point: /mnt/data/mount/gs/fc-aou-datasets-controlled

WARNING: gcsfuse invoked as root. This will cause all files to be owned by
root. If this is not what you intended, invoke gcsfuse as the user that will
be interacting with the file system.

Opening bucket...
Mounting file system...
WARNING, bucket doesn't appear to work:  googleapi: Error 400: Bucket is a requester pays bucket but no user project provided., required

Thanks for your attention

@mbookman
Copy link
Contributor

Thanks for reporting this @ccario83 !

Yes, it looks like dsub needs to be updated to pass the gcsfuse --billing-project flag when the dsub --user-project flag has been provided. The change would be here:

      actions_to_add.extend([
          google_v2_pipelines.build_action(
              name='mount-{}'.format(bucket),
              enable_fuse=True,
              run_in_background=True,
              image_uri=_GCSFUSE_IMAGE,
              mounts=[mnt_datadisk],
              commands=[
                  '--implicit-dirs', '--foreground', '-o ro', bucket,
                  os.path.join(_DATA_MOUNT_POINT, mount_path)
              ]),

and would look something like:

      mount_command = ['--billing-project', user_project] if user_project else []
      mount_command.extend([
        '--implicit-dirs', '--foreground', '-o ro', bucket,
        os.path.join(_DATA_MOUNT_POINT, mount_path)
      ])

      actions_to_add.extend([
          google_v2_pipelines.build_action(
              name='mount-{}'.format(bucket),
              enable_fuse=True,
              run_in_background=True,
              image_uri=_GCSFUSE_IMAGE,
              mounts=[mnt_datadisk],
              commands=mount_command),

We'll look to get that into the next release.

@ccario83
Copy link
Author

Awesome, thank you for the support!

@KarlKeat
Copy link

Just to add to this, I manually inserted a "--billing-project" flag with my project ID hardcoded in into that section of the dsub code and was able to successfully mount the same bucket (fc-aou-datasets-controlled) to a dsub VM. However, trying to list the contents of the directory leads to an Input/Output error. I'm assuming it's permissions related, but I'm not really sure how to proceed.

2023-11-13 20:18:20 INFO: mkdir -m 777 -p /mnt/data/mount/gs/fc-aou-datasets-controlled
Opening GCS connection...
Using mount point: /mnt/data/mount/gs/fc-aou-datasets-controlled
Opening bucket...

WARNING: gcsfuse invoked as root. This will cause all files to be owned by
root. If this is not what you intended, invoke gcsfuse as the user that will
be interacting with the file system.

Mounting file system...
File system has been successfully mounted.
ls: reading directory /mnt/data/mount/gs/fc-aou-datasets-controlled: Input/output error

@mbookman
Copy link
Contributor

Thanks for the report @KarlKeat.

I think that you are correct that there's something permissions related, though I too am not sure quite what it might be. I've tested the proposed change myself on a requester pays bucket (all outside of the AoU environment), and access seems to work fine.

The one thing I was going to recommend was in your --script or --command to test:

  • gsutil ls gs://fc-aou-datasets-controlled, and
  • gsutil cp gs://fc-aou-datasets-controlled/<some-file> .

I'd expect gsutil to be better at surfacing the underlying permissions issues than gcsfuse.

But do you have gsutil in your --image?

Lastly, I can suggest, from this issue adding the --debug_fuse flag in the hope of surfacing more detailed errors.

@mbookman
Copy link
Contributor

Hi @ccario83 and @KarlKeat !

We have released 0.4.10, which includes support for passing the user project to mounted buckets.

When you get the chance, please confirm if it resolves your issues.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants