-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error message for s3fs-mapped files that are in glacier (C4-747) #23
base: master
Are you sure you want to change the base?
Conversation
* Better error diagnostics for S3FS-mounted files that are glaciated | ||
if the ``CGAP_S3FS_UPLOAD_BUCKETS`` and ``CGAP_S3FS_UPLOAD_DIR`` environment variables are set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Script should be updated to set these values.
metadata = s3.head_object(Bucket=mapped_bucket, Key=mapped_key) | ||
storage_class = metadata['StorageClass'] | ||
if storage_class not in AVAILABLE_S3_STORAGE_CLASSES: | ||
show(f"The file {filename} is mapped via S3FS to {storage_class} storage.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might consider expanding this to include more information. I'm not sure we should assume this statement alone is enough to convey the problem (and the solution). I would just say something like:
show(f"The file {filename} is mapped via S3FS to {storage_class} storage."
" Use awscli or equivalent to restore this file to a storage class that it can be retrieved from: {AVAILABLE_S3_STORAGE_CLASSES}")
Before leaving, Sarah left a review comment on PR #25, which I'm going to close (because the part of it that was not this PR is already merged to master):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're moving forward with this PR, I think we should move s3fs-related documentation here instead of the portal and update the documentation in the portal to point here instead. There may be a couple of changes required to make the process functional for Windows machines and encrypted accounts.
mapped_dir = upload_dir.rstrip('/') | ||
pattern = f"^(?:{re.escape(mapped_dir)}|{re.escape(os.path.expanduser(mapped_dir))})/(.*)$" | ||
m = re.match(pattern, filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work for Windows-style paths? It'd be good to add some to the tests here.
candidates = bash_enumeration(upload_buckets) | ||
for mapped_bucket in candidates: | ||
try: | ||
s3.head_object(Bucket=mapped_bucket, Key=mapped_key) # an error means we're failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this call require S3 encryption kwargs?
upload_buckets = os.environ.get("CGAP_S3FS_UPLOAD_BUCKETS") | ||
upload_dir = os.environ.get("CGAP_S3FS_UPLOAD_DIR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should change these to be more generic since the process here is not specific to s3fs but rather to files mounted from S3. For example, one could use goofys to mount S3 files instead with a similar script, and the same process of checking the files' storage class would apply.
This addresses SubmitCGAP vague error from awscli (C4-747) by making a specific error message relating to
s3fs
. However, to do that, it requires that the environment variablesCGAP_S3FS_UPLOAD_DIR
andCGAP_S3FS_UPLOAD_BUCKETS
are set.@sbreiff, best would be if we could get the setup script for s3fs into SubmitCGAP as well, so that it can arrange to set those variables. From discussion on the ticket, with env vars adjusted:
The env vars this PR uses are slightly different, so that would have to be adjusted, too. The interaction looks like:
The interaction looks like: