Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/863 - Improve error message in metaflow.S3 class when DATATOOLS_S3ROOT is not configured. #1491

Merged
merged 2 commits into from
Aug 28, 2023

Conversation

tfurmston
Copy link
Contributor

I recently encountered the issue raised in Issue 863.

I note that there is an existing PR on this issue. However, it is very old and I assume now dead. As a result, I have made a new PR that addresses the comments in the original PR.

@nflx-mf-bot
Copy link
Collaborator

Testing[469] @ b89cae6

@saikonen
Copy link
Collaborator

could you give some more context on what the setup looks like where you ran into the issue? I'm trying to see how widely this issue might've been affecting users.

So far the two cases I've tested with seem to be the only plausible ones. The first one is from the original issue where a user has no config at all. The second is a case where the config is specifically set to not have "METAFLOW_DEFAULT_DATASTORE": "s3" and the flow tries S3 access.
In other cases as soon as the datastore is set to S3, the validation in cli.py ensures that the config is correct.

@tfurmston
Copy link
Contributor Author

tfurmston commented Aug 27, 2023

could you give some more context on what the setup looks like where you ran into the issue? I'm trying to see how widely this issue might've been affecting users.

So far the two cases I've tested with seem to be the only plausible ones. The first one is from the original issue where a user has no config at all. The second is a case where the config is specifically set to not have "METAFLOW_DEFAULT_DATASTORE": "s3" and the flow tries S3 access. In other cases as soon as the datastore is set to S3, the validation in cli.py ensures that the config is correct.

When one runs a flow locally assignments of the form, self.x = 1, save the data artefacts to local files and not S3. So I was, perhaps naively given the name of the class, testing whether the S3 client has analogous behaviour. Such behaviour would be useful for quick local code development as it doesn't require the user to sign into AWS. (Within our infrastructure there is a bit of overhead to signing into AWS in order to use things such as S3.) Under your description my use case would fall under case 1, i.e., no config at all.

It turns out this functionality is not possible. However, instead of a clear error message I got the one mentioned in the issue, which was quite confusing.

I'm not sure of any other cases in which this error would arise, to be honest.

@savingoyal savingoyal merged commit 7addbe8 into Netflix:master Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants