Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write HDF and push to cloud? #240

Closed
AllenDowney opened this issue Jul 18, 2022 · 1 comment
Closed

Write HDF and push to cloud? #240

AllenDowney opened this issue Jul 18, 2022 · 1 comment

Comments

@AllenDowney
Copy link

I've tried a few ways to write an HDF file. This works

    with pd.HDFStore(filename) as store:
        df.to_hdf(store, setname)

But because cloudpathlib didn't open the file, it doesn't get written to the cloud, so when I read it, I get an OverwriteNewerLocalError.

The error message says that one option is to push the local version to the cloud, but I don't see where I can do this explicitly.

Thanks for any help!

@jayqi
Copy link
Member

jayqi commented Jul 18, 2022

This situation is a case of the problem described in #128.

Basically, under the hood, pd.HDFStore is likely using open(filename, ...) in write mode. Unfortunately, while cloudpathlib supports open, it only works as expected in read mode and not write mode. Both cases operate on the local cache file. Read is fine, because cloudpathlib will attempt to sync down the file from the cloud before opening. Write does not work, because the changes only get written to the local cache file, and cloudpathlib does not sync back up. This is reflected in the OverwriteNewerLocalError that you get later, because your local cache file has diverged from the cloud version.

One workaround is to read and write to a manually set local path, and use CloudPath.upload_from to write up to the cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants