-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rsync over ssh: supported? deprecated? #8839
Comments
Rsync is not known to be broken if you are using an S3 store datastore with label 's3' (which is probably not noted in the guides). The ToDos in the code (at least the ones that I know of) are related to relaxing that restriction. That said, it is possible that it is broken - I'm not aware of anyone using it in v5.11 - so if you find a specific issue, feel free to report it. FWIW: As alternatives, the direct-upload option for s3 stores allows parallel upload of files into the 100GB range and requires no special configuration beyond the s3 store itself. In the near term, we're also planning to support Globus transfer to an S3 store, although this requires additional setup and will not support (at least initially) using restrict or embargo for files in such a store. |
@golsch I appreciate your flexibility in possible solutions. 😄 The easiest would be to mark the rsync feature as outdated, like you said. However, we still have a lot of code and docs that assume the rsync feature works fine (I agree with @qqmyers that rsync isn't definitively known to be broken). The server set up is flagged as experimental but the User Guide doesn't do a good job of setting expectations of if it works or not. I'm glad you think the rsync feature is a useful approach. Do please feel free to let us know about specific problems you're having. Maybe it'll help us assess how broken the feature is. 😞 Related: |
Hi, for me there is no direct relation between rsync and s3. I had already seen that the output on the dataset page for rsync over ssh requires the s3 label as store. Therefore, it is not satisfying for me because I don't use s3 respectively I have no idea of possible side effects. |
@golsch yes, I agree, there is no relationship between rsync and S3. From the Dataverse perspective, when we worked on #4946 we updated the Data Capture Module (the experimental thing we've been talking about) to support S3 ( https://github.com/sbgrid/data-capture-module/blob/0.6/doc/aws-s3.md ). But you're not an S3 person so I wouldn't worry about any of that! How can we help? You like the idea of rsync support. We've declared it experimental. Do you want us to help you try to get it working? Can we interest you in non-rsync solutions? @qqmyers mentioned an upcoming Globus integration (#5994), but I'm not sure if it's of interest to you. Should we update the User Guide to better set expectations about rsync support? We're open to your ideas! |
Closing in favor of this issue... ... which I just re-titled to be more specific about updating the guides to remove rsync. It's marked with "hacktoberfest" and has screenshots of where to remove references of rsync |
Overview of the Feature Request
Rsync over SSH is advertised in the doc as a feature of Dataverse. I struggled through the docs to configure it. In the end I found out that the feature is no longer supported or that there are TODOs in the code.
Is there a workaround to make it work? In general, it should be supported again because it is a useful approach for big data or at least be marked as outdated.
The text was updated successfully, but these errors were encountered: