Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on using Globus for accessing NESE storage in production #268

Merged
merged 51 commits into from
Jun 17, 2024

Conversation

landreev
Copy link
Collaborator

@landreev landreev commented Apr 25, 2024

What this PR does / why we need it:

This, very new and still experimental functionality is already in place in production. The setup has already been used to transfer a few TBs of research data to tape storage at NESE, which needed to happen fast due to contractual obligation. As of now there is no end-user documentation available, neither for data depositors (who we expect to be handling the uploads themselves going forward) nor for the users who will be downloading the data once published. The process is not completely intuitive and relies on extra components, some of which are still in active development. So this documentation is desperately needed.

Which issue(s) this PR closes:

Closes #260

Special notes for your reviewer:

This is a documentation-only PR, should be treated as such for the purposes of reviewing.

Suggestions on how to test this:

Normally, our QA for documentation-only PRs is essentially the same as review. For this PR, a bit more needs to be done. Specifically, I would like somebody to try to follow the instructions and (in this order) download some of the data already deposited to NESE in prod.; then do the same with the upload and try to deposit some data (see below for details). If you have never used Globus, that's actually a plus - as we should assume that this may be the case with some of the users for whom these instructions are intended.

  1. Downloads: https://github.com/IQSS/dataverse.harvard.edu/blob/260-globus-nese-documentation/doc/globus/download.md
    Please try to follow the instruction to download one of the (smaller) files in this prod. dataset: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KXJCIU. The dataset is still unpublished. If you don't have an admin account in prod., please let me know (everybody on the team needs to have it of course!).
  2. Uploads: https://github.com/IQSS/dataverse.harvard.edu/blob/260-globus-nese-documentation/doc/globus/upload.md
    Same thing, please follow the instruction, as a data depositor, and send some data to NESE. There's no need to transfer TBs of data of course; a few GB-sized files should be enough. You will need to create a collection and configure it to use the Globus-enabled store NESEtape, then create a dataset in it (and, possibly, give access to it to yourself, as a non-superuser?). I'm not 100% sure yet if using the prod. storage tape for test uploads is a great idea. I think it should be ok, as long as we delete the files still unpublished (that should delete the physical files from NESE storage). Or, alternatively, I can configure demo replicating this setup, but with another tape endpoint at NESE that's specifically set up for testing. Let's do one thing at a time - please let me know once you're done testing the downloads part, and then we'll finalize this plan.

Please keep in mind that the QA is specifically for this documentation on our end. The Borealis dataverse-globus app that the process relies on is a third-party component that is still work-in-progress and not necessarily super user-friendly. If any serious bugs are found in it, we will pass it to Borealis to fix. But generally, any complaints along the lines of "the app ain't pretty/user-friendly, etc." should only be made in the context of "the app ain't user-friendly in such and such way... therefore, we need to specifically document such and such for the users so that they know how to work around it".

@landreev landreev added Size: 33 A percentage of a sprint. GREI 5 Use cases labels Apr 25, 2024
@qqmyers qqmyers self-requested a review April 29, 2024 15:18
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@landreev landreev self-assigned this Jun 14, 2024
landreev and others added 24 commits June 17, 2024 13:04
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
@landreev
Copy link
Collaborator Author

OK, I think I'm done with it.
I committed most of @pdurbin's suggestions verbatim and slightly rewrote the rest, taking into account the comments from everybody.
Thanks for the detailed feedback.
The dataverse-globus app section header is looking like this now:

## 4. dataverse-globus App by Borealis/GDCC.

hope this is ok with everybody (?). @qqmyers , feel free to insert more credit to gdcc if needed!

@landreev landreev removed their assignment Jun 17, 2024
@stevenwinship stevenwinship merged commit 404e750 into master Jun 17, 2024
@stevenwinship stevenwinship removed their assignment Jun 17, 2024
@pdurbin pdurbin deleted the 260-globus-nese-documentation branch July 1, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GREI 5 Use cases Size: 33 A percentage of a sprint.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prod. globus infrastructure, next phase: Create documentation for data uploaders
4 participants