Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading images to TACC with SFTP #3709

Closed
Jegelewicz opened this issue Jul 2, 2021 · 3 comments
Closed

Uploading images to TACC with SFTP #3709

Jegelewicz opened this issue Jul 2, 2021 · 3 comments
Labels
Function-Media Priority-High (Needed for work) High because this is causing a delay in important collection work..

Comments

@Jegelewicz
Copy link
Member

Jegelewicz commented Jul 2, 2021

SABI Goal
Enable better media handling to improve bulkloading and accessing data.

Had a call today with @gracz-UNL and we uncovered a couple of issues.

  1. We need a folder in the arctos project for hwml (there should be one for every Arctos institution - who has the ability to set them up?)
  2. We need better documentation of how image uploads work - if I upload a single image, it ends up in "my" personal folder, this is not helpful for managing images by collection or institution. Then stuff gets moved around somehow and ends up in odd places. Like this UTP photo that is in the UAF folder: https://web.corral.tacc.utexas.edu/UAF/arctos/mediaUploads/20170607/4_827.jpg While I may not care where the image resides on TACC, there are definitely collections/people who do. How do things get moved around and who does it? FWIW, it seems like any UTEP media I uploaded ended up in UAF, not sure why...
  3. @gracz-UNL is experiencing issues uploading to TACC - I will let him post the error he gets in this thread.
@gracz-UNL
Copy link

Thank Teresa for starting this issue.

One addition:
I realized there can be issues with file naming conventions during the sFTP process. When media files uploaded using the Arctos web-form (single file upload) or using the zip file method, media files are automatically renamed, and non-complying characters (e.g. "." and "-") are automatically replaced. Using the sFTP method, file names are not checked. It is possible that when Arctos "ingest" or "processes" a folder, file names are checked and corrected. When the Media Metadata Bulkload Template is created files with non-compliant names should be tagged or the user should be warned that certain file names have been changed.

@dustymc
Copy link
Contributor

dustymc commented Jul 6, 2021

If this can wait until #3641 is completely resolved (no idea when that will be) it probably should, if not please coordinate with both Chris and me - that server is going to go away at some point, perhaps with short notice.

If filepath/URL is critical, the shared storage (or certain tools that write to it) might not be the best choice. 3641 may provide a mechanism to change that, but I doubt we'll be able to do much with that opportunity without significant additional resources. Something like #2863 would probably produce arbitrary URLs as well.

I can't interrupt an SCP/SFTP stream, and 3641 will remove any post-upload "opportunity" (or giant security hole, depending on your perspective...) to manipulate file names.

I changed https://handbook.arctosdb.org/documentation/media.html#media-uri from "Try to avoid..." to "Files containing characters other than A-Z, a-z, 0-9, and _ are not eligible for scripting. Please sanitize any file names before uploading."

@Jegelewicz Jegelewicz added Function-Media Priority-High (Needed for work) High because this is causing a delay in important collection work.. TACC labels Jul 9, 2021
@dustymc dustymc added this to the Needs Discussion milestone Jul 12, 2021
@dustymc
Copy link
Contributor

dustymc commented Aug 30, 2021

@dustymc dustymc closed this as completed Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Function-Media Priority-High (Needed for work) High because this is causing a delay in important collection work..
Projects
None yet
Development

No branches or pull requests

3 participants