Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data_release/document_repository] Make upload directory configurable #5815

Merged
merged 11 commits into from
Dec 20, 2019

Conversation

ridz1208
Copy link
Collaborator

@ridz1208 ridz1208 commented Nov 29, 2019

Brief summary of changes

This gives the option to the users to configure where the default upload directory of data_release and document_repository modules will be. this also removes the burden on LORIS to ensure these locations are readable/writeable and follows the standard established by the other modules uploading data to loris.

testing

  • before checking out the branch, make sure you are able to upload a file to doc repo and data release. If it is not the case, you should resolve the issue as it is not due to this change.
  • check out this branch and ensure you are still able to upload and download files to the 2 modules (the sql patch for existing projects is designed to be backwards compatible)
  • change the upload paths from the configuration module and make sure the new paths exist on your machine and are apache writable
  • try uploading downloading other documents to both modules to ensure that the functionality is unaffected.

Important note for release

Projects should be encouraged to move their files out of the old directories and into new directories somewhere else on their data mount/drive.

Related

@ridz1208 ridz1208 added Release: Add to release notes PR whose changes should be highlighted in the release notes Category: Feature PR or issue that aims to introduce a new feature Cleanup PR or issue introducing/requiring at least one clean-up operation Language: SQL PR or issue that update SQL code labels Nov 29, 2019
@@ -94,5 +94,7 @@ INSERT INTO `Config` (`ID`, `ConfigID`, `Value`) VALUES (97,70,'/data-raisinbrea
INSERT INTO `Config` (`ID`, `ConfigID`, `Value`) VALUES (98,93,'V1');
INSERT INTO `Config` (`ID`, `ConfigID`, `Value`) VALUES (99,101,'');
INSERT INTO `Config` (`ID`, `ConfigID`, `Value`) VALUES (102,19,'false');
INSERT INTO `Config` (`ID`, `ConfigID`, `Value`) VALUES (103,102,'/var/www/loris/modules/document_repository/user_uploads/');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can change the default to be outside of the module now that the install script doesn't create it.

@johnsaigle johnsaigle added the Blocking PR should be prioritized because it is blocking the progress of another task label Dec 2, 2019
@PapillonMcGill
Copy link
Contributor

@ridz1208 Could we do the same thing with project/tables_sql/ and project/instruments/?

This way, in no situation, Apache would require writing inside the loris directory.

$DB = \Database::singleton();
$user = \User::singleton();
$DB = \Database::singleton();
$user = \User::singleton();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but could you update these to use the Factory like the line below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i could... and i wanted to... I just decided to avoid "unrelated changes". I'm gonna do it though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to leave code cleaner than you found it, especially when it's an open issue. I think you're safe here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sold man stop selling

@ridz1208
Copy link
Collaborator Author

ridz1208 commented Dec 2, 2019

@PapillonMcGill The 2 directories you mentioned are much more complicated. Projects track and commit the data in these directories generally with git. it would be very tricky directory to decide which files get read from where if a project is using both LINST and tracked php instruments. same goes for tables_sql, some generated patches are commited to the repo

@@ -152,8 +144,6 @@ elif [[ " ${redhat[*]} " =~ " $os_distro " ]]; then
else
echo "$os_distro Linux distribution detected. We currently do not support this. "
echo "Please manually change subdirectory ownership and permissions to ensure the web server can read *and write* in the following: "
echo "../modules/data_release/user_uploads "
echo "../modules/document_repository/user_uploads "
Copy link
Contributor

@johnsaigle johnsaigle Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 120 in this file should be removed

Changing group to 'www-data' or 'apache' to give permission to create directories in Document Repository module

driusan
driusan previously requested changes Dec 4, 2019
@@ -75,6 +75,8 @@ INSERT INTO ConfigSettings (Name, Description, Visible, AllowMultiple, DataType,
INSERT INTO ConfigSettings (Name, Description, Visible, AllowMultiple, DataType, Parent, Label, OrderNumber) SELECT 'publication_uploads', 'Path to uploaded publications', 1, 0, 'web_path', ID, 'Publications', 10 FROM ConfigSettings WHERE Name="paths";
INSERT INTO ConfigSettings (Name, Description, Visible, AllowMultiple, DataType, Parent, Label, OrderNumber) SELECT 'publication_deletions', 'Path to deleted publications', 1, 0, 'web_path', ID, 'Deleted Publications', 11 FROM ConfigSettings WHERE Name="paths";
INSERT INTO ConfigSettings (Name, Description, Visible, AllowMultiple, DataType, Parent, Label, OrderNumber) SELECT 'MINCToolsPath', 'Path to the MINC tools', 1, 0, 'web_path', ID, 'Path to the MINC tools', 12 FROM ConfigSettings WHERE Name="paths";
INSERT INTO ConfigSettings (Name, Description, Visible, AllowMultiple, DataType, Parent, Label, OrderNumber) SELECT 'documentRepositoryPath', 'Path to uploaded document repository files', 1, 0, 'text', ID, 'Document Repository Upload Path', 13 FROM ConfigSettings WHERE Name="paths";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be of type web_path, not text, so that the config module verifies they exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely. That's a great catch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouuh i forgot about that

@@ -190,6 +192,8 @@ INSERT INTO Config (ConfigID, Value) SELECT ID, "/data/uploads/" FROM ConfigSett
INSERT INTO Config (ConfigID, Value) SELECT ID, "/data/publication_uploads/" FROM ConfigSettings WHERE Name="publication_uploads";
INSERT INTO Config (ConfigID, Value) SELECT ID, "/data/publication_uploads/to_be_deleted/" FROM ConfigSettings WHERE Name="publication_deletions";
INSERT INTO Config (ConfigID, Value) SELECT ID, "%MINCToolsPath%" FROM ConfigSettings WHERE Name="MINCToolsPath";
INSERT INTO Config (ConfigID, Value) SELECT ID, "/data/document_repository_uploads/" FROM ConfigSettings WHERE Name="documentRepositoryPath";
INSERT INTO Config (ConfigID, Value) SELECT ID, "/data/data_release_uploads/" FROM ConfigSettings WHERE Name="dataReleasePath";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we leave the default empty and update the code to have reasonable defaults if the value is null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea about what those constants would be? Where would be a reasonable place on a linux server for these files to go? Maybe somewhere under /srv/?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what constants you're talking about.. I'm suggesting leaving them null.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@driusan but you suggested to update the code to have reasonable defaults. what do you mean by that ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"defaults" was the wrong word.. I meant it should behave reasonably if the value is unset (ie. provide an error message pointing to the config setting if it's required or disable the feature if it's an optional thing similarly to how to the instrument_manager disables uploading if it doesn't have write access.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okai so if i understand correctly;

  • in RB set the paths to "ideal" values /data/...
  • in 0000 conf files do not add an entry into the Config table and let projects set a validpath
  • in the patch for existing projects, maintain status quo and discuss in the release the recommendation of moving the directories.

agreed ?

Copy link
Contributor

@johnsaigle johnsaigle Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though we'll need to provide some detail about also moving the existing files from one destination to another, not just changing the path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the status quo is so I can't comment on the last point, but that sounds right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. if there's modules/document_repository/user_uploads/admin/meme.jpg then the admin/ folder will need to be moved to the new location or else it will be inaccessible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes for sure but that would not be an issue in the current state because existing projects will have their path point to the old path.

it should only be a recommendation on our end

@maltheism
Copy link
Member

@ridz1208 what about the genomic_browser module?

@ridz1208
Copy link
Collaborator Author

ridz1208 commented Dec 5, 2019

@maltheism what about it ?

@maltheism
Copy link
Member

maltheism commented Dec 5, 2019

@ridz1208 you can upload a file in that module and it complains about a directory if one doesn't exist. Seems similar to what this PR is trying to get rid of & solve.

@ridz1208
Copy link
Collaborator Author

ridz1208 commented Dec 5, 2019

@maltheism no, genomics seems to have an already configurable path

@johnsaigle johnsaigle added the Needs formatting PR requires formatting fixes to meet our coding standards (PHPCS, phan, etc.) label Dec 9, 2019
@ridz1208
Copy link
Collaborator Author

@johnsaigle @driusan please review

@ridz1208 ridz1208 removed the Needs formatting PR requires formatting fixes to meet our coding standards (PHPCS, phan, etc.) label Dec 13, 2019
Copy link
Contributor

@johnsaigle johnsaigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small wording changes for the errors. I'll approve after these changes are made.

modules/data_release/ajax/FileUpload.php Outdated Show resolved Hide resolved
modules/data_release/ajax/FileUpload.php Outdated Show resolved Hide resolved
Co-Authored-By: John Saigle <4022790+johnsaigle@users.noreply.github.com>
Co-Authored-By: John Saigle <4022790+johnsaigle@users.noreply.github.com>
@ridz1208
Copy link
Collaborator Author

@driusan @johnsaigle back to you

@@ -205,7 +209,7 @@ class Files extends \NDB_Page
$Notifier->notify($msg_data);
}

$path = __DIR__ . "/../user_uploads/$dataDir";
$path = $path.$dataDir;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's safe to assume that path is defined with a / at the end of it (same for the other places the filename is concatenated to it.) It should explicitly add it (and throw an exception or otherwise die beforehand it path is empty for security reasons.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed the assumption made in other uploading forms. If i make changes to this one do i make it to the others too ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean other forms in this module or other forms in LORIS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loris. Like media for example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, just fix these ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we could temporarily I suppose. A better solution would be to rework modules to use the new File Uploader class because that has all of that validation built in. But that's a more substantial change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to more consistently handle this with something like os.path.join (Python) or filepath.Join (Go), but that's beyond the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I'm agreeing with you just doing the PHP equivalent. I think this works (adapted from some WordPress code):

    return rtrim( $string, '/\\' ) . '/';

We could wrap it in a Utility function just so that we have a standard way of doing this. not necessary though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets talk after i have questions

Copy link
Collaborator

@driusan driusan Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know PHP doesn't have a built in equivalent. I think we should write one with appropriate unit tests, but I don't think this PR is the right place to do it. That's why I think these ones should just be fixed locally and then another issue/PR can tackle doing it "right" throughout LORIS

@ridz1208 ridz1208 added the State: Needs work PR awaiting additional work by the author to proceed label Dec 17, 2019
@ridz1208
Copy link
Collaborator Author

@driusan @johnsaigle is that sufficient ?

@@ -368,7 +373,7 @@ class Files extends \NDB_Page
$fileSize = $uploadedFile->getSize();
$fileName = $uploadedFile->getClientFileName();
$fileType = pathinfo($fileName, PATHINFO_EXTENSION);
$uploadPath = "$base/modules/document_repository/user_uploads/$name/";
$uploadPath = $path.$name."/";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be using the new Utility function too

@ridz1208 ridz1208 force-pushed the 2019_11_29_config_for_uploads branch from 77f52eb to 0eec4b8 Compare December 17, 2019 21:57
@johnsaigle johnsaigle removed the State: Needs work PR awaiting additional work by the author to proceed label Dec 18, 2019
@driusan driusan merged commit 6c290e4 into aces:master Dec 20, 2019
lingma pushed a commit to lingma/Loris that referenced this pull request Jan 6, 2020
…aces#5815)

This gives the option to the users to configure where the default upload directory of data_release and document_repository modules will be. this also removes the burden on LORIS to ensure these locations are readable/writeable and follows the standard established by the other modules uploading data to loris.
@ridz1208 ridz1208 added this to the 23.0.0 milestone Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocking PR should be prioritized because it is blocking the progress of another task Category: Feature PR or issue that aims to introduce a new feature Cleanup PR or issue introducing/requiring at least one clean-up operation Language: SQL PR or issue that update SQL code Release: Add to release notes PR whose changes should be highlighted in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document Repository and Data Release should not store uploaded files in the codebase
5 participants