-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[document_repository] Fixed multiple files with same name when uploading to document repository #3700
[document_repository] Fixed multiple files with same name when uploading to document repository #3700
Conversation
// Setup Database object. | ||
$config =& \NDB_Config::singleton(); | ||
$db_config = $config->getSetting('database'); | ||
$db = \Database::singleton( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a initialization to the db class with parameters? Can't it be done just with
$db =& \Database::singleton();
?
Tested it out on my instance, and I was getting an error due to that line, and when replaced with the above, the added feature worked fine.
$base_path = __DIR__ . "/../user_uploads/"; | ||
$fileBase = $puser . "/" . $fileName; | ||
$base_path = __DIR__ . '/../user_uploads/'; | ||
$fileBase = $puser . '/docs_repo/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason why you are nesting into an additional subdirectory?
Would it not a simpler to leave it as /../user_uploads/$puser/$fileName/$uuid/$fileName
, instead of
/../user_uploads/$puser/docs_repo/$fileName/$uuid/$fileName
. From my understanding, the user_uploads
folder is already intended for the docs_repo, so I don't think there is a need to nest it further. Would like to hear your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user_uploads is used elsewhere besides the docs_repo. So the extra nest is beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @um4r12
the user_uploads folder is used in other modules but the folder is a sub-directory of document_repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@intralizee There are two user_upload
folders: one a subdir of doc_repo and the other a subdir of data_release
(which was based on doc_repo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, every other module (that stores data) stores data under /data/.../...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnsaigle @ZainVirani @um4r12 I'll get back to guys with a more concise response but one of the modules I tested before working on document_repository had uploaded a file or folder to the user_upload directory and resulted in a conflict when uploading a file from document_repository with the same name of the folder or file previously created. I may be wrong as it has been awhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I must have been mistaken previously. Next commit removes the docs_repo as unnecessary. @johnsaigle @ZainVirani @um4r12
@intralizee If you update 0000-00-schema.sql , you need to update the testing data also. |
@intralizee Please updating your new data here "test/RBdata.sql" with the new document_repository table line 1676. |
$target_path = $base_path . $fileBase; | ||
// Create filename directory /base_path/user/fileName | ||
if (!file_exists($base_path . $puser . '/' . $fileName)) { | ||
mkdir($base_path . $puser . '/' . $fileName, 0777); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*/ | ||
function uuid4() | ||
{ | ||
if (function_exists('com_create_guid') === true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support Windows so I think this is a potentially confusing comment.
* | ||
* @return string $version | ||
*/ | ||
function uuid4() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to include return type declarations for new code.
Could you add String
as a return type here?
Hit a snag with “There was an error uploading the file” but it's probably an issue with my permissions RE: move_uploaded_file method as discussed. Will update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After updating permissions on user_uploads/ I was able to test this PR.
Was able to upload duplicate files, and confirmed unique UUID's for each upload in the table.
SQL/0000-00-00-schema.sql
Outdated
@@ -864,6 +864,7 @@ CREATE TABLE `document_repository` ( | |||
`EARLI` tinyint(1) DEFAULT '0', | |||
`hide_video` tinyint(1) DEFAULT '0', | |||
`File_category` int(3) unsigned DEFAULT NULL, | |||
`uuid` varchar(36) DEFAULT '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PapillonMcGill recently set up a standard for all new additions to our SQL database (https://github.com/aces/Loris/blob/19.0-dev/docs/SQLModelingStandard.md). I think uuid should maybe be UUID? melanie, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zaliqarosli @PapillonMcGill I think the capitalized UUID is great. uuid is abbreviation for universally unique identifier. We could also decide on Universally_Unique_Identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zaliqarosli @intralizee
The standards say that The acronym should only contain upper case letter.
So capitalized is the way to go. UUID might not be fully known by neuroimaging community (yet) but software developer within knows it well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@intralizee i had to look up what UUID meant 😝but i think it works here
updated sql column uuid to UUID for updated standard specs
@@ -0,0 +1 @@ | |||
ALTER TABLE document_repository ADD `UUID` varchar(36) DEFAULT ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this patch should go into just the Archive directory? we're now on the 20 release. also, patches usually get moved to the version folders after a release if on minor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are at this, when would you submit the patches into the New_patches
dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@um4r12 when you are on a branch that has these new directories. i'm going to bring this up in tomorrow's meeting :)
$fileBase = $puser . '/' | ||
. $fileName | ||
. '/' . $uuid | ||
. '/' . $fileName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$filename is used twice. Is it supposed to be like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the the first $fileName
is referring to the directory, whereas the second one is referring to the actual file.
ex (from above). user_uploads/admin/text.png/{uuid}/text.png
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PapillonMcGill um4r12 is correct and so it's supposed to be like that.
$base_path = realpath(__DIR__ . '/..') . '/user_uploads/'; | ||
$fileBase = $puser . '/' | ||
. $fileName | ||
. '/' . $uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put all the front-slashes at the front of the line or the back of the line for consistency.
// Cleanup empty directories | ||
set_error_handler( | ||
function () { | ||
// Silence the E_WARNING when files exist in the directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is the best way to do this. Shouldn't we delete the files first in this case? I'm a little worried about this silencing useful warnings.
} | ||
); | ||
$rm_directory = __DIR__ . '/../user_uploads/' | ||
. substr($dataDir, 0, strlen($dataDir)-(strlen($fileName)+1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to chop off the filename from $dataDir
here? If so, just use dirname()
;
|
||
$fileSize = $_FILES["file"]["size"]; | ||
$fileName = $_FILES["file"]["name"]; | ||
$fileType = end((explode(".", $fileName))); | ||
$fileType = ''; | ||
// Handle retrieving the file type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better done using pathinfo()
It's also more accurate to say that you are grabbing the uploaded file's path extension rather than its actual type. Properly getting the file type would involve looking at the file signatures.
|
||
// __DIR__ is the document_repository ajax directory | ||
// when this script is executing. Go up a level to the | ||
// document_repository module directory, and use a | ||
// user_uploads directory as a base for user uploads | ||
$base_path = __DIR__ . "/../user_uploads/"; | ||
$fileBase = $puser . "/" . $fileName; | ||
$base_path = realpath(__DIR__ . '/..') . '/user_uploads/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind changing this to $upload_path
? The "base path" in LORIS typically refers to the Config setting corresponding to the web root.
Note: this code should not go live before
I just wanted to leave a note here to the effect that these branches need to be synchronized. |
This PR is tagged as a Bug Fix yet issued to minor, but in my opinion constitutes a new feature. @intralizee can you weigh in here as to where you think this PR should go? |
should we maybe add a 'Blocked' label to this PR until the needed merging happens? @intralizee |
Sorry to be jumping into this so late but.. why are we even using UUIDs for this? A few bytes of random data encoded in some way (hex, base64, whatever) should be enough to fix the issue that this is trying to solve. |
(Another--probably more robust solution-- would be to use a content addressable naming convention on the backend and include a Content-Disposition header to associate it to a friendly filename when a user tries to download it from the frontend.) |
@driusan I +1 the idea of using another approach. uuid feels like overkill. My personal preference is to use hashing.
If I understand correctly, you mean something like storing the hash of the file contents mapped to a friendly name in the back-end, right? E.g. The Database:
Frontend:
Backend
|
It seems like some discussion is still required on this PR, so I also want to bring the comments on the previous [Closed] PR into the mix Even @driusan's idea seems a bit much to me. IMO the smoothest solution is a standard file directory solution of appending _(x) to the file, where x would be a simple counter of how many of those files already exist. This makes it easily identifiable on the front end as well. If for whatever reason the user needs a specific file name when they download it, they can rename it themselves. |
@ZainVirani (we talked this over i person but I wanted to reply here so others could see). I don't like the counter solution because it makes it very easy for users to upload duplicates and waste space. The hashing method has a built-in ease of detecting duplicate files (if the hash is the same, it's a duplicate, reject the upload) With hashing, the users can still upload files of the same name as their contents will differ. So long as they use the categorization feature, files of different types will be separated into the right trees so they won't get mixed up. If users need multiple revisions of the same document and the same name, they can use data_release which has more robust versioning. I feel like the counter scheme by contrast leads to messiness. |
@driusan @johnsaigle @ZainVirani |
@intralizee Let's discuss it at a LORIS meeting, either today or next week. |
@johnsaigle yes, that's exactly what I meant by a content addressable naming convention. @intralizee It's not the approach of using a random part of the path that I think is overkill, it's just the version 4 UUIDs (which comes with the overhead of having to explain what UUIDs are to anyone who looks at the code, ensure that the generation/formatting/etc is compliant with the standard, decide what to do if someone uses a different kind of UUID, etc) when any meaningless random number would do (and I doubt we need a whole 16 bytes to avoid collisions.) |
@ZainVirani Your approach of having a counter would result in race conditions, while @intralizee's random approach doesn't have that problem (if the randomness source has enough entropy.) |
85a416e
to
bab69e7
Compare
Since changes need to be made to this PR, I am removing the |
0770 | ||
); | ||
} | ||
$target_path = $base_path . $fileBase; | ||
|
||
if (move_uploaded_file($_FILES["file"]["tmp_name"], $target_path)) { | ||
$success = $DB->insert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$db
(didn't check out the code, just noticing you changed from $DB
to $db
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and same with line 174 & 176)
Note that this will not be able to be merged when #3971 goes through. However we should still have this feature at some point. |
I think I'll close this PR for now because of the PR involving the react rewrite. That PR has made notable changes and where this PR is obsolete. I'll see what can be done after that PR is merged into LORIS. |
This pull request allow files with duplicate name to exist and when user uploads a file to the document repository.
I made the mysql column "uuid" in the table "document_repository" for files to be stored in unique directories.
The uuid and file_name are used for the system directory structure when storing the files.
System Directory Structure Examples:
Note: {uuid} is unique per upload.
user_uploads/admin/text.png/{uuid}/text.png
user_uploads/admin/text.txt/{uuid}/text.txt
user_uploads/admin/text.png/{uuid}/text.png
user_uploads/admin/text.txt/{uuid}/text.txt
See also: Bug #10847 and Bug #11243