-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core, Media] Interface for safe file downloading #3881
Conversation
This PR is working and ready for code review even if it's failing Travis! I just need to do PHPCS and function documentation. |
// Verify file exists in the database. | ||
if (!$downloader->isFileInDatabase($partialPath)) { | ||
error_log("ERROR Requested file is not in the database"); | ||
http_response_code(400); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a file not being there should be a 404
* We must be sure that a relative path is not being used maliciously to | ||
* download a forbidden file. | ||
*/ | ||
function sanitizePath(string $fullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be part of the interface, it should be part of the implementation.
/** | ||
* This serves as a generic base class for other modules in LORIS to extend as | ||
* needed. It implements the FileDownloader interface to provide core download | ||
* functionality and security. It does not actually provide any functiaonlity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/functiaonlity/functionality/
* This serves as a generic base class for other modules in LORIS to extend as | ||
* needed. It implements the FileDownloader interface to provide core download | ||
* functionality and security. It does not actually provide any functiaonlity | ||
* related to retrieiving additional file information from the database as in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/retrieiving/retrieving
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the whole, I like the concept and think it's a step forward, except for some implementation details that need fixing.
*/ | ||
function __construct(string $basePath) | ||
{ | ||
$this->realBasePath = realpath($basePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will break on symlinks under basePath if they intentionally point outside of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're right but I was unable to find any place in the LORIS core where we create symlinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symlinks wouldn't be created from LORIS, they would be created by someone (ie. a sysadmin) intentionally trying to point to a different location without changing LORIS (ie. a different mount point)
* | ||
* @return none | ||
*/ | ||
public function downloadFile(string $userPath): void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably take a User as a parameter, so it can determine if the user has access to the file (and return a 401 instead of a 403 if they're anonymous.). It should also probably return a PSR7 StreamInterface rather than printing to stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we actually have a separate file-access permission layer for LORIS front-end users in any case I can think of. Instead there is a combination of the module permissions and the UNIX permissions.
Effectively if you are able to access the module then you are able to access the files in the module. Any variation in this is done in an ad-hoc way by the modules.
Alternatively, we could add a function to the interface where a class must implement LORIS-user-level permission checking per module. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on the StreamInterface bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@driusan I'm not exactly sure how to go about the StreamInterface thing. I'm not very familiar with PSR7.
Is there an existing implementation of the StreamInterface that I can instantiate here? Is StringStream appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to use this to work off of:
Loris/src/Router/ModuleFileRouter.php
Line 85 in 24e633a
$resp = (new \Zend\Diactoros\Response) |
Let me know if this makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean:
Loris/src/Router/ModuleFileRouter.php
Line 87 in 24e633a
->withBody(new \Zend\Diactoros\Stream($fullpath)); |
but yes, I think it makes sense to use the Zend implementation
* | ||
* @return string | ||
*/ | ||
public function getFileNameFromDatabase(int $id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this needs to be declared because of the interface but you're not using then either:
- This function (and class) should be abstract or
- This shouldn't be in the interface, since even your first attempt to implement the interface didn't need it.
error_log("$realUserPath equals $this->realBasePath"); | ||
error_log("ERROR: Likely path traversal attempt in file download."); | ||
http_response_code(400); | ||
exit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you always trying to exit? CLASSES OR LIBRARIES SHOULD NEVER CALL EXIT. Ever. Full stop. If you need to abort normal control flow, you throw an exception, you don't kill the entire server, test suite, script, or anything else that might be innocently trying to use the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing download implementations in LORIS exit which is why I did it here.
I agree with throwing an exception instead.
// Get mime type and do the file download. | ||
$mimeType = mime_content_type($fullPath); | ||
header("Content-type: $mimeType"); | ||
header("Content-Disposition: attachment; filename=$fileName"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are the responsibility of the PSR7 handler. Maybe the interface should have a GetContentType(string $filename) : string
(or similar) method to make it possible for a real middleware to inject the header.
@@ -0,0 +1,155 @@ | |||
<?php | |||
/** | |||
* Class to provide "middleware" for file downloading functionality, taking care |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't middleware, it's just a class.
@driusan Thanks for all the feedback, it's all very helpful and I appreciate the perspective. I'll get to fixing this ASAP. |
Still todo:
|
); | ||
return ''; | ||
} | ||
return mime_content_type($fullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also use sanitized path. If not, it could be used to query and probe for existence of specific file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still the line 126 is not using the sanitized path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sanitizePath is written to cause an exception if the path is bad -- it doesn't return anything. So line 118 will halt the program execution on a dangerous path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it will raise an exception if the file doesn't exist and log an error if the file contain ".." but will no break in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah you're right it works differently after I made the changes to allow for sym links. This needs more work
@@ -114,6 +114,7 @@ class GenericDownloader implements FileDownloader | |||
*/ | |||
function getContentType(string $fullPath): string | |||
{ | |||
sanitizePath($fullPath); | |||
if (!is_file($fullPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see how is_file() is now using this sanitized path.
); | ||
return ''; | ||
} | ||
return mime_content_type($fullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still the line 126 is not using the sanitized path.
@PapillonMcGill I did a lot of refactoring and I believe I addressed your concerns. |
@driusan I believe I've incorporated all the feedback you gave when I first issued this PR including:
As for PSR stuff, I'm not sure if I've really done best practices here because I am still learning how PSR/Diactoros works. I'd really appreciate some guidance here so please let me know if there are changes you'd like. |
); | ||
} | ||
|
||
if (!is_readable($path)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will never be true as it would be catch by line 57
echo "Bad request. A valid path must be specified."; | ||
} else { | ||
$downloadBasePath = $lorisRoot . 'modules/document_repository/user_uploads/'; | ||
$fullPath = $downloadBasePath . $_GET['File']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$fullPath is never used.
@johnsaigle sorry I missed your comment until now.. did you forget to push? It still appears to be using side-effects instead of a PSR7 response. |
I just threw together a plausible looking interface/example of using Psr7. @ridz1208 or @xlecours might have comments on it: /**
* A DownloadableFile represents a file that a user
* can download.
*/
interface DownloadableFile /* Maybe extends ETagCalculator being added in the projects API endpoint PR? */ {
/**
* Returns a StreamInterface to download the file
*/
public function getContent() : \Psr\Http\Message\StreamInterface
/**
* Returns the Content-Type being returned, if
* known. The empty string should be interpreted
* as "unknown"
*/
public function getContentType() : string
/**
* Returns the filename for this file that should
* be presented to the end user, if known, otherwise
* the empty string
*/
public function getFilename() : string
/**
* Returns true if the file being downloaded exists
*/
public function exists() : bool;
/**
* Return true if the user can access this file
*/
public function hasAccess(\User $user) : bool;
}
/**
* FileDownloadValidator downloads files and does some validation.
* It does reasonable things for the above interface where applicable
* and leaves the rest abstract for a concrete implementation to
* enforce such as a module checking permissions.
*/
abstract class FileDownloadValidator implements FileDownloader {
protected $filepath;
public function __construct(string $filepath) {
$this->filepath = $filepath;
}
public function getContent() : \Psr\Http\Message\StreamInterface {
// an example of a reasonable thing to do here
return new \Zend\Diactoros\Stream($this->filepath);
}
abstract public function getContentType() : string;
// other functions go here and do reasonable things where applicable.
}
// Later, likely inside some module's handler:
public function handle(ServerRequestInterface $request) : ResponseInterface {
// After deciding that a file should be downloaded based, most likely based on
// $request->getURI()->getPath().
// Also assume the $user is available somewhere.
$file = new FileDownloadValidator("somefilename.png");
$resp = new \LORIS\Http\Response();
if ($file->hasAccess($user) === false) {
// FIXME: 401 or 403 depending on if they're logged in.
// The PageDecorationMiddleware adds page template boilerplate based on the
// content type
return $resp
->withStatusCode(403)
->withBody(new \LORIS\Http\StringStream("<h6>Permission denied</h6>");
->withContentType("text/html");
}
if ($file->exists() === false) {
return $resp
->withStatusCode(404)
->withBody(new \LORIS\Http\StringStream("<h6>File not found</h6>");
->withContentType("text/html");
}
// Fixme: only do these things if the content type and filename are not empty,
// but this is only an example..
return $resp->withBody($file->getContent())
->withHeader("Content-Type", $file->getContentType())
->withHeader('Content-Disposition', 'attachment; filename=' . $this->getFilename());
} |
@johnsaigle any further development on this one ? |
Will resume development on this after #4163 gets merged. Otherwise it's too painful because every error gets swallowed by the 404 handling. |
#4163 was merged. |
@driusan Yep, just been busy with other work and this is a slightly lower priority. |
This will be rewritten to closely conform to the structure in #4181 |
To those assigning PRs (@ridz1208, @driusan): I'm going to hold off developing this class until #4181 is approved and merged. In my view these classes should conform to each other as closely as possible and I'm happier with the approach I took there than here. Feel free to close this PR if you're looking to organize the repo. We can re-open it when the other is merged. If you don't want to do that, rest assured that I have not forgotten about this PR and will be working on it as soon as the other one goes through. |
Superceded by #4181 |
Summary of changes
The idea is that all file downloading in LORIS will be as short as the new
GetFile
code with baked-in validation and security.Details
There are many places in LORIS where file downloading has been implemented in an ad-hoc way. While functional, these implementations are coded with different methodologies and styles making it very time-consuming to test from a security perspective. At the same time, many of these implementations have been copied partially from each other and so share(d) security vulnerabilities.
This PR adds:
AttachmentResponse
class suitable for doing straight-forward file downloads in the majority of casesThis resolves issue...
Consolidates common functionality in one place so that security and basic verification can be handled in a universal way.
Use an OO approach to allow individual classes to extend and different implement retrieval function to allow flexibility where needed.
Abstracts the following boilerplate code from file downloads:
Related PRs:
Similar idea:
Related security issues
Further work
Refactor existing modules to use (or extend) AttachmentResponse.
Future modules requiring attachment downloads should also use AttachmentResponse
To test this change...