Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core, Media] Interface for safe file downloading #3881

Closed
wants to merge 35 commits into from
Closed

[Core, Media] Interface for safe file downloading #3881

wants to merge 35 commits into from

Conversation

johnsaigle
Copy link
Contributor

@johnsaigle johnsaigle commented Aug 15, 2018

Summary of changes

The idea is that all file downloading in LORIS will be as short as the new GetFile code with baked-in validation and security.

Details

There are many places in LORIS where file downloading has been implemented in an ad-hoc way. While functional, these implementations are coded with different methodologies and styles making it very time-consuming to test from a security perspective. At the same time, many of these implementations have been copied partially from each other and so share(d) security vulnerabilities.

This PR adds:

  • The AttachmentResponse class suitable for doing straight-forward file downloads in the majority of cases
  • An extension of the above class for use in the Doc Repo to demonstrate how to use it.
  • A small refactor to the doc_repo file downloader as it has some quirks but is simple enough to illustrate the changes.

This resolves issue...

Consolidates common functionality in one place so that security and basic verification can be handled in a universal way.

Use an OO approach to allow individual classes to extend and different implement retrieval function to allow flexibility where needed.

Abstracts the following boilerplate code from file downloads:

  • Checking permissions
  • Checking file exists
  • Checking file is a file
  • Attach the appropriate headers (mime type, length, file attachment)
  • Ensuring no path traversal

Related PRs:

Similar idea:

Related security issues

Further work

  • Refactor existing modules to use (or extend) AttachmentResponse.

  • Future modules requiring attachment downloads should also use AttachmentResponse

To test this change...

  • Ensure that you are able to download files from the doc_repo.
  • Make sure the download dialog contains accurate information about the content length and MIME type

@johnsaigle johnsaigle added State: Discussion required PR or issue that requires the resolution of a discussion with the relevant parties to proceed [branch] major labels Aug 15, 2018
@johnsaigle johnsaigle added this to the 21.0.0 milestone Aug 15, 2018
@johnsaigle johnsaigle added the State: Needs work PR awaiting additional work by the author to proceed label Aug 15, 2018
@johnsaigle
Copy link
Contributor Author

This PR is working and ready for code review even if it's failing Travis!

I just need to do PHPCS and function documentation.

@johnsaigle johnsaigle removed the State: Needs work PR awaiting additional work by the author to proceed label Aug 15, 2018
// Verify file exists in the database.
if (!$downloader->isFileInDatabase($partialPath)) {
error_log("ERROR Requested file is not in the database");
http_response_code(400);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a file not being there should be a 404

* We must be sure that a relative path is not being used maliciously to
* download a forbidden file.
*/
function sanitizePath(string $fullPath);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be part of the interface, it should be part of the implementation.

/**
* This serves as a generic base class for other modules in LORIS to extend as
* needed. It implements the FileDownloader interface to provide core download
* functionality and security. It does not actually provide any functiaonlity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/functiaonlity/functionality/

* This serves as a generic base class for other modules in LORIS to extend as
* needed. It implements the FileDownloader interface to provide core download
* functionality and security. It does not actually provide any functiaonlity
* related to retrieiving additional file information from the database as in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/retrieiving/retrieving

Copy link
Collaborator

@driusan driusan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole, I like the concept and think it's a step forward, except for some implementation details that need fixing.

*/
function __construct(string $basePath)
{
$this->realBasePath = realpath($basePath);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will break on symlinks under basePath if they intentionally point outside of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're right but I was unable to find any place in the LORIS core where we create symlinks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symlinks wouldn't be created from LORIS, they would be created by someone (ie. a sysadmin) intentionally trying to point to a different location without changing LORIS (ie. a different mount point)

*
* @return none
*/
public function downloadFile(string $userPath): void
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably take a User as a parameter, so it can determine if the user has access to the file (and return a 401 instead of a 403 if they're anonymous.). It should also probably return a PSR7 StreamInterface rather than printing to stdout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually have a separate file-access permission layer for LORIS front-end users in any case I can think of. Instead there is a combination of the module permissions and the UNIX permissions.

Effectively if you are able to access the module then you are able to access the files in the module. Any variation in this is done in an ad-hoc way by the modules.

Alternatively, we could add a function to the interface where a class must implement LORIS-user-level permission checking per module. What are your thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the StreamInterface bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@driusan I'm not exactly sure how to go about the StreamInterface thing. I'm not very familiar with PSR7.

Is there an existing implementation of the StreamInterface that I can instantiate here? Is StringStream appropriate?

Copy link
Contributor Author

@johnsaigle johnsaigle Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to use this to work off of:

$resp = (new \Zend\Diactoros\Response)

Let me know if this makes sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean:

->withBody(new \Zend\Diactoros\Stream($fullpath));

but yes, I think it makes sense to use the Zend implementation

*
* @return string
*/
public function getFileNameFromDatabase(int $id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this needs to be declared because of the interface but you're not using then either:

  1. This function (and class) should be abstract or
  2. This shouldn't be in the interface, since even your first attempt to implement the interface didn't need it.

error_log("$realUserPath equals $this->realBasePath");
error_log("ERROR: Likely path traversal attempt in file download.");
http_response_code(400);
exit(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you always trying to exit? CLASSES OR LIBRARIES SHOULD NEVER CALL EXIT. Ever. Full stop. If you need to abort normal control flow, you throw an exception, you don't kill the entire server, test suite, script, or anything else that might be innocently trying to use the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing download implementations in LORIS exit which is why I did it here.

I agree with throwing an exception instead.

// Get mime type and do the file download.
$mimeType = mime_content_type($fullPath);
header("Content-type: $mimeType");
header("Content-Disposition: attachment; filename=$fileName");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are the responsibility of the PSR7 handler. Maybe the interface should have a GetContentType(string $filename) : string (or similar) method to make it possible for a real middleware to inject the header.

@@ -0,0 +1,155 @@
<?php
/**
* Class to provide "middleware" for file downloading functionality, taking care
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't middleware, it's just a class.

@johnsaigle
Copy link
Contributor Author

@driusan Thanks for all the feedback, it's all very helpful and I appreciate the perspective. I'll get to fixing this ASAP.

@johnsaigle johnsaigle self-assigned this Aug 21, 2018
@johnsaigle johnsaigle added the State: Needs work PR awaiting additional work by the author to proceed label Sep 10, 2018
@johnsaigle
Copy link
Contributor Author

johnsaigle commented Sep 10, 2018

Still todo:

  • Use middleware to determine the mime type. I think I did it?

  • Make sure symlinks will work after path sanitization

@johnsaigle johnsaigle added the Category: Feature PR or issue that aims to introduce a new feature label Sep 10, 2018
);
return '';
}
return mime_content_type($fullPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also use sanitized path. If not, it could be used to query and probe for existence of specific file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still the line 126 is not using the sanitized path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizePath is written to cause an exception if the path is bad -- it doesn't return anything. So line 118 will halt the program execution on a dangerous path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it will raise an exception if the file doesn't exist and log an error if the file contain ".." but will no break in that case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah you're right it works differently after I made the changes to allow for sym links. This needs more work

@@ -114,6 +114,7 @@ class GenericDownloader implements FileDownloader
*/
function getContentType(string $fullPath): string
{
sanitizePath($fullPath);
if (!is_file($fullPath)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see how is_file() is now using this sanitized path.

);
return '';
}
return mime_content_type($fullPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still the line 126 is not using the sanitized path.

@johnsaigle johnsaigle removed the State: Needs work PR awaiting additional work by the author to proceed label Sep 17, 2018
@johnsaigle
Copy link
Contributor Author

@PapillonMcGill I did a lot of refactoring and I believe I addressed your concerns.

@johnsaigle
Copy link
Contributor Author

johnsaigle commented Sep 17, 2018

@driusan I believe I've incorporated all the feedback you gave when I first issued this PR including:

  • Simplifying the interface (I removed it)
  • Preventing an issue where my method would prevent symlinks from working
  • Using more PSR 7 stuff generally
  • Used a chunked-download approach to (maybe?) help with downloading very large files

As for PSR stuff, I'm not sure if I've really done best practices here because I am still learning how PSR/Diactoros works. I'd really appreciate some guidance here so please let me know if there are changes you'd like.

@johnsaigle johnsaigle removed the State: Discussion required PR or issue that requires the resolution of a discussion with the relevant parties to proceed label Sep 17, 2018
);
}

if (!is_readable($path)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never be true as it would be catch by line 57

echo "Bad request. A valid path must be specified.";
} else {
$downloadBasePath = $lorisRoot . 'modules/document_repository/user_uploads/';
$fullPath = $downloadBasePath . $_GET['File'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$fullPath is never used.

PapillonMcGill
PapillonMcGill previously approved these changes Sep 19, 2018
@driusan
Copy link
Collaborator

driusan commented Oct 11, 2018

@johnsaigle sorry I missed your comment until now.. did you forget to push? It still appears to be using side-effects instead of a PSR7 response.

@driusan
Copy link
Collaborator

driusan commented Oct 11, 2018

I just threw together a plausible looking interface/example of using Psr7. @ridz1208 or @xlecours might have comments on it:

/**
 * A DownloadableFile represents a file that a user
 * can download.
 */
interface DownloadableFile /* Maybe extends ETagCalculator being added in the projects API endpoint PR? */ {
    /**
     * Returns a StreamInterface to download the file
     */
    public function getContent() : \Psr\Http\Message\StreamInterface
    
    /**
     * Returns the Content-Type being returned, if
     * known. The empty string should be interpreted
     * as "unknown"
     */
    public function getContentType() : string
    
    /**
     * Returns the filename for this file that should
     * be presented to the end user, if known, otherwise
     * the empty string
     */
    public function getFilename() : string
    
    /**
     * Returns true if the file being downloaded exists
     */
    public function exists() : bool;
    
    /**
     * Return true if the user can access this file
     */
    public function hasAccess(\User $user) : bool;

}

/**
 * FileDownloadValidator downloads files and does some validation.
 * It does reasonable things for the above interface where applicable
 * and leaves the rest abstract for a concrete implementation to
 * enforce such as a module checking permissions. 
 */
abstract class FileDownloadValidator implements FileDownloader {
    protected $filepath;
    
    public function __construct(string $filepath) {
        $this->filepath = $filepath;
    }
    
    public function getContent() : \Psr\Http\Message\StreamInterface {
        // an example of a reasonable thing to do here 
        return new \Zend\Diactoros\Stream($this->filepath);
    }
    
    abstract public function getContentType() : string;
    // other functions go here and do reasonable things where applicable.
}

// Later, likely inside some module's handler:

 public function handle(ServerRequestInterface $request) : ResponseInterface {
     // After deciding that a file should be downloaded based, most likely based on
     // $request->getURI()->getPath().
     // Also assume the $user is available somewhere.
     $file = new FileDownloadValidator("somefilename.png");
     $resp = new \LORIS\Http\Response();
     if ($file->hasAccess($user) === false) {
         // FIXME: 401 or 403 depending on if they're logged in.
         // The PageDecorationMiddleware adds page template boilerplate based on the
         // content type
         return $resp
             ->withStatusCode(403)
             ->withBody(new \LORIS\Http\StringStream("<h6>Permission denied</h6>");
             ->withContentType("text/html");
     }
     if ($file->exists() === false) {
         return $resp
             ->withStatusCode(404)
             ->withBody(new \LORIS\Http\StringStream("<h6>File not found</h6>");
             ->withContentType("text/html");
     }
     // Fixme: only do these things if the content type and filename are not empty,
     // but this is only an example..
     return $resp->withBody($file->getContent())
         ->withHeader("Content-Type", $file->getContentType())
         ->withHeader('Content-Disposition', 'attachment; filename=' . $this->getFilename());
}

@johnsaigle johnsaigle added State: Needs work PR awaiting additional work by the author to proceed Category: Security PR or issue that aims to improve security labels Oct 19, 2018
@ridz1208
Copy link
Collaborator

ridz1208 commented Nov 5, 2018

@johnsaigle any further development on this one ?

@johnsaigle
Copy link
Contributor Author

Will resume development on this after #4163 gets merged. Otherwise it's too painful because every error gets swallowed by the 404 handling.

@johnsaigle johnsaigle added the State: Blocked PR or issue awaiting an external event such as the merge or another PR to proceed label Dec 4, 2018
@driusan
Copy link
Collaborator

driusan commented Dec 18, 2018

#4163 was merged.

@johnsaigle
Copy link
Contributor Author

@driusan Yep, just been busy with other work and this is a slightly lower priority.

@johnsaigle johnsaigle removed the State: Blocked PR or issue awaiting an external event such as the merge or another PR to proceed label Dec 18, 2018
@johnsaigle
Copy link
Contributor Author

This will be rewritten to closely conform to the structure in #4181

@johnsaigle
Copy link
Contributor Author

To those assigning PRs (@ridz1208, @driusan): I'm going to hold off developing this class until #4181 is approved and merged. In my view these classes should conform to each other as closely as possible and I'm happier with the approach I took there than here.

Feel free to close this PR if you're looking to organize the repo. We can re-open it when the other is merged. If you don't want to do that, rest assured that I have not forgotten about this PR and will be working on it as soon as the other one goes through.

@johnsaigle johnsaigle removed the State: Discussion required PR or issue that requires the resolution of a discussion with the relevant parties to proceed label Jan 28, 2019
@driusan
Copy link
Collaborator

driusan commented Feb 1, 2019

Superceded by #4181

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Feature PR or issue that aims to introduce a new feature Category: Security PR or issue that aims to improve security State: Needs work PR awaiting additional work by the author to proceed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants