Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for server side generated filenames (custom filename prefix for AWS S3 upload) #6518

Closed
mman opened this issue Mar 18, 2020 · 9 comments

Comments

@mman
Copy link
Contributor

mman commented Mar 18, 2020

Current Situation when uploading PFFileObjects

I'm using Parse Server to upload files from the client apps to S3. This works great. In the default configuration client SDK asks to upload file.txt and Parse Server generates a random server side filename by generating a random prefix string. This is done here:

filename = randomHexString(32) + '_' + filename;

As a result, Parse Server will then pass the generated filename, say abcdefgd_file.txt to a concrete instance of FilesAdapter and will store a file somewhere, storing a generated file name abcdefgd_file.txt in the fields that reference this file.

Later when a file needs to be downloaded, the filename abcdefgd_file.txt is passed to the files adapter to generate a download URL, and this is passed to the client.

The Problem

When using S3 backed storage for uploaded files, all uploaded files end up in one bucket and there is no easy way to organize the files and implement regular maintenance, cleanup, etc. AWS API does not support listing files by modification date (aws/aws-cli#1104). The solution is to prepend filenames uploaded to S3 with a date, timestamp, or any other mechanism that works for you so that you can later quickly analyze files with aws s3 ls yourbucket/fileprefix.

Regular maintenance can then be done for example like this.

# Delete files uploaded in 2018, i.e. all files whose names start with prefix 2018
$ aws rm mybucket/2018

This seems to be the recommended practice for S3 anyway and prefix based file listing is recommended and supported everywhere [citation needed].

Alternatives

Parse Server S3 Adapter already supports custom generateKey option that can alter the name of the file before uploading it to S3. This is documented here: https://github.com/parse-community/parse-server-s3-adapter/blob/bc41281a1c9dc24ae853ff4ff50300252a728e99/README.md#L71

This can be used to create a filename on the fly before uploading the file to S3, like shown for example here:

                    generateKey: (filename) => {
                        return `${Date.now()}_${filename}`; // unique prefix for every filename
                    }

At the moment, a filename generated this way is used only by the S3 adapter and is not propagated back from the S3 adapter to correctly store it in the database fields referencing that file. This requires to turn on another option on the Parse Server itself. It is called preserveFileName and when turned on, it will let the client SDK choose a filename and will never alter it. This is a recommended solution per Parse Server S3 Adapter, however it leaves all the responsibility on generating correct filenames on the client, and it is bad and will fail quickly when two clients choose the same filename.

Proposed Solution

We need a way to make the Parse Server to take the client provided filename, alter/generate a safe variant from it, upload it, and handle everywhere.

For me the easiest way to do this is to reuse the FilesController.validateFilename function, which is supposed to validate a passed filename and return an error if there is any problem.

This function is called immediately after POST /files/filename, and is asking the configured Files Adapter to do the validation.

Modifying the function to not return error, but rather to generate a safe filename that will be used later will allow the file adapter to generate whatever filenames work best and propagate them back to the database.

Another option would be to add a new hook, in the spirit of sanitizeFilename, generateRealFilename, and pass that hook down to the Adapter to later work with a newly generated filename everywhere.

The fix to this will involve changes in both Parse Server itself as well as in the S3 Adapter. I'm filing a new issue there as well.

@acinader
Copy link
Contributor

@mpatnode any chance you have the bandwidth to think about and review this??

@mpatnode
Copy link

I don't understand this comment:

At the moment, a filename generated this way is used only by the S3 adapter and is not propagated back from the S3 adapter to correctly store it in the database fields referencing that file

The final file name chosen by Parse should be in the returned file object (and URL) no?

@mman
Copy link
Contributor Author

mman commented Mar 18, 2020

If I read the code correctly, and per my testing, parse generates a file name here https://github.com/parse-community/parse-server/blob/master/src/Controllers/FilesController.js#L30 and that file name, which is either the one passed by client add, or one prefixed with random string is then stored in the database. So any overrides happening later for example in S3 adapter are not taken into action.

Perhaps I am missing something but this is what I observed.

@mman
Copy link
Contributor Author

mman commented Mar 18, 2020

Additionally, here https://github.com/parse-community/parse-server/blob/master/src/Controllers/FilesController.js#L33 the final file location is computed and used in the final promise resolve even before the file is actually uploaded.

@mman
Copy link
Contributor Author

mman commented Mar 18, 2020

So without sounding too confusing, what I am trying to say is that we are storing to the database a file name before we give a chance to the adapter to actually upload file and potentially adjust the file name. And if we use the S3 generateKey function to adjust the name before actual,upload, we actually break the references to those files in the dB.

@mpatnode
Copy link

OK. I see now. Yes, in my implementation, I generated the filename on the client. If you're saying the results of the generateKey method aren't propagating back to the FilesController, that's bad and should be fixed.

Ideally, we'd just pick the final filename out of the createFile() response in the returned Promise. Is it not there in the S3 object response? If not, can we just add it?

@mpatnode
Copy link

Ya: Calling getFileLocation() before the file has been created by the adapter is bad.

@mman
Copy link
Contributor Author

mman commented Mar 19, 2020

On the line https://github.com/parse-community/parse-server-s3-adapter/blob/master/index.js#L113 the S3 response object actually contains a response.Key (documented here: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property). This is the final filename that was uploaded and should be stored in the database.

@mpatnode indeed calling the Adapter.getFileLocation before the file is uploaded is a bug as get file location's primary purpose is to compose a final URL from a given filename.

So the easiest fix, still touching both parse server and all the file adapters would be to remove the line https://github.com/parse-community/parse-server/blob/master/src/Controllers/FilesController.js#L33 and extract the location from the createFile response somehow.

Pros:

  1. super clean fix
  2. fixes s3 generateKey to work as expected

Cons:

  1. Re-release of all file adapters to adopt to new semantics

Since this requires modification/re-release of all the file adapters along with parse-server, my thinking was more along the lines of adding a custom hook to the parse-server's FileAdapter config, something like generateFilename(filename: String) -> String that would be called directly before the line

const error = filesController.validateFilename(filename);

Pros:

  1. Require modification and release only on the parse-server, adapters can stay untouched.
  2. Leave untouched existing semantics of validateFilename call.

Cons:

  1. another option in the parse config to document, test, handle
  2. s3 adapter's generateKey still remains buggy

@mman
Copy link
Contributor Author

mman commented Apr 3, 2020

Closing, fixed by using beforeSave file trigger from #6344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants