Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should StoredUpload use a base FileSystemStorage object based on FILE_STORE_PATH? #31

Closed
jcohen02 opened this issue Jan 17, 2020 · 25 comments
Labels
bug Something isn't working
Milestone

Comments

@jcohen02
Copy link
Collaborator

The update merged in #30 and discussed in #29 replaces the string used to specify the location of a stored upload with a FileField object.

At present there is no specific FileSystemStorage object set for this FileField so it's using the default storage object.

Since users of the library will set DJANGO_DRF_FILEPOND_FILE_STORE_PATH in their application settings to define the base location of the file store for stored uploads, having a FileSystemStorage object relative to this location should enforce that uploads are stored below this location and may be beneficial from a security perspective?

@pasvein, I'm fixing the tests to take account of the changes to the StoredUpload model and adding in this alteration. I'll raise a PR just for this fix initially and it would be great if you can check that this doesn't cause any issues for your use case.

Thanks.

@pasevin
Copy link
Contributor

pasevin commented Jan 21, 2020

Thanks, will do as soon as I can.

@pasevin
Copy link
Contributor

pasevin commented Feb 18, 2020

@jcohen02 everything seems to be working fine!

@jcohen02
Copy link
Collaborator Author

Great! Many thanks @pasevin.

@pasevin
Copy link
Contributor

pasevin commented Feb 18, 2020

I've spoken too early. There's a bug in your code:

FilePondStoredSystemStorage is using same FILEPOND_UPLOAD_TMP as in FilePondUploadSystemStorage

@jcohen02 jcohen02 reopened this Feb 18, 2020
@pasevin
Copy link
Contributor

pasevin commented Feb 18, 2020

I'm getting No such file or directory for all my files because the path is pointing to temporary folder now.

@jcohen02
Copy link
Collaborator Author

I'm investigating - this should have been covered in the tests so while what you're saying does look correct, I'm just checking this out because the tests were passing...

pasevin added a commit to pasevin/django-drf-filepond that referenced this issue Feb 18, 2020
@jcohen02
Copy link
Collaborator Author

Thanks for spotting this @pasevin. I've fixed the issue and added a test (which should have been there already!) to cover this. Update committed to dev branch and I'll be preparing things for an v0.4.0 release based on the dev branch soon.

@pasevin
Copy link
Contributor

pasevin commented Feb 20, 2020

Hey, I'm still getting an error with this implementation.
I'm using django-storages for remote storage on S3.

When I try to present my uploaded files in django admin and via template on my custom page, I get file not found error, because for some reason django is trying to access the file from my local media directory. I have no DJANGO_DRF_FILEPOND_FILE_STORE_PATH set in my settings as per instructions because I'm using remote storage.

Not sure if I'm doing something wrong here, but before storage=stored_storage change it was working ok.

@pasevin
Copy link
Contributor

pasevin commented Feb 20, 2020

I think

@deconstructible
class FilePondStoredSystemStorage(FileSystemStorage):
    """
    Subclass FileSystemStorage to prevent creation of new migrations when
    using a file store location passed to FileSystemStorage using the
    location attribute. Instead the location is applied dynamically on
    creation of the subclass avoiding detection of changes by the migration
    system.

    Addresses #13. Fix is based on fix for similar issue in
    https://github.com/julen/pootle/commit/fd7800050172549e9f31544843b986691290ddc2
    """

    def __init__(self, **kwargs):
        kwargs.update({
            'location': FILEPOND_FILE_STORE_PATH,
        })
        super(FilePondStoredSystemStorage, self).__init__(**kwargs)

somehow should take into account the case when remote storage is used, but I'm not clear yet how.

You subclass 'FileSystemStorage' only right now.

@jcohen02
Copy link
Collaborator Author

jcohen02 commented Feb 20, 2020

Sorry to hear this is still causing issues. I think this must be a result of switching from using a string for the file attribute in StoredUpload to using a FileField (in 4933728).

The use of a string provided a straightforward way of representing the file path that was agnostic to the storage medium. However, I think you said the use of a FileField was so that you could access a file URL so we should maintain this.

I don't think using a FileField that doesn't point to a real file is a great idea so we should decide whether we stick with the use of file as a FileField specifically for local files and do something different for remote files, or whether we do something else. I can't recall if django-storages can help in resolving this. I'll investigate and have this sorted before next week.

Let me know what you think.

@jcohen02 jcohen02 reopened this Feb 20, 2020
@pasevin
Copy link
Contributor

pasevin commented Feb 20, 2020

Hmm... this is getting a little bit too complicated for me right now. I'm on a tight deadline on my own project right now. I will try to find time to give this a thought as soon as I can.

@pasevin
Copy link
Contributor

pasevin commented Feb 20, 2020

I couldn't find anything helpful in django-storages...
As you mentioned, maybe it's not such a bad idea to differentiate between local and remote backend implementations.

From the top of my head:

from django.core.files.storage import  default_storage

FILEPOND_STORAGES_BACKEND = getattr(local_settings, 'STORAGES_BACKEND', None)

stored_storage = default_storage if FILEPOND_STORAGES_BACKEND \
    else FilePondStoredSystemStorage

@jcohen02
Copy link
Collaborator Author

I've taken another look at this and I think I see what's causing the issues here. We're trying to use a FileField to store the file attribute of StoredUpload. In the database the FileField object is just stored as avarchar or similar character field and it just stores the path of the stored file.

Presumably when a StoredUpload object is recreated from the data in the DB, a new FileField object is instantiated as a wrapper around the path in the DB. However, it's using the FilePondStoredSystemStorage as the base storage object for this.

The actual storage of the file seems to be working OK because this is just storing a string representing a path - in fact this is probably wrong, or at least not very clear, and is a leftover from before the file field in the database was changed from string to FileField.

I guess when you remove storage=stored_storage, the FileField then uses whatever the default storage object is and this is presumably the one from django-storages.

I'm going to run through this again to make sure any fix takes into account both local and remote storage - I'll get this fixed and rolled out tomorrow.

@pasevin
Copy link
Contributor

pasevin commented Feb 21, 2020

Yes, this is exactly what is happening. Thanks for being so responsive on this issue!

@jcohen02
Copy link
Collaborator Author

No problem, just looking at the best way to be able to dynamically select the storage backend - I'm wondering if a custom storage backend class might be the way to go. This will then look up what type of storage backend we're using and hand off the file storage or retrieval to the relevant backend.

I think one issue that we encountered previously here is that you can't access dynamic application settings in the models since the app has not been fully initialised at this stage. I think a custom storage backend will work around this.

jcohen02 added a commit that referenced this issue Feb 21, 2020
…load and the underlying file storage used by FileField.
@jcohen02
Copy link
Collaborator Author

Ok, so I think the latest updates should resolve this. If you could test the code, that would be fantastic. The updates are in the PR #36 and the (new) fix/issue31 branch.

I've set the default storage for StoredUpload to a new class which is a LazyObject that I think is initialised on first use - it wraps the correct storage class depending on the application settings.

I've tested it with storing and loading both binary and text data with a local storage backend and a remote S3 backend. Both worked successfully for me but it would be great if you can test and verify if this resolves the issues.

One concern I have is whether this might re-introduce the previous issue we had with new migrations being created due to the dynamic storage location of the different backends. A new migration has been added to add the new storage backend class as the storage class for FileField. I tried changing things and seeing whether this triggered the creation of further migrations and it didn't seem to but a little unsure about whether this is OK as is.

@jcohen02 jcohen02 added the bug Something isn't working label Feb 21, 2020
@pasevin
Copy link
Contributor

pasevin commented Feb 25, 2020

Looks good to me, and it Fixes #34 as well in a better way I guess :)

So far no crashes on my set up with remote storage.

@jcohen02
Copy link
Collaborator Author

Many thanks @pasevin, great that this is looking like it resolves the issue and that it also fixes #34 too. I'll get this merged into master and get a new release out soon that includes this.

I'll close this now but any further issues, please reopen and let me know.

@pasevin
Copy link
Contributor

pasevin commented Feb 29, 2020

Hi, I've tried to install this branch on a clear machine today and noticed Django output:

Your models have changes that are not yet reflected in a migration, and so won't be applied.Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.

If I do that, it creates a new migration inside package:

class Migration(migrations.Migration):

    dependencies = [
        ('django_drf_filepond', '0007_altered_stored_upload_file'),
    ]

    operations = [
        migrations.AlterField(
            model_name='storedupload',
            name='file',
            field=models.FileField(max_length=2048, storage=storages.backends.s3boto3.S3Boto3Storage(), upload_to=''),
        ),
    ]

I see your migration 0007 is:

class Migration(migrations.Migration):

    dependencies = [
        ('django_drf_filepond', '0006_permupload_mods'),
    ]

    operations = [
        migrations.AlterField(
            model_name='storedupload',
            name='file',
            field=models.FileField(max_length=2048, storage=django_drf_filepond.models.FilePondLocalStoredStorage(), upload_to=''),
        ),
    ]

Shouldn't it be:

field=models.FileField(max_length=2048, storage=django_drf_filepond.models.DrfFilePondStoredStorage(), upload_to=''),

@jcohen02
Copy link
Collaborator Author

Hi, yes, I agree the migration doesn't look right. In fact, I think there must have been some confusion with this on my part so apologies for that - my build directory seemed to end up with two 0007 migrations in it, one of which specifies storage=django_drf_filepond.models.DrfFilePondStoredStorage(), even though the source itself only has the one 0007_altered_stored_upload_file which looks incorrect. I've resolved this and pushed the revision to the fix branch.

I note, however, that if I delete the 0007 migration and then recreate with python manage.py makemigrations it creates the migration with the storage class pointing to FilePondLocalStoredStorage.

One thing I can't do is to get it to tell me that the models have changed, as per the message at the top of your previous comment. Can you highlight the steps you're using to get this?

I have a test application (based on the tutorial in the docs that I'm using to test this. If I delete the test application's database altogether and then (running in a virtualenv) delete django-drf-filepond, I can then reinstall it with python setup.py install from the fix/issue31 branch and then regenerate the database with python manage.py migrate. I never see a message telling me that the models have changed.

I can switch my app configuration back and forth between the S3 backend and local file storage and delete and recreate the DB (and reinstall the library) each time and it never tells me the migrations are out of date. It would be useful to be able to recreate this so I can see if I can find a better solution, or at least verify for sure whether this has been resolved.

@jcohen02 jcohen02 reopened this Feb 29, 2020
@pasevin
Copy link
Contributor

pasevin commented Mar 2, 2020

Looks like the latest commit fixed the issue with python manage.py makemigrations triggering unwanted migration creation.

I can't remember now, at which point I saw this Your models have changes that are not yet reflected in a migration, and so won't be applied.... message. I think it was when I tried to execute python manage.py runserver, after I ran makemigrations and haven't noticed that this migration has been created.

I will let you know if I notice any other issues regarding this :)

@jcohen02
Copy link
Collaborator Author

jcohen02 commented Mar 2, 2020

Thanks for that, sorry that this hasn't been a straightforward fix! Hopefully the issue is now resolved - I'll leave this open for now, I'm intending to do some further tests prior to the next release.

@pasevin
Copy link
Contributor

pasevin commented Mar 3, 2020

No problem, I really appreciate you being so active with this project! Keep up the good work :)

@jcohen02
Copy link
Collaborator Author

The updates related to this issue are included in v0.3.0.

For now, I'll leave this open for a little while in case there are any further problems relating to this.

@jcohen02
Copy link
Collaborator Author

This has been present in v0.3.0 for some time now so I'm going to close this issue but please re-open if there are any further issues relating to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants