Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not destroy instances of AttachmentData for attachments marked as deleted #3835

Conversation

floehopper
Copy link
Contributor

There are two ways that AttachmentsController#show can respond with a 404 Not Found for a deleted attachment:

A. If the AttachmentData does not exist the call to AttachmentData.find raises an ActiveRecord::RecordNotFound exception which is translated into a 404 Not Found in production.

B. If the Attachment is marked as deleted, its edition is not included in AttachmentVisibility#edition_ids and thus AttachmentVisibility#visible? will return false and this line in AttachmentsController#fail will be invoked.

Previously, for scenarios involving deleted attachments†, the application was only relying on mechanism A when a single attachment was deleted from a document which had never been published. Even in the very similar scenario when a document which has never been published is discarded, the AttachmentData instances associated with its attachments are not destroyed.

The changes in this commit mean that the application now relies on mechanism B for all scenarios involving deleted attachments, i.e. it makes things more consistent.

I did consider using mechanism A in all scenarios, but this would've meant a lot more changes and it would mean we were potentially deleting useful data.

In some ways this is a step backwards in terms of marking attachment assets as deleted in Asset Manager, but there was already work to do on that front and now it can start from a more consistent starting point.

† Mechanism A is still used if the AttachmentData ID in the URL path does not exist.

This demonstrates the current behaviour when an attachment is deleted
from a draft edition which has never been published.

* `Admin::AttachmentsController#destroy` calls `Attachment#destroy`.

* `Attachment#destroy` overrides the standard
`ActiveRecord::Persistence#destroy` method so that it only marks the
attachment as deleted, but it still runs `:destroy` callbacks [1].

* `FileAttachment`, a subclass of `Attachment` has an `after_destroy`
callback which will destroy the associated `AttachmentData` only if
no other non-deleted attachments are using it [2].

* In this case, because there's only a single edition of the document,
there is only one instance of `Attachment` for the relevant
`AttachmentData` and thus the latter is destroyed.

* This in turn fires an `after_commit, on: :destroy` callback which is
added by CarrierWave which eventually results in a call to
`Whitehall::AssetManagerStorage::File#delete` which calls
`AssetManagerDeleteAssetWorker.perform_async` to queue a job to delete
the corresponding asset in Asset Manager.

* The fact that the `AttachmentData` has actually been deleted results
in an `ActiveRecord::RecordNotFound` exception being raised in
`AttachmentsController#attachment_data`. In production this will result
in a `404 Not Found` response [3].

* In the case where an `AttachmentData` is associated with more than one
edition, the deletion of an `Attachment` will only result in the
`Attachment` being marked as deleted; the `AttachmentData` will not be
destroyed`. However, requests to the attachment URL will still result in
a `404 Not Found`, because `AttachmentVisibility#edition_ids` [4] does
not consider deleted attachments and thus
`AttachmentVisibility#visible?` will always return `false` and this line
[5] in `AttachmentsController#fail` will be invoked.

[1]:
https://github.com/alphagov/whitehall/blob/32bb13cf3ae3ae9ca71fa14bd4c620c631660419/app/models/attachment.rb#L130-L132
[2]:
https://github.com/alphagov/whitehall/blob/32bb13cf3ae3ae9ca71fa14bd4c620c631660419/app/models/file_attachment.rb#L33-L38
[3]:
http://guides.rubyonrails.org/v5.0.6/action_controller_overview.html#rescue
[4]:
https://github.com/alphagov/whitehall/blob/32bb13cf3ae3ae9ca71fa14bd4c620c631660419/app/models/attachment_visibility.rb#L116-L118
[5]:
https://github.com/alphagov/whitehall/blob/32bb13cf3ae3ae9ca71fa14bd4c620c631660419/app/controllers/attachments_controller.rb#L31
This demonstrates the current behaviour when a draft edition, which has
never been published and which has an attachment, is discarded.

* `EditionDeleter#fire_transition!` calls
`Attachable#delete_all_attachments` on the edition.

* This in turn marks each attachment as deleted, but it does *not* call
`Attachment#destroy`.

* This means that even though the only `Attachment` referring to the
relevant instance of `AttachmentData` has been marked as deleted, the
`AttachmentData` is not destroyed.

* So although requests for the attachment URL will result in a `404 Not
 Found`, because of the logic in `AttachmentVisibility#edition_ids` and
`AttachmentsController#fail` mentioned in the previous commit, the
corresponding asset in Asset Manager will not be deleted.

* I think the above somewhat inconsistent behaviour was an oversight in
this commit [1] and I plan to rectify it in a subsequent commit.

[1]:
1832d02
There are two ways that AttachmentsController#show can respond with a
404 Not Found for a deleted attachment:

A. If the AttachmentData does not exist the call to
AttachmentData.find raises an ActiveRecord::RecordNotFound exception
which is translated into a 404 Not Found in production [1].

B. If the Attachment is marked as deleted, its edition is not included
in AttachmentVisibility#edition_ids [2] and thus
AttachmentVisibility#visible? will return `false` and this line [3] in
AttachmentsController#fail will be invoked.

Previously, for scenarios involving deleted attachments†, the application
was only relying on mechanism A when a single attachment was deleted
from a document which had never been published. Even in the very similar
scenario when a document which has never been published is discarded,
the AttachmentData instances associated with its attachments are not
destroyed.

The changes in this commmit mean that the application now relies on
mechanism B for *all* scenarios involving deleted attachments, i.e. it
makes things more consistent.

I did consider using mechanism A in all scenarios, but this would've
meant a lot more changes and it would mean we were potentially deleting
useful data.

In some ways this is a step backwards in terms of marking attachment
assets as deleted in Asset Manager, but there was already work to do on
that front and now it can start from a more consistent starting point.

† Mechanism A is still used if the AttachmentData ID in the URL path
does not exist.

[1]:
http://guides.rubyonrails.org/v5.0.6/action_controller_overview.html#rescue
[2]:
https://github.com/alphagov/whitehall/blob/32bb13cf3ae3ae9ca71fa14bd4c620c631660419/app/models/attachment_visibility.rb#L116-L118
[3]:
https://github.com/alphagov/whitehall/blob/32bb13cf3ae3ae9ca71fa14bd4c620c631660419/app/controllers/attachments_controller.rb#L31
Attachment#destroy already includes the behaviour within the each block,
so this is effectively duplicate code.

Also Attachment#destroy fires the relevant ActiveRecord callbacks which
seems more sensible, even though we removed the last use of this in the
previous commit.
Copy link
Contributor

@chrislo chrislo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I think it's an improvement to make this consistent.

Does the oversight you mentioned in the second commit mean that there are assets in Whitehall / Asset Manager that should have been deleted but have not been? Do you think we should do anything about that (not as part of this PR, but later)?

@floehopper
Copy link
Contributor Author

@chrislo

Thanks for reviewing.

Does the oversight you mentioned in the second commit mean that there are assets in Whitehall / Asset Manager that should have been deleted but have not been? Do you think we should do anything about that (not as part of this PR, but later)?

I assume you are referring to this comment in that commit note:

So although requests for the attachment URL will result in a 404 Not Found, because of the logic in AttachmentVisibility#edition_ids and AttachmentsController#fail mentioned in the previous commit, the corresponding asset in Asset Manager will not be deleted.

I plan to address this in alphagov/asset-manager#509 and alphagov/asset-manager#469. The idea of this commit is to make those changes easier, because they'll be able to start from a consistent set of behaviour. My plan is not to rely on the AttachmentData being destroyed in order to delete corresponding assets in Asset Manager, but instead to hook in to the relevant model/workflow lifecycle events and do queries involving the Attachment#deleted flag to determine whether the corresponding asset in Asset Manager should be deleted.

I hope that makes sense. I'm going to merge this now.

@floehopper floehopper merged commit e90c756 into master Mar 7, 2018
@floehopper floehopper deleted the do-not-destroy-attachment-data-for-attachments-marked-as-deleted branch March 7, 2018 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants