Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduces the use of GoodJob::Batch for CopyProjectJob #15054

Merged
merged 12 commits into from
Apr 5, 2024

Conversation

mereghost
Copy link
Contributor

@mereghost mereghost commented Mar 20, 2024

This PR addresses the following Work Packages: OP#53036 and OP#53035

What is this?

It all started when we decided to add OneDrive/Sharepoint as part of the File Storages offering… It was a rainy night at a dive bar in downtown Ber

Supporting OneDrive/Sharepoint (OD for brevity sake) has presented us with new challenges, from what was the actual concept of a FileStorage to implementation details such as permission handling et al. It works so differently than our other use case (Nextcloud) that we had to adapt our codebase and software behavior to make it conform to both.

This is one of those places: copying a project or using it as a template. This PR aims to implement this feature for OD.

Why do we need this?

OD, when it comes to copying folders is an asynchronous call. We ask for the copy, they gives an URL to check the status of the copy job and at the end handles us the new resource_id. Compared to the synchronous nature of Nextcloud, this was quite the change.

So the code would need to be bent and reshaped to our needs: we'd need to poll the OD provided URL until it is done, which of course presents its own challenges.

During this time we had GoodJobPavel merged giving us the ability to run jobs in batches, which fits this type of work as a glove.

How does it work?

Part 1: Batches 101

To make a long topic short: GoodJob::Batch coordinates jobs enqueued under it. It also makes possible to share some state between the jobs, by the use or Batch#properties and to add other jobs as callbacks to batch events (such as finished, discarded etc).

To enqueue a new Batch :

GoodJob::Batch.enqueue(on_finish: CallBackJob, some_key: 'this becomes part of the batch properties') do
  MyCoolJob.perform_later(my_awesome: job_arguments)
end

# or alternatively the slow build

batch = GoodJob::Batch.new
batch.on_finish: CallBackJob
batch.properties[:some_key] = 'this becomes part of the batch properties'
batch.add { MyCoolJob.perform_later(my_awesome: job_arguments) }
batch.save
batch.enqueue

The batch properties are serialized by ActiveJob::Serializers so anything that's safe for a job, is safe for it.

From inside a job, if you have included GoodJob::ActiveJobExtensions::Batches you can access the batch by simply calling batch.

class SomeJob < ActiveJob
  include GoodJob::ActiveJobExtensions::Batches
  
  def perform
    batch.properties[:some_key]
    batch.properties[:featuring] = 'David Bowie'
    batch.save
    
    batch.enqueue { OtherCoolJob.perform(:under_pressure) }
  end
end

But that's not all! Batches can be updated in runtime, adding new jobs or properties to it as demonstrated above. Cool, huh?

Part 2: The current state

Here is a small list of all the pieces that already exist that needed some tuning or revamps.

  1. Project::CopyService: This is were the bulk of the action happens. Project::CopyService uses the structure of the DependentServices to handle all its intricacies.
  2. CopyProjectJob: This is where we ensure that everything non-copying is done, like making sure we respect the user localization and sending e-mails . It also is the job that gets polled by the UI for copy completion.
  3. StoragesDependentService, StorageProjectFoldersDependentService, FileLinksDependentService: this is where most of the work for Storages happen, making sure that the particularities of each storage and project storage mode are respected.

These pieces worked together to provide the the entirety of the Copy Folder feature taking into account the Nextcloud integration.

Part 3: What changed (aka the cool part):warning:

First thing was to break most of the behavior for copying project folders from CopyService. This created 2 services: ProjectStorages::CopyProjectFoldersService and FileLinks::CopyFileLinksService. Those 2 new services take over the old Dependent services but provide basically the same functionality.

But CopyService uses the .copy_dependencies method as a base to tell the UI what can be copied, so here it comes the first jury-rig of the PR .copyable_dependencies was overridden and the entries relating to FileLinks and ProjectFolders added by hand (aka copy-pasted) from the now deprecated services.

But now we have to deal with "maybe polling", so a new background job comes in: CopyProjectFoldersJob. The main point of this job is to retry itself if a PollingRequired exception is raised. This will happen if the result of the CopyFolderCommand indicates that it requires polling.

So, now we need a way to coordinate the CopyProjectJob with one or more CopyProjectFoldersJob, that's where the GoodJob::Batch comes in. Most of the changes will affect the CopyProjectJob. I'll add some snippets, but please check out the code.

CopyProjectJob Batch edition

After cleaning up a bit the instance variables (it had like 11 million of them), we hook it to the batch system. We also want to enqueue the CopyProjectFoldersJob from within it, adding them to the batch. But how? Luckily CopyProject keeps a state with a lot of information about what has transpired internally so that other Dependent services can rely on previous data.

So right after the copy process concludes, we tap into that state (conveniently called #state), and look for the key that we are interested in: copied_project_storages. This carries the source and copied ProjectStorage pairs, so looping through this list we add a new CopyProjectFoldersJob for each copied ProjectStorage, passing along some extra needed information.

Now that we have the batch, we can rely on it to store the polling URLs and the state of the polling process without the need for error prone Thread.current values.

We also only want to e-mail the user once the entire process completes, so another job was born SendCopyProjectStatusEmailJob. This one is added as a callback that is called once the entire batch finishes (successfully or not). Sounds easy enough, but it needs a bunch of information (errors, copied project name etc) so we push all this info to the Batch#properties making easy this new job easy to access the necessary info (or any other jobs in the batch).

Ok, cool. Which are the gotchas here?

Since our API doesn't know how to poll for Batch completion the user will be redirected once the CopyProjectJob is done even if the CopyProjectFoldersJob are enqueued/running. The completion email, will only be sent once everything is done.

The amount of change would not be trivial, so I left this as a future improvement (future can be tomorrow 🤣). Storages aren't also writing to the errors right now as we still need to decide on how to deal with commands that might error successfully (or some operations might fail, but overall be considered a success).

@mereghost mereghost self-assigned this Mar 20, 2024
@mereghost mereghost changed the base branch from dev to impl/split-storage-jobs March 20, 2024 15:50
@mereghost mereghost force-pushed the impl/split-storage-jobs branch from 1391c36 to 8856f81 Compare March 21, 2024 13:13
@mereghost mereghost force-pushed the impl/batch_copy_project_job branch 2 times, most recently from 5185bfb to cb0fe34 Compare March 21, 2024 16:33
@mereghost mereghost marked this pull request as ready for review March 22, 2024 17:57
@mereghost mereghost force-pushed the impl/batch_copy_project_job branch 4 times, most recently from 5725a25 to 8d7a24e Compare March 25, 2024 15:18
@mereghost mereghost force-pushed the impl/split-storage-jobs branch from 8856f81 to 1aa09a2 Compare March 25, 2024 15:36
@mereghost mereghost force-pushed the impl/batch_copy_project_job branch from 8d7a24e to 5bdd1ca Compare March 25, 2024 16:07
@mereghost mereghost changed the base branch from impl/split-storage-jobs to dev March 25, 2024 16:09
@mereghost
Copy link
Contributor Author

Well... this dev rebae was intense.

Guy sweating

@mereghost mereghost force-pushed the impl/batch_copy_project_job branch from d25eb60 to 6145d8d Compare March 29, 2024 09:01
@mereghost mereghost force-pushed the impl/batch_copy_project_job branch from 6145d8d to 761cc4b Compare April 2, 2024 16:53
Copy link
Member

@akabiru akabiru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ Amazing feat @mereghost 👏🏾 and as always, thank you for the detailed PR description! 👍🏾


✅ Code looks great to me- added Qs and discussion points
📆 Will reach out to you for some smoke testing

@mereghost mereghost merged commit 2c7e76e into dev Apr 5, 2024
9 checks passed
@mereghost mereghost deleted the impl/batch_copy_project_job branch April 5, 2024 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants