-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Allow Bulkrax to assign queue based on number of rows in CSV #671
base: main
Are you sure you want to change the base?
[WIP] Allow Bulkrax to assign queue based on number of rows in CSV #671
Conversation
Basic way of switching queues
Working Tests Set bulk on collections
Most tests working Import Mode Tests
9950d29
to
0f5520c
Compare
module HykuAddons | ||
module ImportMode | ||
extend ActiveSupport::Concern | ||
|
||
def queue_name | ||
return super if non_tenant_job? | ||
switch do | ||
return super unless Flipflop.enabled?(:import_mode) | ||
puts "Job: #{self.class}. Bulk is #{bulk?}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These logger/puts statements are very useful for debugging but must be removed before this is merged.
@@ -0,0 +1,15 @@ | |||
# frozen_string_literal: true | |||
# rubocop:disable Rails/Output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disable statement should be removed, along with the puts statement before merge
@@ -0,0 +1,15 @@ | |||
# frozen_string_literal: true | |||
# rubocop:disable Rails/Output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disable statement should be removed, along with the puts statement before merge
@@ -0,0 +1,14 @@ | |||
# frozen_string_literal: true | |||
# rubocop:disable Rails/Output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disable statement should be removed, along with the puts statement before merge
@@ -0,0 +1,18 @@ | |||
# frozen_string_literal: true | |||
# rubocop:disable Rails/Output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disable statement should be removed, along with the puts statement before merge
end | ||
end | ||
|
||
private | ||
|
||
def works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes clean up the class a little and allow the new code to share logic with the existing code. However they are not strictly needed for this update. Only the total_entries
method is needed and that could be refactored if you do not wish to add all these refactors.
Note: In addition to this description a Wiki article has been created to expand on this information
This PR replaces the
import_mode
feature flipper with code that checks how many works are being imported as part of an import run and switches to bulk processing queues if a threshold has passed.#Methodology
This adapts the currently logic for import mode, and essentially flips the feature on based on a database lookup. The
ImportMode
module is included onActiveJob::Base
.The
portable_object
For any arbitrary job we need a way of fetch the work or collection the job is operating on (which we call the
portable_object
) and looking up the parser class associated with that object in order to fetch the total number of objects being imported. There are a number of different ways a job may refer to aportable_object
. Some jobs will be initialised with the UUID of a work or collection. From that UUID we would be able to find whether it is currently being updated by Bulkrax, and if so, how many objects are in the importer run. In some cases the work or collection may not yet exist (perhaps because the job will create it), but aBulkrax::Entry
will exist. Once we have theBulkrax::Entry
we can again fetch the parser and count the total number of objectsThe strategy classes
There are four strategies the jobs use to identify the
portable_object
. The most simple is thePortableBulkraxEntryBehavior
which finds theBulkrax::Entry
from the id passed to the job. ThePortableActiveFedoraBehavior
fetches anActiveFedora
record and finds an associatedBulkrax::Entry
either from thesource_id
or theid
itself. ThePortableGenericBehavior
is used when the job is passed the ActiveFedora object itself rather than itsid
. The final strategy is thePortableBulkraxImporterBehavior
. This strategy applies only to the job which kicks off the import (Bulkrax::ImportJob
), which is called on neither a work nor a collection, but a table internal to Bulkrax.Import mode
Once we have identified the
portable_object
, we look up its parser and return thetotal
attribute. If this is above a threshold then import mode is triggered. The tests for this can proceed as previously. We create a generic job class and includeImportMode
. All we need to add is some mocks which will return the count of the number of works being imported. Which mocks we setup depends on which strategy is used. The spec loops through the four strategies and sets up slightly different mocks.TODO
Whilst the tests demonstrate in principle that the queues are being assigned correctly in practice it appears that the web process picks up most of the Bulkrax jobs, at least in the development environment. This is because the jobs are not being called asynchronously at all, but are called with perform now. Thus it is difficult to tell whether this PR will do what we expect without manual testing on a separate test environment.
Warning: This PR needs to be thoroughly tested before it is merged, and the change will need to be monitored for several weeks after deployment. It risks breaking import mode which will severely impact the business's ability to onboard new clients.