Releases: lithictech/sidekiq-amigo
1.8.0: Better retry
What's Changed
- Retry: Better exception messages and structure by @rgalanakis in #15
- publish matcher can accept a matcher as a topic by @rgalanakis in #14
- Audit logger level is overrideable by @rgalanakis in 74f4fad
Full Changelog: 1.6.1...1.8.0
Ability to disable cron splay behavior
Amigo::ScheduledJob
supports a splay
option to avoid 'thundering herd' issues. For some jobs that occur very frequently, however, this can be a problem. Use splay nil
to disable the splay behavior.
What's Changed
- Support
splay nil
to disable splay behavior by @rgalanakis in #13
Full Changelog: 1.6.0...1.6.1
Autoscaler: Add 'scale down' support, and add example Heroku autoscaler
What's Changed
Full Changelog: 1.5.0...1.6.0
Features
Add 'latency restored' callback to Amigo::Autoscaler
You can now provide latency_restored_handlers
(and other params) to get a callback when the high latency event is finished.
See 29ffd51 for more details.
Add Amigo::Autoscaler::Heroku
Example implementation of up/down autoscaling via Heroku.
Usage:
heroku_scaler = Amigo::Autoscaler::Heroku.new(heroku: PlatformAPI.connect_oauth(ENV['APP_HEROKU_OAUTH_TOKEN']))
autoscaler = Amigo::Autoscaler.new(
handlers: [heroku_scaler.alert_callback],
latency_restored_handlers: [heroku_scaler.restored_callback]
)
autoscaler.start
Autoscaler supports a custom log
method
Use like:
log: ->(level, message, context) { Sidekiq.logger.send(event, "#{message}: #{context.to_json}") }
The context
argument are structured logging keywords; see Amigo.log
for more info.
Bugfixes
Fix bug with wrong interval usage
We were using the poll interval rather than the alerting interval,
which by default would result in alerting more often than expected
since the poll interval is usually shorter than the alert interval.
`Amigo::Job#pattern` can be a `Regexp`
What's Changed
- Allow regular expression patterns in jobs by @DeeTheDev in #11
In addition to on "x.y.*"
which uses fnmatch (so matches x.y.z and x.y.z.w), you can use on /^x\.y\.[a-z]+$/
to only match x.y.z, for example.
New Contributors
- @DeeTheDev made their first contribution in #11
Full Changelog: 1.4.2...1.5.0
Use Amigo::Retry::Quit to jump out of a job
Where you don't want to retry or die, but just want to jump out of the job.
Full Changelog: 1.4.1...1.4.2
`Amigo::Retry` errors store error information on the job
The jobs did not store the error_class
or error_message
(since they didn't hit Sidekiq's retry system). Store whatever the caller provides in the error, so you can provide context like "raise Amigo::Retry::Retry.new(5, "rate limited")to see that the job was rate limited or
raise Amigo::Retry::Die("invalid token")` to kill a job because it cannot auth.
Add Amigo::Autoscaler
When queues achieve a latency that is too high,
take some action.
You should start this up at Sidekiq or Web application startup:
# puma.rb
Amigo::Autoscaler.new.start
Right now, this is pretty simple- we alert any time
there is a latency over a threshold.
In the future, we can:
-
actually autoscale rather than just alert
(this may take the form of a POST to a configurable endpoint), -
become more sophisticated with how we detect latency growth.
1.3.0: Better handling of failure cases
What's Changed
Better handling of errors in subscribers
We were not handling subscriber errors aggressively enough. The idea was that, because publish and subscribe are decoupled, a failure in a subscriber should NOT be a critical error in the app- it should be a failure of just that subscriber. And we log the error out of convenience.
This makes sense in theory, in a truly distributed pub/sub system, but in practice, clients depend on subscriptions to publish properly; for example, if model events fail to publish (because, say, they enqueue a job into Redis) many critical components of a system will not work.
Instead, the default behavior should be to raise an error if any subscriber fails. This would cause code that fails to publish to raise an error on the publish
call, returning a 500, failing the job, etc.
To continue the old behavior, you can use Amigo.on_publish_error = proc {}
.
This also improves the logging so that we log the full event details, rather than just the representation of the object. This means that, if something critical fails to publish, there is still a record of the event for future use in repairs.
Handle audit logger perform_async failure
If the audit logger job cannot perform_async
, we would get an error in the subscriber. This is not great, since it means we don't get the event audited.
Instead, if we cannot .perform_async
we can .new.perform
. This ensures we still get the event audited, even if Redis is down.
See #7
Full Changelog: v1.2.2...v1.3.0
v1.2.2
What's Changed
- Test publisher isolation should be recursive #4
- Amigo::Event can turn into Sidekiq strict arg JSON #4
- Automatically register jobs #3 (actually version 1.2.1)
Full Changelog: v1.2.0...v1.2.2
New features, and job auto-registration
What's Changed
Automatically register jobs when Job is extended by @rgalanakis in #3
Add new jobs and features by @rgalanakis in #2
Retry
can be used for custom retry behavior,
where the code being run, and not the queue,
needs to specify how something should retry and die.QueueBackoffJob
can be used to reschedule a job before it runs
if another queue has latency (avoid saturing all worker threads
with a single queue).RateLimitedErrorHandler
is used to avoid handling the same error
many times over within the same process, to avoid error spam.SemaphoreBackoffJob
is used to run a job under a semaphore key
to avoid more concurrency than desired (ie avoid a single user
saturating all worker threads).- Spec helpers for working with a real Redis