Add middleware system for jobs #584

brandur · 2024-09-09T01:46:14Z

Here, experiment with a middleware-like system that adds middleware
functions to job lifecycles, which results in them being invoked during
specific phases of a job like as it's being inserted or worked.

The most obvious unlock for this is telemetry (e.g. logging, metrics),
but it also acts as a building block for features like encrypted jobs.

Here, experiment with a middleware-like system that adds middleware functions to job lifecycles, which results in them being invoked during specific phases of a job like as it's being inserted or worked. The most obvious unlock for this is telemetry (e.g. logging, metrics), but it also acts as a building block for features like encrypted jobs.

brandur · 2024-09-09T01:49:52Z

rivertype/river_type.go

+	//
+	// InsertBegin is *not* invoked on a batch insertion with InsertMany or
+	// InsertManyTx. Integrations should implement InsertManyBegin separately.
+	// InsertBegin(ctx context.Context, params *JobLifecycleInsertParams) (context.Context, error)


So I originally tried to write this system as "hooks" instead of middleware, and you can see what the old entrypoints looked like here (left them in as gravestones for the time being). They almost worked, but I found that I was running into two major problems with them compared to middleware:

It'd be a common thing to want to be put something in context in the "begin" part, then extract that thing in the "end" part (e.g. for metrics or whatever). I accomplished that by returning a context here as you can see, but it was fairly awkward, and I don't think users would've like it.

The bigger issue is that it was hard to know how the "end" functions should be called (or not called) under various error conditions like a panic or error returned. We would've either need to send back parameters for all of (1) successful return, (2) possible error, (3) panic val, OR just have totally separate functions for when errors occurred, but both those options were extremely awkward. With middleware, the return values are right there, and it's up to the caller to just do whatever they want with them.

Makes sense, I think this is probably the cleaner & more flexible option.

brandur · 2024-09-09T01:52:40Z

client.go

@@ -1150,7 +1155,7 @@ func (c *Client[TTx]) ID() string {
 	return c.config.ID
 }

-func insertParamsFromConfigArgsAndOptions(archetype *baseservice.Archetype, config *Config, args JobArgs, insertOpts *InsertOpts) (*riverdriver.JobInsertFastParams, *dbunique.UniqueOpts, error) {


So not 100% sure on this one yet, but the "insert" part of the middleware needs to receive a type that's not a JobRow because we don't have a job row yet. I basically took JobInsertParams, duplicated it, and promoted it to rivertype. The types are different for now, but they can be type converted to one another because they have the same fields. I basically did this because I like the naming of JobInsertParams, but they could also be the same type or even slightly different types with a couple fields dropped (CreatedAt for example, which is only needed for time injection).

Either way, JobInsertParams stays internal for now, so it should leave refactoring flexibility ...

What do you think about pulling arg encoding out of this helper and passing the middleware functions a type that includes raw unencoded JobArgs? My thought is that this unlocks more dynamic behavior, because middleware then have the ability to do type assertions against JobArgs including to assert interface implementations.

The downside is they lose the ability to directly access the encoded json bytes, but then I'm not sure I know of any cases where that's desirable. For metadata, sure, but not for args.

brandur · 2024-09-09T01:55:40Z

@bgentry Wanted to get a minimal POC out there just so I don't spend too much time on this in case there's backpressure, so didn't write tests or anything like that. Rough thoughts?

bgentry

Awesome progress. I had a few thoughts/concerns to consider whether this design may need some tweaks, but I'm generally good to move forward with it.

bgentry · 2024-09-09T20:09:46Z

middleware_defaults.go

+func (l *JobMiddlewareDefaults) Insert(ctx context.Context, params *rivertype.JobInsertParams, doInner func(ctx context.Context) (*rivertype.JobInsertResult, error)) (*rivertype.JobInsertResult, error) {
+	return doInner(ctx)
+}
+
+func (l *JobMiddlewareDefaults) InsertMany(ctx context.Context, manyParams []*rivertype.JobInsertParams, doInner func(ctx context.Context) (int, error)) (int, error) {
+	return doInner(ctx)
+}
+
+func (l *JobMiddlewareDefaults) Work(ctx context.Context, job *rivertype.JobRow, doInner func(ctx context.Context) error) error {
+	return doInner(ctx)
+}


I think for this to be the most useful it would need to be customizable on a per-job basis. I'm wondering what that looks like in practice with this design. Like what if I wanted to add a middleware that uses some aspect of the worker or args to dynamically determine what to do? (maybe some optional interface gets fulfilled by either of those types to indicate to the middleware what it should do).

The problem with trying to do that here is the args have already been encoded, so there's no longer any access to the underlying JobArgs type. Is there any path to potentially having the middleware stack get called before the JSON encoding part? That could more easily enable dynamic behavior based on the type.

Additionally, this might be further exposing the somewhat confusing split between JobArgs and Worker implementations. We had some recent customer feedback about it being a little weird that i.e. the timeout must be customized on the Worker and can't easily be tweaked at insertion time via the args. In this case though you mentioned potentially allowing for middleware to be configured at the JobArgs level, which seems fine for insert time but IMO doesn't make any sense for the Work() middleware. I don't want to have two separate middleware stacks/concepts, but it does feel a bit odd to have both of these on a single interface given the way this split is designed today 🤔

I ended up putting JobArgs into the struct in my uniqueness PR #590 and I think it's a great thing to have available. Can still encode args in advance but just keep the original around for introspection.

bgentry · 2024-09-09T20:10:29Z

rivertype/river_type.go

+	//
+	// InsertBegin is *not* invoked on a batch insertion with InsertMany or
+	// InsertManyTx. Integrations should implement InsertManyBegin separately.
+	// InsertBegin(ctx context.Context, params *JobLifecycleInsertParams) (context.Context, error)


Makes sense, I think this is probably the cleaner & more flexible option.

client.go

bgentry · 2024-09-13T03:52:14Z

I rebased this branch on top of #589 now that it's merged. I also refactored our bulk insert methods so they use the same underlying code aside from a narrow adapter function for each query. I think it can be improved further, but IMO it's a good start: https://github.com/riverqueue/river/compare/bg-lifecycle-hooks-on-insert-many-refactor?expand=1

Finally, I had the realization that given the goal of aligning our Insert and InsertMany APIs to use a single code path (each supporting the same set of features too) that maybe we don't want to introduce a middleware interface that differentiates between those two. I made that change in my above branch as well.

bgentry · 2024-09-30T01:35:24Z

I know we talked about potentially wanting to have the database transaction available as part of the middleware interface, but I think I've talked myself out of that. With the driver concept it becomes pretty tough to do that cleanly, especially given we don't want the driver interfaces to be considered stable. I also don't think it's needed for anything I'm doing at the moment (#627 seems like the way for me to do database-related customizations).

bgentry · 2024-10-05T20:20:04Z

A version of this was merged in #632 and will be in the next release.

brandur commented Sep 9, 2024

View reviewed changes

bgentry reviewed Sep 9, 2024

View reviewed changes

bgentry mentioned this pull request Sep 15, 2024

Bulk unique insertion, uniqueness with subset of args #590

Merged

9 tasks

bgentry mentioned this pull request Sep 23, 2024

Unify all bulk inserts to a single code path #610

Merged

This was referenced Oct 4, 2024

remove deprecated advisory lock uniqueness, consolidate insert logic #614

Merged

Add middleware system for jobs #632

Merged

bgentry closed this Oct 5, 2024

bgentry deleted the brandur-lifecycle-hooks branch October 5, 2024 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add middleware system for jobs #584

Add middleware system for jobs #584

brandur commented Sep 9, 2024

brandur Sep 9, 2024

bgentry Sep 9, 2024

brandur Sep 9, 2024 •

edited

Loading

bgentry Sep 10, 2024

brandur commented Sep 9, 2024

bgentry left a comment

bgentry Sep 9, 2024

bgentry Sep 17, 2024 •

edited

Loading

bgentry Sep 9, 2024

bgentry commented Sep 13, 2024

bgentry commented Sep 30, 2024

bgentry commented Oct 5, 2024

Add middleware system for jobs #584

Add middleware system for jobs #584

Conversation

brandur commented Sep 9, 2024

brandur Sep 9, 2024

Choose a reason for hiding this comment

bgentry Sep 9, 2024

Choose a reason for hiding this comment

brandur Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

bgentry Sep 10, 2024

Choose a reason for hiding this comment

brandur commented Sep 9, 2024

bgentry left a comment

Choose a reason for hiding this comment

bgentry Sep 9, 2024

Choose a reason for hiding this comment

bgentry Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

bgentry Sep 9, 2024

Choose a reason for hiding this comment

bgentry commented Sep 13, 2024

bgentry commented Sep 30, 2024

bgentry commented Oct 5, 2024

brandur Sep 9, 2024 •

edited

Loading

bgentry Sep 17, 2024 •

edited

Loading