-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve job list API with more fetching capabilities #16415
Improve job list API with more fetching capabilities #16415
Conversation
cd91a7d
to
a49f985
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review here since I have a question about the API change itself.
@@ -3769,6 +3769,9 @@ components: | |||
$ref: "#/components/schemas/JobConfigType" | |||
configId: | |||
type: string | |||
includingJobId: | |||
description: If the job with this ID exists for the specified connection, returns all jobs created after and including this job, or the full pagination pagesize if that list is smaller than a page. Otherwise, this field is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels hard to use from a client perspective.
If I call the API using this attribute, the only way to know that the jobId doesn't exist is to check all the results to see if the jobId appears.
I am not sure about your use cases, but returning an empty list if the job doesn't exists feels more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case is that someone can link to a particular job by creating a url such as https://cloud.airbyte.io/workspaces/a385d3d9-4512-4a3a-9240-cea3d237c822/connections/aa3b8ef1-2f39-48b8-bad4-dd12b0d3876d/status#575674::0 -- notice the #575674::0
at the end; the first number is the job ID (second is the attempt ID). This type of link can also be generated by clicking the "link" icon button on a job in the UI.
My reasoning here was that if someone correctly supplies a job ID that is in the connection, this will work as expected. Otherwise, they are specifying a job ID that doesn't exist in the connection, so there is nothing for us to do with that #jobId::attemptId
suffix, and we should therefore just ignore that and return the normal job list.
However, your comment has me thinking that it might be a better user experience if we instead showed a specific message indicating that the linked job could not be found, with a button to return them to the normal job list page. With that approach, I think your suggestion of returning an empty list of the job cannot be found makes sense, so I can look into making that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to return an empty list if the target job cannot be found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. Can I voice the opposite opinion here? :D
I feel if we can't find the ID that's passed in, we should just return the specified page size instead and ignore trying to find that job. I think that would play nicer together with the frontend, since we otherwise might want to add additional logic in the FE.
The ID will just be part of the fragment of the URL, a user might accidentally change that to an invalid ID. If you try to load the page like that, we'd now just see an empty page with a "Load more" button. That feels like a bit weird behavior to me, and I think we rather want to just load the 25 first jobs still in this case. We can of course address that in the FE by making another API call if we see the list is empty, but had a totalCount that's > 0, but I feel it would be easier specifying this API in a way that it will return the regular page size if the ID isn't found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better UX. Ignore my comment then
@Override | ||
public Long getJobCount(final Set<ConfigType> configTypes, final String connectionId) throws IOException { | ||
final Result<Record> result = jobDatabase.query(ctx -> ctx.fetch( | ||
"SELECT COUNT(*) FROM jobs WHERE CAST(config_type AS VARCHAR) in " + Sqls.toSqlInFragment(configTypes) + " AND scope = '" + connectionId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we are not using jooq to generate the query here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not any particularly good reason -- I was mainly just following the trend of the other job query implementations. But using jooq is definitely more maintainable, so I can switch to that instead 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored the full queries that I added into jooq and they look a lot better now. Thanks for the suggestion!
8f080cf
to
d5a946c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3769,6 +3769,9 @@ components: | |||
$ref: "#/components/schemas/JobConfigType" | |||
configId: | |||
type: string | |||
includingJobId: | |||
description: If the job with this ID exists for the specified connection, returns all jobs created after and including this job, or the full pagination pagesize if that list is smaller than a page. Returns an empty list if this job is specified and cannot be found in this connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename it starting jobId? It sounds more descriptive than includingJobId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I actually like startingJobId, I think that is better. I'll make that change 👍
} | ||
|
||
// list of jobs up to target job is larger than pagesize, so return that list | ||
final String jobsSubquery = "(SELECT * FROM jobs WHERE CAST(config_type AS VARCHAR) in " + Sqls.toSqlInFragment(configTypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we convert that to jooq as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that one is a bit harder to convert to jooq, because it is using the jobSelectAndJoin()
method which takes in the subquery and inserts that into the SELECT statement string that it builds. However, I may be able to build the query with jooq and then call something like toSql()
to convert it into a string that can be passed to that method. I'll play around with that for a bit but no guarantees on if I can get it working properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to convert these to jooq using getSql(ParamType.INLINED)
, so I have pushed that up to this PR as well! Thanks for the suggestion, I think it looks a lot cleaner now
@@ -353,6 +362,39 @@ public List<Job> listJobs(final Set<ConfigType> configTypes, final String config | |||
jobSelectAndJoin(jobsSubquery) + ORDER_BY_JOB_TIME_ATTEMPT_TIME))); | |||
} | |||
|
|||
@Override | |||
public List<Job> listJobsIncludingId(final Set<ConfigType> configTypes, final String connectionId, final long targetJobId, final int pagesize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have use case for listJobsWithStatusIncludingJobId
? I am asking because it feels like we may just want to have a configurable query function rather than specific queries each time.
Some parts such as pagination should be common for example.
If this make sense, we can track it as a task later, doesn't have to be in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we have a use case for that today -- currently the only thing using listJobsWithStatus
is the JobCleaner which I don't think is even running, and it doesn't have the same need of directly linking to a specific job that the frontend has
c57a400
to
ed71403
Compare
* start implementation of new persistence method * add includingJobId and totalJobCount to job list request * format * update local openapi as well * refactor queries into JOOQ and return empty list if target job cannot be found * fix descriptions and undo changes from other branch * switch including job to starting job * fix job history handler tests * rewrite jobs subqueries in jooq * fix multiple config type querying * remove unnecessary casts * switch back to 'including' and return multiple of page size necessary to include job * undo webapp changes * fix test description * format
* start implementation of new persistence method * add includingJobId and totalJobCount to job list request * format * update local openapi as well * refactor queries into JOOQ and return empty list if target job cannot be found * fix descriptions and undo changes from other branch * switch including job to starting job * fix job history handler tests * rewrite jobs subqueries in jooq * fix multiple config type querying * remove unnecessary casts * switch back to 'including' and return multiple of page size necessary to include job * undo webapp changes * fix test description * format
What
Relates to #16369
As described in the issue linked above, the frontend currently does not have a great way to fetch all jobs up to a specific job ID with the current pagination implementation in the list jobs API endpoint.
This PR aims to solve this by adding a couple of fields to the list jobs API:
includingJobId
is added to theJobListRequestBody
, which will be used to retrieve all jobs created after and including the specified job, so that specific jobs can be linked directly with a URL and the frontend can retrieve all necessary jobs with a single requesttotalJobCount
is added to theJobReadList
response, which contains the total count of jobs in the connection. This solves a tangentially-related issue where the frontend does not know if more jobs can be retrieved for a connection until all of them have been retrieved. Adding a total count value to the list jobs response is a standard approach in pagination to help the frontend understand if there are more jobs to load, and therefore decide whether or not to show theLoad more
button.How
Add the above fields to the open API specification, and implement the logic for them through queries in the DefaultJobPersistence class. The only additional queries being added here are queries to retrieve a single job record and retrieve the count of jobs for a connection, which should both be very quick.
Recommended reading order
airbyte-api/src/main/openapi/config.yaml
- the API changesDefaultJobPersistence.java
- the new query implementationsJobHistoryHandler.java
- where the new queries are used to form the new response🚨 User Impact 🚨
No breaking changes for users - just some new fields being added to the list jobs API which API users can decide to use if they would like.