Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: save dependantJobs as array of stepIds rather than job instances #30

Merged

Conversation

connors511
Copy link
Contributor

I recently started having concerns about the payload size when saving dependantJobs with full job payloads and today I got my first exception for workflow_jobs.job column being too small.
Average size of the job column for my project while testing is almost 20kb, with jobs regularly creeping up towards 40-50kb and this is just for the small workflows/jobs in the project.
This PR reduces the size of the job column by 85% on average for me, but upwards of 95% on my current test data.

What are your thoughts on this?

Signed-off-by: Matthias Larsen <mlarsen@gammamedia.org>
@ksassnowski
Copy link
Owner

I've only looked at this briefly, but it seems like a pretty reasonable change. I'll have a more thorough look at it soon 👍

Just out of curiosity, what was the exception you were getting? Because the job column is a text column, so a few kilobytes really shouldn't be a problem. Or is it Horizon or Redis crashing because the payload is too large?

@connors511
Copy link
Contributor Author

SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'job' at row 1 (SQL: insert into `workflow_jobs` (`job`, `name`, `uuid`, `edges`, `workflow_id`) values

The payload of each job is a bit large, which then quickly escalates when the Workflow contains 20 - 30 jobs and I expect some of my workflows to contain a lot more than 30 jobs.

@connors511
Copy link
Contributor Author

I think there might be an issue with this PR and nested workflow functionality.

A job's dependencies gets saved as $workflowId . '.' . $jobId, resulting in jobs having their list of dependencies as "fully qualified names", where as Workflow@onStepFinished just saves $job->jobId in the finished_jobs array..
Maybe change the finished_jobs to use a Job's stepId instead?

@connors511
Copy link
Contributor Author

A less invasive way might be to change ->withDependencies($this->graph->getDependencies($id)) in WorkflowDefinition@build to something like ->withDependencies(Arr::last(explode('.', $this->graph->getDependencies($id)))) to just get the jobId part 🤔

Signed-off-by: Matthias Larsen <mlarsen@gammamedia.org>
Signed-off-by: Matthias Larsen <mlarsen@gammamedia.org>
@connors511
Copy link
Contributor Author

The latest commit works and jobs are dispatched as before. Should probably add a test or two for the $job->dependencies and $workflow->finished_jobs matching up as expected. I'll see if I can get to it tomorrow.

Signed-off-by: Matthias Larsen <mlarsen@gammamedia.org>
@ksassnowski
Copy link
Owner

I’ll try to have a look at this during the weekend. I haven’t forgotten about this, I’m just swamped with work right now 😅

@connors511
Copy link
Contributor Author

Totally fair, I know the feeling :D
I haven't had a chance to look at #29 yet either

$workflow->onStepFinished($job1);

assertEquals(['::job-id::'], $workflow->refresh()->finished_jobs);
});
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two tests should be the same test but using a data provider instead. Something like this (untested)

it('stores a finished job\'s id', function ($job, string $expectedJobId) {
    $workflow = createWorkflow([
        'job_count' => 1,
        'jobs_processed' => 0,
    ]);

    $workflow->onStepFinished($job);

    assertEquals([$expectedJobId], $workflow->refresh()->finished_jobs);
})->with([
    'no job id should default to class name' => [
        new TestJob1(),
        TestJob1::class,
    ],
    'use existing job id' => [
        (new TestJob1())->withJobId('::job-id::'),
        '::job-id::',
    |
]);


$model->onStepFinished($job);

assertEquals(NestedWorkflow::class . '.' . TestJob1::class, $job->jobId);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assertion is unnecessary because you also added another test somewhere that specifically tests for this (ids of nested jobs get namespaced properly)

@ksassnowski
Copy link
Owner

The one thing I don't quite understand about this PR is that there are quite a few changes regarding the jobId. I can't quite figure out that relates back to storing the dependent jobs as stepIds instead of instances. Are these unrelated things or am I missing something?

@ksassnowski
Copy link
Owner

Oh I see you mentioned this in a previous comment. I've also been wondering for the last couple of minutes why I save the jobId instead of the stepId in finished_jobs 😅

Let me see if I can fix this in the main branch so we can hopefully simplify this PR a bit

@ksassnowski ksassnowski added the enhancement New feature or request label May 3, 2021
@ksassnowski ksassnowski self-assigned this May 3, 2021
@ksassnowski ksassnowski added this to the 3.2.0 milestone May 3, 2021
@ksassnowski
Copy link
Owner

ksassnowski commented May 11, 2021

Can you take a look at my last two comments (regarding the tests)? I think I’m going to merge this for now and refactor to using the stepId later.

Signed-off-by: Matthias Larsen <mlarsen@gammamedia.org>
@ksassnowski
Copy link
Owner

Thanks a lot for the contribution! I’ll merge this for now and do a few manual tests locally before tagging a new release 👍

@ksassnowski ksassnowski merged commit ffd637e into ksassnowski:master May 13, 2021
@ksassnowski
Copy link
Owner

This is part of the 3.1.1 release. I had to make one additional change to not break backwards compatibility for steps that were created before this version but hadn’t run yet (cf7fea8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants