Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MaybeBug/MaybeDoc] Unclear expectations about dependent jobs #108

Open
nulltoken opened this issue Oct 29, 2024 · 17 comments · Fixed by #124
Open

[MaybeBug/MaybeDoc] Unclear expectations about dependent jobs #108

nulltoken opened this issue Oct 29, 2024 · 17 comments · Fixed by #124
Labels
bug Something isn't working

Comments

@nulltoken
Copy link
Collaborator

Describe the bug
Running the same job on different schedules, each schedule having different sub dependent jobs doesn't seem to work.

I've tried to expose this through a test that seems to put the "issue/expectation" under the light.

⚠️ I've been fighting a lot with the integration test infrastructure (ChannelWriter , ...). And I often lost. My understanding of it is very light. So that test may be badly written or failing for the wrong reasons.

I've tried to model something that may exist in real life: Regular internal reporting vs longer delayed external reporting.

As a side note, I've haven't been able to model a test simulating a daily vs a monthly execution and would be interested to know more about how to achieve this.

[Fact]
public async Task Hrmpf()
{
    string constantly = "* * * * *";
    string monthly = "0 0 1 * *";

    FakeTimer.SetUtcNow(DateTimeOffset.Parse("2024-10-25T16:55Z", CultureInfo.InvariantCulture));

    var storage = new DepStorage();
    ServiceCollection.AddSingleton(storage);
    ServiceCollection.AddNCronJob(n =>
    {
        n.AddJob<AnalysisJob>(b => b.WithCronExpression(constantly))
            .ExecuteWhen(success: s => s.RunJob<ReportToProductOwnerJob>());

        n.AddJob<AnalysisJob>(b => b.WithCronExpression(monthly))
            .ExecuteWhen(success: s => s.RunJob<ReportToStakeholdersJob>());
    });

    var provider = CreateServiceProvider();
    await provider.GetRequiredService<IHostedService>().StartAsync(CancellationToken);

    FakeTimer.Advance(TimeSpan.FromMinutes(2));

    await Task.WhenAll(GetCompletionJobs(2));
    string.Join(' ', storage.InvokedJobs).ShouldNotContain(nameof(ReportToStakeholdersJob));
}

private sealed class DepStorage
{
    public ConcurrentBag<string> InvokedJobs { get; } = [];
}

private sealed class AnalysisJob(ChannelWriter<object> writer, DepStorage storage) : IJob
{
    public async Task RunAsync(IJobExecutionContext context, CancellationToken token)
    {
        storage.InvokedJobs.Add(context.JobType!.Name);
        await writer.WriteAsync(true, token);
    }
}

private sealed class ReportToProductOwnerJob(ChannelWriter<object> writer, DepStorage storage) : IJob
{
    public async Task RunAsync(IJobExecutionContext context, CancellationToken token)
    {
        storage.InvokedJobs.Add(context.JobType!.Name);
        await writer.WriteAsync(true, token);
    }
}

private sealed class ReportToStakeholdersJob(ChannelWriter<object> writer, DepStorage storage) : IJob
{
    public async Task RunAsync(IJobExecutionContext context, CancellationToken token)
    {
        storage.InvokedJobs.Add(context.JobType!.Name);
        await writer.WriteAsync(true, token);
    }
}

Current behavior
ReportToProductOwnerJob is invoked every minute.
ReportToStakeholdersJob is invoked every minute.

Expected behavior
I can think of 3 options (from "best" to "worse" from the user standpoint):

  1. Make it work

    • ReportToProductOwnerJob is invoked every minute.
    • ReportToStakeholdersJob is invoked every month.
  2. Prevent it from happening

    • Make AddNCronJob() throw a NotSupportedException bearing an explicit message when this kind of usage is detected
  3. Document that it's not currently supported

    • Add a "Known gotchas" section in the documentation
    • Propose an alternative design that would allow this kind of use case

Although "Make it work" would obviously be the preferred option, my current endeavor in #106 leads me to think that this might not be a one-liner fix.

Version information

  • NCronJobVersion: 3.3.2
  • .NET runtime version: irrelevant
@linkdotnet linkdotnet added the bug Something isn't working label Oct 29, 2024
@linkdotnet
Copy link
Member

That is an interesting case and for sure a bug! Your first given option should be what is happening. That is what the user configured.

I understand the test-setup is rather complex. The issue at the moment is, that we don't have any means of looking inside the state (when in a test). Therefore, the channel writes something, and the tests listen. So that we know for sure, something was triggered.

@linkdotnet
Copy link
Member

I guess I do have an idea for fix:
Your example can even be more convoluted:

Services.AddNCronJob(s => s.AddJob<...>(p => p.WithCronExpression(...).WithCronExpression(...)).ExecuteWhen()

Here I want to execute the job for both cron expressions

@linkdotnet
Copy link
Member

Plus:

Servuces.AddNCronJob(s => s.AddJob<..>().ExecuteWhen();

It should also work. So if the user doesn't define any parameters, it should work either way (think of Instant Jobs),

@nulltoken
Copy link
Collaborator Author

I guess I do have an idea for fix:

I was also toying with something on my side.

We currently link by Job type.

Now that all the interesting pieces are more or less in JobRegistry, we could link by JobDefinition instance (and use it as the key to the dictionary of DependentJobRegistryEntrys.

@linkdotnet
Copy link
Member

I currently have a working version that does exactly that. But that leads to the following failing test:

[Fact]
public async Task WhenJobWasSuccessful_DependentAnonymousJobShouldRun()
{
    Func<ChannelWriter<object>, JobExecutionContext, Task> execution = async (writer, context) => await writer.WriteAsync($"Parent: {context.ParentOutput}");
    ServiceCollection.AddNCronJob(n => n.AddJob<PrincipalJob>()
        .ExecuteWhen(success: s => s.RunJob(execution)));
    var provider = CreateServiceProvider();
    await provider.GetRequiredService<IHostedService>().StartAsync(CancellationToken);

    provider.GetRequiredService<IInstantJobRegistry>().ForceRunInstantJob<PrincipalJob>(true);

    var result = await CommunicationChannel.Reader.ReadAsync(CancellationToken) as string;
    result.ShouldBe("Parent: Success");
}

Basically: A job that does get triggered directly. Still, at least that is my assumption, dependent jobs, should be executed.
Of course we could also argue, that isn't part of the deal - but that would be a breaking change and I would like to keep the old behavior.

@linkdotnet
Copy link
Member

By the way, a very simple test in regards to your original message might be:

[Fact]
public async Task ConfiguringDifferentDependentJobsForSchedulesShouldResultInIndependentRuns()
{
    ServiceCollection.AddNCronJob(n =>
    {
        n.AddJob<PrincipalJob>(s => s.WithCronExpression("1 0 1 * *").WithParameter(true))
            .ExecuteWhen(s => s.RunJob((ChannelWriter<object> writer) => writer.WriteAsync("1").AsTask()));
        n.AddJob<PrincipalJob>(s => s.WithCronExpression("1 0 2 * *").WithParameter(true))
            .ExecuteWhen(s => s.RunJob((ChannelWriter<object> writer) => writer.WriteAsync("2").AsTask()));
    });
    var provider = CreateServiceProvider();
    await provider.GetRequiredService<IHostedService>().StartAsync(CancellationToken);
    using var timeoutToken = new CancellationTokenSource(3000);
    using var linkedToken = CancellationTokenSource.CreateLinkedTokenSource(timeoutToken.Token, CancellationToken);

    FakeTimer.Advance(TimeSpan.FromMinutes(1));

    var content = await CommunicationChannel.Reader.ReadAsync(linkedToken.Token);
    content.ShouldBe("1");

    FakeTimer.Advance(TimeSpan.FromDays(1));
    content = await CommunicationChannel.Reader.ReadAsync(linkedToken.Token);
    content.ShouldBe("2");
}

@linkdotnet
Copy link
Member

Down to one failing test

@linkdotnet
Copy link
Member

This one -.-

[Fact]
public async Task CanBuildAChainOfDependentJobs()
{
    ServiceCollection.AddNCronJob(n =>
    {
        n.AddJob<PrincipalJob>().ExecuteWhen(success: s => s.RunJob<DependentJob>());
        n.AddJob<DependentJob>().ExecuteWhen(success: s => s.RunJob<DependentDependentJob>());
    });
    var provider = CreateServiceProvider();
    await provider.GetRequiredService<IHostedService>().StartAsync(CancellationToken);

    provider.GetRequiredService<IInstantJobRegistry>().ForceRunInstantJob<PrincipalJob>(true);

    using var timeoutToken = new CancellationTokenSource(2000);
    using var linkedToken = CancellationTokenSource.CreateLinkedTokenSource(timeoutToken.Token, CancellationToken);
    var result = await CommunicationChannel.Reader.ReadAsync(linkedToken.Token) as string;
    result.ShouldBe("Me:  Parent: Success");
    FakeTimer.Advance(TimeSpan.FromMinutes(1));
    result = await CommunicationChannel.Reader.ReadAsync(linkedToken.Token) as string;
    result.ShouldBe("Dependent job did run");
}

But I could move some stuff out of the JobRegistry thanks to your work

@nulltoken
Copy link
Collaborator Author

nulltoken commented Oct 29, 2024

Basically: A job that does get triggered directly. Still, at least that is my assumption, dependent jobs, should be executed.
Of course we could also argue, that isn't part of the deal - but that would be a breaking change and I would like to keep the old behavior.

⚠️ Thinking out loud

I'm starting to wonder whether Job might be a too generic concept with regards to this library.

n.AddJob<AnalysisJob>(b => b.WithCronExpression(constantly))
      .ExecuteWhen(success: s => s.RunJob<ReportToProductOwnerJob>());

n.AddJob<AnalysisJob>(b => b.WithCronExpression(monthly))
      .ExecuteWhen(success: s => s.RunJob<ReportToStakeholdersJob>());

In the example above, we've got one Job and two schedules.
That generates two JobDefinitions (an internal concept that the user knows nothing about).
Each of them linking to other Jobs...

2024-10-29 20_03_58-Woody And Buzz Lightyear Everywhere Meme Template — Kapwing — Mozilla Firefox

When triggering provider.GetRequiredService<IInstantJobRegistry>().ForceRunInstantJob<AnalysisJob>();, which JobDefinition should we actually activate?

Should we blindly trigger one of them, all of them?
Should we prevent the user from triggering this when there's an ambiguity?
Would we need a way (naming?) to let the user disambiguate those two convoys?

Untitled

😉

@linkdotnet
Copy link
Member

Yes - I am running down the same rabbit hole! There are plenty of similar questions in my head (mainly all of them very exceptional / edge cases):

ServiceCollection.AddNCronJob(n =>
{
    n.AddJob<PrincipalJob>().ExecuteWhen(success: s => s.RunJob<DependentJob>("PARAM"));
    n.AddJob<DependentJob>(p => p.WithCronExpression("*/2 * * * *")).ExecuteWhen(success: s => s.RunJob<DependentDependentJob>());
    n.AddJob<DependentJob>(p => p.WithCronExpression("* * * * *")).ExecuteWhen(success: s => s.RunJob<AnotherJob>());
});

If I run PrincipalJob - what jobs are triggered here? Only DependentJob? And after that DependentDependentJob?
Or not?

If I just add a n.AddJob<DependentJob>().ExecuteWhen(s => s.RunJob<TotallyDifferent>)()... then for sure this one, not? :D

The approach via typeof was simple and nice :D

@linkdotnet
Copy link
Member

The simplest way to cover almost all, is to allow a standalone ExecuteWhen that takes a job name!
Job names have to be unique - but they wouldn't cover instant jobs for example.

@nulltoken
Copy link
Collaborator Author

nulltoken commented Oct 29, 2024

Considering most of those are edge cases and as of now everything lives in the JobRegistry, there could be a path where the lib would analyze the trees of potential executions and issue some warnings when ambiguities are identified.

With regards to instant jobs, as their definition live outside of AddNCronJob, there's little the analyzer could do about them.

However, a breaking change in the API may potentially restrict what a InstantJob execution could accept and (maybe) help resolve this.

  • Pure delegates would of course work
  • "named" convoys (that would be defined in AddNCronJob)

This whole thing somehow reminds me of the constraints of a DI container...

  • AddNCronjob 'building' it and ensuring it's safe and sound.
  • And InstantJob execution being the resolving side of it.

When one register two services through their interfaces ("A job with different cron expressions or twice the same job with different ExecuteWhen"), how does a container react when it's supposed to instantiate a type that only accepts an interface as a constructor parameter? I believe this somehow to answer that question that dotnet came around with the keyed services (hence the "named convoy" thingie above).

(Please keep in mind that my knowledge of the lib and all of its use cases is very superficial. So please consider everything I write with a grain of salt)

@linkdotnet
Copy link
Member

Good points overall - I am still struggling what is the most intuitive way for people. What are they really expecting?
How would chaining work? ...

But maybe keyed services might help here. Thanks for raising that issue. It kept me thinking alot

@nulltoken
Copy link
Collaborator Author

FWIW, it's not a "production" issue on my side. Just one thing I've stumbled upon while working on #106.

Once JobRegistry was assembled, this typed based approach downside somehow raised its head and drove me in "whatifs" scenarios.

Knowing a bit more of the inner workings, would I need to achieve this kind of model, I'd know how to temporarily walk around this current dark corner.

So, no pressure 😉

@linkdotnet
Copy link
Member

I love that you make your thoughts for yourself and basically gift us your precious time! Really appreciated.

I really have to think about the issue and/or if I just document this current behavior at the very least.

@nulltoken
Copy link
Collaborator Author

Reopened as only partially addressed through #124

@nulltoken
Copy link
Collaborator Author

nulltoken commented Nov 3, 2024

A follow up on #128 (comment) which belongs more to this issue:

The context of dependent jobs should not be considered in the scope of AddNCronJob() only, but also with the RuntimeRegistry in mind. Some additional corner cases may pop up in that light.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants