-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Retry Mechanism and Concurrency Control for Jobs #21
Conversation
A new `TestCancellationJob` class has been added to simulate a long-running job and handle job cancellation. Everything was made to support async structures to better support error handling and cancellation support. Added preliminary parallel execution support.
… beyond the loop scope
Thanks for that PR! Really good job. Before going on a code base, I wanted to address some of the features. I very much like the features you provided but my experience pushes me into a different direction on sensible defaults than you took. So the following is just about that - not about the features itself, they are very welcome.
In your current implementation retry is "always-on" without an option to opt-out. In general I would argue making this feature opt-in is the more sensible choice. That is how many of the .NET objects work and what I would expect as well.
That is an interesting take, because my gut feeling would also go the opposite direction. Make every job concurrent by default and opt-out of this feature. And here I lean towards the same argumentation as above, it feels more natural (to me). |
I agree with you about the defaults, I did not integrate configuration bindings as this was more of an initial concept. However most orchestration systems default to retries because they assume the system is deterministic and Jobs are idempotent. Since neither of those are ensured here we can opt-in instead.
I understand the natural concurrency for tasks like generating reports or sending e-mails, but this is more about risk-mitigation. The default non-concurrent setting is a safer approach in a non-deterministic environment where shared resources or external states are involved. This reduces the risks of race conditions and unintended side effects. Ensuring that concurrency is only enabled after a developer explicitly verifies it helps prevent complex debugging issues, making the system more robust and easier to maintain. Funny enough the reason I added this to this PR was actually because I hit debugging issues while testing the cancellation support and having many of the same type Jobs running in parallel. |
That is a fair point. Let's go with your approach and if we see that the usage is way different, we can still flip it in the next major version. |
If you prefer I can add a global optIn optOut approach. For example: services.AddNCronJob(builder =>
{
// Use this if you assume most jobs are safe to run concurrently and exceptions are explicitly handled
builder.OptInConcurrency();
});
services.AddNCronJob(builder =>
{
// Use this if you assume most jobs should not run concurrently and only specific jobs are safe to handle it
builder.OptOutConcurrency();
}); or services.AddNCronJob(builder =>
{
...
}).OptOutConcurrency(); But this would have a minor breaking change since AddNCronJob would not return an Then decorate the Job according to how you set your global: [SupportsConcurrency(true)]
public class ConcurrentJob : IJob
{
// Implementation allowing concurrency
}
[SupportsConcurrency(false)]
public class NonConcurrentJob : IJob
{
// Implementation disallowing concurrency
} So if you use |
I do prefer the attribute (without any boolean flag) over the other given approaches. We can think of configuring this via the configuration builder: Services.AddNCronJob(o =>
{
o.AddJob<MySampleJob>().WithCronExpression("...").AllowMultipleInstances();
}); |
@linkdotnet The challenge with that approach is that the dev can easily do the following (because you're not making the concurrency support inherently a part of the Job): builder.Services.AddNCronJob(o =>
{
o.AddJob<MySampleJob>().WithCronExpression("*/10 * * * *").AllowMultipleInstances();
o.AddJob<MySampleJob>().WithCronExpression("*/25 * * * *");
}); It's the same |
That is fair. Given that we already settled on the attribute, I'd propose utilizing this to control the concurrency level: [SupportsConcurrency(MaxDegreeOfParallelism = 3)]
public class MyJob : IJob That would also remove the need for the |
I think you're mixing the global concurrency with the Job specific concurrency (basically it's 1 job with many triggers). Your suggestion of putting it in the attribute means it is specific only to that job. However the way I had coded the ConcurrencyConfig.MaxDegreeOfParallelism was to manage the concurrency of the Scheduler within the application, not the Job. Either way I think properly supporting concurrency for both individual Jobs and globally can be a little challenging, but I think I have something you can test |
Indeed I did - thanks for the clarification. |
All in all, this PR goes in a really good direction - I really appreciate your work and discussion, @falvarez1 ! My open ToDo's for now:
Also thanks for keeping the API stable so we don't have to do a major version bump. If you have time you can also extend the |
Glad you said that because I already had that in mind to submit once this was approved ;-) I'll include it in this one instead.
I agree. I don't know the best way to isolate each use case as a separate example without literally creating a separate sample project. I'll clean it up as best I can but open to suggestions.
Certainly, just waiting to finalize the design first. Don't want to keep changing them while it's in flux.
I'll give it shot (both README and CHANGELOG).
No worries. I do have other ideas but they may require changes to the API definition. Can discuss separately. |
I do agree - it might be tricky. I do think we need a separate documentation page.
That makes absolutely sense - quite frankly I do think we are almost there.
Keep it coming ;) - anyway thanks again for the hard work! |
f7a5315
to
a49372b
Compare
…nal changes to job execution, retry logic, and concurrency management within the system: - **Job and Retry Logic Refactoring:** - Moved `retryHandler` from `CronScheduler` to `JobExecutor` to centralize execution logic. - Consolidated all retry policy-related code into the new `RetryPolicies` folder. - Added `RetryPolicy` attributes with two strategies: exponential backoff and fixed interval. - **Concurrency Enhancements:** - Introduced `SupportsConcurrency` attribute with a `MaxDegreeOfParallelism` parameter to finely control job execution. - Replaced `ConcurrencyConfig` class with `ConcurrencySettings`, incorporating a `MaxDegreeOfParallelism` property. - Updated `CronScheduler` and `NCronJobOptionBuilder` to utilize `ConcurrencySettings`. - Added preliminary support for priority-based queuing of jobs to improve execution efficiency. - **Testing and Samples:** - Added new jobs in `NCronJobSample` project to showcase concurrency capabilities. - Updated sample project configuration, including removal of machine-specific settings and adding `ConcurrencySettings` to `.editorconfig`. - Enhanced `TestCancellationJob` to support a `MaxDegreeOfParallelism` of 10 and added multiple job instances for testing. - **Project Structure and Cleanup:** - Moved job definition files into a dedicated `Jobs` folder. - Removed obsolete pragma directives and updated project settings to reflect new structure and focus. - **Additional Features and Fixes:** - Added auto-detection of precision in `WithCronExpression` when the second parameter is omitted. - Added default values to `SupportsConcurrencyAttribute` to simplify job configuration. - Implemented thread safety in `CronScheduler.cs` to ensure reliable operation under concurrent access. - CronRegistry.RunInstantJob now has a void return type, aligning with expected behavior.
…tionalities and associated tests. Key changes include: - **Retry Attributes**: Minor updates to retry attributes to improve reliability. - **Retry Logic Cleanup**: Cleaned up the retry code and added two new tests to ensure functionality. - **Time Provider Fix**: Resolved issues in TimeProvider that affected tests, ensuring accurate time simulation. - **Retry Issue Resolution**: Addressed specific bugs in the retry mechanisms that affected process stability. - **Job Execution Context**: Changed JobExecutionContext to be an instance of Job Run rather than the Job itself, clarifying execution context. - **Thread Safety in JobOptionBuilder**: Enhanced thread safety and added a test to verify concurrency handling in JobOptionBuilder. - **CronExpression Tests**: Fixed previously broken tests related to CronExpressions and added additional tests to cover new cases. - **optimize scheduler: use precomputed values for cron occurrence
…CronJob-Dev#24) * Use GetRequiredService instead of GetService and added tests * Removed redundant test * Added entry in changelog * Remove empty line in NCronJobIntegrationTests.cs * Update CHANGELOG.md Co-authored-by: Steven Giesel <stgiesel35@gmail.com> --------- Co-authored-by: Steven Giesel <stgiesel35@gmail.com>
…astructure: - **Test Fixes and Enhancements:** - Fixed all existing tests to ensure they are passing and accurately reflecting intended behavior. - Introduced a new version of `WaitForJobsOrTimeout` that allows time advancement for each run, facilitating more precise control over test timing and behavior.
a49372b
to
fd8186c
Compare
There are small ordering things inside the README.md I'd like to move around (Migration not in between features for example) - but nothing that blocks merging. If there are no open points from your side, I'd probably Squash and merge that thing. I'd directly release a preview version for folks to test. |
I agree, I think this needs a little more user testing. I also want to note that some other features I plan to contribute build upon this work, so figuring out a clear path forward would be beneficial for everyone. I suggest creating a
Would it also help to put a ToC at the top that links to the sections? |
Basically,
That would be one thing to do - I plan on having a bit more detailed documentation that gets pushed to the GitHub Page (or buying a proper domain name in the long term - if there is an interest in the library). |
@falvarez1 I would merge the changes into |
No other points, I'm good with the merge, thanks. |
Overview
This pull request introduces two major enhancements to NCronJob:
Details
SupportsConcurrency
attribute introduced to mark job classes that can run concurrently.Justification