You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Map-based job repository was never intended for production use. However, even though this is clearly documented, people use (or, more precisely, misuse) it in production and complain about thread-safety and performance issues.
When there is no need to persist metadata, we have always recommended using the JDBC-based job repository with an in-memory database. See the following items on Stack Overflow:
The Map-based job repository suffers from many drawbacks:
1. Thread Unsafety
Even though thread-safe data structures back some Map-based DAOs, this job repository is not safe to use in a multi-threaded job with splits, as mentioned in the Javadoc.
This is impossible with the Map-based job repository, as it provides a single clear() method to wipe the entire entities graph. It is, however, possible with an in-memory database (you can get a handle to the datasource and run any deletion query).
This inconsistency is confusing and leads to a poor user experience on start.spring.io: People come from the batch world (with datasource being optional in mind) and want to migrate to Boot, download a project with only Batch dependency, and expect things to work out-of-the-box. Unfortunately, this is not the case, see example 1, example 2.
5. Confusing Configuration
@EnableBatchProcessing does a good job of setting batch artifacts, including the default Map-based job repository. However, it does so only when the application context does not contain a DataSource bean. As soon as you have a datasource but you do not want to use it for batch metadata, things seem to become complicated and confusing to many people, even if the documentation says to use a custom BatchConfigurer in this case.
"How can I use the Map-based job repository with an application context that contains a DataSource?" is one of the most
frequently asked questions on StackOverflow/Github/Gitter:
In sum, the Map-based job repository (and all the Map-based DAOs behind it) are creating more problems than they solve.
For all these reasons, we plan to deprecate them in v4.3 and remove them in v5.
Now what is the alternative? The alternative is to use the JDBC-based job repository with an-in memory database. For production, this should not be an issue. Any production-grade application should already define a Datasource that can be used for batch processing. If you have no need to persist or use batch metadata, you can always define another embedded datasource and use it for batch (Spring Boot provides the @BatchDataSource to make doing so easy) or provide a "NoOp" implementation of the JobRepository interface (as long as it honors the contract). For testing and prototyping, you can use an embedded database (With Boot, this is as simple as putting one of the supported embedded databases in the classpath) or a containerized one (using testcontainers.org for instance).
The Map-based job repository was never intended for production use. However, even though this is clearly documented, people use (or, more precisely, misuse) it in production and complain about thread-safety and performance issues.
When there is no need to persist metadata, we have always recommended using the JDBC-based job repository with an in-memory database. See the following items on Stack Overflow:
The Map-based job repository suffers from many drawbacks:
1. Thread Unsafety
Even though thread-safe data structures back some Map-based DAOs, this job repository is not safe to use in a multi-threaded job with splits, as mentioned in the Javadoc.
2. Poor Performance
The Map-based job repository is very slow in a partitioned step. This is due to the jobRepsoitory.saveAll(stepExecutions) call in
StepExecutionSplitter
, which takes 20+ minutes (and fails with OOM, even with -Xmx8g) for 5000 partitions versus only 0.42 seconds when using the JDBC job repository with an embedded H2 (see attached benchmark [1]). This is also due to creating several copies of step and job execution data through reflection and the serialization and deserialization of execution contexts.Using a partitioned step is a very common use case, and many people were hit by this performance issue (for example, see Step initialization time too long using Partitioner in Spring-Batch?).
3. Unflexibility
Since the Map-based job repository is the default, some people continue using it in a 24/7 running JVM with all jobs running in it. This leads to huge memory consumption by the job repository, and people start cleaning metadata older than a given date or removing the metadata for a specific job.
This is impossible with the Map-based job repository, as it provides a single
clear()
method to wipe the entire entities graph. It is, however, possible with an in-memory database (you can get a handle to the datasource and run any deletion query).4. Incompatibility with Spring Boot
In Spring Batch, we claim that you can run a job without a data source. However, Spring Boot requires the datasource.
This inconsistency is confusing and leads to a poor user experience on
start.spring.io
: People come from the batch world (with datasource being optional in mind) and want to migrate to Boot, download a project with onlyBatch
dependency, and expect things to work out-of-the-box. Unfortunately, this is not the case, see example 1, example 2.5. Confusing Configuration
@EnableBatchProcessing
does a good job of setting batch artifacts, including the default Map-based job repository. However, it does so only when the application context does not contain aDataSource
bean. As soon as you have a datasource but you do not want to use it for batch metadata, things seem to become complicated and confusing to many people, even if the documentation says to use a customBatchConfigurer
in this case."How can I use the Map-based job repository with an application context that contains a
DataSource
?" is one of the mostfrequently asked questions on StackOverflow/Github/Gitter:
It is concerning that people end up with an ugly empty setter for the datasource:
Conclusion
In sum, the Map-based job repository (and all the Map-based DAOs behind it) are creating more problems than they solve.
For all these reasons, we plan to deprecate them in v4.3 and remove them in v5.
Now what is the alternative? The alternative is to use the JDBC-based job repository with an-in memory database. For production, this should not be an issue. Any production-grade application should already define a
Datasource
that can be used for batch processing. If you have no need to persist or use batch metadata, you can always define another embedded datasource and use it for batch (Spring Boot provides the@BatchDataSource
to make doing so easy) or provide a "NoOp" implementation of theJobRepository
interface (as long as it honors the contract). For testing and prototyping, you can use an embedded database (With Boot, this is as simple as putting one of the supported embedded databases in the classpath) or a containerized one (using testcontainers.org for instance).[1] Job repository benchmark: JobRepositoryBenchmark.zip
The text was updated successfully, but these errors were encountered: