Deprecate the Map-based JobRepository/JobExplorer implementations

The Map-based job repository was [never intended for production use](https://github.com/spring-projects/spring-batch/issues/1812#issuecomment-566286712). However, even though this is clearly [documented](https://docs.spring.io/spring-batch/docs/4.2.x/api/org/springframework/batch/core/repository/support/MapJobRepositoryFactoryBean.html), people use (or, more precisely, misuse) it in production and complain about thread-safety and performance issues.

When there is no need to persist metadata, we have always recommended using the JDBC-based job repository with an in-memory database. See the following items on Stack Overflow:

- https://stackoverflow.com/a/31519701/5019386
- https://stackoverflow.com/a/44243647/5019386
- https://stackoverflow.com/a/60722617/5019386

The Map-based job repository suffers from many drawbacks:

### 1. Thread Unsafety

Even though thread-safe data structures back some Map-based DAOs, this job repository is not safe to use in a multi-threaded job with splits, as mentioned in the [Javadoc](https://docs.spring.io/spring-batch/docs/4.2.x/api/org/springframework/batch/core/repository/support/MapJobRepositoryFactoryBean.html).

### 2. Poor Performance

The Map-based job repository is very slow in a partitioned step. This is due to the [jobRepsoitory.saveAll(stepExecutions)](https://github.com/spring-projects/spring-batch/blob/d8fc58338d3b059b67b5f777adc132d2564d7402/spring-batch-core/src/main/java/org/springframework/batch/core/partition/support/SimpleStepExecutionSplitter.java#L195) call in `StepExecutionSplitter`, which takes 20+ minutes (and fails with OOM, even with -Xmx8g) for 5000 partitions versus only 0.42 seconds when using the JDBC job repository with an embedded H2 (see attached benchmark [1]). This is also due to [creating several copies of step and job execution data through reflection and the serialization and deserialization of execution contexts](https://github.com/spring-projects/spring-batch/blob/b50c665f043d55dc52cc575ac731af0562d3d1cc/spring-batch-core/src/main/java/org/springframework/batch/core/repository/dao/MapStepExecutionDao.java#L54-L68).

Using a partitioned step is a very common use case, and many people were hit by this performance issue (for example, see [Step initialization time too long using Partitioner in Spring-Batch?](https://stackoverflow.com/questions/43800324/step-initialization-time-too-long-using-partitioner-in-spring-batch)).

### 3. Unflexibility

Since the Map-based job repository is the default, some people continue using it in a 24/7 running JVM with all jobs running in it. This leads to huge memory consumption by the job repository, and people start [cleaning metadata older than a given date or removing the metadata for a specific job](https://stackoverflow.com/questions/60686818/spring-batch-in-memory-mapjobrepositoryfactorybean-clear-old-jobs-not-the-runn/60722617#60722617).

This is impossible with the Map-based job repository, as it provides a single [`clear()`](https://docs.spring.io/spring-batch/docs/4.2.x/api/org/springframework/batch/core/repository/support/MapJobRepositoryFactoryBean.html#clear--) method to wipe the entire entities graph. It is, however, possible with an in-memory database (you can get a handle to the datasource and run any deletion query).

### 4. Incompatibility with Spring Boot

In Spring Batch, we [claim](https://docs.spring.io/spring-batch/docs/4.2.x/reference/html/index-single.html#configuringJobRepository) that you can run a job without a data source. However, [Spring Boot requires the datasource](https://docs.spring.io/spring-boot/docs/current/reference/html/howto.html#howto-spring-batch-specifying-a-data-source).

This inconsistency is confusing and leads to a poor user experience on `start.spring.io`: People come from the batch world (with datasource being optional in mind) and want to migrate to Boot, download a project with only `Batch` dependency, and expect things to work out-of-the-box. Unfortunately, this is not the case, see [example 1](https://stackoverflow.com/questions/48534390/spring-batch-cannot-determine-embedded-database-driver-class-for-database-type), [example 2](https://stackoverflow.com/questions/66956482/why-spring-batch-empty-project-failed-to-start).

### 5. Confusing Configuration

`@EnableBatchProcessing` does a good job of setting batch artifacts, including the default Map-based job repository. However, it does so only when the application context does *not* contain a `DataSource` bean. As soon as you have a datasource but you do not want to use it for batch metadata, things seem to become complicated and confusing to many people, even if the documentation says to use a custom `BatchConfigurer` in this case.

"How can I use the Map-based job repository with an application context that contains a `DataSource`?" is one of the most
frequently asked questions on StackOverflow/Github/Gitter:

* [Problem is I already have 3 another dataSources defined, but I don't want to use any of them in springBatch](https://stackoverflow.com/questions/39913918/spring-boot-spring-batch-without-datasource)
* [Spring-Batch without persisting metadata to database?](https://stackoverflow.com/questions/25077549/spring-batch-without-persisting-metadata-to-database)
* [Define an in-memory JobRepository](https://stackoverflow.com/questions/44238232/define-an-in-memory-jobrepository)
* [In-memory repository with Spring Boot](https://github.com/spring-projects/spring-batch/issues/905)
* And so on.

It is concerning that people end up with an ugly empty setter for the datasource:

- [Example 1](https://stackoverflow.com/a/42721313/5019386)
- [Example 2](https://stackoverflow.com/a/52590772/5019386)
- [Example 3](https://stackoverflow.com/a/52643365/5019386)

### Conclusion

In sum, the Map-based job repository (and all the Map-based DAOs behind it) are creating more problems than they solve.
For all these reasons, we plan to deprecate them in v4.3 and remove them in v5.

Now what is the alternative? The alternative is to use the JDBC-based job repository with an-in memory database. For production, this should not be an issue. Any production-grade application should already define a `Datasource` that can be used for batch processing. If you have no need to persist or use batch metadata, you can always define another embedded datasource and use it for batch (Spring Boot provides the `@BatchDataSource` to make doing so easy) or provide a "NoOp" implementation of the `JobRepository` interface (as long as it honors the contract). For testing and prototyping, you can use an embedded database (With Boot, this is as simple as putting one of the supported embedded databases in the classpath) or a containerized one (using [testcontainers.org](https://www.testcontainers.org) for instance).

----

[1] Job repository benchmark:  [JobRepositoryBenchmark.zip](https://github.com/spring-projects/spring-batch/files/5230882/JobRepositoryBenchmark.zip)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecate the Map-based JobRepository/JobExplorer implementations #3780

1. Thread Unsafety

2. Poor Performance

3. Unflexibility

4. Incompatibility with Spring Boot

5. Confusing Configuration

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deprecate the Map-based JobRepository/JobExplorer implementations #3780

Description

1. Thread Unsafety

2. Poor Performance

3. Unflexibility

4. Incompatibility with Spring Boot

5. Confusing Configuration

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions