Is this normal behavior? processor/writer running same number of times as chunk size #4593

ochompsky · 2024-05-13T21:58:40Z

ochompsky
May 13, 2024

zz

fmbenhassine · 2024-09-11T09:50:57Z

fmbenhassine
Sep 11, 2024
Maintainer

Apologies for the late reply on this. I was in a long leave. I saw your original edited message.

The processor should run for every item in the chunk, and the writer should be run once per chunk. Now in your example, an item is defined a list of items, so you should expect the processor to be executed once per list (ie per item) and the writer once a list of lists is accumulated (depending on the chunk size).

If there are less items in the datasource than the chunk size, then you should have a single chunk.

Does this clarifies things?

Edit: I add a quick sample to illustrate things:

package org.springframework.batch.samples.helloworld;

import java.util.Arrays;
import java.util.List;

import javax.sql.DataSource;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.datasource.embedded.EmbeddedDatabaseBuilder;
import org.springframework.jdbc.datasource.embedded.EmbeddedDatabaseType;
import org.springframework.jdbc.support.JdbcTransactionManager;

@Configuration
@EnableBatchProcessing
public class GHD4593 {

	@Bean
	public Job job(JobRepository jobRepository, JdbcTransactionManager transactionManager) {
		return new JobBuilder("job", jobRepository)
			.start(new StepBuilder("step", jobRepository).<List<Integer>, List<Integer>>chunk(10, transactionManager)
				.reader(new ListItemReader<>(Arrays.asList(Arrays.asList(1), Arrays.asList(2))))
				.processor(item -> {
					System.out.println("processing item " + item);
					return item;
				})
				.writer(items -> {
					System.out.println("writing items " + items);
					items.forEach(System.out::println);
				})
				.build())
			.build();
	}

	@Bean
	public DataSource dataSource() {
		return new EmbeddedDatabaseBuilder().setType(EmbeddedDatabaseType.HSQL)
			.addScript("/org/springframework/batch/core/schema-hsqldb.sql")
			.build();
	}

	@Bean
	public JdbcTransactionManager transactionManager(DataSource dataSource) {
		return new JdbcTransactionManager(dataSource);
	}

	public static void main(String[] args) throws Exception {
		ApplicationContext context = new AnnotationConfigApplicationContext(GHD4593.class);
		JobLauncher jobLauncher = context.getBean(JobLauncher.class);
		Job job = context.getBean(Job.class);
		jobLauncher.run(job, new JobParameters());
	}

}

which prints:

[main] INFO org.springframework.batch.core.configuration.annotation.BatchRegistrar - Finished Spring Batch infrastructure beans configuration in 8 ms.
[main] INFO org.springframework.jdbc.datasource.embedded.EmbeddedDatabaseFactory - Starting embedded database: url='jdbc:hsqldb:mem:testdb', username='sa'
[main] INFO org.springframework.batch.core.repository.support.JobRepositoryFactoryBean - No database type set, using meta data indicating: HSQL
[main] INFO org.springframework.batch.core.configuration.annotation.BatchObservabilityBeanPostProcessor - No Micrometer observation registry found, defaulting to ObservationRegistry.NOOP
[main] INFO org.springframework.batch.core.launch.support.TaskExecutorJobLauncher - No TaskExecutor has been set, defaulting to synchronous executor.
[main] INFO org.springframework.batch.core.launch.support.TaskExecutorJobLauncher - Job: [SimpleJob: [name=job]] launched with the following parameters: [{}]
[main] INFO org.springframework.batch.core.job.SimpleStepHandler - Executing step: [step]
processing item [1]
processing item [2]
writing items [items=[[1], [2]], skips=[]]
[1]
[2]
[main] INFO org.springframework.batch.core.step.AbstractStep - Step: [step] executed in 16ms
[main] INFO org.springframework.batch.core.launch.support.TaskExecutorJobLauncher - Job: [SimpleJob: [name=job]] completed with the following parameters: [{}] and the following status: [COMPLETED] in 29ms

There are two items of type list in the datasource and the chunk size is set to 10. In this case, the processor is called once for each list (ie each item) and the writer is called once for both lists (which represent the first and only chunk). Hopefully this clarifies the behaviour.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this normal behavior? processor/writer running same number of times as chunk size #4593

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Is this normal behavior? processor/writer running same number of times as chunk size #4593

ochompsky May 13, 2024

Replies: 1 comment

fmbenhassine Sep 11, 2024 Maintainer

ochompsky
May 13, 2024

fmbenhassine
Sep 11, 2024
Maintainer