Skip to content

Partitioning enablement (SPI). [BATCH-677] #2899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spring-projects-issues opened this issue Jun 18, 2008 · 6 comments
Closed

Partitioning enablement (SPI). [BATCH-677] #2899

spring-projects-issues opened this issue Jun 18, 2008 · 6 comments

Comments

@spring-projects-issues
Copy link
Collaborator

Wayne Lund opened BATCH-677 and commented

Want to ensure that it runs on dm Server but should run on any platform.


Affects: 2.0.0.RC2

Sub-tasks:

  • BATCH-53 Aggregation of execution context when partitioning

  • BATCH-875 Pull jobRepository.save() out of Step implementations and put it in the driving Job

Referenced from: commits 1cf7cda, 0826db6, 6185714, f6487f6

@spring-projects-issues
Copy link
Collaborator Author

Dave Syer commented

Added initial draft of partitioning SPI. Main strategy is PartitionHandler. Jobs use it through the PartitionStep.

@spring-projects-issues
Copy link
Collaborator Author

Lucas Ward commented

This is a minor point, but in the TaskExecutorPartitionHandler, there is the following code:

taskExecutor.execute(new Runnable() {
public void run() {
task.run();
}
});

where task is a FutureTask. I'm a little confused, since FutureTask also implements the Runnable interface, which is the run being called on the task. Is this just a harmless leftover of refactoring, or was there some reason for doing it? It's effectively harmless, but for my own knowledge I wanted to see if there was a reasoning behind it.

@spring-projects-issues
Copy link
Collaborator Author

Lucas Ward commented

I just saw another potential issue in PatritionStep:

aggregator.aggregate(stepExecution, executions);
if (stepExecution.getStatus()!=BatchStatus.COMPLETED) {
throw new JobExecutionException("Partition handler returned an incomplete step");
}

I understand how the aggregator works, (using max in the BathStatus was very clever, btw) effectively returning the 'highest' status out of the entire list. So if one is Failed, the whole StepExecution is FAILED. However, given the new contract of Step, it seems like it should return regardless, and not throw an exception. Perhaps this is another case where the new flow work would be nice. In sequential Job, a status of failed would indicate the step stopped (and thus the job too), but in a FlowJob, the ExitStatus could indicate an incomplete remote execution, which would potentially allow for a particular compensation step, rather than restarting the step, depending upon business use.

@spring-projects-issues
Copy link
Collaborator Author

Lucas Ward commented

Another quick question based on review:

// If this is a restart we must retain the same grid size, ignoring the
// one passed in...
int splitSize = getSplitSize(stepExecution, gridSize);

I think we've discussed this before, but if someone changes the configuration between runs, should we just use the one from the ExecutionContext, or should we fail the job? In general, I'm in favor of figuring out the right thing to do, and doing it, but I wonder if someone would up the gridSize, and be confused in a restart scenario, with the results they see.

@spring-projects-issues
Copy link
Collaborator Author

Lucas Ward commented

Another comment regarding the Patritioner interface:

public interface Partitioner {

Map<String, ExecutionContext> partition(int gridSize);

}

It seems like it could just as easily be:

List<String> partition(int gridSize);

I can't imagine an implementation doing anything but returning new ExecutionContext() for each string, which is what SimplePartitioner does. If not, it would make much more sense for the Partitioner to do the restart check, rather than the splitter. Although, I don't have any strong feelings about who does that check.

@spring-projects-issues
Copy link
Collaborator Author

Dave Syer commented

Wow, lots of comments.

#2 you are perfectly right and it was an oversight
#3 not sure need to check. I might prefer just to throw an exception and let the AbstractStep deal with how it wants to interpret it.
#4 the grid size in general might not be configured - it could well be dynamically determined from the grid itself. There is no point trying to play catch up - the execution contexts are too important on a restart to ignore some of them, or to try and guess what the missing values are if the grid size has increased.
#5 I think it must be getting late there. Partitioner has to create the input values for the step executions, so most implementations will actually return very important information in the ExecutionContext (like primary key ranges, file locations etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant