-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Partitioning enablement (SPI). [BATCH-677] #2899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Dave Syer commented Added initial draft of partitioning SPI. Main strategy is PartitionHandler. Jobs use it through the PartitionStep. |
Lucas Ward commented This is a minor point, but in the TaskExecutorPartitionHandler, there is the following code: taskExecutor.execute(new Runnable() { where task is a FutureTask. I'm a little confused, since FutureTask also implements the Runnable interface, which is the run being called on the task. Is this just a harmless leftover of refactoring, or was there some reason for doing it? It's effectively harmless, but for my own knowledge I wanted to see if there was a reasoning behind it. |
Lucas Ward commented I just saw another potential issue in PatritionStep: aggregator.aggregate(stepExecution, executions); I understand how the aggregator works, (using max in the BathStatus was very clever, btw) effectively returning the 'highest' status out of the entire list. So if one is Failed, the whole StepExecution is FAILED. However, given the new contract of Step, it seems like it should return regardless, and not throw an exception. Perhaps this is another case where the new flow work would be nice. In sequential Job, a status of failed would indicate the step stopped (and thus the job too), but in a FlowJob, the ExitStatus could indicate an incomplete remote execution, which would potentially allow for a particular compensation step, rather than restarting the step, depending upon business use. |
Lucas Ward commented Another quick question based on review: // If this is a restart we must retain the same grid size, ignoring the I think we've discussed this before, but if someone changes the configuration between runs, should we just use the one from the ExecutionContext, or should we fail the job? In general, I'm in favor of figuring out the right thing to do, and doing it, but I wonder if someone would up the gridSize, and be confused in a restart scenario, with the results they see. |
Lucas Ward commented Another comment regarding the Patritioner interface: public interface Partitioner {
} It seems like it could just as easily be: List<String> partition(int gridSize); I can't imagine an implementation doing anything but returning new ExecutionContext() for each string, which is what SimplePartitioner does. If not, it would make much more sense for the Partitioner to do the restart check, rather than the splitter. Although, I don't have any strong feelings about who does that check. |
Dave Syer commented Wow, lots of comments. #2 you are perfectly right and it was an oversight |
Wayne Lund opened BATCH-677 and commented
Want to ensure that it runs on dm Server but should run on any platform.
Affects: 2.0.0.RC2
Sub-tasks:
BATCH-53 Aggregation of execution context when partitioning
BATCH-875 Pull jobRepository.save() out of Step implementations and put it in the driving Job
Referenced from: commits 1cf7cda, 0826db6, 6185714, f6487f6
The text was updated successfully, but these errors were encountered: