-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Custom namespace for Job and related (Step, and maybe some *Reader/Writer) [BATCH-63] #3515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Dave Syer commented Technical hitch - inner beans cannot be scoped... This is turning into a documentation issue - no-one has time / technology to fix it properly. |
Dave Syer commented The technical hitch is a bug SPR-3800 which should be fixed in 2.1-m4 (N.B. inner beans can be scoped - they were just not registering their destruction callbacks). But this whole issue is still not looking like it is going to be fixed for 1.0. Documentation covered elsewhere (BATCH-120). |
Lucas Ward commented Dave, Can you attach the example namespace we came up with last time I was in SH, for others to be able to review? I seem to have lost it in all my computer changes. |
Lucas Ward commented I did a quick review of the schema that Ben attached recently. Everything looks good except for the database ItemReader. I see a Sql reader(should probably be jdbc reader), with a type of cursor or driving, and an iBatis reader. My issue is that the iBatis reader is really a driving reader. More specifically, it's a driving query reader with an iBatis generator. Normally, it wouldn't be too bad, but I think the difference between the two types is big enough to warrant a more explicit separation. This is because the cursor item reader returns the whole item, whereas it's generally expected that the driving one will only return the key, requiring more information to be loaded via another dao. Everything else looks fine, although I'm curious how you plan to handle the column mappings in file input, since I don't see them in the prototype. Lastly, one use case that needs to be maintained is the ability to set default settings for batch jobs that may be overridden. We currently do this by using Abstract beans with the default values, and child beans that can override them. I'm not sure how to best handle this with a namespace, but in my mind it's an extremely important use case to keep. If someone has even as few as 10 batch jobs, you wouldn't want to wire in exception handler (for logging out skipped records) separately for each job. it becomes even worse in the very common case where 50+ distinct batch jobs exist. |
nebhale commented (In reply to comment #4)
We opted for SQL since from the perspective of the user, they don't actually deal with JDBC. This actually was discussed a bit and Dave and I both agree that SQL is more appropriate in this case.
What kind of separation do you have in mind between a driving reader and a cursor reader? Basically from a configuration perspective, both of them require a query and then we take a look at the flag to determine which type of ItemReader to actually instantiate. The reason that iBatis is kept separately is because it doesn't actually require a query at all, it requires an iBatis named query. If you think this is still too confusing I'd be game for removing the <ibatis-item-reader/> tag completely as I think it's a pretty small use case. In the end the data being loaded by a separate DAO is actually just an implementation detail of your transformer rather than of the reader, so I'm pretty happy with the way this is configured now.
You're absolutely right on this. I do plan on letting this shake out as I actually do the implementations of those projects.
Well, I think that this is more of an issue in cases where we use very verbose configurations (like bean definitions). I think that simply having a ref to a bean name in a number of places is fine in the case. At the very least, I'd like to go to 1.0 without explicit support for abstract namespace elements and if in practice people dislike configuring each time (again, just a ref to a separate bean, not a full bean definition) then we can add it in. I'm very concerned about the backwards compatibility of the namespace, so I want to go with a minimal set of support in this case. Dave, do you have any comments on any of this? |
nebhale commented Working version of Spring Batch namespace. You'll need either Mac or Windows Safari to open it. |
Scott Wintermute commented Ben, do you mind exporting the Spring Batch namespace as pdf for those of us who aren't using a Mac or Safari. I would prefer not to load another browser and Confluence should let you do an easy export of the page in PDF. Let me know if this is an issue. |
nebhale commented The reason that I chose the other way is because the confluence export isn't very good. I've attached it though. |
Lucas Ward commented
I disagree with this, since it's not used commonly in spring. For example, if you use JdbcTemplate, you don't have to work with Jdbc directly either, and certainly no more than you do with the JdbcCursorItemReader. I also disagree with the comments about Driving vs. Cursor, there is a huge difference between the usage: Cursor returns an item, and driving returns a key (expecting you to create an item from it) It's a HUGE difference in usage. Furthermore, the difference between the two types should be understood by anyone wanting to read in from a database, as they could have a lot of impacts depending upon which database type you're using. I would much rather have two separate tags for driving and cursor than one with a 'switch' for the two. Another option may be just leaving ItemReaders out of the namespace for release 1.0. Regarding 'abstract configuration. I have very strong feelings about this one, since we already have a lot of projects, some in production, that would be unable to use the namespace if support for this is not added. It is an extremely important use-case to allow projects to do things like set a default strategy for logging out skipped items (take a look at the chunkedStep for more details on that) |
Douglas C. Kaminsky commented I don't see the value in including the infrastructure components (i.e. ItemReaders and ItemWriters) as schema elements at all as opposed to just using beans of the required type (enforced via the tools namespace). Here's my thinking:
|
Douglas C. Kaminsky commented Just realized I posted before commenting on the abstraction issue - using abstract beans is EXTRAORDINARILY important for managing configuration. I just came off of a project where the projected number of batch jobs numbers in the thousands. Without the ability to abstract common aspects of the configurations, Spring Batch becomes exponentially less useful. |
nebhale commented (In reply to comment #9)
This was actually a recommendation from Juergen himself. He said that from his perspective, it was an SQL only situation and he'd change it in Spring if he could. The first rev of the schema actually had data-source-reader there and when presented with jdbc-reader or sql-reader, he chose the latter.
But this is an implementation detail. From the configuration perspective (and the left side of the chunk) the only thing that is required is a query and whether it is driving or not. When it comes to what is returned from the ItemReader, the code doesn't really care until after it's passed to the transformer. Since the transformer is written by the user, that detail is easily dealt with there. The configuration never has to care.
This was considered for the 1.0 release, but as most of the configuration would then lie in the readers, we opted to have some sort of simplification.
The problem is of course that the concept of abstract namespace elements isn't really supported. The concept of the abstract bean is supported at the container level but a namespace element doesn't map one-to-one with a bean, so it doesn't really work out.
This I think gets more to the crux of the argument. What about the addition of a <default-step-configuration> element that talks about default strategies rather than the concept of an abstract bean? |
Lucas Ward commented Ben, "This I think gets more to the crux of the argument. What about the addition of a <default-step-configuration> element that talks about default strategies rather than the concept of an abstract bean?" That is completely fine, I didn't mean that it had to support abstract beans, just that the ability to create default settings is an important use case. Being able to create a 'default step' is much much better actually.
"But this is an implementation detail. From the configuration perspective (and the left side of the chunk) the only thing that is required is a query and whether it is driving or not. When it comes to what is returned from the ItemReader, the code doesn't really care until after it's passed to the transformer. Since the transformer is written by the user, that detail is easily dealt with there. The configuration never has to care." I understand your point about the 'Writers' or 'Readers' dealing with the output of the readers as a separate concern from the configuration. However, my point is that there is a huge domain difference between the 'driving' and 'cursor' based readers. It's so important in fact, that I think it should be separated. I'm extremely concerned that someone will try and use a type of 'driving' but then use the same type of sql statement and RowMapper that they would use for cursor (i.e. for returning the item rather than a key) This would lead to every issue that Rob highlighted so well at TSE, since the DrivingQueryItemReader holds the keys in a list. I understand your point about trying to make them less verbose, but I'm not sure that trying to combine the different types in configuration is a good idea. I would instead prefer to keep them separated, as they are in the code by driving vs. cursor. and I would definitely like to have a '<driving-query-item-reader> that has a <key generator> tag. I know that documentation can help with some of these concerns, but I still feel like it's much better to be explicit. |
nebhale commented (In reply to comment #10)
This had actually been considered.
Well, we split the item-reader/writers off into another schema just so it could change quickly. It has been pointed out that a good factory could hide a lot of the complexity of the implementations thus negating the need for a namespace at all.
Well, I'd say it's driven by community demand. In one of the earlier comments I mentioned that I would not be averse to removing the ibatis support as it is a pretty small minority of users.
We actually don't. We discussed having the ability to define an in-line bean, but to stay with the pattern set out by Spring, we were only going to allow references to other beans. In fact, Juergen pointed out that pretty much the only namespace that allows in-line bean definitions is 'beans'.
Fair point. The issue is that with a namespace we can more clearly enumerate (and in some cases create a more concise definition for) the dependencies. That being said, I'm not against this.
And that's why :) |
Douglas C. Kaminsky commented I envisioned something to the effect of batch:config in the execution environment config file to address these concerns and more use of the <bean> tag in the actual job config to support inheritance. |
nebhale commented (In reply to comment #13)
We'll go with that then.
Couldn't the same argument be made that they would create a <driving-query-reader> and then put in the cursor query? If the problem is a mistaken choice, I'm not sure that having separate elements is any more helpful. |
Wayne Lund commented Lucas and Doug have already stated their positions really well but I wanted to weigh in on the issue. I would also prefer to see two separate tags for driving queries and cursor handling. I also agree with the issue of naming and would argue that Spring does the same thing. In other words, even though iBatis and jdbc both expose SQL the support classes are named after the implementing technology. I think we need to do the same. However, Doug highlights the problems of trying to be too detailed this release in our namespace. I looked at the webarchive provided and have the same concerns that Doug expressed on including the detailed infrastructure items for the same reasons he's nicely pointed out. We do think there are going to need to be corrections on line items that include cursor | driving because they do behave quite differently. Probably sounds like we're ganging up on these three issues but our large batch client had 5500 jobs and an AbstractConfiguration would have been extremely helpful, although we didn't really have that notion at the time. Summary:
|
nebhale commented (In reply to comment #18)
I still haven't heard a compelling argument that two tags is actually less prone to error than a single tag. The same decision errors occur in both situations.
The chief architect of Spring feels we should go the other way, though... does that sway your opinion at all?
I'd like to point out that because we've broken the other namespace out (and heck, even if we hadn't) no one has to use the reader/writer namespace. It's there as a convince for elements that that are used in very common ways. There is actually no requirement that you define readers using the namespace, beans:bean works just fine.
I'd like Dave to weigh in on this thread before I go changing it, but as I said earlier, I've got no problems with a default-step-configuration element. |
Douglas C. Kaminsky commented Per previous comments, this is how I saw the job config schema looking -- note this is not intended to be a completed product, it was just a first stab at what I was personally thinking. The usage of the tool schema may also not be appropriate as I don't have much experience with it. |
Lucas Ward commented It sounds like we're all in agreement with the 'Default-Step' that Ben mentioned. it's much betters solution than the abstract one anyway, since the Abstract/Parent bean definition approach was just a hack to support a 'default step'. Re: Sql* vs. Jdbc* Certainly, Juergen's opinion is extremely important, but I think my earlier point (along with wayne's) was that spring uses Jdbc*, so I would prefer to be consistent with spring proper, but I'm not opposed to calling it sql (I originally called it that when I named the classes) It's a fairly minor issue though. The only really important thing to me would be naming the actually classes behind the tags consistently (i.e. SqlCursorItemReader) @ Driving vs. Cursor Perhaps I erred when using the 'error prone' argument in the first place. I fundamentally have two issues with it:
I would still very much prefer a separate driving query tag with a key generator tag: <driving-query-reader> |
nebhale commented (In reply to comment #20)
This is actually what my first cut before the SpringSource meeting looked like pretty much. But to stay in line with the namespaces that are being created in spring, we decided to eliminate the in-line bean definitions. Looking at it now, there aren't really any significant uses of in-line bean definitions. |
Dave Syer commented I don't care much about sql- vs jdbc-, and I also don't care if the class names are Jdbc* and the XML is sql-*. The data source item readers are confusing (too many features), and would they benefit from a simpler, single-element config, even if that means we simplify the features? That is the tough question, it seems. Remember also that we don't have to cover all the available Java options in the namespace - I don't like to leave features out, and force people to use beans:*/, but it is an option (e.g. not to expose the restart row mapper in the multi-column case). Why does a driving query return a primary key and a cursor a domain object (please don't call it an "item" - they are all items)? Is there any reason why both shouldn't do the same thing? Or why we can't go to press with at least an XML schema that is consistent between the two, and leaves more advanced features for beans: customizations. |
Lucas Ward commented "I don't care much about sql- vs jdbc-, and I also don't care if the class names are Jdbc* and the XML is sql-*. " I agree, it's a minor issue, but I would still like to be consistent. Let's do either Sql* or Jdbc* everywhere. "Why does a driving query return a primary key and a cursor a domain object (please don't call it an "item" - they are all items)? Is there any reason why both shouldn't do the same thing? Or why we can't go to press with at least an XML schema that is consistent between the two, and leaves more advanced features for beans: customizations." I originally tried to make the two be 'symmetrical' by having the driving query also return the 'details' about each item. However, if you think it's complex now, it would be 4x so if we tried to do that. Furthermore, the details are generally very domain specific and likely already have a dao for them. I also had a lot of projects that were forgoing it and letting developers just wire a dao in. It also makes a lot of sense if you think about an asynchronous processing case. It would be much more efficient to give a chunk that's full of keys to be written, rather than reading back the rest of the details on the client side, which takes time, and passing that out. I understand your point about trying to simplify it. Afterall, without any attempt at simplification there's honestly very little value to the tags. However, I think it's too much confusion for too little simplification. |
Lucas Ward commented I wanted to add another comment about the namespace as well. The following is a bean definition for the JdbcCursorItemReader with the beans namespace:
with the proposed namespace it would be: <sql-reader id="customerSqlItemReader" data-source="dataSource" type="cursor" mapper="creditMapper"> |
Dave Syer commented After some chats with Lucas... I think I sort of agree - there is much to risk by putting out the item/infrastructure namespace too early, and not much to be gained (most of the encapsulation we are looking for is in the step and job implementations). Maybe we should can the idea of an infrastructure namespace and put more effort into JobLauncher (which is pretty simple, but ubiquitously used?) |
nebhale commented (In reply to comment #25)
Remember that often the value of a namespace has very little do with actual XML reduction; in many cases that is simply a side-effect. The real value of a namespace typically lies in the domain specificity of the declaration. Talking about the domain elements rather than properties of Java classes is what a namespace buys you. In addition there is value is the development time validation that schema validation gives you. With out use of setters everywhere for properties, it's nearly impossible to get any useful validation before run/test-time. |
nebhale commented
We had talked about making sure that we actually injected factories into the step's rather than ItemReader(Writer)s. Could we get a similar effect to a namespace where we create an ItemReader(Writer)Factory interface that simplifies all of the wiring for us? I'd be totally OK if we went that way and eliminated the Item namespace.
Sounds like a plan contingent on the previous. I'll start working that way now. |
Lucas Ward commented Ben, Regarding the value of a namespace, I understand your point that it's not just for 'line reduction', and has other values, but I guess Dave said it better than I: the disadvantages outweigh the advantages right now.
It sounds like from your last comment that you're abandoning the infrastructure tags and creating some tags to help with the JobLauncher (and I assume the repo as well). What's the status on this? I haven't seen any commits go through, so I was curious. |
Dave Syer commented this is relevant to the ongoing discussion in BATCH-364. In my opinion Chunker cannot be as good a gerenal purpose factory for ItemReaders as the BeanFactory. With step scope we get the best object factory in the business, and no need for every implementation of ItemReader to have its own factory implementation. |
Lucas Ward commented Ben, I've seen a couple of updates go by, what's the current status of the namespace? |
nebhale commented The <job-repository/> work is done, the <tasklet-step/> stuff is done. Working on <step/> (we decided to go with that instead of <chunked-step/>) and <job/>. I haven't forgotten about the default step configuration stuff, but I need to talk with Juergen about the best way to do it. |
nebhale commented Moving this to the sandbox for future reference and possible implementation... |
Dave Syer commented Re-opened for 2.0 |
Dave Syer commented Attached updated XML examples |
Dave Syer opened BATCH-63 and commented
StepScope needs to be validated - certain collaborators (like implementations of InputSource) have to be scope="step" and it is quite hard to validate. Lots of docos needed. A minimal custom namespace just for Tasklet configuration might be the most practical solution.
Affects: 1.0-m2
Attachments:
Referenced from: commits 447b1f0, 190b14b, 3b0331a, c99fa10, 5274b55, ef00a5e, 0d4777e, 816c4b2, 13734f0, f8a62d6, d49cc44, 97ce100, a21f0a2, a5107fb, 8a1eae3, 24eab0a, 79c8465, ac6493b, 760b040, 315f08a, dc656d2, 37bfe92, dfa783a, c714494, feeb5bd
The text was updated successfully, but these errors were encountered: