-
Notifications
You must be signed in to change notification settings - Fork 4.1k
STORM-2358: Update storm hdfs spout to remove specific implementation handling #1942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… handling As part of this change we have removed specific handling in code for TextFileReader & SequenceFileReader also made AbstractFileReader as public
|
Can someone take a look at this? |
| try { | ||
| Class<?> clsType = Class.forName(readerType); | ||
| Constructor<?> constructor = clsType.getConstructor(FileSystem.class, Path.class, Map.class, String.class); | ||
| Constructor<?> constructor = readerType.getConstructor(FileSystem.class, Path.class, Map.class, String.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes a ctor with expected number and type of params is available and tries to invoke it via reflection. Adding an init/open with required params in the FileReader is much cleaner IMO and avoids having to check and throw exceptions at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same code is being there before my changes, I can create JIRA for this. ATM I am just trying to get rid of specific implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by "get rid of specific implementations". I see that you are mainly cleaning up and refactoring the code and so its better if you do it with this patch.
| // Reader type config | ||
| if ( readerType==null && conf.containsKey(Configs.READER_TYPE) ) { | ||
| readerType = conf.get(Configs.READER_TYPE).toString(); | ||
| String className = (String) conf.get(Configs.READER_TYPE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the need for this when the spout accepts the readerType via setReaderType ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already there so I updated as per requirement.
| } | ||
|
|
||
| public HdfsSpout setReaderType(String readerType) { | ||
| public HDFSSpout setReaderType(Class<? extends FileReader> readerType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
How about backward compatibility ? Changing the signature will break existing clients. May be you can deprecate the earlier method and add a new one like
setFileReader(Class<? extends FileReader> reader) -
How would one initialize a
HDFSSpoutvia Flux? Earlier since the method accepted a String it was pretty trivial. Try it out and add some example, or add an additional method likesetFileReader(String className)that takes the class name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I will deprecate this. However adding deprecation will add more code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done please review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you are removing support for aliases. Is that right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have deprecated as per comments, not removed.
roshannaik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this PR seems to be guided by misplaced ideas ... and complete disregard for backward compatibility.
| import org.apache.storm.tuple.Fields; | ||
|
|
||
| public class HdfsSpout extends BaseRichSpout { | ||
| public class HDFSSpout extends BaseRichSpout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you realize what you are doing here ?
|
First of all, thanks for the contribution. I'm seeing backward incompatible as @roshannaik stated, and like renaming class is not strictly necessary to. Huge part of maintaining open source is fighting with backward compatibility. What you change will affect number of developers. It's joyful thing for open source developers but also huge pain point. Unless there are critical places so need to change or get rid of, breaking backward compatibility would not be accepted, and even it is really needed, it would be better to discuss widely. |
|
@HeartSaVioR @roshannaik I have reverted class renaming. Other than this I don't see break in backward compatibility |
|
@HeartSaVioR @roshannaik @arunmahadevan Can you review this? Let me know if there is any issue you see with backward compatibility. |
|
Deprecating/removing aliasing support IS naturally a backward compat issue. You have expressed your concern that you don't like registering aliases in HdfsSpout. But removing aliasing is not the solution to address your concern. Like I said before, If you know how to implement aliasing without registering in HdfsSpout then it would address your concern and we can look into this PR.... otherwise there is nothing left to look into here. |
|
@roshannaik Like I said even on Jira STORM-2358. I don't see point having code just to specifically handle alias. Code should be generic to handle all cases. If you want to push to have specific handling I don't think its right contribution & we already have this discussion on STORM-2358. I would like to see view from community. Let me know from your point of view on backward compatibility. I know moving away from alias will have issues but to keep specific code just to support this is not worth |
|
@pasalkarsachin1 @roshannaik
cc. @arunmahadevan ps. Keeping backward compatibility makes it bad here. If 1 is not true, we should keep method signature as it is, then we can't deprecate the alias usage even we want to get rid of it, since we're not changing method signature or even overload method here. |
Agree and pointed out in one of the earlier comments. One suggestion was to have two methods,
Right now the method accepts special strings "text", "seq" or class name, which looks confusing and not type safe. The proposed patch addresses this by accepting a FileReader class. I am not sure why cant we directly pass the FileReader implementation instead of the class (like The current FileReader implementation is expected to have constructor with a specific signature, which cannot be enforced at compile time. IMO, this should also be fixed along with this patch by introducing some init method in the FileReader interface. Regarding backward compatibility, if we are fine to break it in the next major version (2.0) we can keep this out of 1.x branch. |
It enforces implementation to be |
|
@HeartSaVioR However... the author's concern is quite the opposite ... it is not from a user perspective. He is trying to make code more generic (as he puts it) by removing aliasing ... because we need to make the alias known to Hdfs Spout.... Which means no enums, no text() or sequence() etc. Side note: This approach of using aliases OR class names in the same place has precedence in other projects as well... Apache Flume uses this pattern a fair amount. |
|
@roshannaik |
|
Clink on this link : in file based config also dual methods/settings can be used to keep aliases and class names separate just like in code. |
|
For points made by @HeartSaVioR For point 2, I have various implementations in mind to add to open source one of them is implemented by megha10 for ZipTextFileReader another can be anySepratorReader like (pipe,comma). If we add enum for every implementation contributor has to do other extra work to enable it rather just adding reader. Also if user adds his class where he can add it? He has to provide FQCN which comes back to same implementation which I did. |
I am thinking to work on this after this patch we can have init() api which verifies things before it proceeds.
Once we provide init method we are providing user a way to initialize the non Serializable member. |
For point 2 I suggest adding shortcut, not removing previous one. Providing FQDN is not easiest way to do so we can make some implementations taking precedence and support it via easiest way. IMHO providing shortcut method is easiest (or similar to providing enum), and providing enum is good enough, and receiving only FQDN is not convenient, but just has to use. First two things make IDE providing various hints, and the last thing doesn't. Adding it to enum or even providing shortcut method will be done in 10 lines. I don't think it gives contributors more extra works, but we might need to guide it properly.
There're various ways to do it. |
My intention was never to use FQCN it was just to use className rather than Constants. Even you look at my code I am just using class name. setReader(Class<? extends FileReader> reader)However what you are suggesting is in HdfsSpout I should add something like below public hdfsSpout useTextFileReader() {
this.readerType = TextFileReader.class;
return this;
}
public hdfsSpout useSequenceFileReader() {
this.readerType = SequenceFileReader.class;
}Also have a way to provide way to support FQCN, Correct? If everyone agrees I will do the change. |
|
@HeartSaVioR @roshannaik @arunmahadevan are you OK with this approach? |
We are closing stale Pull Requests to make the list more manageable. Please re-open any Pull Request that has been closed in error. Closes apache#608 Closes apache#639 Closes apache#640 Closes apache#648 Closes apache#662 Closes apache#668 Closes apache#692 Closes apache#705 Closes apache#724 Closes apache#728 Closes apache#730 Closes apache#753 Closes apache#803 Closes apache#854 Closes apache#922 Closes apache#986 Closes apache#992 Closes apache#1019 Closes apache#1040 Closes apache#1041 Closes apache#1043 Closes apache#1046 Closes apache#1051 Closes apache#1078 Closes apache#1146 Closes apache#1164 Closes apache#1165 Closes apache#1178 Closes apache#1213 Closes apache#1225 Closes apache#1258 Closes apache#1259 Closes apache#1268 Closes apache#1272 Closes apache#1277 Closes apache#1278 Closes apache#1288 Closes apache#1296 Closes apache#1328 Closes apache#1342 Closes apache#1353 Closes apache#1370 Closes apache#1376 Closes apache#1391 Closes apache#1395 Closes apache#1399 Closes apache#1406 Closes apache#1410 Closes apache#1422 Closes apache#1427 Closes apache#1443 Closes apache#1462 Closes apache#1468 Closes apache#1483 Closes apache#1506 Closes apache#1509 Closes apache#1515 Closes apache#1520 Closes apache#1521 Closes apache#1525 Closes apache#1527 Closes apache#1544 Closes apache#1550 Closes apache#1566 Closes apache#1569 Closes apache#1570 Closes apache#1575 Closes apache#1580 Closes apache#1584 Closes apache#1591 Closes apache#1600 Closes apache#1611 Closes apache#1613 Closes apache#1639 Closes apache#1703 Closes apache#1711 Closes apache#1719 Closes apache#1737 Closes apache#1760 Closes apache#1767 Closes apache#1768 Closes apache#1785 Closes apache#1799 Closes apache#1822 Closes apache#1824 Closes apache#1844 Closes apache#1874 Closes apache#1918 Closes apache#1928 Closes apache#1937 Closes apache#1942 Closes apache#1951 Closes apache#1957 Closes apache#1963 Closes apache#1964 Closes apache#1965 Closes apache#1967 Closes apache#1968 Closes apache#1971 Closes apache#1985 Closes apache#1986 Closes apache#1998 Closes apache#2031 Closes apache#2032 Closes apache#2071 Closes apache#2076 Closes apache#2108 Closes apache#2119 Closes apache#2128 Closes apache#2142 Closes apache#2174 Closes apache#2206 Closes apache#2297 Closes apache#2322 Closes apache#2332 Closes apache#2341 Closes apache#2377 Closes apache#2414 Closes apache#2469
We are closing stale Pull Requests to make the list more manageable. Please re-open any Pull Request that has been closed in error. Closes apache#608 Closes apache#639 Closes apache#640 Closes apache#648 Closes apache#662 Closes apache#668 Closes apache#692 Closes apache#705 Closes apache#724 Closes apache#728 Closes apache#730 Closes apache#753 Closes apache#803 Closes apache#854 Closes apache#922 Closes apache#986 Closes apache#992 Closes apache#1019 Closes apache#1040 Closes apache#1041 Closes apache#1043 Closes apache#1046 Closes apache#1051 Closes apache#1078 Closes apache#1146 Closes apache#1164 Closes apache#1165 Closes apache#1178 Closes apache#1213 Closes apache#1225 Closes apache#1258 Closes apache#1259 Closes apache#1268 Closes apache#1272 Closes apache#1277 Closes apache#1278 Closes apache#1288 Closes apache#1296 Closes apache#1328 Closes apache#1342 Closes apache#1353 Closes apache#1370 Closes apache#1376 Closes apache#1391 Closes apache#1395 Closes apache#1399 Closes apache#1406 Closes apache#1410 Closes apache#1422 Closes apache#1427 Closes apache#1443 Closes apache#1462 Closes apache#1468 Closes apache#1483 Closes apache#1506 Closes apache#1509 Closes apache#1515 Closes apache#1520 Closes apache#1521 Closes apache#1525 Closes apache#1527 Closes apache#1544 Closes apache#1550 Closes apache#1566 Closes apache#1569 Closes apache#1570 Closes apache#1575 Closes apache#1580 Closes apache#1584 Closes apache#1591 Closes apache#1600 Closes apache#1611 Closes apache#1613 Closes apache#1639 Closes apache#1703 Closes apache#1711 Closes apache#1719 Closes apache#1737 Closes apache#1760 Closes apache#1767 Closes apache#1768 Closes apache#1785 Closes apache#1799 Closes apache#1822 Closes apache#1824 Closes apache#1844 Closes apache#1874 Closes apache#1918 Closes apache#1928 Closes apache#1937 Closes apache#1942 Closes apache#1951 Closes apache#1957 Closes apache#1963 Closes apache#1964 Closes apache#1965 Closes apache#1967 Closes apache#1968 Closes apache#1971 Closes apache#1985 Closes apache#1986 Closes apache#1998 Closes apache#2031 Closes apache#2032 Closes apache#2071 Closes apache#2076 Closes apache#2108 Closes apache#2119 Closes apache#2128 Closes apache#2142 Closes apache#2174 Closes apache#2206 Closes apache#2297 Closes apache#2322 Closes apache#2332 Closes apache#2341 Closes apache#2377 Closes apache#2414 Closes apache#2469
As part of this change I have removed specific handling in code for TextFileReader & SequenceFileReader also made AbstractFileReader as public