Skip to content

Conversation

@pasalkarsachin1
Copy link
Contributor

As part of this change I have removed specific handling in code for TextFileReader & SequenceFileReader also made AbstractFileReader as public

… handling

As part of this change we have removed specific handling in code for TextFileReader & SequenceFileReader also made AbstractFileReader as public
@pasalkarsachin1
Copy link
Contributor Author

Can someone take a look at this?

try {
Class<?> clsType = Class.forName(readerType);
Constructor<?> constructor = clsType.getConstructor(FileSystem.class, Path.class, Map.class, String.class);
Constructor<?> constructor = readerType.getConstructor(FileSystem.class, Path.class, Map.class, String.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes a ctor with expected number and type of params is available and tries to invoke it via reflection. Adding an init/open with required params in the FileReader is much cleaner IMO and avoids having to check and throw exceptions at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same code is being there before my changes, I can create JIRA for this. ATM I am just trying to get rid of specific implementations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by "get rid of specific implementations". I see that you are mainly cleaning up and refactoring the code and so its better if you do it with this patch.

// Reader type config
if ( readerType==null && conf.containsKey(Configs.READER_TYPE) ) {
readerType = conf.get(Configs.READER_TYPE).toString();
String className = (String) conf.get(Configs.READER_TYPE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for this when the spout accepts the readerType via setReaderType ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already there so I updated as per requirement.

}

public HdfsSpout setReaderType(String readerType) {
public HDFSSpout setReaderType(Class<? extends FileReader> readerType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How about backward compatibility ? Changing the signature will break existing clients. May be you can deprecate the earlier method and add a new one like setFileReader(Class<? extends FileReader> reader)

  2. How would one initialize a HDFSSpout via Flux? Earlier since the method accepted a String it was pretty trivial. Try it out and add some example, or add an additional method like setFileReader(String className) that takes the class name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will deprecate this. However adding deprecation will add more code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done please review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are removing support for aliases. Is that right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deprecated as per comments, not removed.

Copy link
Contributor

@roshannaik roshannaik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this PR seems to be guided by misplaced ideas ... and complete disregard for backward compatibility.

import org.apache.storm.tuple.Fields;

public class HdfsSpout extends BaseRichSpout {
public class HDFSSpout extends BaseRichSpout {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you realize what you are doing here ?

@HeartSaVioR
Copy link
Contributor

First of all, thanks for the contribution.

I'm seeing backward incompatible as @roshannaik stated, and like renaming class is not strictly necessary to. Huge part of maintaining open source is fighting with backward compatibility. What you change will affect number of developers. It's joyful thing for open source developers but also huge pain point.

Unless there are critical places so need to change or get rid of, breaking backward compatibility would not be accepted, and even it is really needed, it would be better to discuss widely.

@pasalkarsachin1
Copy link
Contributor Author

pasalkarsachin1 commented Feb 17, 2017

@HeartSaVioR @roshannaik I have reverted class renaming. Other than this I don't see break in backward compatibility

@pasalkarsachin1
Copy link
Contributor Author

@HeartSaVioR @roshannaik @arunmahadevan Can you review this? Let me know if there is any issue you see with backward compatibility.

@roshannaik
Copy link
Contributor

Deprecating/removing aliasing support IS naturally a backward compat issue.

You have expressed your concern that you don't like registering aliases in HdfsSpout. But removing aliasing is not the solution to address your concern.

Like I said before, If you know how to implement aliasing without registering in HdfsSpout then it would address your concern and we can look into this PR.... otherwise there is nothing left to look into here.

@pasalkarsachin1
Copy link
Contributor Author

pasalkarsachin1 commented Feb 20, 2017

@roshannaik Like I said even on Jira STORM-2358. I don't see point having code just to specifically handle alias. Code should be generic to handle all cases. If you want to push to have specific handling I don't think its right contribution & we already have this discussion on STORM-2358. I would like to see view from community.

Let me know from your point of view on backward compatibility. I know moving away from alias will have issues but to keep specific code just to support this is not worth

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 20, 2017

@pasalkarsachin1 @roshannaik
Two things from me...

  1. We should check that it works with Flux. In other words, if Flux doesn't support instantiating Class object and passing it, the change seriously breaks backward compatibility.

  2. To be honest, accepting alias and class in one method doesn't look intuitive. Users should memorize the alias (TEXT, SEQ) to use. Instead of doing this, why not adding shortcut methods so that IDE can help for us? Like text(), sequence() or a bit more longer name if it doesn't feel enough. Other options we can create Enum for known implementations and full class name for others, but the former looks simpler unless we're aliasing too much.

cc. @arunmahadevan

ps. Keeping backward compatibility makes it bad here. If 1 is not true, we should keep method signature as it is, then we can't deprecate the alias usage even we want to get rid of it, since we're not changing method signature or even overload method here.

@arunmahadevan
Copy link
Contributor

We should check that it works with Flux. In other words, if Flux doesn't support instantiating Class object and passing it, the change just breaks backward compatibility.

Agree and pointed out in one of the earlier comments. One suggestion was to have two methods,

  1. setReader(Class<? extends FileReader> reader)
  2. setReaderClass(String className)

To be honest, accepting alias and class in one method doesn't look intuitive. Users should memorize the alias (TEXT, SEQ) to use.

Right now the method accepts special strings "text", "seq" or class name, which looks confusing and not type safe. The proposed patch addresses this by accepting a FileReader class. I am not sure why cant we directly pass the FileReader implementation instead of the class (like setReader(FileReader reader)) and I believe flux will be able to handle this.

The current FileReader implementation is expected to have constructor with a specific signature, which cannot be enforced at compile time. IMO, this should also be fixed along with this patch by introducing some init method in the FileReader interface.

Regarding backward compatibility, if we are fine to break it in the next major version (2.0) we can keep this out of 1.x branch.

@HeartSaVioR
Copy link
Contributor

@arunmahadevan

I am not sure why cant we directly pass the FileReader implementation instead of the class (like setReader(FileReader reader)) and I believe flux will be able to handle this.

It enforces implementation to be Serializable, which HDFS classes (fields in FileReader implementations) are not. That seems why HdfsSpout instantiates FileReader in open method.

@roshannaik
Copy link
Contributor

@HeartSaVioR
I am OK with that line of thinking of handling aliases, as you are looking to make it more intuitive to user. .. although we may not have a real problem to justify public interface changes.

However... the author's concern is quite the opposite ... it is not from a user perspective. He is trying to make code more generic (as he puts it) by removing aliasing ... because we need to make the alias known to Hdfs Spout.... Which means no enums, no text() or sequence() etc.

Side note: This approach of using aliases OR class names in the same place has precedence in other projects as well... Apache Flume uses this pattern a fair amount.

@HeartSaVioR
Copy link
Contributor

@roshannaik
Interesting. Could you point out where Apache Flume uses that approach? Just curious. :)
Btw, if it's from configuration backed by text file, we might have no option. But I think there're still some options to choose since we're assigning it from codebase.

@roshannaik
Copy link
Contributor

roshannaik commented Feb 20, 2017

Clink on this link :
http://flume.apache.org/FlumeUserGuide.html
and search for 'class name'

in file based config also dual methods/settings can be used to keep aliases and class names separate just like in code.

@pasalkarsachin1
Copy link
Contributor Author

pasalkarsachin1 commented Feb 20, 2017

For points made by @HeartSaVioR

For point 2, I have various implementations in mind to add to open source one of them is implemented by megha10 for ZipTextFileReader another can be anySepratorReader like (pipe,comma). If we add enum for every implementation contributor has to do other extra work to enable it rather just adding reader.

Also if user adds his class where he can add it? He has to provide FQCN which comes back to same implementation which I did.

@pasalkarsachin1
Copy link
Contributor Author

The current FileReader implementation is expected to have constructor with a specific signature, which cannot be enforced at compile time. IMO, this should also be fixed along with this patch by introducing some init method in the FileReader interface.`

I am thinking to work on this after this patch we can have init() api which verifies things before it proceeds.

It enforces implementation to be Serializable, which HDFS classes (fields in FileReader implementations) are not. That seems why HdfsSpout instantiates FileReader in open method.`

Once we provide init method we are providing user a way to initialize the non Serializable member.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 20, 2017

@pasalkarsachin1

For point 2, I have various implementations in mind to add to open source one of them is implemented by megha10 for ZipTextFileReader another can be anySepratorReader like (pipe,comma). If we add enum for every implementation contributor has to do other extra work to enable it rather just adding reader.
Also if user adds his class where he can add it? He has to provide FCQN which comes back to same implementation which I did.

For point 2 I suggest adding shortcut, not removing previous one. Providing FQDN is not easiest way to do so we can make some implementations taking precedence and support it via easiest way.

IMHO providing shortcut method is easiest (or similar to providing enum), and providing enum is good enough, and receiving only FQDN is not convenient, but just has to use. First two things make IDE providing various hints, and the last thing doesn't.

Adding it to enum or even providing shortcut method will be done in 10 lines. I don't think it gives contributors more extra works, but we might need to guide it properly.

The current FileReader implementation is expected to have constructor with a specific signature, which cannot be enforced at compile time. IMO, this should also be fixed along with this patch by introducing some init method in the FileReader interface.`
I am thinking to work on this after this patch we can have init() api which verifies things before it proceeds.
Once we provide init method we are providing user a way to initialize the non Serializable member.

There're various ways to do it.
First of all, your suggestion is valid and I've been using that approach.
One other approach is adding method which receives builder instance and build the instance in open() method. Yes builder instance must be serializable, but if it only has configurations it is not that hard.

@pasalkarsachin1
Copy link
Contributor Author

pasalkarsachin1 commented Feb 20, 2017

@HeartSaVioR

For point 2 I suggest adding shortcut, not removing previous one. Providing FQDN is not easiest way to do so we can make some implementations taking precedence and support it via easiest way.
IMHO providing shortcut method is easiest (or similar to providing enum), and providing enum is good enough, and receiving only FQDN is not convenient, but just has to use. First two things make IDE providing various hints, and the last thing doesn't.
Adding it to enum or even providing shortcut method will be done in 10 lines. I don't think it gives contributors more extra works, but we might need to guide it properly.`

My intention was never to use FQCN it was just to use className rather than Constants. Even you look at my code I am just using class name.

setReader(Class<? extends FileReader> reader)

However what you are suggesting is in HdfsSpout I should add something like below

public hdfsSpout useTextFileReader() {
this.readerType = TextFileReader.class;
return this;
}

public hdfsSpout useSequenceFileReader() {
this.readerType = SequenceFileReader.class;
}

Also have a way to provide way to support FQCN, Correct?

If everyone agrees I will do the change.

@pasalkarsachin1
Copy link
Contributor Author

@HeartSaVioR @roshannaik @arunmahadevan are you OK with this approach?

d2r pushed a commit to d2r/storm that referenced this pull request Oct 16, 2018
We are closing stale Pull Requests to make the list more manageable.

Please re-open any Pull Request that has been closed in error.

Closes apache#608
Closes apache#639
Closes apache#640
Closes apache#648
Closes apache#662
Closes apache#668
Closes apache#692
Closes apache#705
Closes apache#724
Closes apache#728
Closes apache#730
Closes apache#753
Closes apache#803
Closes apache#854
Closes apache#922
Closes apache#986
Closes apache#992
Closes apache#1019
Closes apache#1040
Closes apache#1041
Closes apache#1043
Closes apache#1046
Closes apache#1051
Closes apache#1078
Closes apache#1146
Closes apache#1164
Closes apache#1165
Closes apache#1178
Closes apache#1213
Closes apache#1225
Closes apache#1258
Closes apache#1259
Closes apache#1268
Closes apache#1272
Closes apache#1277
Closes apache#1278
Closes apache#1288
Closes apache#1296
Closes apache#1328
Closes apache#1342
Closes apache#1353
Closes apache#1370
Closes apache#1376
Closes apache#1391
Closes apache#1395
Closes apache#1399
Closes apache#1406
Closes apache#1410
Closes apache#1422
Closes apache#1427
Closes apache#1443
Closes apache#1462
Closes apache#1468
Closes apache#1483
Closes apache#1506
Closes apache#1509
Closes apache#1515
Closes apache#1520
Closes apache#1521
Closes apache#1525
Closes apache#1527
Closes apache#1544
Closes apache#1550
Closes apache#1566
Closes apache#1569
Closes apache#1570
Closes apache#1575
Closes apache#1580
Closes apache#1584
Closes apache#1591
Closes apache#1600
Closes apache#1611
Closes apache#1613
Closes apache#1639
Closes apache#1703
Closes apache#1711
Closes apache#1719
Closes apache#1737
Closes apache#1760
Closes apache#1767
Closes apache#1768
Closes apache#1785
Closes apache#1799
Closes apache#1822
Closes apache#1824
Closes apache#1844
Closes apache#1874
Closes apache#1918
Closes apache#1928
Closes apache#1937
Closes apache#1942
Closes apache#1951
Closes apache#1957
Closes apache#1963
Closes apache#1964
Closes apache#1965
Closes apache#1967
Closes apache#1968
Closes apache#1971
Closes apache#1985
Closes apache#1986
Closes apache#1998
Closes apache#2031
Closes apache#2032
Closes apache#2071
Closes apache#2076
Closes apache#2108
Closes apache#2119
Closes apache#2128
Closes apache#2142
Closes apache#2174
Closes apache#2206
Closes apache#2297
Closes apache#2322
Closes apache#2332
Closes apache#2341
Closes apache#2377
Closes apache#2414
Closes apache#2469
d2r pushed a commit to d2r/storm that referenced this pull request Oct 16, 2018
We are closing stale Pull Requests to make the list more manageable.

Please re-open any Pull Request that has been closed in error.

Closes apache#608
Closes apache#639
Closes apache#640
Closes apache#648
Closes apache#662
Closes apache#668
Closes apache#692
Closes apache#705
Closes apache#724
Closes apache#728
Closes apache#730
Closes apache#753
Closes apache#803
Closes apache#854
Closes apache#922
Closes apache#986
Closes apache#992
Closes apache#1019
Closes apache#1040
Closes apache#1041
Closes apache#1043
Closes apache#1046
Closes apache#1051
Closes apache#1078
Closes apache#1146
Closes apache#1164
Closes apache#1165
Closes apache#1178
Closes apache#1213
Closes apache#1225
Closes apache#1258
Closes apache#1259
Closes apache#1268
Closes apache#1272
Closes apache#1277
Closes apache#1278
Closes apache#1288
Closes apache#1296
Closes apache#1328
Closes apache#1342
Closes apache#1353
Closes apache#1370
Closes apache#1376
Closes apache#1391
Closes apache#1395
Closes apache#1399
Closes apache#1406
Closes apache#1410
Closes apache#1422
Closes apache#1427
Closes apache#1443
Closes apache#1462
Closes apache#1468
Closes apache#1483
Closes apache#1506
Closes apache#1509
Closes apache#1515
Closes apache#1520
Closes apache#1521
Closes apache#1525
Closes apache#1527
Closes apache#1544
Closes apache#1550
Closes apache#1566
Closes apache#1569
Closes apache#1570
Closes apache#1575
Closes apache#1580
Closes apache#1584
Closes apache#1591
Closes apache#1600
Closes apache#1611
Closes apache#1613
Closes apache#1639
Closes apache#1703
Closes apache#1711
Closes apache#1719
Closes apache#1737
Closes apache#1760
Closes apache#1767
Closes apache#1768
Closes apache#1785
Closes apache#1799
Closes apache#1822
Closes apache#1824
Closes apache#1844
Closes apache#1874
Closes apache#1918
Closes apache#1928
Closes apache#1937
Closes apache#1942
Closes apache#1951
Closes apache#1957
Closes apache#1963
Closes apache#1964
Closes apache#1965
Closes apache#1967
Closes apache#1968
Closes apache#1971
Closes apache#1985
Closes apache#1986
Closes apache#1998
Closes apache#2031
Closes apache#2032
Closes apache#2071
Closes apache#2076
Closes apache#2108
Closes apache#2119
Closes apache#2128
Closes apache#2142
Closes apache#2174
Closes apache#2206
Closes apache#2297
Closes apache#2322
Closes apache#2332
Closes apache#2341
Closes apache#2377
Closes apache#2414
Closes apache#2469
@asfgit asfgit closed this in #2880 Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants