-
Notifications
You must be signed in to change notification settings - Fork 4.1k
STORM-828 HdfsBolt takes a lot of configuration, need good defaults #668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any guarantees that this bolt actually writes out comma separated values? I don't see that, which makes the class name somewhat misleading.
Similar observation about the TSV bolt further down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could set the RecordDefaultDelimiter to ",". Yes, thanks for pointing it out, I will make the changes and it up soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would take much more work than that and a clearer definition of what the CSVFileBolt does, IMO. What should the user expect from the behavior of CSVBolt.emit() ? Will it take arbitrary values from a tuple and join them together with commas? Will it escape characters? etc.
Another approach would be to create a separate bolt that does nothing but generate CSV output and then passes the results to any other bolt the user wanted. I often create pre-processing bolts like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For futher clarification, I was further thinking over this issue and I presume currently the spout emits tuples in the form of fields and the csv or tsv bolt joins them with the specified record format delimiter. If it is a ",", all tuples will be appended by a comma. The HdfsBolt actually does this implementation when the execute is called upon it. The TSV or CSV are just abstractions to get the intended values based on the record delimiter. Can you please let me know what exactly has to be done with an example, I thought I got your point but right now I cannot clearly picture it. It will be great if you can get back with a reply. Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CSV and TSV formats are a bit controversial perhaps we can file a separate JIRA to implement them properly, with escaping, etc. and remove them from here.
|
@redsanket why we need CSVBolt and TSVBolt when there is DelimitedRecordFormat which users can configure |
|
ok that seems to be true, I will change that. Mostly it will be down to refactoring a piece of code, we might not have to make it more specific then. Thanks |
|
@redsanket I am not sure why this patch needs to change all these files. Why can't we just add default values to the variables here https://github.com/apache/storm/blob/master/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/bolt/AbstractHdfsBolt.java#L53 similar to what we have in hive connector https://github.com/apache/storm/blob/master/external/storm-hive/src/main/java/org/apache/storm/hive/common/HiveOptions.java#L33 |
|
Yes I have made the changes, I will update my pull request soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank line.
|
Sorry it took me so long to review this. For the most part things look good. I made a few comments on the HdfBolt that I would like to see reflected in the SequenceFileBolt too. Also an upmerge would be good. Thank you for being so patient with us. |
|
@redsanket Are you still working on this? |
We are closing stale Pull Requests to make the list more manageable. Please re-open any Pull Request that has been closed in error. Closes apache#608 Closes apache#639 Closes apache#640 Closes apache#648 Closes apache#662 Closes apache#668 Closes apache#692 Closes apache#705 Closes apache#724 Closes apache#728 Closes apache#730 Closes apache#753 Closes apache#803 Closes apache#854 Closes apache#922 Closes apache#986 Closes apache#992 Closes apache#1019 Closes apache#1040 Closes apache#1041 Closes apache#1043 Closes apache#1046 Closes apache#1051 Closes apache#1078 Closes apache#1146 Closes apache#1164 Closes apache#1165 Closes apache#1178 Closes apache#1213 Closes apache#1225 Closes apache#1258 Closes apache#1259 Closes apache#1268 Closes apache#1272 Closes apache#1277 Closes apache#1278 Closes apache#1288 Closes apache#1296 Closes apache#1328 Closes apache#1342 Closes apache#1353 Closes apache#1370 Closes apache#1376 Closes apache#1391 Closes apache#1395 Closes apache#1399 Closes apache#1406 Closes apache#1410 Closes apache#1422 Closes apache#1427 Closes apache#1443 Closes apache#1462 Closes apache#1468 Closes apache#1483 Closes apache#1506 Closes apache#1509 Closes apache#1515 Closes apache#1520 Closes apache#1521 Closes apache#1525 Closes apache#1527 Closes apache#1544 Closes apache#1550 Closes apache#1566 Closes apache#1569 Closes apache#1570 Closes apache#1575 Closes apache#1580 Closes apache#1584 Closes apache#1591 Closes apache#1600 Closes apache#1611 Closes apache#1613 Closes apache#1639 Closes apache#1703 Closes apache#1711 Closes apache#1719 Closes apache#1737 Closes apache#1760 Closes apache#1767 Closes apache#1768 Closes apache#1785 Closes apache#1799 Closes apache#1822 Closes apache#1824 Closes apache#1844 Closes apache#1874 Closes apache#1918 Closes apache#1928 Closes apache#1937 Closes apache#1942 Closes apache#1951 Closes apache#1957 Closes apache#1963 Closes apache#1964 Closes apache#1965 Closes apache#1967 Closes apache#1968 Closes apache#1971 Closes apache#1985 Closes apache#1986 Closes apache#1998 Closes apache#2031 Closes apache#2032 Closes apache#2071 Closes apache#2076 Closes apache#2108 Closes apache#2119 Closes apache#2128 Closes apache#2142 Closes apache#2174 Closes apache#2206 Closes apache#2297 Closes apache#2322 Closes apache#2332 Closes apache#2341 Closes apache#2377 Closes apache#2414 Closes apache#2469
We are closing stale Pull Requests to make the list more manageable. Please re-open any Pull Request that has been closed in error. Closes apache#608 Closes apache#639 Closes apache#640 Closes apache#648 Closes apache#662 Closes apache#668 Closes apache#692 Closes apache#705 Closes apache#724 Closes apache#728 Closes apache#730 Closes apache#753 Closes apache#803 Closes apache#854 Closes apache#922 Closes apache#986 Closes apache#992 Closes apache#1019 Closes apache#1040 Closes apache#1041 Closes apache#1043 Closes apache#1046 Closes apache#1051 Closes apache#1078 Closes apache#1146 Closes apache#1164 Closes apache#1165 Closes apache#1178 Closes apache#1213 Closes apache#1225 Closes apache#1258 Closes apache#1259 Closes apache#1268 Closes apache#1272 Closes apache#1277 Closes apache#1278 Closes apache#1288 Closes apache#1296 Closes apache#1328 Closes apache#1342 Closes apache#1353 Closes apache#1370 Closes apache#1376 Closes apache#1391 Closes apache#1395 Closes apache#1399 Closes apache#1406 Closes apache#1410 Closes apache#1422 Closes apache#1427 Closes apache#1443 Closes apache#1462 Closes apache#1468 Closes apache#1483 Closes apache#1506 Closes apache#1509 Closes apache#1515 Closes apache#1520 Closes apache#1521 Closes apache#1525 Closes apache#1527 Closes apache#1544 Closes apache#1550 Closes apache#1566 Closes apache#1569 Closes apache#1570 Closes apache#1575 Closes apache#1580 Closes apache#1584 Closes apache#1591 Closes apache#1600 Closes apache#1611 Closes apache#1613 Closes apache#1639 Closes apache#1703 Closes apache#1711 Closes apache#1719 Closes apache#1737 Closes apache#1760 Closes apache#1767 Closes apache#1768 Closes apache#1785 Closes apache#1799 Closes apache#1822 Closes apache#1824 Closes apache#1844 Closes apache#1874 Closes apache#1918 Closes apache#1928 Closes apache#1937 Closes apache#1942 Closes apache#1951 Closes apache#1957 Closes apache#1963 Closes apache#1964 Closes apache#1965 Closes apache#1967 Closes apache#1968 Closes apache#1971 Closes apache#1985 Closes apache#1986 Closes apache#1998 Closes apache#2031 Closes apache#2032 Closes apache#2071 Closes apache#2076 Closes apache#2108 Closes apache#2119 Closes apache#2128 Closes apache#2142 Closes apache#2174 Closes apache#2206 Closes apache#2297 Closes apache#2322 Closes apache#2332 Closes apache#2341 Closes apache#2377 Closes apache#2414 Closes apache#2469
Removing configs from HDFSFileTopology example. I have made neccessary config and constructor changes as per my understanding. It will be nice to know if there is anything I need to look at.