-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line reader #30
Line reader #30
Conversation
<configuration> | ||
<failBuildInCaseOfConflict>true</failBuildInCaseOfConflict> | ||
</configuration> | ||
</plugin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kokosing The LineReader class is duplicated and this plugin would throw error when building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you excluded this file here: https://github.com/prestodb/presto-hadoop-apache2/pull/30/files#diff-600376dffeb79835ede4a0b285078036R546.
This plugin was added here for a reason to catch that situation. If class is duplicated, then which one will be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That exclude doesn't work. I don't see what this maven-duplicate-finder-plugin is used for. There are no other duplicated classes. As a similar example, the presto-hive-apache pom.xml doesn't include this maven-duplicate-finder-plugin as well. However I added it back but changed the configuration to
true
@electrum what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shade plugin does a duplicate check itself. Unfortunately, it's only a warning and doesn't fail the build, but shading is complicated and doesn't change often, so it seems reasonable that anyone working on it or reviewing can review the build output. (there are lots of other things you have to review manually when shading, this is one of them)
Having the duplicate checker is problematic because it doesn't use the exclusion config from the shade plugin. It has to be configured separately, and can be misleading since you might only configure the duplicate checker and not the shade plugin and would be lead to believe the build was safe.
Hence, my recommendation is to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, for explanation.
Can you submit a PR (or issue) to Hadoop to get this change merged upstream, and then reference it in the code... basically, say, "Remove this class once xxx is resolved in Hadoop" |
3ca2406
to
56d2f8b
Compare
56d2f8b
to
ddd14ec
Compare
The last commit title is too long (notice how GitHub truncates it). Please see guidelines here: https://chris.beams.io/posts/git-commit/ |
Can you add test for |
<configuration> | ||
<failBuildInCaseOfConflict>true</failBuildInCaseOfConflict> | ||
</configuration> | ||
</plugin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shade plugin does a duplicate check itself. Unfortunately, it's only a warning and doesn't fail the build, but shading is complicated and doesn't change often, so it seems reasonable that anyone working on it or reviewing can review the build output. (there are lots of other things you have to review manually when shading, this is one of them)
Having the duplicate checker is problematic because it doesn't use the exclusion config from the shade plugin. It has to be configured separately, and can be misleading since you might only configure the duplicate checker and not the shade plugin and would be lead to believe the build was safe.
Hence, my recommendation is to remove it.
pom.xml
Outdated
@@ -340,6 +340,7 @@ | |||
<version>6.2.1</version> | |||
<scope>test</scope> | |||
</dependency> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this looks like an accidental change
@@ -240,12 +246,19 @@ private int readDefaultLine(Text str, int maxLineLength, int maxBytesToConsume) | |||
appendLength = maxLineLength - txtLength; | |||
} | |||
if (appendLength > 0) { | |||
int newTxtLength = txtLength + appendLength; | |||
if (str.getBytes().length < newTxtLength && Math.max(newTxtLength, txtLength << 1) > MAX_ARRAY_SIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about overflow of txtLength << 1
? Same question for the one below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such case Math.max(newTxtLength, txtLength << 1) will be newTxtLength which is <= maxLineLength and won't overflow. This is the same logic in Text.setCapacity().
ddd14ec
to
2886247
Compare
The shade plugin already checks for duplicate classes.
34edac2
to
95bc0d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments, otherwise looks good
Can you also add a test for |
import static java.nio.charset.StandardCharsets.UTF_8; | ||
import static org.testng.Assert.assertEquals; | ||
|
||
public class TestLineReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: formatting for this is off
95bc0d1
to
e847f66
Compare
e847f66
to
6e6032e
Compare
6e6032e
to
ecff33c
Compare
// There should not be any bytes read into str because the maxLineLength is 0 | ||
reader.readLine(str, 0, 30); | ||
assertEquals(str.getLength(), 0); | ||
assertEquals(str.getBytes(), "".getBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second test will never be hit, because the first one would fail first if the string is non-empty. It would be a bit better to replace both tests with this:
assertEquals(str, new Text());
This will use the equals
method of Text
, which works correctly. The advantage is that on a failure, it will show the results of toString
for the actual and expected values, which makes it easier to debug (compared to just showing the length is different).
The same goes for the other tests.
assertEquals(str.getLength(), 0); | ||
assertEquals(str.getBytes(), "".getBytes()); | ||
} | ||
catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This catch is not needed. Just declare the checked exception for the test method. If it throws, the test will fail.
Same goes for the rest of the methods.
// The LineReader does not store the new line character into str, | ||
// and the str was doubled 3 times so its size is 32, with 5 0-valued | ||
// bytes at the end, so we need to compare just the first 27 characters. | ||
assertEquals(str.getLength(), 27); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write this as
String input = "Hello world! Goodbye world!\n";
...
assertEquals(str, new Text(input));
You want to make tests as simple as possible, both so they are easy to read and to avoid bugs in tests.
Fail the read when the text file line is over the maxLineLength limit.
ecff33c
to
2716e01
Compare
Throw IOException when the text line is too big and
Case 1) would cause OOM, and the fix (the same as the second commit in this PR)
was submitted as the following PR to Apache Hadoop:
apache/hadoop#414
The fix for case 2) and 3) was behavior change so that when the line is over the configured
maxLineLength limit (suppose it's less than the VM array size limit), the LineReader
throws IOException so that the query would fail loudly. The original LineReader behavior
was to silently discard the content for a line after the maxLineLength limit.