-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MSHARED-1453] Canonicalize properties files #77
Conversation
pw.println(l); | ||
for (String line : lines) { | ||
writer.write(line); | ||
writer.write( '\n' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System.lineSeparator()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moot. This code can be deleted now that we're using properties.store
@@ -71,8 +71,8 @@ private void createPropertiesFile(Properties properties, Path outputFile, boolea | |||
return; | |||
} | |||
|
|||
try (PrintWriter pw = new PrintWriter(outputFile.toFile(), StandardCharsets.ISO_8859_1.name()); | |||
StringWriter sw = new StringWriter()) { | |||
try ( Writer writer = Files.newBufferedWriter(outputFile, StandardCharsets.ISO_8859_1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loadPropertiesFile
method let the Properties
class decide the charset used (so it's using ISO_8859_1, as it's the default). I think we should use an OutputStream
here and pass it to properties.store()
without the charset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. That makes this much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code sorted the properties and removed comments. Ive dropped that for now. Was it necessary for reproducible builds or something?
@@ -71,8 +71,8 @@ private void createPropertiesFile(Properties properties, Path outputFile, boolea | |||
return; | |||
} | |||
|
|||
try (PrintWriter pw = new PrintWriter(outputFile.toFile(), StandardCharsets.ISO_8859_1.name()); | |||
StringWriter sw = new StringWriter()) { | |||
try ( Writer writer = Files.newBufferedWriter(outputFile, StandardCharsets.ISO_8859_1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. That makes this much simpler.
pw.println(l); | ||
for (String line : lines) { | ||
writer.write(line); | ||
writer.write( '\n' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moot. This code can be deleted now that we're using properties.store
Ah, could be. Having a stable output is really important imho. |
OK. I'm going to add some tests to this too. The class is sorely lacking in them. |
pw.println(l); | ||
} | ||
try (OutputStream out = Files.newOutputStream(outputFile)) { | ||
properties.store(out, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use properties.store
here.
The removal of comments and the ordering is definitely important for reproducible builds.
We could refactor the code to use BufferedReader.lines()
, filter out comments, sort, and print using the streams api.
for (String key : sortedPropertyNames) { | ||
out.write(key); | ||
out.write(": "); | ||
out.write(unsortedProperties.getProperty(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's wrong. We need escaping for both key and value here. See https://github.com/openjdk/jdk/blob/8c2b4f62714f26ab3bc4808c734502af632a1eef/src/java.base/share/classes/java/util/Properties.java#L686-L738
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see why the original code wrote out a properties file and read it back in. That avoided the need to reimplement the escaping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, that might not always work. It relies on the Properties format being consistent across VMs and Java versions and it doesn't have to be. We're accounting for comments, separator character, and ordering here, but there are other possible differences that can occur. It's risky to assume that Eclipse Temurin 21 is going to produce the same output as OpenJDK 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's risky. That part is very clearly specified. Changing the way properties file are stored would be a big breakage.
See https://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#store-java.io.Writer-java.lang.String- for an exact explanation of what is written.
The way key / separator / value are written is clearly specified and has not changed from JDK 8 to JDK 24.
Then every entry in this Properties table is written out, one per line. For each entry the key string is written, then an ASCII =, then the associated element string. For the key, all space characters are written with a preceding \ character. For the element, leading space characters, but not embedded or trailing space characters, are written with a preceding \ character. The key and element characters #, !, =, and : are written with a preceding backslash to ensure that they are properly loaded.
Feel free to re-implement this mechanism, but I'm not really sure it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To guarantee the output, I think we are going to need to reimplement the escaping here. This could be tricky. What are the rules about copy-pasting OpenJDK code into an Apache project? We probably can't do that since it's GPL, not ASpache licensed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also:
[...] the input/output stream is encoded in ISO 8859-1 character encoding. Characters that cannot be directly represented in this encoding can be written using Unicode escapes as defined in section [3.3 (https://docs.oracle.com/javase/specs/jls/se24/html/jls-3.html#jls-3.3) of The Java Language Specification; only a single 'u' character is allowed in an escape sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To guarantee the output, I think we are going to need to reimplement the escaping here. This could be tricky. What are the rules about copy-pasting OpenJDK code into an Apache project? We probably can't do that since it's GPL, not ASpache licensed.
The output is already guaranteed by the javadoc of the Properties.store()
method. I really don't see what additional guarantees you're looking for.
try (Writer out = Files.newBufferedWriter(outputFile, StandardCharsets.ISO_8859_1)) { | ||
for (String key : sortedPropertyNames) { | ||
out.write(key); | ||
out.write(": "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separator used in properties.store()
is =
without spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Colon is allowed. Per wikipedia "There are 3 delimiting characters: equal ('='), colon (':') and whitespace (' ', '\t' and '\f')." but I'll change it. Note the test case passes.
I think the assumption driving this issue is wrong. All JDK output the exact same file for a given output, apart from: the comment date, and the order of properties. |
The javadoc from JDK 1.4 is actually more concise, because it did only support ISO 8859-1 at that time:
|
I'd go for something like:
|
} | ||
} | ||
} | ||
|
||
private static String escape(String s) { | ||
String escaped = StringEscapeUtils.escapeJava(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's wrong. It's not general java escaping mechanism. The escaping is specific to the properties file. There are rules for spaces and separators =
, :
, and the rules are slightly different for the key and for the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know. I'm working on the additional pieces. That's just the quickest way to handle the Unicode part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the quickest way is to not reimplement the whole thing.
Please look at https://github.com/apache/maven-archiver/pull/79/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unicode encoding is clearly not well supported in the current code, but the PR I pointed above fixes it. The reason is that calling Properties.store(Writer)
bypasses the encoding, while calling Properties.store(OutputStream)
correctly supports unicode.
I do remember @hboutemy doing this already somewhere... |
// Now read the file directly to check for alphabetical order and encoding | ||
List<String> contents = Files.readAllLines(pomPropertiesFile, StandardCharsets.ISO_8859_1); | ||
assertEquals(4, contents.size()); | ||
assertEquals("a\\ key\\ with\\\twhitespace=value\\ with\\\twhitespace", contents.get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test is wrong afaik. The mechanism is different for keys and values
Then every entry in this Properties table is written out, one per line. For each entry the key string is written, then an ASCII =, then the associated element string. For the key, all space characters are written with a preceding \ character. For the element, leading space characters, but not embedded or trailing space characters, are written with a preceding \ character. The key and element characters #, !, =, and : are written with a preceding backslash to ensure that they are properly loaded.
if (Character.isWhitespace(c) || c == '#' || c == '!' || c == '=' || c == ':') { // backslash escape | ||
sb.append('\\'); | ||
sb.append(c); | ||
} else if (c < 256) { // 8859-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,. this one I don't see. What's wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following test succeeds:
Properties p = new Properties();
p.put("foo", "aéüb");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
p.store(baos, null);
String s = baos.toString();
assertTrue(s.contains("foo=a\\u00E9\\u00FCb"));
So non plain ascii chars should be unicode encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to spend more time on re-implementing something from the JDK for no benefit really...
Me as well. |
No description provided.