Resolve more java field accessor name conflicts #8198

philsttr · 2021-01-10T22:40:14Z

Previously, some proto field names would cause the java code generator to generate accessor names that conflict with method names from the message super classes/interfaces, leading to java code that would not compile.
A list of field names that cause such conflicts previously existed, but the list did not contain every field name that would cause a conflict.
Additionally, only snake_case field names would be detected. If the field name was in camelCase or began with a leading underscore, the conflict would not be detected.

This change adds the complete set of field names that will cause accessor name conflicts, and detects conflicts in snake_case, _snake_case (with a leading underscore), and camelCase field names.

Fixes #8142

Ignore java/lite/target

Previously, some proto field names would cause the java code generator to generate accessor names that conflict with method names from the message super classes/interfaces. A list of field names that cause such conflicts previously existed, but the list did not contain every field name that would cause a conflict. Additionally, only snake_case field names would be detected. If the field name was in camelCase, the conflict would not be detected. This change adds the complete set of field names that will cause assessor name conflicts, and detects conflicts in both snake_case and camelCase field names. Fixes protocolbuffers#8142

philsttr · 2021-02-05T17:33:13Z

Can someone review this PR?

Side note, I can't add the required labels (release notes: yes, java) since I do not have permission.

JonathanLeech · 2021-02-17T20:27:15Z

Does this fix a field named _class?

…res. Previously, some protobuf field names beginning with leading underscores (e.g. _class) would cause uncompilable java code to be generated due to assessor name conflicts. Now, non-conflicting java accessor method names are created for those fields

philsttr · 2021-02-19T01:11:10Z

@JonathanLeech It didn't before, but I just pushed an update to this PR so that it now handles fields named _class (and also for other conflicting names beginning with a leading underscore)

shaod2 · 2021-06-07T17:49:41Z

Thanks for the fix!

Since we're changing the field names somehow, can you also add a log warning somewhere (e.g. conflicting field names/will be changed)? I know there isn't one before, but I think it'd be good to add one here.

philsttr · 2021-06-07T18:20:44Z

That's a good idea.

I might not be able to get to it for a little while. Would you consider merging this bug-fix PR as-is? And create an enhancement feature request for a warning message? I can submit a new PR for that when I get a chance.

java/core/src/main/java/com/google/protobuf/DescriptorMessageInfoFactory.java

…l case Rename snakeCaseToCamelCase to snakeCaseToLowerCamelCase Add snakeCaseToUpperCamelCase Add clarifying in-line comments for field name generation Remove explicit version numbers from references.

…m:philsttr/protobuf into resolve_java_field_name_accessor_conflict

java/core/src/main/java/com/google/protobuf/DescriptorMessageInfoFactory.java

elharo · 2021-11-08T15:35:57Z

java/core/src/main/java/com/google/protobuf/DescriptorMessageInfoFactory.java

+
+    String suffix = specialFieldNames.contains(upperCamelCaseName)
+                  // For field names that match the specialFieldNames,
+                  // append "__" to prevent field accessor method names from


Why both __ and _? Is there a conflict if both cases uses a single underscore?

The single and double underscores referenced here are simply to match the java code that is currently generated by the protoc java compiler. Notice the previous code used the single and double underscore. In other words, my change does not change the existing semantics for single and double underscores.

The protoc java compiler currently (prior to my change) uses a single or double underscore depending on the proto field name.

The protoc java compiler always appends at least one underscore to java field names. This prevents those field names from clashing with java keywords. For example, if the proto field name was int, then the protoc java compiler would generate a java field name of int_.

The protoc java compiler does not include the last trailing underscore in accessor method names for these fields. For example, the getter method for the proto field named int would be getInt().

In addition, the protoc java compiler adds another underscore (for a total of two underscores) to the java field name for names that would result in accessor method names that clash with other method names. For example, if the proto field name was class, the java field name would be class__ (double underscore). And again, the last trailing underscore is not included in the accessor method names. So the getter method for the proto field named class would be getClass_()

This behavior existed prior to my change. My change simply adds more detection of field names that result in accessor method name clashes. Previously, only cached_size, serialized_size, and class were detected. Unfortunately, those are only a subset of the proto field names that would actually cause java assessor method name conflicts. My change adds the complete set of proto field names that cause java accessor method name conflicts.

Again, I'm not changing the semantics of single or double underscores. I am also not going to refactor the protoc compiler to change its current semantics for single and double underscores. I'm simply fixing a bug in the protoc compiler where it was generating non-compilable java code due to missing detection of field names that cause conflicts.

elharo · 2021-11-08T15:37:04Z

java/core/src/main/java/com/google/protobuf/DescriptorMessageInfoFactory.java

+                  // append "__" to prevent field accessor method names from
+                  // clashing with other methods.
+                  ? "__"
+                  // For other field names, append "_" to prevent field names


Should field names here be field method accessor names as well?

No, the protoc compiler appends a single underscore specifically to the java field names to prevent clashes with java keywords. The protoc compiler does not include this last undescore in accessor method names.

elharo · 2021-11-08T15:40:34Z

java/core/src/main/java/com/google/protobuf/DescriptorMessageInfoFactory.java

+                  ? "__"
+                  // For other field names, append "_" to prevent field names
+                  // from clashing with java keywords.
+                  : "_";


Don't split the ternary ?: operator across these lines. I found this really hard to read. Possibly this method should be split up, one for handling method names and one for handling field names. As is, I'm finding it very hard to grok. I did not initially understand what it was doing.

I removed the ternary operator and improved the comments. As stated in my other comment, I'm not going to refactor the semantics of how the protoc compiler uses single and double underscores. I'm also not going to refactor how accessor method names are currently generated from field names. Both would be great improvements, but they are outside of the scope of this bugfix.

For comparison, take a look at the previous implementation of this method...

protobuf/java/core/src/main/java/com/google/protobuf/DescriptorMessageInfoFactory.java

Lines 592 to 598 in 3a4d931

static String getFieldName(FieldDescriptor fd) {

String name = (fd.getType() == FieldDescriptor.Type.GROUP)

? fd.getMessageType().getName()

: fd.getName();

String suffix = specialFieldNames.contains(name) ? "__" : "_";

return snakeCaseToCamelCase(name) + suffix;

}

I believe the comments in the new implementation have made it more clear, and are sufficient for this bugfix.

are colliding -> collide

zhangskz · 2021-11-09T18:02:52Z

Thanks for the contribution! This change seems alright, but does differ from similar existing behavior (

protobuf/src/google/protobuf/compiler/java/java_context.cc

Line 163 in d37dcf9

info.capitalized_name += StrCat(field->number());

) which instead appends field numbers to handle clashes rather than underscores.

I'd like to hold off a bit before approving to take a closer look at how best to handle this as there are some pros/cons to each solution, but ultimately we'd probably want a consistent approach for this type of scenario.

philsttr · 2021-11-09T18:22:59Z

I would like to reiterate that the underscores are existing behavior as well. For example, the existing code (prior to my change) will generate a java field named class_ and a getter method named getClass_() for a proto field named class. And similarly for proto fields named cached_size and serialized_size.

This change just adds more detection of field names that cause conflicts, and utilizes the existing behavior to resolve those conflicts.

I understand the desire for collapsing the two previously existing behaviors though. But note, you will not be able to change the existing behavior for fields named class, cached_size, and serialized_size without breaking compatibility.

philsttr · 2022-01-12T17:20:12Z

Hi @zhangskz What were the results of your closer look?

zhangskz · 2022-02-02T22:49:53Z

I haven't had the opportunity to get through a complete audit of how we handle escaping across conflict types and languages. This would actually be a large undertaking that would need to be approached carefully since modifying this would indeed break compatibility. In any case, this shouldn't affect your fix since it doesn't really make the incompatibilities much worse -- we already use both underscores to escape in some cases.

I'll review your CL shortly (aka tomorrow). Thanks for the patience!

zhangskz · 2022-02-03T15:53:54Z

java/core/src/main/java/com/google/protobuf/DescriptorMessageInfoFactory.java

+
+    // convert to UpperCamelCase for comparison to the specialFieldNames
+    // (which are in UpperCamelCase)
+    String upperCamelCaseName = snakeCaseToUpperCamelCase(name);


I think snakeCaseToUpperCamelCase is only ever used for checking against specialFieldNames. Can we just keep specialFieldNames in lower camel case instead? This would simplify our code and IMO is a bit more intuitive anyways.

The specialFieldNames are in UpperCamelCase to handle proto fields named _my_field, since _my_field will be MyField when it is converted to camel case (either upper or lower). It seemed more straightforward to keep the special field names in UpperCamelCase for comparison (since it handles proto fields named both my_field, _my_field, and __my_field), rather than adding additional logic to handle proto fields starting with _.

Note the tests that use ForbiddenWordsLeadingUnderscoreMessage test this case (pun intended ;)).

philsttr added 2 commits January 10, 2021 14:28

Remove javanano from .gitignore.

af6ce22

Ignore java/lite/target

google-cla bot added the cla: yes label Jan 10, 2021

acozzette added java release notes: yes labels Feb 5, 2021

protobuf-kokoro removed the kokoro:run label Feb 5, 2021

TeBoring requested a review from shaod2 June 1, 2021 22:30

Merge branch 'master' into resolve_java_field_name_accessor_conflict

1f8baae

elharo added the kokoro:run label Oct 1, 2021

elharo self-requested a review October 1, 2021 13:11

protobuf-kokoro removed the kokoro:run label Oct 1, 2021

acozzette added the kokoro:force-run label Oct 13, 2021

protobuf-kokoro removed the kokoro:force-run label Oct 13, 2021

elharo added kokoro:force-run labels Oct 24, 2021

protobuf-kokoro removed kokoro:run labels Oct 24, 2021

elharo reviewed Oct 24, 2021

View reviewed changes

philsttr added 2 commits October 24, 2021 11:04

Improve comments/documentation for conversion from snake case to came…

4038de8

…l case Rename snakeCaseToCamelCase to snakeCaseToLowerCamelCase Add snakeCaseToUpperCamelCase Add clarifying in-line comments for field name generation Remove explicit version numbers from references.

Merge branch 'resolve_java_field_name_accessor_conflict' of github.co…

7c0c1fc

…m:philsttr/protobuf into resolve_java_field_name_accessor_conflict

philsttr requested a review from elharo October 24, 2021 18:11

elharo reviewed Nov 5, 2021

View reviewed changes

Fix indents and typo

9f88f0c

Unnest <pre> tag

8663cff

philsttr requested a review from elharo November 5, 2021 22:35

elharo added kokoro:force-run labels Nov 8, 2021

protobuf-kokoro removed kokoro:run labels Nov 8, 2021

elharo reviewed Nov 8, 2021

View reviewed changes

philsttr added 2 commits November 8, 2021 09:19

improve grammar in comments

bbd504b

are colliding -> collide

Remove ternary operator and improve comments

ce16ade

elharo approved these changes Nov 8, 2021

View reviewed changes

elharo added kokoro:force-run labels Nov 8, 2021

protobuf-kokoro removed kokoro:run labels Nov 8, 2021

Fix typo in comment

7427439

zhangskz self-requested a review November 9, 2021 17:49

acozzette added the kokoro:run label Feb 2, 2022

protobuf-kokoro removed the kokoro:run label Feb 2, 2022

zhangskz reviewed Feb 3, 2022

View reviewed changes

zhangskz approved these changes Feb 3, 2022

View reviewed changes

zhangskz merged commit 3be4648 into protocolbuffers:master Feb 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve more java field accessor name conflicts #8198

Resolve more java field accessor name conflicts #8198

philsttr commented Jan 10, 2021 •

edited

Loading

philsttr commented Feb 5, 2021

JonathanLeech commented Feb 17, 2021

philsttr commented Feb 19, 2021

shaod2 commented Jun 7, 2021

philsttr commented Jun 7, 2021

elharo Nov 8, 2021

philsttr Nov 8, 2021 •

edited

Loading

elharo Nov 8, 2021

philsttr Nov 8, 2021

elharo Nov 8, 2021

philsttr Nov 8, 2021 •

edited

Loading

zhangskz commented Nov 9, 2021

philsttr commented Nov 9, 2021

philsttr commented Jan 12, 2022

zhangskz commented Feb 2, 2022

zhangskz Feb 3, 2022

philsttr Feb 3, 2022

	static String getFieldName(FieldDescriptor fd) {
	String name = (fd.getType() == FieldDescriptor.Type.GROUP)
	? fd.getMessageType().getName()
	: fd.getName();
	String suffix = specialFieldNames.contains(name) ? "__" : "_";
	return snakeCaseToCamelCase(name) + suffix;
	}

Resolve more java field accessor name conflicts #8198

Resolve more java field accessor name conflicts #8198

Conversation

philsttr commented Jan 10, 2021 • edited Loading

philsttr commented Feb 5, 2021

JonathanLeech commented Feb 17, 2021

philsttr commented Feb 19, 2021

shaod2 commented Jun 7, 2021

philsttr commented Jun 7, 2021

elharo Nov 8, 2021

Choose a reason for hiding this comment

philsttr Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

elharo Nov 8, 2021

Choose a reason for hiding this comment

philsttr Nov 8, 2021

Choose a reason for hiding this comment

elharo Nov 8, 2021

Choose a reason for hiding this comment

philsttr Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

zhangskz commented Nov 9, 2021

philsttr commented Nov 9, 2021

philsttr commented Jan 12, 2022

zhangskz commented Feb 2, 2022

zhangskz Feb 3, 2022

Choose a reason for hiding this comment

philsttr Feb 3, 2022

Choose a reason for hiding this comment

philsttr commented Jan 10, 2021 •

edited

Loading

philsttr Nov 8, 2021 •

edited

Loading

philsttr Nov 8, 2021 •

edited

Loading