Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: extract and expose struct tags, interface method IDs #17357

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Sep 3, 2024

This enables us to distinguish all database types in QL. Previously structs with the same field names and types but differing tags, and interface types with matching method names and at least one non-exported method but declared in differing packages, were impossible or only sometimes possible to distinguish in QL. With this change these types can be
distinguished, as well as permitting queries to examine struct field tags, e.g. to read JSON field name associations.

This is a pre-requisite to (some approaches to) dealing with Go 1.23's more direct exposure of type aliases, since it enables us to distinguish all types that are distinct in the database in QL, and therefore implement up-to-aliasing type matching, known in the Go spec as identical types.

This enables us to distinguish all database types in QL. Previously structs with the same field names and types but differing tags, and interface types with matching method names and at least one non-exported method but declared in differing packages, were impossible or only sometimes possible to distinguish in QL. With this change these types can be distinguished, as well as permitting queries to examine struct field tags, e.g. to read JSON field name associations.
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer review to follow.

Comment on lines +4 to +5
* Added methods `StructTag.hasOwnFieldWithTag` and `Field.getTag`, which enable CodeQL queries to examine struct field tags.
* Added method `InterfaceType.getMethodTypeById`, which enables CodeQL queries to distinguish interfaces with matching non-exported method names that are declared in different packages, and are therefore incompatible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Added methods `StructTag.hasOwnFieldWithTag` and `Field.getTag`, which enable CodeQL queries to examine struct field tags.
* Added method `InterfaceType.getMethodTypeById`, which enables CodeQL queries to distinguish interfaces with matching non-exported method names that are declared in different packages, and are therefore incompatible.
* Added member predicates `StructTag.hasOwnFieldWithTag` and `Field.getTag`, which enable CodeQL queries to examine struct field tags.
* Added member predicate `InterfaceType.getMethodTypeById`, which enables CodeQL queries to distinguish interfaces with matching non-exported method names that are declared in different packages, and are therefore incompatible.

@@ -1150,6 +1150,20 @@ var ComponentTypesTable = NewTable("component_types",
EntityColumn(TypeType, "tp"),
).KeySet("parent", "index")

// ComponentTagsTable is the table associating composite types with their component types' tags
var ComponentTagsTable = NewTable("component_tags",
EntityColumn(CompositeType, "parent"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EntityColumn(CompositeType, "parent"),
EntityColumn(StructType, "parent"),

Tags only exist on fields of structs, so I don't see why we should make this table more general than that. (Various names should change as well, of course.)

Copy link
Member

@mbg mbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks good! Thank you for improving this and moving it out into it's own PR. I just have a few suggestions in addition to @owen-mc's comments, which also make sense.

Also, to sanity check: in the PR description you discuss that part of the motivation here is to be able to distinguish types better. That makes sense and I found the relevant part of the Go specification for this in https://go.dev/ref/spec#Type_identity. For structs:

Two struct types are identical if they have the same sequence of fields, and if corresponding fields have the same names, and identical types, and identical tags. Non-exported field names from different packages are always different.

For interfaces:

Two interface types are identical if they define the same type set.

Looking over the tests here, I can see that the tests exercise the new functionality and that seems to behave as expected. Do the tests cover the new ability to decide (in)equality that you are hoping for? Could you comment on how the tests cover that?

Comment on lines +1566 to +1567
// meth.Id() will be equal to meth.Name() for an exported method, or
// packge-qualified otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: My understanding is that, in Go, exported methods always start with an upper-case character. Why did you go for this meth.Id() != meth.Name() check rather than checking the first character of the name? Did that not work or do you think this method is more reliable or has other advantages?

StringColumn("tag"),
).KeySet("parent", "index")

// InterfacePrivateMethodIdsTable is the table associating interface types with their private method ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this table includes entries for methods (by index in the interface) which are private. If so, I don't think the comment makes that very clear. The current comment suggests that every interface type has a private method id. How about:

Suggested change
// InterfacePrivateMethodIdsTable is the table associating interface types with their private method ids
// InterfacePrivateMethodIdsTable is the table associating interface types with the indices of their private methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should be clearer. This is slightly more informative:

Suggested change
// InterfacePrivateMethodIdsTable is the table associating interface types with their private method ids
// InterfacePrivateMethodIdsTable is the table associating interface types with the indices and ids of their private methods.

* different packages defines two distinct types, but they appear identical according to
* `getMethodType`.
*/
Type getMethodTypeById(string id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably more a complaint about Go's terminology than anything else, but id doesn't seem like the best choice of name for this.

Comment on lines +766 to +768
* For example, `interface { Exported() int; notExported() int }` declared in two
* different packages defines two distinct types, but they appear identical according to
* `getMethodType`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to extend the example to include the ids of the methods (with sample packages) to show explicitly what you mean in the previous paragraph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file be called InterfaceMethodIds.ql?

@owen-mc
Copy link
Contributor

owen-mc commented Sep 3, 2024

Tests failing:

The following files need to be reformatted using gofmt or have compilation errors:
./ql/test/library-tests/semmle/go/Types/pkg2/tst.go
Error: make: *** [Makefile:15: check-formatting] Error 1
./ql/test/library-tests/semmle/go/Types/struct_tags.go
Error: Process completed with exit code 2.

@smowton smowton changed the base branch from rc/3.15 to main September 3, 2024 15:45
@smowton
Copy link
Contributor Author

smowton commented Sep 3, 2024

Retargeted this against main because we're currently not expecting to need this for rc/3.15 if we go for a simpler alias-erasing approach in the interim

Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work spotting these problems and fixing them. A few small suggestions for improvement.

Also, shouldn't the label for struct types include the tag of each field? Since differing tags make it a different struct type? Ideally this would have a test as well. This could be done as a follow-up, but it also fits in pretty naturally with this PR.

@@ -1547,6 +1547,7 @@ func extractType(tw *trap.Writer, tp types.Type) trap.Label {
name = ""
}
extractComponentType(tw, lbl, i, name, field.Type())
dbscheme.ComponentTagsTable.Emit(tw, lbl, i, tp.Tag(i))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be slightly better to only emit this line when the tag is non-empty. This would save some space, at the cost of some slightly more complicated QL to access the tags. (Note that the spec says that empty tag is equivalent to no tag at all.) It would also make the upgrade script simpler - no need to make a table and fill it with (i, "") relations.

StringColumn("tag"),
).KeySet("parent", "index")

// InterfacePrivateMethodIdsTable is the table associating interface types with their private method ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should be clearer. This is slightly more informative:

Suggested change
// InterfacePrivateMethodIdsTable is the table associating interface types with their private method ids
// InterfacePrivateMethodIdsTable is the table associating interface types with the indices and ids of their private methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants