Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C#/Java: Content based model generation improvements. #17363

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

michaelnebel
Copy link
Contributor

@michaelnebel michaelnebel commented Sep 3, 2024

If we apply the content based model generator to

  • The Java SDK we get around 80k models.
  • The .NET 8 Runtime we get around 500k models.

Many of these models are undesirable, primarily because of one of the following reasons

  1. Due to the lack of type constraints, we get lots of models as "data" could have originated from lots of different implementations (and conversely for stores).
  2. A model reads from a synthetic field that no other model writes to (and conversely).

With the changes proposed in this PR, applying the content based model generator to

  • The Java SDK we get around 4k models.
  • The .NET 8 Runtime we get around 5k models.

The issues are addressed in the following way:

  1. We make a rough filtering on the APIs: Content based model generation is only considered for an API if the formula below holds for that particular API (the reasoning behind this is also listed as a code comment and the magic numbers 2 and 3 in the formula below are definitely up for debate. If another approach should be taken - e.g. maybe something like filtering APIs based on input and return types then we can discuss this as well). Applying this constraint to the target APIs from the Java SDK removes approximately 2% og the possible target APIs for summary generation, which indicates that this is not an especially invasive constraint.
    • # summaries <= 2 * #number of parameters + 3
  2. We only include a summary for an API that reads or stores into a synthetic field, if there exists a "chain" of APIs with summaries such that data originates from non-synthetic content and targets non-synthetic content. The canonical example for this are get and set methods that uses a private backing field X. In this case we would like to generate the summaries set : Argument[0] -> Argument[this].SyntheticField[X] and get : Argument[this].SyntheticField[X] -> ReturnValue, but we only want to do this in case both methods exists. If only the get method exists, then it retrieves information from a "dead" synthetic field. The synthetic chaining is currently based on access path equality but that restriction could potentially be loosened a bit, but I suspect that this only has limited impact on the generated summaries.

Note that the changes in this PR doesn't change the production version of the model generation and only impacts model generated by the recently introduced --with-contentbased-summaries.
Further changes are still required for this to be production ready, but the changes in this PR are self contained.

@michaelnebel michaelnebel force-pushed the modelgen/fieldbasedimprovements branch 3 times, most recently from b5456a3 to b7d00a4 Compare September 9, 2024 11:11
@michaelnebel michaelnebel force-pushed the modelgen/fieldbasedimprovements branch 4 times, most recently from 4521dfd to 30f3923 Compare September 10, 2024 09:38
@michaelnebel michaelnebel added the no-change-note-required This PR does not need a change note label Sep 10, 2024
@michaelnebel michaelnebel marked this pull request as ready for review September 10, 2024 14:16
@michaelnebel michaelnebel requested review from a team as code owners September 10, 2024 14:16
@michaelnebel
Copy link
Contributor Author

DCA looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant