Skip to content

C++: Update static call target resolution semantics in dataflow #19500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MathiasVP
Copy link
Contributor

Up until now C/C++ has only had the following two kinds of functions:

  • Functions represented by source code in the database
  • Manually created summaries

And with only those two cases it was easy to choose which function to dispatch to in dataflow: If there was a summary always use that one (since it was always manually created and thus it probably captured all the desired behavior).

However, now that we're adding lots of generated models we need to rethink this since generated models are often less precise. So now we have:

  • Functions represented by source code in the database
  • Manually created summaries
  • Autogenerated summaries

This PR changes the semantics to the following:

  • If there is a manually created summary, always use that one.
  • If there is no manually created summary, but there is a source definition, use the source definition.
  • If there is a generated summary and no source definition, use the generated summary.

This PR is likely to be a no-op right now since we don't actually have any generated summaries. However, I've added a testcase to check that we capture the right behavior.

Commit-by-commit review recommended

@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label May 15, 2025
@github-actions github-actions bot added the C++ label May 15, 2025
@MathiasVP MathiasVP marked this pull request as ready for review May 15, 2025 16:57
@Copilot Copilot AI review requested due to automatic review settings May 15, 2025 16:57
@MathiasVP MathiasVP requested a review from a team as a code owner May 15, 2025 16:57
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the static call-target resolution in C/C++ dataflow so that manual summaries take priority, then source definitions, then generated summaries. It adds new tests and model entries to validate this order and revises the core DataFlow logic accordingly.

  • Add new test functions and calls in test.cpp to cover manual vs. generated summaries with and without bodies
  • Extend steps.ext.yml and flow.ext.yml with manual and generated model entries
  • Update expected output files to reflect the new dispatch order
  • Change getStaticCallTarget() in DataFlowPrivate.qll to implement the prioritized selection

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cpp/ql/test/library-tests/dataflow/external-models/test.cpp Add declarations and test calls for manual & generated summaries
cpp/ql/test/library-tests/dataflow/external-models/steps.ext.yml Register new manual and generated summaries for test functions
cpp/ql/test/library-tests/dataflow/external-models/flow.ext.yml Register new manual and generated flow models
cpp/ql/test/library-tests/dataflow/external-models/*.expected Update expected sources, sinks, and flows for new resolution order
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll Implement prioritized dispatch in getStaticCallTarget()
Comments suppressed due to low confidence (1)

cpp/ql/test/library-tests/dataflow/external-models/test.cpp:17

  • [nitpick] Variable names like y, z, y2, y3 can be ambiguous; consider using descriptive names such as manualResult or generatedResult for clarity.
int y = ymlStepManual(x);

Comment on lines +1169 to +1173
* - If there is a manual summary then we always use the manual summary.
* - If there is a source callable and we only have generated summaries
* we use the source callable.
* - If there is no source callable then we use the summary regardless of
* whether is it manual or generated.
Copy link
Preview

Copilot AI May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider rephrasing for clarity, e.g. 'If there is a source callable and no manual summary exists, use the source callable.'

Suggested change
* - If there is a manual summary then we always use the manual summary.
* - If there is a source callable and we only have generated summaries
* we use the source callable.
* - If there is no source callable then we use the summary regardless of
* whether is it manual or generated.
* - If there is a manual summary, then we always use the manual summary.
* - If there is a source callable and no manual summary exists, use the source callable.
* - If there is no source callable, then we use the summary regardless of
* whether it is manual or generated.

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MathiasVP MathiasVP merged commit cadcb20 into github:main May 16, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants