C++: Update static call target resolution semantics in dataflow #19500

MathiasVP · 2025-05-15T14:39:26Z

Up until now C/C++ has only had the following two kinds of functions:

Functions represented by source code in the database
Manually created summaries

And with only those two cases it was easy to choose which function to dispatch to in dataflow: If there was a summary always use that one (since it was always manually created and thus it probably captured all the desired behavior).

However, now that we're adding lots of generated models we need to rethink this since generated models are often less precise. So now we have:

Functions represented by source code in the database
Manually created summaries
Autogenerated summaries

This PR changes the semantics to the following:

If there is a manually created summary, always use that one.
If there is no manually created summary, but there is a source definition, use the source definition.
If there is a generated summary and no source definition, use the generated summary.

This PR is likely to be a no-op right now since we don't actually have any generated summaries. However, I've added a testcase to check that we capture the right behavior.

Commit-by-commit review recommended

… when there are model-generated summaries AND source definitions.

Copilot

Pull Request Overview

This PR updates the static call-target resolution in C/C++ dataflow so that manual summaries take priority, then source definitions, then generated summaries. It adds new tests and model entries to validate this order and revises the core DataFlow logic accordingly.

Add new test functions and calls in test.cpp to cover manual vs. generated summaries with and without bodies
Extend steps.ext.yml and flow.ext.yml with manual and generated model entries
Update expected output files to reflect the new dispatch order
Change getStaticCallTarget() in DataFlowPrivate.qll to implement the prioritized selection

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
cpp/ql/test/library-tests/dataflow/external-models/test.cpp	Add declarations and test calls for manual & generated summaries
cpp/ql/test/library-tests/dataflow/external-models/steps.ext.yml	Register new manual and generated summaries for test functions
cpp/ql/test/library-tests/dataflow/external-models/flow.ext.yml	Register new manual and generated flow models
cpp/ql/test/library-tests/dataflow/external-models/*.expected	Update expected sources, sinks, and flows for new resolution order
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll	Implement prioritized dispatch in `getStaticCallTarget()`

Comments suppressed due to low confidence (1)

cpp/ql/test/library-tests/dataflow/external-models/test.cpp:17

[nitpick] Variable names like y, z, y2, y3 can be ambiguous; consider using descriptive names such as manualResult or generatedResult for clarity.

int y = ymlStepManual(x);

Copilot · 2025-05-15T17:00:08Z

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll

+   * - If there is a manual summary then we always use the manual summary.
+   * - If there is a source callable and we only have generated summaries
+   * we use the source callable.
+   * - If there is no source callable then we use the summary regardless of
+   * whether is it manual or generated.


[nitpick] Consider rephrasing for clarity, e.g. 'If there is a source callable and no manual summary exists, use the source callable.'

Suggested change

* - If there is a manual summary then we always use the manual summary.

* - If there is a source callable and we only have generated summaries

* we use the source callable.

* - If there is no source callable then we use the summary regardless of

* whether is it manual or generated.

* - If there is a manual summary, then we always use the manual summary.

* - If there is a source callable and no manual summary exists, use the source callable.

* - If there is no source callable, then we use the summary regardless of

* whether it is manual or generated.

cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll

jketema

LGTM

MathiasVP added 3 commits May 15, 2025 15:05

C++: Add tests to exercise the upcoming behavior of function dispatch…

c6df950

… when there are model-generated summaries AND source definitions.

C++: Update semantics of picking the static call target in dataflow.

69a1a87

C++: Accept test changes.

e75dcd2

MathiasVP added the no-change-note-required This PR does not need a change note label May 15, 2025

github-actions bot added the C++ label May 15, 2025

MathiasVP marked this pull request as ready for review May 15, 2025 16:57

Copilot AI review requested due to automatic review settings May 15, 2025 16:57

MathiasVP requested a review from a team as a code owner May 15, 2025 16:57

Copilot AI reviewed May 15, 2025

View reviewed changes

C++: Small refactoring.

d31ddad

jketema approved these changes May 16, 2025

View reviewed changes

MathiasVP merged commit cadcb20 into github:main May 16, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Update static call target resolution semantics in dataflow #19500

C++: Update static call target resolution semantics in dataflow #19500

Uh oh!

MathiasVP commented May 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 15, 2025

Uh oh!

Uh oh!

jketema left a comment

Uh oh!

Uh oh!

Uh oh!

C++: Update static call target resolution semantics in dataflow #19500

C++: Update static call target resolution semantics in dataflow #19500

Uh oh!

Conversation

MathiasVP commented May 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jketema left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!