Skip to content

Go: mass-convert taint-flow models to models-as-data format (with viableParamArgSpecific hook)#12750

Merged
smowton merged 76 commits intogithub:mainfrom
smowton:smowton/admin/add-dataflow-viableParamArgSpecific-hook
Apr 12, 2023
Merged

Go: mass-convert taint-flow models to models-as-data format (with viableParamArgSpecific hook)#12750
smowton merged 76 commits intogithub:mainfrom
smowton:smowton/admin/add-dataflow-viableParamArgSpecific-hook

Conversation

@smowton
Copy link
Contributor

@smowton smowton commented Apr 3, 2023

This is a variation on #12562 that uses a new data-flow library language-specific hook instead of trying to achieve the golang QL lib's prior behaviour by trickery using argument/parameter numbering as that PR does. See #12562 for background and details.

@aschackmull
Copy link
Contributor

filter out receiver argument flow to Golang interface dispatch candidates

I'm confused about why you want to do this. And also wondering why this needs a hook into the dataflow library. I looked briefly at the referenced PR, but didn't find explanation/motivation there either (although I didn't dig deep into that PR and all of its comments).

@smowton
Copy link
Contributor Author

smowton commented Apr 12, 2023

Why: maintain current behaviour, which I didn't write but I guess is there because of flows like

tainted := getInterfaceTypedTaint()
tainted.Method()

...

func (derived *DerivedType) Method() {
  sink(derived.field)
}

So the hypothesis is "if the source ever made a DerivedType then we'd have taint flow", contrast:

tainted := getParameterTypedTaint();
dispatch := getInterfaceTypedThing();
dispatch.Method(tainted)

where at least it isn't the source itself that is hypothesised to create a DerivedType object (it is non-source getInterfaceTypedThing, though).

I agree it's not very principled, and to get full detail we'd have to grill Max on the original decision-making, but my guess is that Go's simple handling of low-confidence dispatch + the absence of data-flow type pruning (not that that would help us in this particular example) meant we saw too many FPs resembling this example.

Why do it via a data-flow lib hook: because this is currently implemented by suppressing argument nodes entirely for interface dispatch receivers, but that no longer works once summaries are implemented by synthetic callables, and therefore we do need an argument node, but that argument node must flow only to the interface summary, not to derived-type synthetic or concrete implementatons. Note this can't be done by eliminating a parameter node either, because the implementations should accept flow from high-confidence dispatch.

@github github deleted a comment Apr 12, 2023
smowton added 11 commits April 12, 2023 14:15
This means that when a function has a real body and a summary (usually because it has a real definition in source, and implements an interface that has a model), two callables are created and dispatch considers both possible paths.

This specifically overcomes the difficulty with ParameterNodes when the real callable, if any, may or may not define an SsaNode, either because the real parameter is unused or because it is anonymous. Now the synthetic callable will always have parameter nodes, while the real one may or may not depending on whether a definition is present and
whether or not it names or uses its parameter.
This is 1x path changes without result changes, and 1x expected change since the Encode function is no longer modelled using TaintTracking::FunctionModel
smowton added 13 commits April 12, 2023 14:19
It appears at present the Go standard library imports the deprecated io/ioutil package internally on some platforms but not others. Therefore I add a test explicitly using it to make the test behave more uniformly.
This reverts commit 12eaedc.

We can't do this now, because there is nothing to guarantee an interface has actually been extracted, and therefore whether a model will get applied. Therefore explicitly modelling methods that may be interface implementations where the interface is in a different package may still make a difference to behaviour.
In some cases multiple return value outputs can be coalesced, and in others we had accidentally conflated two independent flows (e.g. Arg1 -> Arg2 | Arg3 -> Arg4 led to accidentally introducing Arg1 -> Arg4 and Arg3 -> Arg2)
This referred to a private type
These are cheap and frequently-used, and magicking them with respect to `interpretPackage` was yielding expensive, unnecessary regex operations.
This was addressed by adding `getAPackageWithSummarizedCallables`
…interface dispatch candidates.

Other langauges stub the callback.
@smowton smowton added the no-change-note-required This PR does not need a change note label Apr 12, 2023
@smowton smowton force-pushed the smowton/admin/add-dataflow-viableParamArgSpecific-hook branch from 29b75ec to 7eefa43 Compare April 12, 2023 13:34
Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data flow hook LGTM.

These are caused by nodes being hidden by github#12783
@smowton smowton merged commit d049b11 into github:main Apr 12, 2023
owen-mc added a commit to owen-mc/codeql that referenced this pull request Apr 18, 2023
This was introduced in github#12750 but
the relevant tests that should have caught it weren't run.
owen-mc added a commit to owen-mc/codeql that referenced this pull request Apr 18, 2023
This was introduced in github#12750 but
the relevant tests that should have caught it weren't run.
owen-mc added a commit to owen-mc/codeql that referenced this pull request Apr 18, 2023
These were introduced in github#12750 but
the relevant tests that should have caught it weren't run.
owen-mc added a commit to owen-mc/codeql that referenced this pull request Apr 19, 2023
These were introduced in github#12750 but
the relevant tests that should have caught it weren't run.
owen-mc added a commit to owen-mc/codeql that referenced this pull request Apr 19, 2023
These were introduced in github#12750 but
the relevant tests that should have caught it weren't run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants