Python: Shared dataflow: Field flow #3830

yoff · 2020-06-29T05:41:19Z

This is a continuation of the integration of the shared data flow library into the Python library (#3701).
This PR focuses on field flow.

Only have one type of callable, but have an extra type of call. A constructor call directs to an init callable (should also handle `call` overrides at some point).

…ieldFlow

tausbn

This is just a partial review -- I haven't looked at the test files yet.
I figured this was enough to start with, however.

python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll

tausbn · 2020-09-22T13:54:42Z

python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll

-/** A data flow node which should have an associated post-update node. */
-abstract class PreUpdateNode extends Node { }
+/** A data flow node for which we should synthesise an associated pre-update node. */
+abstract class NeedsSyntheticPreUpdateNode extends Node {


I'm not terribly happy about the Needs prefix, but I can't immediately think of a better way of writing this. I'm mostly bringing this up here in case others have ideas for a better name (or think the current name is just fine).

python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll

python/ql/src/experimental/dataflow/internal/DataFlowPublic.qll

Co-authored-by: Taus <tausbn@github.com>

yoff · 2020-09-22T21:50:16Z

It may have been only a partial review, but the code is already much nicer, thanks!

tausbn

A few more bits and bobs to address. Overall this looks really nice!
I've also looked at the tests a bit, and they seem sensible to me, although there is a lot of noise in there. I don't know how useful it is to have tests recording, say, the local flow in these cases. Perhaps we would be better off using the inline test expectations? At present, it's a lot of jumping back and forth to see what nodes go where.

python/ql/test/experimental/dataflow/coverage/argumentRouting1.ql

tausbn · 2020-09-22T14:52:31Z

python/ql/test/experimental/dataflow/coverage/dataflow.expected

 | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:73:18:73:23 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> |
 | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:80:20:80:25 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> |
 | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:81:20:81:25 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> |
+| datamodel.py:159:6:159:17 | ControlFlowNode for Attribute | datamodel.py:13:10:13:17 | ControlFlowNode for Str | datamodel.py:159:6:159:17 | ControlFlowNode for Attribute | <message> |


I was puzzled by this line until I realised that the query selects sink.getNode(), source, sink. It looks to me as if the latter sink contains the same information as sink.getNode(), so why include the sink twice? (Also, what's up with the constant string <message>?)

(I realise this is probably to make it yield nicer output for the extension, but I don't feel like this is a good idea for test files. After all, the extension (sadly) doesn't provide an interface for .expected files.)

This format is simply taken from the help. I never returned to it to make it fit for our purpose.

It is nice that you can run the query in the extension and inspect the paths in the path viewer. When I am looking at the .expected-files, I just look at the two first columns. I guess "<message>" should at least be changed to "Flow found".

It is nice that you can run the query in the extension and inspect the paths in the path viewer.

Right, but this strikes me more as a "debug" query that would be nice to have around for debugging. I think for test output that is primarily meant to be read as a (diffed) text file, it's useful to keep the test output as succinct as possible (without ruining the purpose of the test).

If we had an easy way of accessing the underlying database for a particular test, my opinion would be different. Currently, if you want to stay in VSCode, the best way is to make the test fail, import the resulting database, and then run the same query again. This is tedious and brittle. 🙁

When I am looking at the .expected-files, I just look at the two first columns.

Oh, interesting! I guess you're used to seeing flow backwards (sink, then source), then. 🤔

python/ql/test/experimental/dataflow/fieldflow/allLocalFlow.ql

python/ql/test/experimental/dataflow/fieldflow/dataflowExplore.ql

python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll

python/ql/src/experimental/dataflow/internal/DataFlowPublic.qll

python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll

Co-authored-by: Taus <tausbn@github.com>

yoff · 2020-09-25T11:02:43Z

there is a lot of noise in there. I don't know how useful it is to have tests recording, say, the local flow in these cases. Perhaps we would be better off using the inline test expectations? At present, it's a lot of jumping back and forth to see what nodes go where.

I agree, many of these tests have a debug flavour and should probably be removed or relegated to meta/debug. I do like the idea that changes to the underpinning predicates will be flagged up, even if the effect is not visible or obvious at the level of flow paths. But the current output is not conveniently readable, and neither are diffs unless they are small

The extra hist in `test.py` seen in `globalStep.expected` are due to the removal of manual filtering code. (That code was from when dataflow had many strange things in it.)

yoff · 2020-09-25T11:39:46Z

Good suggestions, thanks. I made the flow-step tests inclusive towards all test files in the directory. I thus postponed the work of cleaning up which test files should actually be there, deciding it out of scope for this PR. I also left the path queries in their format, changing only the message.

yoff · 2020-09-25T11:48:17Z

I updated Dataflow ready for use #143 to reflect the need to revisit tests.

tausbn

Awesome stuff! I am happy to merge this once the tests pass.

I agree that cleaning up the tests is better done in a different PR. 👍

yoff mentioned this pull request Jun 30, 2020

Python: Start using the shared data flow libraries #3701

Merged

7 tasks

tausbn added the Python label Jul 3, 2020

RasmusWL changed the title ~~Shared dataflow: Field flow~~ Python: Shared dataflow: Field flow Jul 6, 2020

adityasharad changed the base branch from master to main August 14, 2020 18:34

yoff force-pushed the SharedDataflow_FieldFlow branch 6 times, most recently from 9604641 to 14a77fa Compare September 17, 2020 15:18

yoff marked this pull request as ready for review September 18, 2020 20:11

yoff requested a review from a team as a code owner September 18, 2020 20:11

yoff added 8 commits September 19, 2020 22:27

Python: Tests for field flow

a2d006f

Python: Implement field-stores, -reads, and -content

27b2556

Python: Add malloc nodes

aa28167

Python: Add explorative test

e50b665

Python: Add missing .expected file

e132361

Python: update test expectations

b2f1c43

Python: class callable -> class call

9aa0cfb

Only have one type of callable, but have an extra type of call. A constructor call directs to an init callable (should also handle `call` overrides at some point).

Python: Make constructor calls post-update nodes

73d2d9b

yoff force-pushed the SharedDataflow_FieldFlow branch from bae12f6 to 73d2d9b Compare September 21, 2020 15:32

yoff added 4 commits September 21, 2020 17:44

Python: Update test annotation

08b51e6

Merge branch 'main' of github.com:github/codeql into SharedDataflow_F…

3e2331c

…ieldFlow

Python: Fixup comments after merge

b065d87

Python: Fix compilation error

131cf8d

tausbn requested changes Sep 22, 2020

View reviewed changes

yoff and others added 2 commits September 22, 2020 22:33

Apply suggestions from code review

aece0ff

Co-authored-by: Taus <tausbn@github.com>

Python: Address review comments

ef4461c

yoff requested a review from tausbn September 23, 2020 07:39

tausbn requested changes Sep 24, 2020

View reviewed changes

Apply suggestions from code review

c56ff98

Co-authored-by: Taus <tausbn@github.com>

yoff added 2 commits September 25, 2020 13:35

Python: Modify tests based on review

88bba46

The extra hist in `test.py` seen in `globalStep.expected` are due to the removal of manual filtering code. (That code was from when dataflow had many strange things in it.)

Python: fix QL format

4621e6d

yoff requested a review from tausbn September 25, 2020 11:39

tausbn approved these changes Sep 25, 2020

View reviewed changes

tausbn merged commit fc84286 into github:main Sep 25, 2020

Python: Shared dataflow: Field flow #3830

Python: Shared dataflow: Field flow #3830

Uh oh!

Conversation

yoff commented Jun 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tausbn Sep 22, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yoff commented Sep 22, 2020

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tausbn Sep 22, 2020

Choose a reason for hiding this comment

Uh oh!

yoff Sep 25, 2020

Choose a reason for hiding this comment

Uh oh!

tausbn Sep 25, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yoff commented Sep 25, 2020

Uh oh!

yoff commented Sep 25, 2020

Uh oh!

yoff commented Sep 25, 2020

Uh oh!

tausbn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yoff commented Jun 29, 2020 •

edited

Loading