-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Shared dataflow: Field flow #3830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9604641 to
14a77fa
Compare
Only have one type of callable, but have an extra type of call. A constructor call directs to an init callable (should also handle `call` overrides at some point).
bae12f6 to
73d2d9b
Compare
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a partial review -- I haven't looked at the test files yet.
I figured this was enough to start with, however.
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
| /** A data flow node which should have an associated post-update node. */ | ||
| abstract class PreUpdateNode extends Node { } | ||
| /** A data flow node for which we should synthesise an associated pre-update node. */ | ||
| abstract class NeedsSyntheticPreUpdateNode extends Node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not terribly happy about the Needs prefix, but I can't immediately think of a better way of writing this. I'm mostly bringing this up here in case others have ideas for a better name (or think the current name is just fine).
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPublic.qll
Outdated
Show resolved
Hide resolved
Co-authored-by: Taus <tausbn@github.com>
|
It may have been only a partial review, but the code is already much nicer, thanks! |
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more bits and bobs to address. Overall this looks really nice!
I've also looked at the tests a bit, and they seem sensible to me, although there is a lot of noise in there. I don't know how useful it is to have tests recording, say, the local flow in these cases. Perhaps we would be better off using the inline test expectations? At present, it's a lot of jumping back and forth to see what nodes go where.
python/ql/test/experimental/dataflow/coverage/argumentRouting1.ql
Outdated
Show resolved
Hide resolved
| | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:73:18:73:23 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> | | ||
| | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:80:20:80:25 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> | | ||
| | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:81:20:81:25 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> | | ||
| | datamodel.py:159:6:159:17 | ControlFlowNode for Attribute | datamodel.py:13:10:13:17 | ControlFlowNode for Str | datamodel.py:159:6:159:17 | ControlFlowNode for Attribute | <message> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was puzzled by this line until I realised that the query selects sink.getNode(), source, sink. It looks to me as if the latter sink contains the same information as sink.getNode(), so why include the sink twice? (Also, what's up with the constant string <message>?)
(I realise this is probably to make it yield nicer output for the extension, but I don't feel like this is a good idea for test files. After all, the extension (sadly) doesn't provide an interface for .expected files.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This format is simply taken from the help. I never returned to it to make it fit for our purpose.
It is nice that you can run the query in the extension and inspect the paths in the path viewer. When I am looking at the .expected-files, I just look at the two first columns. I guess "<message>" should at least be changed to "Flow found".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is nice that you can run the query in the extension and inspect the paths in the path viewer.
Right, but this strikes me more as a "debug" query that would be nice to have around for debugging. I think for test output that is primarily meant to be read as a (diffed) text file, it's useful to keep the test output as succinct as possible (without ruining the purpose of the test).
If we had an easy way of accessing the underlying database for a particular test, my opinion would be different. Currently, if you want to stay in VSCode, the best way is to make the test fail, import the resulting database, and then run the same query again. This is tedious and brittle. 🙁
When I am looking at the .expected-files, I just look at the two first columns.
Oh, interesting! I guess you're used to seeing flow backwards (sink, then source), then. 🤔
python/ql/test/experimental/dataflow/fieldflow/dataflowExplore.ql
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/src/experimental/dataflow/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
Co-authored-by: Taus <tausbn@github.com>
I agree, many of these tests have a debug flavour and should probably be removed or relegated to |
The extra hist in `test.py` seen in `globalStep.expected` are due to the removal of manual filtering code. (That code was from when dataflow had many strange things in it.)
|
Good suggestions, thanks. I made the flow-step tests inclusive towards all test files in the directory. I thus postponed the work of cleaning up which test files should actually be there, deciding it out of scope for this PR. I also left the path queries in their format, changing only the message. |
|
I updated Dataflow ready for use #143 to reflect the need to revisit tests. |
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff! I am happy to merge this once the tests pass.
I agree that cleaning up the tests is better done in a different PR. 👍
This is a continuation of the integration of the shared data flow library into the Python library (#3701).
This PR focuses on field flow.