Skip to content

Conversation

@yoff
Copy link
Contributor

@yoff yoff commented Jun 29, 2020

This is a continuation of the integration of the shared data flow library into the Python library (#3701).
This PR focuses on field flow.

@tausbn tausbn added the Python label Jul 3, 2020
@RasmusWL RasmusWL changed the title Shared dataflow: Field flow Python: Shared dataflow: Field flow Jul 6, 2020
@adityasharad adityasharad changed the base branch from master to main August 14, 2020 18:34
@yoff yoff force-pushed the SharedDataflow_FieldFlow branch 6 times, most recently from 9604641 to 14a77fa Compare September 17, 2020 15:18
@yoff yoff marked this pull request as ready for review September 18, 2020 20:11
@yoff yoff requested a review from a team as a code owner September 18, 2020 20:11
@yoff yoff force-pushed the SharedDataflow_FieldFlow branch from bae12f6 to 73d2d9b Compare September 21, 2020 15:32
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a partial review -- I haven't looked at the test files yet.
I figured this was enough to start with, however.

/** A data flow node which should have an associated post-update node. */
abstract class PreUpdateNode extends Node { }
/** A data flow node for which we should synthesise an associated pre-update node. */
abstract class NeedsSyntheticPreUpdateNode extends Node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not terribly happy about the Needs prefix, but I can't immediately think of a better way of writing this. I'm mostly bringing this up here in case others have ideas for a better name (or think the current name is just fine).

yoff and others added 2 commits September 22, 2020 22:33
@yoff
Copy link
Contributor Author

yoff commented Sep 22, 2020

It may have been only a partial review, but the code is already much nicer, thanks!

@yoff yoff requested a review from tausbn September 23, 2020 07:39
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more bits and bobs to address. Overall this looks really nice!
I've also looked at the tests a bit, and they seem sensible to me, although there is a lot of noise in there. I don't know how useful it is to have tests recording, say, the local flow in these cases. Perhaps we would be better off using the inline test expectations? At present, it's a lot of jumping back and forth to see what nodes go where.

| datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:73:18:73:23 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> |
| datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:80:20:80:25 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> |
| datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | datamodel.py:81:20:81:25 | ControlFlowNode for SOURCE | datamodel.py:81:6:81:26 | ControlFlowNode for Attribute() | <message> |
| datamodel.py:159:6:159:17 | ControlFlowNode for Attribute | datamodel.py:13:10:13:17 | ControlFlowNode for Str | datamodel.py:159:6:159:17 | ControlFlowNode for Attribute | <message> |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was puzzled by this line until I realised that the query selects sink.getNode(), source, sink. It looks to me as if the latter sink contains the same information as sink.getNode(), so why include the sink twice? (Also, what's up with the constant string <message>?)

(I realise this is probably to make it yield nicer output for the extension, but I don't feel like this is a good idea for test files. After all, the extension (sadly) doesn't provide an interface for .expected files.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format is simply taken from the help. I never returned to it to make it fit for our purpose.

It is nice that you can run the query in the extension and inspect the paths in the path viewer. When I am looking at the .expected-files, I just look at the two first columns. I guess "<message>" should at least be changed to "Flow found".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice that you can run the query in the extension and inspect the paths in the path viewer.

Right, but this strikes me more as a "debug" query that would be nice to have around for debugging. I think for test output that is primarily meant to be read as a (diffed) text file, it's useful to keep the test output as succinct as possible (without ruining the purpose of the test).

If we had an easy way of accessing the underlying database for a particular test, my opinion would be different. Currently, if you want to stay in VSCode, the best way is to make the test fail, import the resulting database, and then run the same query again. This is tedious and brittle. 🙁

When I am looking at the .expected-files, I just look at the two first columns.

Oh, interesting! I guess you're used to seeing flow backwards (sink, then source), then. 🤔

Co-authored-by: Taus <tausbn@github.com>
@yoff
Copy link
Contributor Author

yoff commented Sep 25, 2020

there is a lot of noise in there. I don't know how useful it is to have tests recording, say, the local flow in these cases. Perhaps we would be better off using the inline test expectations? At present, it's a lot of jumping back and forth to see what nodes go where.

I agree, many of these tests have a debug flavour and should probably be removed or relegated to meta/debug. I do like the idea that changes to the underpinning predicates will be flagged up, even if the effect is not visible or obvious at the level of flow paths. But the current output is not conveniently readable, and neither are diffs unless they are small

The extra hist in `test.py` seen in `globalStep.expected`
are due to the removal of manual filtering code.
(That code was from when dataflow had many strange things in it.)
@yoff
Copy link
Contributor Author

yoff commented Sep 25, 2020

Good suggestions, thanks. I made the flow-step tests inclusive towards all test files in the directory. I thus postponed the work of cleaning up which test files should actually be there, deciding it out of scope for this PR. I also left the path queries in their format, changing only the message.

@yoff yoff requested a review from tausbn September 25, 2020 11:39
@yoff
Copy link
Contributor Author

yoff commented Sep 25, 2020

I updated Dataflow ready for use #143 to reflect the need to revisit tests.

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff! I am happy to merge this once the tests pass.

I agree that cleaning up the tests is better done in a different PR. 👍

@tausbn tausbn merged commit fc84286 into github:main Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants