Skip to content

Conversation

@yoff
Copy link
Contributor

@yoff yoff commented Jun 12, 2020

  • Pull the identical files into codeql/python/ql/src/semmle/code/python/dataflow, note that we do not currently have a directory called code under semmle, but the other users of the shared libraries use this location.

  • Implement enough stubs to have all files compile while keeping the identical files identical.

  • Record the Python components in codeql/config/identical-files.json.

  • Fill in the stubs with references to the PointsTo library

  • Test that we get desired data flows

  • Switch to new CFG implementation based on type tracking

  • Test that we get desired data flows

@yoff yoff added the Python label Jun 12, 2020
@tausbn
Copy link
Contributor

tausbn commented Jun 12, 2020

Just a few quick thoughts (as you've somewhat preempted the issue I'm writing 🙂):

  • This should perhaps go in experimental until we've vetted it.
  • We'll probably want to set up some automatic synchronisation of these libraries with the main copy. this is what Go does.

@aschackmull
Copy link
Contributor

  • We'll probably want to set up some automatic synchronisation of these libraries with the main copy. this is what Go does

Go needs a separate sync mechanism because it's in a separate repo. For python simply using the identical-files.json-mechanism should be fine (that's how C/C++, Java, and C# sync).

@aschackmull
Copy link
Contributor

  • Pull the identical files into codeql/python/ql/src/semmle/code/python/dataflow, note that we do not currently have a directory called code under semmle, but the other users of the shared libraries use this location.

You can use whatever path prefix you want - you don't have to use the code dir if it doesn't match up with what you currently have.

@yoff
Copy link
Contributor Author

yoff commented Jun 12, 2020

  • This should perhaps go in experimental until we've vetted it.

Ah, I have yet to discover experimental, that sounds very appropriate :-)

yoff added 3 commits June 15, 2020 08:01
Perhaps a not-python-specific version of this
could go into the shared implementation.
@yoff
Copy link
Contributor Author

yoff commented Jun 15, 2020

@aschackmull thanks for the meeting. It has resulted in readme.md, feel free to have a quick look and see if I have misunderstood/misremembered anything.

@intrigus-lgtm
Copy link
Contributor

I find all of this very interesting, so if you can, keep the documentation as detailed as it is now :)

@yoff
Copy link
Contributor Author

yoff commented Jun 16, 2020

@intrigus-lgtm Do you mean readme.md? I am not sure I have written much documentation otherwise (although I have stolen heaps from the other language implementations).

need to actually have `OutNode`s
@intrigus-lgtm
Copy link
Contributor

@intrigus-lgtm Do you mean readme.md? I am not sure I have written much documentation otherwise (although I have stolen heaps from the other language implementations).

Yeah, readme.md and just looking at the separate commits that implement data flow step by step is interesting :)

@yoff
Copy link
Contributor Author

yoff commented Jun 17, 2020

Yeah, readme.md and just looking at the separate commits that implement data flow step by step is interesting :)

Ok, I am afraid I have made some messy progress lately :-) I will try to add my more stable findings to the readme so there will be some sort of thread..

@yoff
Copy link
Contributor Author

yoff commented Jun 30, 2020

As part of my attempt at implementing field flow, I did some refactoring in a recent commit to #3830.

I mainly got rid of Node::asCfgNode and Node::asVariable, replacing them with casts to (new) subclasses.

If this is an improvement, I can do the same here..

@RasmusWL
Copy link
Member

RasmusWL commented Jul 1, 2020

I mainly got rid of Node::asCfgNode and Node::asVariable, replacing them with casts to (new) subclasses.

yep, sounds like the right approach!

I haven't had the time to look everything over yet, but this one sounds like a win 👍

@tausbn
Copy link
Contributor

tausbn commented Jul 2, 2020

I agree with RasmusWL's comment above. Splitting it out into separate classes is very reminiscent of how it's done in the JavaScript libraries, and (I think) greatly increases the readability.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Have some minor suggestions, but otherwise this looks like a good first start 🎉


Some thoughts after reading through the PR:

  • getEnclosingCallable doesn't really fit the way Python programs are executed, since you can execute things in the module scope. I guess the most straightforward way of shoehorning Python into the existing framework would be to define a "fake" callable for the module scope.
  • ReturnKind might be useful for differentiating between the return values of normal functions, generators (using yield), and async function returns.

What happens with named arguments? What does C# do?

In C# using named arguments is always interchangeable with using positional arguments, so the situation is not quite the same. Notice that in Python these are called keyword arguments (as opposed to named arguments in C#). I have been digging into our handling of keyword arguments recently, so would be happy to chat about this 👍

yoff and others added 5 commits July 3, 2020 13:30
…ate.qll

Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
RasmusWL
RasmusWL previously approved these changes Jul 3, 2020
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks good to me now 👍 (but let's get approval from Taus as well before merging)

The part about documentation for toString actually applied to multiple member-predicate definitions, I guess I should have called that out 😅

Lastly I have to admit that I didn't look too closely on the tests and expected output, since it seemed quite cumbersome. I realize I forgot to mention this, and if there is anything in particular you want me to look at, let me know :) Would be great if we could get some way to annotate the test code so it's easy to spot in the .expected files if anything is not working as we expect it to.

@yoff
Copy link
Contributor Author

yoff commented Jul 3, 2020

Ah, I thought I saw another toString and got worried that I had not pulled the merged suggestion, but then I could not find it again :-)

The tests should be annotated, yes, and some are perhaps more debug than tests at the moment.

It might be nice to have tests for all the classes of nodes in the interface, then we can quickly see if we change the meaning of those..

@tausbn
Copy link
Contributor

tausbn commented Jul 3, 2020

I am happy to merge the PR as it is.

Regarding toString, I think we need go through all of our documentation at some point anyway, to ensure consistency.

Regarding test output, I also didn't study it in detail, but did a bit of random sampling here and there. I think improving the test setup for this should be a fairly high-priority item.

@tausbn tausbn merged commit 01c4852 into github:master Jul 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants