Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DataFlow.jl, use IRTools #30

Closed
MikeInnes opened this issue Sep 16, 2019 · 13 comments
Closed

Remove DataFlow.jl, use IRTools #30

MikeInnes opened this issue Sep 16, 2019 · 13 comments

Comments

@MikeInnes
Copy link
Member

MikeInnes commented Sep 16, 2019

IRTools is generally much more powerful and easy to use than DataFlow.jl, and more importantly it's maintained (DataFlow.jl is effectively deprecated at this point). It should be pretty easy to port over the parts of this package that use it.

One thing IRTools does not do as well just yet is printing of expressions, but that's coming along (FluxML/IRTools.jl#25), and since ONNX.jl doesn't have any control flow yet it should be particularly straightforward.

@AriMKatz
Copy link

AriMKatz commented Nov 24, 2019

I'm thinking about refactoring/forking ONNX.jl for stability and export of graphs (ultimately trying to interface with TF light). How would IRtools compare to https://github.com/phipsgabler/DynamicComputationGraphs.jl as foundation for this work? I'm looking at ease of use/ robustness and accessibility of the code. The thought of working with IR feels intimidating and DCG seems like it's a bit closer in abstraction to the destination.

Also would curious to hear if @phipsgabler has any thoughts on this?

@phipsgabler
Copy link

phipsgabler commented Nov 25, 2019

Well, I don't think DCG is stable enough, yet. And I don't really know in what way you are using IR-transforming generated functions here. But maybe there's an intermediate solution, which I have already been speculating about implementing.

Namely, when you think about the IR transformation code of DCG, it's basically a little, very specialized reimplementation of Cassette's overdub -- just with the important difference that I am also "overdubbing" branches (I even have a simplified context system now).

It would certainly be possible to factor out the Cassette-like overloading part cleanly, and implement DCG on top of it. Maybe the same principle works for ONNX as well, but I don't feel qualified to answer that.

@phipsgabler
Copy link

But if I look at it more closely, ONNX only emits IR, and does not need to change existing IR, right? In that case, just push!ing to an IR object is certainly the simplest way to go.

DCG's purpose is just to record computation graphs of calls to existing functions, in form of "extended Wengert lists".

@DrChainsaw
Copy link
Contributor

@AriMKatz
In case you are still pursuing this, I recently had the need to export ONNX models and I whipped this thing together:
https://github.com/DrChainsaw/ONNXmutable.jl

I was also quite intimdated by the output from IRTools and ended up with a non-metaprogramming/non-compilerhacking solution which instead uses multiple-dispatch (abuse) and this worked out for the cases I'm interested in.

@phipsgabler
I guess for the export you want to take some arbitrary julia function and translate it to an ONNX computation graph and this seems to be somewhat in the scope of DCG.

Uhm, also, sorry for derailing topic....

I'm interested in the original issue since my package will break the day dataflow breaks. Is there any plan to do this and if not, will you accept a PR for it (given that it probably will hit almost the whole codebase)?

@AriMKatz
Copy link

AriMKatz commented Mar 1, 2020

@DrChainsaw Unfortunately had to leave Julia world for the project which spurred my comment, so I won't be getting to this anytime soon.... But I'd be excited to see it happen anyway.

@phipsgabler
Copy link

@DrChainsaw But to get the ONNX graph, you'd just need (as far as I understand) to to recurse once over the IR of the function you want to serialize, get the all the original IR down to "primitives", and translate it -- a static computation graph. IRTracker (former DCG) will give you a new function with new IR that tracks the execution of the original function every time it is called -- a dynamic computation graph.

As an analogy: you can just take a TensorFlow graph and analyze/transform its structure, since it's fixed over multiple executions. In contrast, you wouldn't like to do that in a PyTorch model, where you define a new, possibly different graph every time you execute something (which is similar to what IRTracker does).

It's not that I want to avoid work, I just think that my project does not do the things needed for this use case :) But I can quickly throw together a recursive static tracker some time.

@DrChainsaw
Copy link
Contributor

@phipsgabler Understood. It was basically just the tracing functionality along with the context hooks (e.g. an OnnxContext which collects recognized ops in a protobuffer) which seemed applicable. I understand that the tracing is not the main portion of what IRTracker does and this (along with the imaturity warning) is why I chose to roll my own tracing mechanism in ONNXmutable.

FYI, what PyTorch offers in terms of ONNX export for its dynamic part is basically what you describe: Run the graph once and whatever path is taken this time gets recorded. Not much else you can do I guess given that the ONNX graphs are basically static.

This is the same approach I have taken as well: Run the function once with a tracer (AbstractProbe) as input and whatever functions the tracer recognizes gets recorded.

ONNX has some support for control code (loops and conditions) and to make those work in my approach one would need to put those in a special function which is recognized by the tracing mechanism (i.e. function branchcondition(x, condition, path1, path2)). It would be awesome if one could build some IR hacking tool which translates julia control code to the ONNX equivalent. I'm very inexperienced with IR hacking but to me this seems to be identical to language transpiling which afaik is a very hard problem.

@phipsgabler
Copy link

Well, I must take back my hesitations :D

Turns out I had a complete error in reasoning -- it's the recusion that poses a problem. We of course can't track IR recursively in a static form, since the the selection of methods happens only at runtime. I implemented this, which made me realize my mistake. The dynamo records the IR of each method down to predefined primitives. It should work as long as control flow is only used at the same level as primitives -- otherwise, you can track only one path, instead of all possible ones.

I think to really generalize this approach, one would need a concolic execution tool making sure every function is called with enough different arguments so that all branches are taken. If the one-path solution is enough, then IRTracker could in fact work for this purpose (and would be mature enough, I guess, at least to some ad-hoc IR hacking code).

@DrChainsaw
Copy link
Contributor

Thanks alot @phipsgabler

I'm currently pursuing other interests, but I will revisit your example and see if I can refactor my approach, if nothing else then for the sake of learning.

@AriMKatz
Copy link

AriMKatz commented Apr 29, 2020

@DrChainsaw have you seen: https://github.com/MikeInnes/Mjolnir.jl/issues/1

It seems to have the right level of abstraction and functionality for refactoring this and also your computational graph packages. Uses an abstract tracing approach that already can return a graph of operations. It seems much more powerful, general but still user friendly than any other approach as it's based on IR tools but higher level.

I plan to play around with it some more in the coming days.

@DrChainsaw
Copy link
Contributor

@AriMKatz
Thanks for sharing! That tracing example indeed makes it look simple. I still haven't found the time/energy to experiment with phips example above so now there are two things to try out when the current approach has run its course.

It would also be awesome to clear out all those vertices from the NaiveXX packages and just work with generic julia code (although I'm not sure if the level of usefulness of those packages motivates the effort).

Btw, if you are still looking into ONNX export then feel free to take whatever you need from ONNXmutable. That package should anyways have the exporter parts split out into a separate package as they are not dependent on NaiveNASflux.

@MikeInnes
Copy link
Member Author

one would need a concolic execution tool making sure every function is called with enough different arguments so that all branches are taken

This seems like a description of an abstract interpreter; i.e. the tool that both Julia's type inference and Mjolnir use to handle both control flow and recursion.

@ToucheSir
Copy link
Member

Closing this now that we're using https://github.com/dfdx/Ghost.jl. CC @dfdx in case there are any useful insights in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants