Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Functions, not sessions in 2.0 #20

Merged
merged 15 commits into from
Nov 16, 2018
Merged

RFC: Functions, not sessions in 2.0 #20

merged 15 commits into from
Nov 16, 2018

Conversation

asimshankar
Copy link
Contributor

@asimshankar asimshankar commented Sep 20, 2018

Review period closes 2018-10-04

Functions, not sessions in 2.0

Status Proposed
Author(s) ashankar@google.com, joshl@google.com
Sponsor apassos@google.com
Updated 2018-09-18

Proposal to make TensorFlow be more "Pythonic" in 2.0. In four bullet points:

  • Encourage the encapsulation of graph computation as Python functions
    (where the graph is executed when the function is invoked, instead of via Session)
  • Align "state" in the TensorFlow runtime (e.g., resource tensors like those that back tf.Variable objects) with state in the Python program (e.g., Python objects corresponding to the runtime state with lifetimes attached to each other).
  • Make it easy to export these encapsulations to a GraphDef+Checkpoint and/or SavedModel.
  • Enable eager execution by default.

@ewilderj ewilderj added RFC: Proposed RFC Design Document 2.0 TensorFlow 2.0 development labels Sep 20, 2018
@ngc92
Copy link

ngc92 commented Sep 20, 2018

For python 3 users, it would be nice if instead of specifying expected types and shapes as an argument to defun, pythons own type annotations would be parsed. So instead of

@tf.defun(input_signature=((tf.float32, [None, 5]))
def f(x):
  pass

we could write

@tf.defun
def f(x: tf.TensorSpec[tf.float32, (None, 5)):
  pass

While this could not be used in tensorflow itself (unless there was a build step that when targeting py2 would convert the latter into the former), but would give projects not concerned with backward compatibility a nicer syntax, and give linters the ability to detect some API misuses before the program is run.

@carlthome
Copy link

Definitely prefer tf.function over tf.defun! Why would we de-fun TensorFlow programs? 🙃

@omoindrot
Copy link

It also makes sense to have @tf.function if we have @tf.method (and not @tf.defmethod).

@seanpmorgan
Copy link
Member

Strongly agree with @tf.function or @tf.func. My first look at tf.contrib.defun was that it signaled some kind of defunct function. Since the decorator will be so commonly used, it's worth making sure the name is intuitive at first glance and doesn't need to be looked up.

Looking at common python decorators, I find the most clear decorators use either:

  • an understood noun (@classmethod, @propery, @app_endpoint())
  • written out action that wraps it (@autograph.convert(), @login_required, @atexit.register )

@seanpmorgan
Copy link
Member

Separately, will we need to handle tf.defun wrapping a function which uses tf.py_func? Seems like if nothing else it may need to throw an error since it wont be serialized in the graph.

Related to tensorflow/tensorflow#10282

@ngc92
Copy link

ngc92 commented Sep 21, 2018

How does this interact with device selection? I.e.

@tf.defun
def f(...):
    ...

with tf.device("/gpu:0"):
  f()
with tf.device("/gpu:1"):
  f()

the different f's have the same signature here, but to execute them on different devices we need to different graphs.

@samjabrahams
Copy link

At a high level, I really like the idea to push towards a more functional/composable API. For me, having to figure out a clean way to reuse components of a graph while passing in different inputs (e.g. dataset inputs during training, then export a graph with placeholders for inference) has always been an annoyance. I will likely have more comments (currently traveling and have limited time to type things up), but I had a couple of concerns I want to bring up:

  • Currently, the Session API takes a few runs to "warm up". That is, the first couple of runs will run slower. I suppose this is more of a question about Eager in general, but are there risks of things always running slowly by not using a session? Or will there actually be less overhead due to not needing to traverse a graph to find what operations to run?
  • I'm concerned about this change being a half-measure. I haven't done user research myself, but I think that the difficulty of new user on-boarding is more than just the fact that the graph is run lazily. It's also that the API is massive and thus prone to make new user's eyes glaze over. Not only that, but there is limited guidance on what idomatic TensorFlow code should use. This leads to confusion and a lot of people who have to learn the same lessons over and over. I believe that one of the largest opportunities of 2.0 is the ability to remove cruft and imagine what TensorFloe could be if it was built from the ground up. I understand that there are needs in terms of enabling backwards compatibility to prevent alienating the userbase, but right now I think this risks adding bloat in the form of "there's yet another way to run a graph!".
    • I'm not saying that this API is exactly what we want; I need more time to digest it. What I would say is that either the scope should be much larger, or it should be reduced so that it doesn't require as many awkward compromises/workarounds in order to kind-of work with existing semantics.
  • I think there should be more weight given to/conversation about diverging the Python vs non-Python APIs. I know that most of the people using C++/Go APIs may be considered "power users", and thus better equipped to deal with additional things to learn, but one of the strengths of TensorFlow is that the semantics between your Python/training code are similar to those in deployment. It's hard to argue against "Python code should be Pythonic", but I wonder if there is a way to keep the APIs unified at the same time.

@asimshankar
Copy link
Contributor Author

@samjabrahams , thanks for the comments. Some responses:

  • "warm up": There should be no risk of running slowly by not using a session. Any warmup (e.g., creating the kernels etc.) may happen on the first time the function is executed, but there should be no overhead beyond that. As you guessed, there may even be less overhead (the ~10us or so sometimes spent in looking up the exact graph to execute based on feeds and fetches).

  • This change is just one of many towards 2.0. You'll see talk about reducing the API surface in other RFCs like RFC: Unify RNN interface #15 RFC: TensorFlow API symbols and namespaces #16 RFC: Sunset tf.contrib #18, and others coming. Is there something specific to this proposal that you think can be improved?

  • Point taken about the other language APIs. I'll follow up on that.

@asimshankar
Copy link
Contributor Author

@ngc92 , re #20 (comment) - yes, different graphs will be created for those. This happens with tf.contrib.eager.defun today, as the context of execution is implicitly a part of the input signature. I'll clarify that in the document too.

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Oct 18, 2018
(In the spirit of tensorflow/community#20)

PiperOrigin-RevId: 217627136
@goldiegadde goldiegadde self-assigned this Oct 30, 2018
@goldiegadde
Copy link
Contributor

asimshankar could you please post the notes from the design review meeting into this thread ?

also, are there any updates to this document?

once both those are resolved we can proceed to merge this.

@asimshankar
Copy link
Contributor Author

@goldiegadde : I updated the document in place based on the meeting notes.

@goldiegadde
Copy link
Contributor

@goldiegadde : I updated the document in place based on the meeting notes.

Thanks @asimshankar . Can you push a version of the doc with Status as "Accepted". We can then proceed to merge it.

@asimshankar
Copy link
Contributor Author

@goldiegadde : Done. Thanks.

@goldiegadde
Copy link
Contributor

Thanks everyone for reviewing the RFC. I am going to merge this request now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0 TensorFlow 2.0 development RFC: Accepted RFC Design Document: Accepted by Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.