Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Sunset tf.contrib #18

Merged
merged 5 commits into from
Dec 17, 2018
Merged

RFC: Sunset tf.contrib #18

merged 5 commits into from
Dec 17, 2018

Conversation

martinwicke
Copy link
Member

@martinwicke martinwicke commented Sep 7, 2018

The feedback phase will be open for four weeks until 2018-10-01.

Sunsetting tf.contrib

Status Proposed
Author(s) Martin Wicke (wicke@tensorflow.org)
Sponsor Edd Wilder-James (ewj@tensorflow.org)
Updated 2018-09-07

Summary

The tf.contrib module plays several important roles in the TensorFlow
ecosystem: It has made it easy for members of the community to contribute to
TensorFlow, and have their contributions tested and maintained. It is also used
as a staging ground to test early-stage and experimental features in TensorFlow.

However, as the community has grown, the lack of scalability of the current
approach for maintaining and supporting tf.contrib has become apparent.

This RFC is a proposal to sunset the present tf.contrib, and replace its
important functions with more maintainable alternatives. Note that it also
affects some non-contrib code which is not part of the tensorflow module.

@martinwicke martinwicke added RFC: Proposed RFC Design Document 2.0 TensorFlow 2.0 development labels Sep 7, 2018
@seanpmorgan
Copy link
Member

The removal of tech debt is palpable 😄
As far as new names for tensorflow/contrib, some ideas:

  • tensorflow/extensions
  • tensorflow/supplemental
  • tensorflow/complementary

| hvx | satok16 | delete (redundant with NNAPI) |
| image | | partial move to tensorflow/contrib? |
| input_pipeline | rohan100jain | delete |
| integrate | shoyer *mcoram | delete? |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also put tf.contrib.integrate down for moving to tensorflow/scientific.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my bad. Yes.

@bhack
Copy link
Contributor

bhack commented Sep 7, 2018

Can we maintain something like a pypi classifier for every module in contrib and ownership like in Helm/Charts to quickly expose the maturity level and notify manteiners MIA?

@martinwicke
Copy link
Member Author

Well, given that contrib is going away, I'm not sure we want to do this for contrib. But we are thinking about something like this for different projects in the tensorflow org, @ewilderj FYI.

@bhack
Copy link
Contributor

bhack commented Sep 7, 2018

Yeah I meant for contrib projects moving in the separate repository (so the ones not moving in the core or deleted).

@martinwicke
Copy link
Member Author

martinwicke commented Sep 8, 2018 via email

| timeseries | gsundeep karmel | move to tensorflow/estimator |
| tpu | saeta | move to core |
| training | ebrevdo sguada joel-shor | |
| util | | delete (no owner), or move to tools |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one function in this project that proved to be particularly useful for static optimizations on libraries built on top on TensorFlow, and as far as I know there is no alternative: tf.contrib.util.constant_value.

Considering the future of this project is undecided yet, would it be possible to ensure that at least this particular function or an alternative to it are available in TensorFlow 2.0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how interesting this function still is with eager execution. @mrry FYI.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking precisely not for eager execution, but for graph mode. In our use cases eager execution is simply not useful at all. We need a graph we can export as a file and run in other platforms, like for example game engines in real time on Windows using the C or C++ APIs. Reimplementing the operations there even if we had the weights is not an option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eager execution in 2.0 still allows you to have graphs (though we'd prefer you think of them as functions), and those can still be exported, executed remotely, etc. constant_value is not particularly useful with eager execution since you can simply look at any tensor (its value will be there), or run any function (which will produce a value).

Or is the capability you need the implicit estimate of whether the value can be computed cheaply?

@asimshankar

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to read about graphs in 2.0. If you happen to have any links where I can learn more about this new design, I'll be very happy to take a look so I can make my comments more useful and targeted in the future.

Answering your question in TF 1.x terms, we need the ability to quickly and statically access the value of constant tensors and statically known shapes without involving a session or a tensor evaluation. I'm not fully sure how this would translate to TF 2.0, but we need something faster than tf.Tensor.eval().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping to publish a proposal for eager and graphs in TF 2.0 soon (I'm working on the text, PR hopefully this week). See keep and eye out for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asimshankar
Thanks, I'll definitely keep an eye.

@martinwicke
I think we're on the same page here, but let me confirm.

"I believe for Tensors which you know are constants, the value information
is simply always available, same for shapes, so I do think this function is
redundant."

Is that the case too if I don't know if a tensor is constant? For example, if I'm writing some function and I get a tensor, can I simply try to "statically" get its value? (which I expect would work if it's constant or somehow statically evaluable, but fail otherwise)

From what you say it looks like I probably can, and if that's the case then this function would indeed become redundant, but just making sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinwicke
Just to not leave this pending, will it be possible to check if a given tensor is constant in TF 2.x? That should also help solving the problem. In 1.x you could check the type of the op the tensor belongs to, but I ignore if that will be possible in TF 2.x.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI, re #18 (comment) see #20)

@AndreasMadsen
Copy link

AndreasMadsen commented Sep 8, 2018

Hi, author of sparsemax here. While I understand some contrib moduels are hard to maintain, sparsemax shouldn't be as it is implemented in python. There was a C++/CUDA implementation before, but I was asked to replace that for the sake of maintainability. I would therefore hope sparsemax will make it into tensorflow/tensorflow.

It was actually the TensorFlow team who originally asked me to implement it. It does also appear to be used quite a bit, see for example the past interrest in making it nan & infinity safe tensorflow/tensorflow#15564 (PR: tensorflow/tensorflow#21183).

@perfinion
Copy link
Member

Very much support this! In the unbundling dependencies work I've been doing, there are quite a few deps that are only used by some random tf.contrib module but nowhere else in the entire build so it's been a lot of work to deal with for not much gain.

A tensorflow-contrib package could even be published to pypi eventually with whatever is left after deleting so for the people that do want it pip install tensorflow-contrib would make import tf.contrib the same as before. Are version numbers of the new repos going to match the main TF release or are they just going to be independent?

Another thing I'd ask is when doing this migration, it should be done with preserving the git history as much as possible. That makes things much easier to follow in the new repos. Something like clone the entire TF repo then delete everything and commit then push to the new github repo.

@facaiy
Copy link
Member

facaiy commented Sep 10, 2018

I'm a little worried that tensorflow/contrib might meet the same maintenance challenges later like those of tensorflow core today.

@bhack
Copy link
Contributor

bhack commented Sep 10, 2018

@facaiy That's why I proposed to track/update maturity and owners of sub-api. But seems that it will be defined in the self-organization process of the new SIG.

@seanpmorgan
Copy link
Member

@perfinion I second the idea to preserve git history, but I'm torn on keeping the same module name. There will likely be no compatibility guarantee between the new "contrib" repo and the still living 1.x module so I worry it'll create confusion.

As far as versioning, I think it would be nice to match with 2.x versions, but may be unrealistic for the SIG to go through a release cycle at the same cadence as the TF team.

@dmitrievanthony
Copy link

dmitrievanthony commented Sep 10, 2018

Hi, let me remind that currently I have two opened pull requests with two new modules into contrib:

  • Apache Ignite Dataset (#22210) which introduces new data source.
  • Apache Ignite File System (#22194) which introduces new file system.

I really hope they will be accepted and merged soon. At the same time, if we are talking about 2.0 I think it makes sense to consider these modules as well. So, @martinwicke, let me ask you do not forget about them.

@facaiy
Copy link
Member

facaiy commented Sep 10, 2018

@bhack That's why I proposed to track/update maturity and owners of sub-api. But seems that it will be defined in the self-organization process of the new SIG.

Sounds great to me.

Does it make sense to create a new organization, say tensorflow-thirdparty? And let SIG manage how to migrate those modules into the new organization: move them totally into a big repository, or split them into each separate small repository (I like this second way, however it might be an exhausting task).

I mean this is for all: tensorflow/contrib, tensorflow/scientific, and tensorflow/io. All SIGs share the new organization and namespace. Is it worthwhile ?

@bhack
Copy link
Contributor

bhack commented Sep 12, 2018

Are you moving tf.slim in model just for models retro-compatibility and virtually deprecate it or do you plan to release new models with this api? Having models that rely on core repository API will help to enforce the quality of the API and its implicit testing. I.e. High level API was often not in shape also cause many reference models stressed more slim.

@yongtang
Copy link
Member

@dmitrievanthony I think Apache Ignite would be a good fit for tensorflow/io.

There are two pending PRs tensorflow/tensorflow#18224 (Avro) and tensorflow/tensorflow#19461 (Parquet) that may also fit tensorflow/io. I think the Avro support may takes additional time. The Parquet support is awaiting the upstream Bazel's http_archive issue (bazelbuild/bazel#5932), which should be resolved in Bazel 0.17.1 (bazelbuild/bazel#5059).

Bazel 0.17.1 is planned to be released tomorrow (bazelbuild/bazel#5059 (comment)).

/cc @fraudies @galv

yongtang added a commit to yongtang/tensorflow that referenced this pull request Sep 13, 2018
Add myself so that issues or PRs could be assigned to me.

Note contrib/{kafka,kinesis} might be moved:
tensorflow/community#18

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
yongtang added a commit to yongtang/tensorflow that referenced this pull request Sep 13, 2018
Add myself so that issues or PRs could be assigned to me.

Note contrib/{kafka,kinesis} might be moved:
tensorflow/community#18

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
| image | | partial move to tensorflow/contrib? |
| input_pipeline | rohan100jain | delete |
| integrate | shoyer *mcoram | move to tensorflow/scientific? |
| kafka | yongtang (mrry) | move to tensorflow/io? |
Copy link

@dmitrievanthony dmitrievanthony Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add ignite (#22210) and igfs (#22194) here, tensorflow/io?

@seanpmorgan
Copy link
Member

Do we foresee a path to core from "contrib"?

Will there be any coordinated additions to core and subsequent removals from "contrib" or is it best to have the repos live independently? I worry that having the same functionality in two places will cause problems (though as it exists today this still happens).

@kingspp
Copy link

kingspp commented Sep 19, 2018

@martinwicke This might help in sunsetting contrib package - https://tf-contrib-analyzer.herokuapp.com

@martinwicke
Copy link
Member Author

martinwicke commented Sep 19, 2018 via email

| nccl | (tobyboyd) | move essential parts to core |
| nearest_neighbor | | delete |
| nn | | partial move to tensorflow/contrib? |
| opt | *joshburkart apassos | move to tensorflow/contrib? |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinwicke Hi, I'm interested in opt module. My colleague and I contributed codes for adamax optimizer, elastic average optimizer, and agn_optimizer. I would love to help if needed.

| lookup | ysuematsu (ebrevdo) | move to core |
| losses | | partial move to tensorflow/contrib |
| makefile | petewarden | delete (RPI build now uses bazel) |
| memory_stats | wujingyue | delete |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an alternative to query memory usage (especially on GPUs) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ask Session.run to fill out StepStats, which includes information on memory usage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not support the functionality of the "MaxBytesInUseOp" in contrib, which returns me AllocatorStats.max_bytes_in_use. The closest thing in StepStats seems to be AllocatorMemoryUsed.allocator_bytes_in_use, which corresponds to AllocatorStats.bytes_in_use instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AllocatorMemoryUsed.peak_bytes equivalent to that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, those are per-node statistics. However the "MaxBytesInUse" op in contrib gives global peak memory usage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Following the response in the main thread)
I tried and sum of peak_bytes for all node always return me a much larger results than the MaxBytesInUse op, sometimes close to 2x larger.
There is little docs on what these fields mean, but I guess sum of peak memory per node is expected to be much larger than the overall memory consumption (which I'm interested in) since nodes can have non-overlapped lifetime.
Conceptually, I think that the peak memory consumption of a whole system cannot be the sum of any per-node statistics.

Copy link
Contributor

@yuefengz yuefengz Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AllocatorStats.max_bytes_in_use should give you the peak memory per device. That op queries that field. You can also use another way to query that field: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/clusters/single_machine.cc#L237

But there is no python interface for that currently. : )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @ppwwyyxx is interested in querying this from Python. We don't have a way to do that any more. Should the peak device memory usage be added to StepStats (or something in there?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a new field to StepStats to hold global peak memory per allocator if necessary.

@byronyi
Copy link
Contributor

byronyi commented Sep 25, 2018

For the new repos under TensorFlow org, will contributors be required to sign the same SLA as in the main repo? And license (Apache 2) remains unchanged or cannot be changed?

@martinwicke
Copy link
Member Author

@byronyi Yes, we will keep Apache2 (if nothing else, for the practical reason that it allows us to move coder around between repos since they are all licensed the same), and we are required to have the CLA signed for all contributors to any repos under the TensorFlow org.

@martinwicke
Copy link
Member Author

martinwicke commented Sep 25, 2018 via email

@martinwicke
Copy link
Member Author

martinwicke commented Sep 25, 2018 via email

@mrry
Copy link

mrry commented Sep 25, 2018

@martinwicke I think @yuefengz added the code for tracking memory usage via StepStats, so he'd probably know best.

(As an aside: in 2.0 without Session.run(), do we have a plan to expose StepStats?)

@martinwicke
Copy link
Member Author

martinwicke commented Sep 25, 2018 via email

@alextp
Copy link
Contributor

alextp commented Sep 25, 2018

@mrry Currently the TFE context exposes StepStats though the API is not super nice: https://github.com/tensorflow/tensorflow/blob/153578f3c90ca423501151adcbaf6b81e05e2440/tensorflow/python/eager/context.py#L587

Essentially you call context.context().enable_run_metadata() once and then call context.context().export_run_metadata() to get a proto with what happened since you last called export.

@asimshankar
Copy link
Contributor

@mrry @alextp @martinwicke : Also, as per #20, for any function call you can provide RunOptions and RunMetadata, so we can obtain StepStats for individual defuns as well.

| ffmpeg | fredbertsch | delete |
| framework | | partially move to core, delete the rest |
| fused_conv | | delete |
| gan | joel-shor | move to separate repo |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galeone TFGAN will be moved to a separate repo. You can find more details here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thank you for this reference!

@foxik
Copy link

foxik commented Dec 12, 2018

Hi, we use tf.contrib.crf quite a lot and it would be great to have it in TensorFlow, but it does not seem to have an owner -- I would be willing to help with the merge (i.e., become an owner if there is none).

@martinwicke
Copy link
Member Author

martinwicke commented Dec 12, 2018 via email

@martinwicke martinwicke merged commit 6619a99 into master Dec 17, 2018
@martinwicke martinwicke deleted the rfc-contrib branch December 17, 2018 22:01
@ewilderj ewilderj added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Dec 17, 2018
@byronyi byronyi mentioned this pull request Apr 4, 2019
@drheatherwalker
Copy link

Is there any update on the fate of WALSMatrixFactorization and WALSModel in tensorflow 2.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0 TensorFlow 2.0 development RFC: Accepted RFC Design Document: Accepted by Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.