-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Sunset tf.contrib #18
Conversation
The removal of tech debt is palpable 😄
|
rfcs/20180907-contrib-sunset.md
Outdated
| hvx | satok16 | delete (redundant with NNAPI) | | ||
| image | | partial move to tensorflow/contrib? | | ||
| input_pipeline | rohan100jain | delete | | ||
| integrate | shoyer *mcoram | delete? | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also put tf.contrib.integrate down for moving to tensorflow/scientific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my bad. Yes.
Can we maintain something like a pypi classifier for every module in contrib and ownership like in Helm/Charts to quickly expose the maturity level and notify manteiners MIA? |
Well, given that contrib is going away, I'm not sure we want to do this for contrib. But we are thinking about something like this for different projects in the tensorflow org, @ewilderj FYI. |
Yeah I meant for contrib projects moving in the separate repository (so the ones not moving in the core or deleted). |
If all goes according to plan, there may be 3 repositories afterwards:
tensorflow/contrib (I also like Sean's suggestions), tensorflow/scientific,
and tensorflow/io. I don't really want to distinguish projects inside
those, but in the end, that should be up to the SIG which maintains the
repo in question.
|
rfcs/20180907-contrib-sunset.md
Outdated
| timeseries | gsundeep karmel | move to tensorflow/estimator | | ||
| tpu | saeta | move to core | | ||
| training | ebrevdo sguada joel-shor | | | ||
| util | | delete (no owner), or move to tools | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one function in this project that proved to be particularly useful for static optimizations on libraries built on top on TensorFlow, and as far as I know there is no alternative: tf.contrib.util.constant_value.
Considering the future of this project is undecided yet, would it be possible to ensure that at least this particular function or an alternative to it are available in TensorFlow 2.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how interesting this function still is with eager execution. @mrry FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking precisely not for eager execution, but for graph mode. In our use cases eager execution is simply not useful at all. We need a graph we can export as a file and run in other platforms, like for example game engines in real time on Windows using the C or C++ APIs. Reimplementing the operations there even if we had the weights is not an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eager execution in 2.0 still allows you to have graphs (though we'd prefer you think of them as functions), and those can still be exported, executed remotely, etc. constant_value
is not particularly useful with eager execution since you can simply look at any tensor (its value will be there), or run any function (which will produce a value).
Or is the capability you need the implicit estimate of whether the value can be computed cheaply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to read about graphs in 2.0. If you happen to have any links where I can learn more about this new design, I'll be very happy to take a look so I can make my comments more useful and targeted in the future.
Answering your question in TF 1.x terms, we need the ability to quickly and statically access the value of constant tensors and statically known shapes without involving a session or a tensor evaluation. I'm not fully sure how this would translate to TF 2.0, but we need something faster than tf.Tensor.eval().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping to publish a proposal for eager and graphs in TF 2.0 soon (I'm working on the text, PR hopefully this week). See keep and eye out for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asimshankar
Thanks, I'll definitely keep an eye.
@martinwicke
I think we're on the same page here, but let me confirm.
"I believe for Tensors which you know are constants, the value information
is simply always available, same for shapes, so I do think this function is
redundant."
Is that the case too if I don't know if a tensor is constant? For example, if I'm writing some function and I get a tensor, can I simply try to "statically" get its value? (which I expect would work if it's constant or somehow statically evaluable, but fail otherwise)
From what you say it looks like I probably can, and if that's the case then this function would indeed become redundant, but just making sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinwicke
Just to not leave this pending, will it be possible to check if a given tensor is constant in TF 2.x? That should also help solving the problem. In 1.x you could check the type of the op the tensor belongs to, but I ignore if that will be possible in TF 2.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FYI, re #18 (comment) see #20)
Hi, author of It was actually the TensorFlow team who originally asked me to implement it. It does also appear to be used quite a bit, see for example the past interrest in making it nan & infinity safe tensorflow/tensorflow#15564 (PR: tensorflow/tensorflow#21183). |
Very much support this! In the unbundling dependencies work I've been doing, there are quite a few deps that are only used by some random tf.contrib module but nowhere else in the entire build so it's been a lot of work to deal with for not much gain. A tensorflow-contrib package could even be published to pypi eventually with whatever is left after deleting so for the people that do want it pip install tensorflow-contrib would make import tf.contrib the same as before. Are version numbers of the new repos going to match the main TF release or are they just going to be independent? Another thing I'd ask is when doing this migration, it should be done with preserving the git history as much as possible. That makes things much easier to follow in the new repos. Something like clone the entire TF repo then delete everything and commit then push to the new github repo. |
I'm a little worried that tensorflow/contrib might meet the same maintenance challenges later like those of tensorflow core today. |
@facaiy That's why I proposed to track/update maturity and owners of sub-api. But seems that it will be defined in the self-organization process of the new SIG. |
@perfinion I second the idea to preserve git history, but I'm torn on keeping the same module name. There will likely be no compatibility guarantee between the new "contrib" repo and the still living 1.x module so I worry it'll create confusion. As far as versioning, I think it would be nice to match with 2.x versions, but may be unrealistic for the SIG to go through a release cycle at the same cadence as the TF team. |
Hi, let me remind that currently I have two opened pull requests with two new modules into
I really hope they will be accepted and merged soon. At the same time, if we are talking about 2.0 I think it makes sense to consider these modules as well. So, @martinwicke, let me ask you do not forget about them. |
Sounds great to me. Does it make sense to create a new organization, say tensorflow-thirdparty? And let SIG manage how to migrate those modules into the new organization: move them totally into a big repository, or split them into each separate small repository (I like this second way, however it might be an exhausting task). I mean this is for all: tensorflow/contrib, tensorflow/scientific, and tensorflow/io. All SIGs share the new organization and namespace. Is it worthwhile ? |
Are you moving tf.slim in model just for models retro-compatibility and virtually deprecate it or do you plan to release new models with this api? Having models that rely on core repository API will help to enforce the quality of the API and its implicit testing. I.e. High level API was often not in shape also cause many reference models stressed more slim. |
@dmitrievanthony I think Apache Ignite would be a good fit for tensorflow/io. There are two pending PRs tensorflow/tensorflow#18224 (Avro) and tensorflow/tensorflow#19461 (Parquet) that may also fit tensorflow/io. I think the Avro support may takes additional time. The Parquet support is awaiting the upstream Bazel's http_archive issue (bazelbuild/bazel#5932), which should be resolved in Bazel 0.17.1 (bazelbuild/bazel#5059). Bazel 0.17.1 is planned to be released tomorrow (bazelbuild/bazel#5059 (comment)). |
Add myself so that issues or PRs could be assigned to me. Note contrib/{kafka,kinesis} might be moved: tensorflow/community#18 Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Add myself so that issues or PRs could be assigned to me. Note contrib/{kafka,kinesis} might be moved: tensorflow/community#18 Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
rfcs/20180907-contrib-sunset.md
Outdated
| image | | partial move to tensorflow/contrib? | | ||
| input_pipeline | rohan100jain | delete | | ||
| integrate | shoyer *mcoram | move to tensorflow/scientific? | | ||
| kafka | yongtang (mrry) | move to tensorflow/io? | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we foresee a path to core from "contrib"? Will there be any coordinated additions to core and subsequent removals from "contrib" or is it best to have the repos live independently? I worry that having the same functionality in two places will cause problems (though as it exists today this still happens). |
@martinwicke This might help in sunsetting contrib package - https://tf-contrib-analyzer.herokuapp.com |
This is awesome.
|
| nccl | (tobyboyd) | move essential parts to core | | ||
| nearest_neighbor | | delete | | ||
| nn | | partial move to tensorflow/contrib? | | ||
| opt | *joshburkart apassos | move to tensorflow/contrib? | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinwicke Hi, I'm interested in opt
module. My colleague and I contributed codes for adamax optimizer, elastic average optimizer, and agn_optimizer. I would love to help if needed.
| lookup | ysuematsu (ebrevdo) | move to core | | ||
| losses | | partial move to tensorflow/contrib | | ||
| makefile | petewarden | delete (RPI build now uses bazel) | | ||
| memory_stats | wujingyue | delete | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an alternative to query memory usage (especially on GPUs) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ask Session.run
to fill out StepStats
, which includes information on memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not support the functionality of the "MaxBytesInUseOp" in contrib, which returns me AllocatorStats.max_bytes_in_use
. The closest thing in StepStats
seems to be AllocatorMemoryUsed.allocator_bytes_in_use, which corresponds to AllocatorStats.bytes_in_use instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AllocatorMemoryUsed.peak_bytes equivalent to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, those are per-node statistics. However the "MaxBytesInUse" op in contrib gives global peak memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Following the response in the main thread)
I tried and sum of peak_bytes
for all node always return me a much larger results than the MaxBytesInUse
op, sometimes close to 2x larger.
There is little docs on what these fields mean, but I guess sum of peak memory per node is expected to be much larger than the overall memory consumption (which I'm interested in) since nodes can have non-overlapped lifetime.
Conceptually, I think that the peak memory consumption of a whole system cannot be the sum of any per-node statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AllocatorStats.max_bytes_in_use
should give you the peak memory per device. That op queries that field. You can also use another way to query that field: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/clusters/single_machine.cc#L237
But there is no python interface for that currently. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @ppwwyyxx is interested in querying this from Python. We don't have a way to do that any more. Should the peak device memory usage be added to StepStats (or something in there?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a new field to StepStats to hold global peak memory per allocator if necessary.
For the new repos under TensorFlow org, will contributors be required to sign the same SLA as in the main repo? And license (Apache 2) remains unchanged or cannot be changed? |
@byronyi Yes, we will keep Apache2 (if nothing else, for the practical reason that it allows us to move coder around between repos since they are all licensed the same), and we are required to have the CLA signed for all contributors to any repos under the TensorFlow org. |
That's correct, you would have to add them up manually (or is there
something I'm missing?). If I'm right about this, I don't think the
separate ops are worth maintaining to avoid that summation.
|
That makes sense. I would be fine with adding the equivalent of
MaxBytesInUse to StepStats. @mrry do you have a sense of why we never did
that / if there's a reason that may be hard?
|
@martinwicke I think @yuefengz added the code for tracking memory usage via (As an aside: in 2.0 without |
I believe @alextp has a plan, yes. Not sure how exactly.
|
@mrry Currently the TFE context exposes StepStats though the API is not super nice: https://github.com/tensorflow/tensorflow/blob/153578f3c90ca423501151adcbaf6b81e05e2440/tensorflow/python/eager/context.py#L587 Essentially you call |
@mrry @alextp @martinwicke : Also, as per #20, for any function call you can provide |
| ffmpeg | fredbertsch | delete | | ||
| framework | | partially move to core, delete the rest | | ||
| fused_conv | | delete | | ||
| gan | joel-shor | move to separate repo | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galeone TFGAN will be moved to a separate repo. You can find more details here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Thank you for this reference!
Hi, we use |
Please reach out and join addons@tensorflow.org. I think crf is likely to
end up a part of tensorflow/addons.
… |
Is there any update on the fate of WALSMatrixFactorization and WALSModel in tensorflow 2.0? |
The feedback phase will be open for four weeks until 2018-10-01.
Sunsetting
tf.contrib
Summary
The
tf.contrib
module plays several important roles in the TensorFlowecosystem: It has made it easy for members of the community to contribute to
TensorFlow, and have their contributions tested and maintained. It is also used
as a staging ground to test early-stage and experimental features in TensorFlow.
However, as the community has grown, the lack of scalability of the current
approach for maintaining and supporting
tf.contrib
has become apparent.This RFC is a proposal to sunset the present
tf.contrib
, and replace itsimportant functions with more maintainable alternatives. Note that it also
affects some non-contrib code which is not part of the
tensorflow
module.