-
Notifications
You must be signed in to change notification settings - Fork 74.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.vectorized_maps
resolve fallbacks
#55639
Comments
On the TF side I add also keras-team/keras-cv#258 (comment) as it will be the next one. |
Any feedback on what to do next? |
I've just updated the fallback list from the current Keras-cv master: WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustContrastv2 cause Input "contrast_factor" of op 'AdjustContrastv2' expected to be loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustHue cause Input "delta" of op 'AdjustHue' expected to be loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting AdjustSaturation cause Input "scale" of op 'AdjustSaturation' expected to be loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DepthwiseConv2dNative cause Input "filter" of op 'DepthwiseConv2dNative' expected to be not loop invariant.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DynamicPartition cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting DynamicStitch cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting HistogramFixedWidth cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting RandomShuffle cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING tensorflow:pfor.py:1082 Using a while_loop for converting StridedSlice cause Input "input" of op 'StridedSlice' expected to be not loop invariant. |
We discussed it in our meeting but still don't have an owner for |
@wangpengmit Thank you but I need to wait that we have a resource allocated to I want also try to minimize the risk that at some point I find out that it will no longer be actively supported or it is superseded by something else as we don't have a public visibility on TF roadmaps. Please notify me on this ticket when you have a new resource allocated and I could help. |
CCing @ishark . |
Sorry for the late reply on this. Can you please help prioritize which ops would you like to have converters for first? |
I think that we need to understand a little bit of things before looking at converters priorities: I suppose As you know the origin of these are from Keras-cv so I'am more worried about the And my guess (it could be wrong) is that we are hitting an hard limit of E.g. see: @ishark What do you think? |
Thanks for the prompt reply @bhack and for providing the context! Agreed, it makes sense to understand the whole context before diving into solution. So the way I understand, there are 2 main issues:
In my opinion, 1 has a more straightforward solution, unless the op is not vectorizable. For 2 on the other hand, things can be a bit tricky and might depend on the usecase. One possible approach here might be to provide way to extend the vectorization converters to users, if general solution is not suitable. Can you confirm that in your delta needs to be a variant and actually has a different value for each call? Or is it coming from keras cv implementation? |
Obviously 1 is always necessary as a pre-requisite but in this case I think we will end up in 2 by design. Our variant parameters are mainly inherited by some design choices in the early days of the Keras-cv repo using a within the batch randomization of the TF op parameters. To have more context please check in the original thread with the few messages around keras-team/keras-cv#146 (comment). See also my last checklist in: |
Just to clarify with another example the within the batch approach vs the nature of the TF native op: tensorflow/tensorflow/python/ops/parallel_for/pfor.py Lines 1713 to 1717 in d8ce9f9
|
The same is happening for RandAugment. Without RandAugment each epoch takes around 35s on my machine. With RandAugment it takes about 2mins 25 seconds. Any resolution on the roadmap? |
hello! please provide repro code. Are you using it in a sequential model? You should use it in the tf.data pipeline |
@LukeWood I am using it within the tf.data pipeline. See snippets below.
|
I see. It’s not clear what the expected performance is of RandAugment; this might not be a bug but rather the cost of doing more computation - especially if your neural network is tiny. What’s your model code? |
Yes It could still be a too small input/network that makes the forward and backward pass faster then the augmentation task and so the augmentation is the bottleneck but we had reported a quite heavy overhead also just with a single augmentation like Random contrast with EfficientnetB0 with CIFAR (ok not a so huge input size) at keras-team/keras-cv#581 The main problem is that we have never benchmarked in Keras-cv the performance drop of these vectorized map fallbakcs and our original choice/policy to randomize within the batch that it is one of the root cause of these fallbacks. I haven't still any specific reference about a paper that sustain why this training regime has gain on the loss/epoch. What I can claim for sure is that we have a performance drop that need to be benchmarked. Then with @ishark we could check if there is any action item or we don't have too much pratical solutions if we want to maintain the within the batch augmentation policy other that changing the All this with an orphan |
'But other then an API break it will requires kernels and XLA lowerings/bridge rewriting' Possibly. Depends on how the low level op(s) in question is(are) affected. |
Thanks @atlanticstarr1 and @bhack for adding details to the bug. Regarding the code snippet for RandAugment, I did not see vectorized_map being used in the snippet. Tracing through the dataset.map method briefly, it seems like it uses custom dataset ops and not vectorized_map under the hood. @atlanticstarr1, could you please confirm the slowdown is related to vectorized_map? Otherwise we might want to track it as a separate issue. Will look into a solution for invariant issue for pfor in this quarter and update. Thanks! |
@ishark You can find it if you look at the base class of the image preprocessing layers: |
With the within the batch augmentation policy we have (imho wrongly) tried to use If we are still talking about covering the missing converter with invariant I don't think that we will solve this cases. |
Thanks for the link to keras preprocessing layer @bhack. I meant I would investigate the original issue of tf.vectorized_map fallbacks occurred during a few of the Keras cv preprocessing layers and check which ones we can resolve in the short term. It may not solve all the issues related to Keras CV + pfor usage, however, hopefully it will move us further along. |
Hopefully we can discuss it publicly as soon as you have an idea. My hypothesis is obviously that in this specific case the lack of converters is just a "facade" problem |
Please check my Colab performance test at keras-team/keras-cv#581 (comment) |
/cc @martin-gorner |
Thank you @bhack! This is very helpful. |
I confirm this is a deliberate product decision: data augmentation techniques that augment an entire batch of images in a similar way are potentially harmful. Many models train better with good stochasticity of the training data. Even inadequate shuffling of data within a batch can show up on learning curves. Data augmentation is supposed to increase the variance of input images. But having an entire batch augmented in the same way works against that. These effects on training can be more or less visible, sometimes very subtle and therefore very difficult to pinpoint and debug. That's why we want to make sure this is a problem users of KerasCV augmentation layer do not have to worry about. This discussion is not so much about numerical impact (of batches augmented in a similar way vs. with good intra-batch augmentation stochasticity) on this or that model. There might be evidence that this does or does not matter for some models. Model vary and the evidence will always be debated. This product decision is about making this debate a moot point by offering a design that is always safe out of the box. |
Yes I think we all understand this but we also know that if in the design phase we don't have planned the resources to rewrite the required TF ops and Kernels (+XLA) the performance gap was assured as we need to launch a kernel for every image in the batch. Also in some specific known conditions it is hard to plan at all to refactor these specific API (see tensorflow/community#412 (comment)) In the early design days, as re-mentioned in keras-team/keras-cv#372 (comment), I've tried to argue, in the case we didn't plan to rewrite these ops/kernels, a sort of compromise between stochasticity and the time required for a training step: Can we interpret the mentioned paper by randomizing the augmenting params in sub-batches repeating the same set of images? In this way we could populate the batch reducing the number of launched kernels and also reduce the IO traffic on the filesystem. Maybe you have discussed this internally but in the "community" this has never been addressed. On the practical side, if this idea does not convince the team, what other solution do we have in hand besides rewriting the ops in TF? |
I'm hearing you. Performance is an important consideration too. We will be discussing the tradeoff. |
@martin-gorner Thanks, I hope that you could keep the discussion open in the community also if sometimes I understand it could be slower than an internal thread. Currently I don't have the Cloud resource available to compile again TF and prepare some testing PRs. tensorflow/tensorflow/core/kernels/image/adjust_contrast_op.cc Lines 381 to 386 in 7d9c767
Always considering that we are not going to accumulate a combinatorial explosion of TF ops to modify especially with the new 3d augmentation API and that we have at least enough resources to review PRs. I think that instead for the missing converters it is a minor problem that can be handled more safely by the |
@martin-gorner In the case it could help https://arxiv.org/abs/2210.06441 |
@martin-gorner What do you think about keras-team/keras-cv#1331 ? |
I'm seeing the same issue with PopulationCount |
Hi, Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base. The Tensorflow team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate. Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow space. |
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you. |
This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further. |
Mirroring keras-team/keras-cv#291 for the TF core component.
System information
master
I don't know. If I have a clear contribution path/pin-pointer probably.
Describe the feature and the current behavior/state.
Missing converters PFOR and eventually mitigate other fallbacks
Will this change the current api? How?
No
Who will benefit with this feature?
Performance on fallback
Any Other info.
/cc @wangpengmit
The text was updated successfully, but these errors were encountered: