Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue that caused Lightning to extract the batch size even though it was set by the user in LightningModule.log #10408

Merged
merged 41 commits into from
Nov 19, 2021

Conversation

rohitgr7
Copy link
Contributor

@rohitgr7 rohitgr7 commented Nov 8, 2021

What does this PR do?

Fixes: #10349 (comment)
Fixes: #10576

Seems like after we added the warning to check for batch_size for an ambiguous batch, a few users realized that the batch_size used internally isn't correct. We always make a call to extract batch size even when it's set by the user explicitly inside self.log(batch_size=...) which IMO is not required because we have a recursive call to extract batch size which causes unnecessary extraction for it even when it's not required plus it will raise a warning if someone specifies it and we still look for it. So now the call will be made inside .log only when batch_size is not specified.

In short: this fixes an issue that caused Lightning to extract the batch size even though it was set by the user in LightningModule.log.

So an overview of the solution is:
track the batch and use it to extract batch_size inside self.log call when it's not specified explicitly and metric is being logged with on_epoch=True and will be averaged at the end.

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @carmocca @edward-io @ananthsub

@rohitgr7 rohitgr7 changed the title Update to extract batch size only when its required Fix the call to extract_batch_size only when its required Nov 9, 2021
@rohitgr7 rohitgr7 added bug Something isn't working data handling Generic data-related topic labels Nov 9, 2021
@rohitgr7 rohitgr7 added this to the 1.5.x milestone Nov 9, 2021
@rohitgr7
Copy link
Contributor Author

Hi, @rohitgr7 Thanks a lot for tackling this issue! I left a few comments in code, feel free to let me know if you have any questions. A generic idea from my point of view, I think extract_data_batch() is this kind function we could convert as hook and delegate to user to implement, because they know their batch, it will be easily for them to define the batch_size or how to get batch_size. While here we had a lot assumptions about what the batch is going to look like, but we still can only extract valid batch_size under very limited situations. Anyway, I don't want to interrupt this PR discussion thread, I will initialize an issue to start the discussion. cc: @tchaton @ananthsub

if we do this, then we will have two different ways by which user can define their batch size. One will be this hook and another one will be self.log(..., batch_size=). They still can define methods within their lightning module and can call it to retrieve the batch_size and use it within self.log. Plus we already have too many hooks and this seems more like a utility method to me. Just a personal opinion :)

@tchaton
Copy link
Contributor

tchaton commented Nov 19, 2021

To add to @rohitgr7 point,

self.log(..., batch_size=) was more a rescue solution and I personally don't find it elegant to provide the batch size, especially if you are logging outside of the *_step methods of the LightningModule where the current batch is unavailable. But this might affect a minority of users.

Plus we already have too many hooks
This is entirely true and we want to enforce only 1 way to do anything within Lightning. I believe we would have to weigh the hook vs the self.log(..., batch_size=) in terms of usability/simplicity.

@carmocca @awaelchli @SeanNaren Any thoughts ?

@rohitgr7
Copy link
Contributor Author

@tchaton let's discuss more here: #10635

@mergify mergify bot removed the has conflicts label Nov 19, 2021
@mergify mergify bot added ready PRs ready to be merged has conflicts labels Nov 19, 2021
@mergify mergify bot removed the has conflicts label Nov 19, 2021
@carmocca carmocca enabled auto-merge (squash) November 19, 2021 16:16
@carmocca carmocca disabled auto-merge November 19, 2021 16:16
@carmocca carmocca enabled auto-merge (squash) November 19, 2021 16:16
@carmocca carmocca merged commit ec27313 into master Nov 19, 2021
@carmocca carmocca deleted the fix/batch_size branch November 19, 2021 16:48
rohitgr7 added a commit that referenced this pull request Nov 20, 2021
…g` (#10408)

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Raalsky pushed a commit to neptune-ai/pytorch-lightning that referenced this pull request Nov 23, 2021
…g` (Lightning-AI#10408)

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
awaelchli pushed a commit that referenced this pull request Nov 24, 2021
…g` (#10408)

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
lexierule pushed a commit that referenced this pull request Nov 24, 2021
…g` (#10408)

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
@rohitgr7 rohitgr7 mentioned this pull request Feb 7, 2022
12 tasks
JoeLambourne added a commit to AutodeskAILab/BRepNet that referenced this pull request May 25, 2022
# Why?

In  #11 and #12 we see that updates to the occwl library introduced breaking changes which were not spotted in the example notebooks.   Thank you to @[cteqeu](https://github.com/cteqeu) for reporting the issues and @[akashaero](https://github.com/akashaero) for sharing the workaround.

This PR addresses the problems

# What?

- A new version of occwl is released [see here](https://github.com/AutodeskAILab/occwl/releases/tag/v1.0.0).  We now fix the version of occwl to this version to avoid further unexpected breakage.
- In [jupyter_segmentation_viewer.py](visualization/jupyter_segmentation_viewer.py) we remove the calls to the removed functions.
- All the notebooks are rerun and retested.
- Some warnings were firing in [pytorch-lightning](Lightning-AI/pytorch-lightning#10408).  To ensure that all metrics are computed correctly we pass the number of faces to the logger as the batch size.   Notice that the IoU and accuracy metrics reported in [the paper](https://arxiv.org/abs/2104.00706) are per-face  rather than per solid.
- In [scale_utils.py](utils/scale_utils.py) we make use of the scaling function in occwl rather than re-implementing it here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working logging Related to the `LoggerConnector` and `log()` ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make extract_batch_size() easier to redefine Trying to infer the batch_size from an ambiguous collection
5 participants