Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental hf logger #1456

Merged
merged 23 commits into from
Jun 13, 2023
Merged

Experimental hf logger #1456

merged 23 commits into from
Jun 13, 2023

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Apr 27, 2023

Following this slack thread (private) I started to prototype a logger to push training data to the Hub.

EDIT: I removed the Mixin and only add a HFSummaryWriter that does the does. Let's optimize for our use case first and revisit later if we want to generalize. The logger inherits from tensorboardX.SummaryWriter.

EDIT 2: updated again to integrate CommitScheduler. No need for regular .push_to_hub, the logger automatically takes care of it. Also makes the commit more robust to concurrent read/write. Upload happens asynchronously in a background threads. Uploads are queued to avoid concurrency issues.

I ran a MNIST training to get a toy example. I took the script from this article and simply replaced

writer = SummaryWriter()

by

writer = HFSummaryWriter(repo_id="test-logger-MNIST", commit_every=1)

And that's it! It got me this tensorboard on the Hub.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 27, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite cool!

One suggestion is that maybe we should not just publish the mixin but also directly the mixed-in implem(s) for the most famous loggers, i.e. also export the

class TensorBoardHFLogger(HFLoggerMixin, SummaryWriter):
    pass

WDYT?


Example:
```py
from lightning.pytorch.loggers import TensorBoardLogger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think maybe better to use the "official" tensorboard one as an example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i think it's SummaryWriter? not sure anymore tbh, it's been a while)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's definitely where I need feedback from people actually knowing about it! I'm up with integrating it with any logger now that the base work is done.

@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 27, 2023

One suggestion is that maybe we should not just publish the mixin but also directly the mixed-in implem(s) for the most famous loggers

Yup that would make sense, agree we should do that. I wanted to at first but then realized that both lightning.pytorch.loggers.TensorBoardLogger and lightning.fabric.loggers.TensorBoardLogger exist and have slightly different usage so I let the user decide. But if we can select a few (or even 1) loggers that are popular, that's even better (and why not expose the Mixin for power users if they really want it).

@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 28, 2023

@julien-c I've updated the PR description to showcase how it works with the base SummaryWriter (and it works exactly the same, out-of-the-box). Here is the generated tensorboard: https://huggingface.co/Wauplin/test_hf_logger/tensorboard. I think exposing both the main HFTensorBoardLogger (that encapsulate SummaryWriter) and the mixin would be best to let users decide if they want a default solution or a flexible one.

Before moving forward on this PR, I'd be very nice to extensively test it on real trainings. I'm a bit "afraid" on how it can behave when the log_dir becomes large. There is also the limitation of running in the main thread or in the background. @thomwolf can I let you see with your team and get back to me if you're interested/if someone's testing it?

@julien-c
Copy link
Member

I think exposing both the main HFTensorBoardLogger (that encapsulate SummaryWriter) and the mixin would be best

100% agree

@Wauplin
Copy link
Contributor Author

Wauplin commented May 19, 2023

Made an update to use run_as_future=True (#1458). Data is now pushed in the background without stopping the training.

@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Patch coverage: 6.81% and project coverage change: +29.92 🎉

Comparison is base (8eabd33) 51.76% compared to head (73f868a) 81.68%.

❗ Current head 73f868a differs from pull request most recent head e5477c0. Consider uploading reports for the commit e5477c0 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1456       +/-   ##
===========================================
+ Coverage   51.76%   81.68%   +29.92%     
===========================================
  Files          56       56               
  Lines        6047     5957       -90     
===========================================
+ Hits         3130     4866     +1736     
+ Misses       2917     1091     -1826     
Impacted Files Coverage Δ
src/huggingface_hub/__init__.py 75.75% <ø> (ø)
src/huggingface_hub/_tensorboard_logger.py 0.00% <0.00%> (ø)
src/huggingface_hub/utils/__init__.py 100.00% <ø> (ø)
src/huggingface_hub/utils/_runtime.py 55.71% <40.00%> (+4.60%) ⬆️
src/huggingface_hub/utils/_experimental.py 100.00% <100.00%> (+25.00%) ⬆️

... and 37 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

def log_dir(self) -> str:
return self.logdir

logger = HFTensorBoardLogger("training/results", repo_id="test_hf_logger", path_in_repo="tensorboard")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be weird not to assign a log_dir? I think it's confusing that we need to point both somewhere on the hub and somewhere on the disk. Shouldn't it write on a temporary disk instead? so that basically the hub acts as the source of truth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not thought of this use case before TBH but it makes total sense. At first I wanted the logger to be as close as possible to the default logger (i.e. write locally as well). But yes we can make log_dir optional and if not provided we create a temporary directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasw21 actually by default SummaryWriter creates a local directory under /runs. I will keep this behavior as it is instead of creating a temporary directory by default. If we really want a tmp dir in the future, we can always add it as a new feature.

@Wauplin Wauplin changed the title [Draft] Experimental hf logger Experimental hf logger May 31, 2023
@Wauplin Wauplin marked this pull request as ready for review May 31, 2023 11:12
@Wauplin Wauplin merged commit b4e01b0 into main Jun 13, 2023
@Wauplin Wauplin deleted the experimental-hf-logger branch June 13, 2023 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants