Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log details to metadata for run analytics #992

Merged
merged 67 commits into from
Mar 23, 2024

Conversation

angel-ruiz7
Copy link
Contributor

@angel-ruiz7 angel-ruiz7 commented Feb 23, 2024

This will log information via the MosaicMLLogger to place the following keys in a run's metadata for analytics purposes. The data to log includes

  • model_name: string
  • script: 'Training', 'Eval'
  • train_task_type: PRETRAIN, CONTINUED_PRETRAIN, INSTRUCTION_FINETUNE
  • train_loader_name: string
  • train_dataset_hf_name: string
  • eval_loader_name: string
  • eval_dataset_hf_name: string
  • tokenizer_name: string
  • n_heads: number
  • d_model: int
  • callbacks: string[]
  • train_loader_workers: int
  • eval_loader_workers: int
  • gauntlet_configured: boolean
  • icl_configured: boolean

Screenshots

Using the Quickstart example

Screenshot 2024-03-22 at 4 39 18 PM

Using the gpt2-small example

Screenshot 2024-03-22 at 4 45 16 PM

@angel-ruiz7
Copy link
Contributor Author

angel-ruiz7 commented Feb 23, 2024

still need to add the subtype. what's the best approach for this? currently, we classify the run by looking through the command to find the name of specific yaml or python files. what would be the best approach to do this inside of train.py?

cc @aspfohl @irenedea

Copy link
Contributor

@aspfohl aspfohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of an ask for foundry team: Is there anything else that would be useful for analytics?

Should we make it configurable to turn this on or off? Or is presence of MosaicMLLogger enough (users could always turn it off via MOSAICML_PLATFORM env var)

scripts/eval/eval.py Outdated Show resolved Hide resolved
scripts/eval/eval.py Outdated Show resolved Hide resolved
scripts/eval/eval.py Outdated Show resolved Hide resolved
scripts/eval/eval.py Outdated Show resolved Hide resolved
Copy link
Contributor

@irenedea irenedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include evidence of this working in the PR description? Some manual tests and screenshots would be good.

Copy link
Contributor

@irenedea irenedea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 LGTM! Just one super super tiny formatting comment :) Thanks Angel! Will be great to have more logging and data 😄

llmfoundry/utils/__init__.py Outdated Show resolved Hide resolved
@angel-ruiz7 angel-ruiz7 merged commit 31e4879 into main Mar 23, 2024
10 checks passed
KuuCi pushed a commit that referenced this pull request Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants