Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save the list of new model failures #31013

Merged
merged 1 commit into from
May 24, 2024
Merged

save the list of new model failures #31013

merged 1 commit into from
May 24, 2024

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented May 24, 2024

What does this PR do?

We finally get back the missing new model failure table and list on slack channels, but those are truncated version.
We have the full table saved as artifact, but not the list. This PR saves the full list of new model failures as artifact too, so we can access the information outside slack.

This is particularly useful when a CI run has many new failures.

table: (just some numbers)

Changed model modules failures:

Single PT |  Multi PT | Single TF |  Multi TF |     Other | Category
        0 |         0 |         0 |         0 |        +7 | fsdp
        0 |         0 |         0 |         0 |        +1 | models_cohere
        0 |       +22 |         0 |        +1 |         0 | models_deit

list: (has the test names and job links)

<https://github.com/huggingface/transformers/actions/runs/9194245573/job/25287588257|multi> gpu
tests/fsdp/test_fsdp.py::TrainerIntegrationFSDP::test_basic_run_full_shard_bf16

<https://github.com/huggingface/transformers/actions/runs/9194245573/job/25287588257|multi> gpu
tests/fsdp/test_fsdp.py::TrainerIntegrationFSDP::test_basic_run_shard_grad_op_bf16

@ydshieh ydshieh requested a review from amyeroberts May 24, 2024 12:56
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh ydshieh merged commit a3cdff4 into main May 24, 2024
8 checks passed
@ydshieh ydshieh deleted the save_full_new_failures branch May 24, 2024 13:20
@ydshieh ydshieh mentioned this pull request May 31, 2024
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 11, 2024
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants