[CI] Add a test of PyTorch XPU with Huggingface Transformers #1165

dvrogozh · 2024-12-13T00:42:55Z

CC: @juliusshufan, @chuanqi129, @RUIJIEZHONG66166

Add ci github action test running Huggingface Transformers test suite against XPU backend. Test goals:

Catch regressions coming from PyTorch XPU backend which affect Transformers
Catch new features coming from Transformers which require implementation efforts in PyTorch XPU
Design approach is to be as close to Transformers ci environment as possible. See Dockerfile, see self-push.yml for the references.

Setup the following test triggers:

Per opened PR modifying github action workflow file with the test (or any file in the repo from which workflow file depends on)
Per manual trigger event optionally specifying PyTorch XPU nightly build to test (default - latest nightly)

Setup environment as follows (T - required by Transformers tests):

$ cat spec.py
import torch
DEVICE_NAME = 'xpu'
MANUAL_SEED_FN = torch.xpu.manual_seed
EMPTY_CACHE_FN = torch.xpu.empty_cache
DEVICE_COUNT_FN = torch.xpu.device_count

Run Transformers tests as follows (G - test group):

At the moment we still have some features not implemented for PyTorch XPU backend affecting Transformers tests, plus some porting is needed in tests themselves. For convenience we are breaking tess into groups defining baseline expectations for each group separately. In the future we will likely switch to running just python -m pytest tests. Baseline expecations are:

Test group	Errors	Failed
`tests/*.py`	0	8
`tests/benchmark`	0	0
`tests/generation`	0	18
`tests/models`	0	TBD
`tests/models -k backbone`	0	0
`tests/pipelines`	0	9
`tests/trainer`	0	3
`tests/utils`	0	1

Test should check baseline as follows:

For groups with 0/0 expectations - check pytest return status code (expect to be 0)
For groups with non-zero failed cases - ignore pytest return status code and check:
- Number of errors should match (be 0)
- Number of failed cases should match
- One-line failures_line.txt outputs from --make-reports (or failed cases) should match

The following artifacts should be made available after test execution:

List of PyPI packages installed in Conda environment and their versions (run pip list, dump to generic log output is fine)
List of available GPU device IDs (run cat /sys/class/drm/render*/device/device, dump to generic log output is fine)
Logs running each pytest command
Archived reports from --make-reports command
Table report with annotations (versions of key packages, env variables, etc.), ci: print annotations for key package versions in transformers test #1184
Table report with number of passed/failed/skipped cases
Table report with failed cases
Table report with skipped cases and reasons

The text was updated successfully, but these errors were encountered:

dvrogozh · 2024-12-19T23:40:09Z

First version of the test available in:

[Test] Add transformers test #1175

chuanqi129 assigned RUIJIEZHONG66166 Dec 13, 2024

chuanqi129 added the transformers label Dec 13, 2024

chuanqi129 added this to the PT2.7 milestone Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add a test of PyTorch XPU with Huggingface Transformers #1165

[CI] Add a test of PyTorch XPU with Huggingface Transformers #1165

dvrogozh commented Dec 13, 2024 •

edited

Loading

dvrogozh commented Dec 19, 2024

[CI] Add a test of PyTorch XPU with Huggingface Transformers #1165

[CI] Add a test of PyTorch XPU with Huggingface Transformers #1165

Comments

dvrogozh commented Dec 13, 2024 • edited Loading

dvrogozh commented Dec 19, 2024

dvrogozh commented Dec 13, 2024 •

edited

Loading