Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Add a test of PyTorch XPU with Huggingface Transformers #1165

Open
18 of 32 tasks
dvrogozh opened this issue Dec 13, 2024 · 1 comment
Open
18 of 32 tasks

[CI] Add a test of PyTorch XPU with Huggingface Transformers #1165

dvrogozh opened this issue Dec 13, 2024 · 1 comment
Assignees
Milestone

Comments

@dvrogozh
Copy link
Contributor

dvrogozh commented Dec 13, 2024

CC: @juliusshufan, @chuanqi129, @RUIJIEZHONG66166

Add ci github action test running Huggingface Transformers test suite against XPU backend. Test goals:

  1. Catch regressions coming from PyTorch XPU backend which affect Transformers
  2. Catch new features coming from Transformers which require implementation efforts in PyTorch XPU
    Design approach is to be as close to Transformers ci environment as possible. See Dockerfile, see self-push.yml for the references.

Setup the following test triggers:

  • Per opened PR modifying github action workflow file with the test (or any file in the repo from which workflow file depends on)
  • Per manual trigger event optionally specifying PyTorch XPU nightly build to test (default - latest nightly)

Setup environment as follows (T - required by Transformers tests):

  • Use linux.idc.xpu runners
  • Use Ubuntu based hosts (22.04 or later)
  • Install: apt-get install git-lfs && git lfs install (T)
  • Install: apt-get install espeak-ng (T)
    • tests/models/wav2vec2_phoneme/test_tokenization_wav2vec2_phoneme.py::Wav2Vec2PhonemeCTCTokenizerTest::test_batch_encode_plus_padding (v4.47.0)
  • Install: apt-get install pkg-config libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libavfilter-dev libswscale-dev libswresample-dev (T)
  • Use Conda virtual environment with python 3.10: conda create -y -n venv python=3.10 (3.12 steps into Pip install is failing to build due to av/logging.pyx error PyAV-Org/PyAV#1140)
  • Clone Transformers v4.47.0 (https://github.com/huggingface/transformers/tree/v4.47.0)
  • Install Transformers: pip install -e .
  • Install Transformers test dependencies: pip install -e .[dev-torch,testing,video]
  • Install XPU device specific test configuration file TRANSFORMERS_TEST_DEVICE_SPEC=spec.py (see file content below)
$ cat spec.py
import torch
DEVICE_NAME = 'xpu'
MANUAL_SEED_FN = torch.xpu.manual_seed
EMPTY_CACHE_FN = torch.xpu.empty_cache
DEVICE_COUNT_FN = torch.xpu.device_count

Run Transformers tests as follows (G - test group):

At the moment we still have some features not implemented for PyTorch XPU backend affecting Transformers tests, plus some porting is needed in tests themselves. For convenience we are breaking tess into groups defining baseline expectations for each group separately. In the future we will likely switch to running just python -m pytest tests. Baseline expecations are:

Test group Errors Failed
tests/*.py 0 8
tests/benchmark 0 0
tests/generation 0 18
tests/models 0 TBD
tests/models -k backbone 0 0
tests/pipelines 0 9
tests/trainer 0 3
tests/utils 0 1

Test should check baseline as follows:

  • For groups with 0/0 expectations - check pytest return status code (expect to be 0)
  • For groups with non-zero failed cases - ignore pytest return status code and check:
    • Number of errors should match (be 0)
    • Number of failed cases should match
    • One-line failures_line.txt outputs from --make-reports (or failed cases) should match

The following artifacts should be made available after test execution:

  • List of PyPI packages installed in Conda environment and their versions (run pip list, dump to generic log output is fine)
  • List of available GPU device IDs (run cat /sys/class/drm/render*/device/device, dump to generic log output is fine)
  • Logs running each pytest command
  • Archived reports from --make-reports command
  • Table report with annotations (versions of key packages, env variables, etc.), ci: print annotations for key package versions in transformers test #1184
  • Table report with number of passed/failed/skipped cases
  • Table report with failed cases
  • Table report with skipped cases and reasons
@dvrogozh
Copy link
Contributor Author

First version of the test available in:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants