$ python -m pytest -n 3 --dist=loadfile -s -v ./tests/test_optimization.py ================================================================================================================================== test session starts ================================================================================================================================== platform linux -- Python 3.9.7, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- .../python cachedir: .pytest_cache rootdir: .../transformers_manuel, configfile: setup.cfg plugins: xdist-2.4.0, dash-2.0.0, forked-1.3.0, timeout-2.0.1 [gw0] linux Python 3.9.7 cwd: .../transformers_manuel [gw1] linux Python 3.9.7 cwd: .../transformers_manuel [gw2] linux Python 3.9.7 cwd: .../transformers_manuel [gw0] Python 3.9.7 (default, Sep 16 2021, 13:09:58) -- [GCC 7.5.0] [gw2] Python 3.9.7 (default, Sep 16 2021, 13:09:58) -- [GCC 7.5.0] [gw1] Python 3.9.7 (default, Sep 16 2021, 13:09:58) -- [GCC 7.5.0] gw0 [5] / gw1 [5] / gw2 [5] scheduling tests via LoadFileScheduling tests/test_optimization.py::OptimizationTest::test_adafactor [gw0] PASSED tests/test_optimization.py::OptimizationTest::test_adafactor tests/test_optimization.py::OptimizationTest::test_adam_w [gw0] PASSED tests/test_optimization.py::OptimizationTest::test_adam_w tests/test_optimization.py::OptimizationTest::test_compare_adamw_no_weight_decay [gw0] FAILED tests/test_optimization.py::OptimizationTest::test_compare_adamw_no_weight_decay tests/test_optimization.py::OptimizationTest::test_compare_adamw_with_weight_decay [gw0] FAILED tests/test_optimization.py::OptimizationTest::test_compare_adamw_with_weight_decay tests/test_optimization.py::ScheduleInitTest::test_schedulers [gw0] PASSED tests/test_optimization.py::ScheduleInitTest::test_schedulers ======================================================================================================================================= FAILURES ======================================================================================================================================== __________________________________________________________________________________________________________________ OptimizationTest.test_compare_adamw_no_weight_decay __________________________________________________________________________________________________________________ [gw0] linux -- Python 3.9.7 .../python self = def test_compare_adamw_no_weight_decay(self): > self.util_adamw_comparison(weight_decay=0) tests/test_optimization.py:120: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , weight_decay = 0 def util_adamw_comparison(self, weight_decay): import torch import numpy as np model_size =1024 lr = 0.1 betas=(0.9, 0.999) eps = 1e-01 rng_state = torch.get_rng_state() device = "cpu" torch.manual_seed(56) param_torch = torch.nn.Parameter(torch.randn(model_size, device=device)) torch.set_rng_state(rng_state) torch.manual_seed(56) param_transf = torch.nn.Parameter(torch.randn(model_size, device=device)) optimizer_torch = torch.optim.AdamW([param_torch], lr=lr, betas=betas, eps=eps, weight_decay=weight_decay) optimizer_transf = AdamW(params=[param_transf], lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, correct_bias=True) for i in range(100): rng_state = torch.get_rng_state() param_torch.grad = torch.randn(model_size, device=device) torch.set_rng_state(rng_state) param_transf.grad = torch.randn(model_size, device=device) optimizer_torch.step() optimizer_transf.step() atol=1e-3 val_torch = param_torch.detach().numpy() val_transf = param_transf.detach().numpy() > np.testing.assert_allclose(val_transf, val_torch, err_msg="Mismatch between AdamW implementations!", rtol=0, atol=atol) E AssertionError: E Not equal to tolerance rtol=0, atol=0.001 E Mismatch between AdamW implementations! E Mismatched elements: 1022 / 1024 (99.8%) E Max absolute difference: 1.0594273 E Max relative difference: 36.50006 E x: array([-0.841293, -0.970589, 0.211384, ..., -0.593096, 0.516756, E 2.300092], dtype=float32) E y: array([-0.945812, -1.075232, 0.213893, ..., -0.710561, 0.584081, E 2.978668], dtype=float32) tests/test_optimization.py:116: AssertionError _________________________________________________________________________________________________________________ OptimizationTest.test_compare_adamw_with_weight_decay _________________________________________________________________________________________________________________ [gw0] linux -- Python 3.9.7 .../python self = def test_compare_adamw_with_weight_decay(self): > self.util_adamw_comparison(weight_decay=0.5) tests/test_optimization.py:123: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , weight_decay = 0.5 def util_adamw_comparison(self, weight_decay): import torch import numpy as np model_size =1024 lr = 0.1 betas=(0.9, 0.999) eps = 1e-01 rng_state = torch.get_rng_state() device = "cpu" torch.manual_seed(56) param_torch = torch.nn.Parameter(torch.randn(model_size, device=device)) torch.set_rng_state(rng_state) torch.manual_seed(56) param_transf = torch.nn.Parameter(torch.randn(model_size, device=device)) optimizer_torch = torch.optim.AdamW([param_torch], lr=lr, betas=betas, eps=eps, weight_decay=weight_decay) optimizer_transf = AdamW(params=[param_transf], lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, correct_bias=True) for i in range(100): rng_state = torch.get_rng_state() param_torch.grad = torch.randn(model_size, device=device) torch.set_rng_state(rng_state) param_transf.grad = torch.randn(model_size, device=device) optimizer_torch.step() optimizer_transf.step() atol=1e-3 val_torch = param_torch.detach().numpy() val_transf = param_transf.detach().numpy() > np.testing.assert_allclose(val_transf, val_torch, err_msg="Mismatch between AdamW implementations!", rtol=0, atol=atol) E AssertionError: E Not equal to tolerance rtol=0, atol=0.001 E Mismatch between AdamW implementations! E Mismatched elements: 1004 / 1024 (98%) E Max absolute difference: 0.23363012 E Max relative difference: 10.406861 E x: array([-0.295148, 0.150579, 0.081928, ..., 0.06077 , 0.209031, E 0.247049], dtype=float32) E y: array([-0.390061, 0.191684, 0.104576, ..., 0.083256, 0.267295, E 0.319532], dtype=float32) tests/test_optimization.py:116: AssertionError =================================================================================================================================== warnings summary ==================================================================================================================================== .../lib/python3.9/site-packages/flatbuffers/compat.py:19 .../lib/python3.9/site-packages/flatbuffers/compat.py:19 .../lib/python3.9/site-packages/flatbuffers/compat.py:19 .../lib/python3.9/site-packages/flatbuffers/compat.py:19 .../lib/python3.9/site-packages/flatbuffers/compat.py:19: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp tests/test_optimization.py::ScheduleInitTest::test_schedulers .../lib/python3.9/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`. warnings.warn("To get the last learning rate computed by the scheduler, " tests/test_optimization.py::ScheduleInitTest::test_schedulers .../lib/python3.9/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. " -- Docs: https://docs.pytest.org/en/stable/warnings.html ================================================================================================================================ short test summary info ================================================================================================================================ FAILED tests/test_optimization.py::OptimizationTest::test_compare_adamw_no_weight_decay - AssertionError: FAILED tests/test_optimization.py::OptimizationTest::test_compare_adamw_with_weight_decay - AssertionError: ======================================================================================================================= 2 failed, 3 passed, 6 warnings in 15.24s ========================================================================================================================