retry version guard fix #679

msaroufim · 2024-08-14T20:16:54Z

EDIT: There were many version conflict issues so I'm reopening this PR #485

The only interesting changes to review are in torchao/utils.py and the version guard check is a tad complex right now but it's because we're insisting not to have any dependencies for torchao

This PR solves 2 issues

Our current version guard checks were incorrect TORCH_VERSION_AFTER_2_3(2.3) would return false for 2.3 (see below for more detail) and adding .dev was obscuring the actual intent
The second issue is the name we were using was confusing so renamed TORCH_VERSION_AFTER_X_X to TORCH_VERSION_AT_LEAST_X_X,

import torch
torch.__version__ # prints 2.3.0
from torchao.utils import torch_version_at_least
torch_version_at_least("2.3.0.dev") # was returning false for 2.3.0 but True for 2.3.1

But after changes in this PR

torch_version_at_least("2.3.0.dev") # True
torch_version_at_least("2.3.0") # True

import torch
from torchao.utils import torch_version_at_least

torch.__version__ = "2.5.0.dev20240708+cu121"

print(torch.__version__) # 2.4.0+cu121

print(torch_version_at_least("2.4.0.dev")) # Return True
print(torch_version_at_least("2.4.0")) # Return True


print(torch_version_at_least("2.5.0.dev")) # Return True
print(torch_version_at_least("2.5.0")) # Return True


torch.__version__ = "2.5.0"
print(torch_version_at_least("2.4.0.dev")) # Return True
print(torch_version_at_least("2.4.0")) # Return True


print(torch_version_at_least("2.5.0.dev")) # Return True
print(torch_version_at_least("2.5.0")) # Return True

torch.__version__ = "2.4.0"

print(torch_version_at_least("2.4.0.dev")) # Return True
print(torch_version_at_least("2.4.0")) # Return True
print(torch_version_at_least("2.5.0.dev")) # Return False
print(torch_version_at_least("2.5.0")) # Return False

In practice this change didn't matter too much but these tests are now failing because they were intended to work with versions greater than 2.3 not 2.3 and above so I skipped them

  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_compile_11_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_compile_14_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_compile_17_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_kwargs_11_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_kwargs_14_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_kwargs_17_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_manual_3_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_manual_4_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestAutoQuant::test_autoquant_manual_5_cuda - NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: inductor._reinterpret_tensor.default
  FAILED test/integration/test_integration.py::TestUtils::test_get_model_size_autoquant_3_cuda - torch._dynamo.exc.TorchRuntimeError: Failed running

pytorch-bot · 2024-08-14T20:16:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/679

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1650f9d with merge base 6199f89 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Run Regression Tests / test (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://download.pytorc... / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

crcrpar · 2024-08-15T01:56:51Z

This does not seem to support source build of pytorch without PYTORCH_BUILD_VERSION.
For example, in my local environment, pytorch version is 2.5.0a0+git9f17037 and torchao import is failing as follows:

# python -c 'import torch; import torchao'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/torchao/__init__.py", line 31, in <module>
    from torchao.quantization import (
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/__init__.py", line 7, in <module>
    from .smoothquant import *  # noqa: F403
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/smoothquant.py", line 18, in <module>
    from .utils import (
  File "/usr/local/lib/python3.10/dist-packages/torchao/quantization/utils.py", line 11, in <module>
    from torchao.utils import find_multiple
  File "/usr/local/lib/python3.10/dist-packages/torchao/utils.py", line 306, in <module>
    TORCH_VERSION_AT_LEAST_2_5 = torch_version_at_least("2.5.0")
  File "/usr/local/lib/python3.10/dist-packages/torchao/utils.py", line 304, in torch_version_at_least
    return is_fbcode() or compare_versions(torch.__version__, min_version) >= 0
  File "/usr/local/lib/python3.10/dist-packages/torchao/utils.py", line 288, in compare_versions
    v1_parts = parse_version(v1)
  File "/usr/local/lib/python3.10/dist-packages/torchao/utils.py", line 285, in parse_version
    return [int(x) for x in version.split('.')]
  File "/usr/local/lib/python3.10/dist-packages/torchao/utils.py", line 285, in <listcomp>
    return [int(x) for x in version.split('.')]
ValueError: invalid literal for int() with base 10: '0a0'

I'm wondering why a custom version parser is implemented, instead of using packaging.version.parse. IIUC pytorch includes packaging in its install_requires. An example diff would be

diff --git a/setup.py b/setup.py
index 47f7714..b2ac860 100644
--- a/setup.py
+++ b/setup.py
@@ -131,6 +131,7 @@ setup(
     package_data={
         "torchao.kernel.configs": ["*.pkl"],
     },
+    install_requires=["packaging"],
     ext_modules=get_extensions() if use_cpp != "0" else None,
     extras_require={"dev": read_requirements("dev-requirements.txt")},
     description="Package for applying ao techniques to GPU models",
diff --git a/torchao/utils.py b/torchao/utils.py
index 47227b1..c3e2785 100644
--- a/torchao/utils.py
+++ b/torchao/utils.py
@@ -7,6 +7,7 @@ import torch.nn.utils.parametrize as parametrize
 import itertools
 import time
 import warnings
+from packaging.version import parse

 __all__ = [
     "benchmark_model",
@@ -279,23 +280,16 @@ def unwrap_tensor_subclass(model, filter_fn=None):
         unwrap_tensor_subclass(child)
     return model

-def parse_version(version_string):
-    # Remove any suffixes like '+cu121' or '.dev'
-    version = version_string.split('+')[0].split('.dev')[0]
-    return [int(x) for x in version.split('.')]
-
 def compare_versions(v1, v2):
-    v1_parts = parse_version(v1)
-    v2_parts = parse_version(v2)
-
-    for i in range(max(len(v1_parts), len(v2_parts))):
-        v1_part = v1_parts[i] if i < len(v1_parts) else 0
-        v2_part = v2_parts[i] if i < len(v2_parts) else 0
-        if v1_part > v2_part:
-            return 1
-        elif v1_part < v2_part:
-            return -1
-    return 0
+    v1_version = parse(v1)
+    v2_version = parse(v2)
+
+    if v1_version == v2_version:
+        return 0
+    elif v1_version > v2_version:
+        return 1
+    else:
+        return -1

 def is_fbcode():
     return not hasattr(torch.version, "git_version")

msaroufim · 2024-08-15T02:55:01Z

So we've been bit before by having dependencies in general - you can see these 2 threads for more context. The short of it is forcing dependencies was causing forced version downgrades for our partners and most of our dependencies did not really seem necessary

Unfortunately PyTorch does not depend on packaging https://gist.github.com/msaroufim/09c316d10cd314e189e1b2ad28823c3b but I wish it did

Specifically regarding the version check logic thank you for catching that, this does indeed bother me

I did though try applying your diff and the check is unfortunately also incorrect but if you have a forward fix that would solve this case and the test case I mention the PR description then we should be good. I'll poke around a bit more myself at this as well

import torch
from torchao.utils import torch_version_at_least

torch.__version__ = "2.5.0a0+git9f17037" # nightly version 

print(torch_version_at_least("2.5.0")) # Returns False

diff --git a/setup.py b/setup.py
index 47f7714..b2ac860 100644
--- a/setup.py
+++ b/setup.py
@@ -131,6 +131,7 @@ setup(
     package_data={
         "torchao.kernel.configs": ["*.pkl"],
     },
+    install_requires=["packaging"],
     ext_modules=get_extensions() if use_cpp != "0" else None,
     extras_require={"dev": read_requirements("dev-requirements.txt")},
     description="Package for applying ao techniques to GPU models",
diff --git a/torchao/utils.py b/torchao/utils.py
index 47227b1..05a04b5 100644
--- a/torchao/utils.py
+++ b/torchao/utils.py
@@ -7,6 +7,8 @@ import torch.nn.utils.parametrize as parametrize
 import itertools
 import time
 import warnings
+from packaging.version import parse
+
 
 __all__ = [
     "benchmark_model",
@@ -279,23 +281,17 @@ def unwrap_tensor_subclass(model, filter_fn=None):
         unwrap_tensor_subclass(child)
     return model
 
-def parse_version(version_string):
-    # Remove any suffixes like '+cu121' or '.dev'
-    version = version_string.split('+')[0].split('.dev')[0]
-    return [int(x) for x in version.split('.')]
 
 def compare_versions(v1, v2):
-    v1_parts = parse_version(v1)
-    v2_parts = parse_version(v2)
-    
-    for i in range(max(len(v1_parts), len(v2_parts))):
-        v1_part = v1_parts[i] if i < len(v1_parts) else 0
-        v2_part = v2_parts[i] if i < len(v2_parts) else 0
-        if v1_part > v2_part:
-            return 1
-        elif v1_part < v2_part:
-            return -1
-    return 0
+    v1_version = parse(v1)
+    v2_version = parse(v2)
+
+    if v1_version == v2_version:
+        return 0
+    elif v1_version > v2_version:
+        return 1
+    else:
+        return -1
 
 def is_fbcode():
     return not hasattr(torch.version, "git_version")

ptrblck · 2024-08-15T13:36:09Z

NIT:
One shown examples looks wrong:

torch.__version__ = "2.5.0.dev20240708+cu121"
...
print(torch_version_at_least("2.5.0")) # Return True

Relase 2.5.0.dev is not "at least" 2.5.0, as it's a pre-release (assuming "at least" is interpreted as ">=").
packaging also correctly shows is:

import torch
from packaging import version

torch.__version__ = "2.5.0.dev20240708+cu121"

version.parse(torch.__version__) >= version.parse("2.4.0.dev") # True
version.parse(torch.__version__) >= version.parse("2.4.0") # True

version.parse(torch.__version__) >= version.parse("2.5.0.dev") # True
version.parse(torch.__version__) >= version.parse("2.5.0") # False !!!! HERE !!!!

torch.__version__ = "2.5.0"
version.parse(torch.__version__) >= version.parse("2.4.0.dev") # True
version.parse(torch.__version__) >= version.parse("2.4.0") # True

version.parse(torch.__version__) >= version.parse("2.5.0.dev") # True
version.parse(torch.__version__) >= version.parse("2.5.0") # True

torch.__version__ = "2.4.0"
version.parse(torch.__version__) >= version.parse("2.4.0.dev") # True
version.parse(torch.__version__) >= version.parse("2.4.0") # True

version.parse(torch.__version__) >= version.parse("2.5.0.dev") # False
version.parse(torch.__version__) >= version.parse("2.5.0") # False

msaroufim · 2024-08-15T14:40:49Z

@ptrblck I agree this is somewhat confusing but here's the reasoning #684 (comment)

retry version guard fix

bd9980c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 14, 2024

msaroufim added 2 commits August 14, 2024 13:25

push

362368a

push

15f0d72

msaroufim requested review from jerryzh168 and cpuhrsch August 14, 2024 20:46

msaroufim marked this pull request as draft August 14, 2024 21:06

push

0c5f4a5

cpuhrsch approved these changes Aug 14, 2024

View reviewed changes

push

35c24b0

msaroufim marked this pull request as ready for review August 14, 2024 22:21

push

1650f9d

atalman approved these changes Aug 14, 2024

View reviewed changes

msaroufim merged commit 1acd710 into main Aug 14, 2024
13 of 14 checks passed

msaroufim deleted the msaroufim/guard branch August 14, 2024 23:47

This was referenced Aug 15, 2024

Fix version guard check to include source torch #683

Closed

Add PyTorch 2.4 tests in CI #654

Merged

Fix source version check #684

Merged

mobicham mentioned this pull request Aug 15, 2024

Add HQQ support #605

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

retry version guard fix #679

retry version guard fix #679

Uh oh!

msaroufim commented Aug 14, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 14, 2024 •

edited

Loading

Uh oh!

Uh oh!

crcrpar commented Aug 15, 2024 •

edited

Loading

Uh oh!

msaroufim commented Aug 15, 2024 •

edited

Loading

Uh oh!

ptrblck commented Aug 15, 2024

Uh oh!

msaroufim commented Aug 15, 2024

Uh oh!

Uh oh!

retry version guard fix #679

retry version guard fix #679

Uh oh!

Conversation

msaroufim commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/679

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

crcrpar commented Aug 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msaroufim commented Aug 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptrblck commented Aug 15, 2024

Uh oh!

msaroufim commented Aug 15, 2024

Uh oh!

Uh oh!

msaroufim commented Aug 14, 2024 •

edited

Loading

pytorch-bot bot commented Aug 14, 2024 •

edited

Loading

crcrpar commented Aug 15, 2024 •

edited

Loading

msaroufim commented Aug 15, 2024 •

edited

Loading