Torchao version check changes/BC import of TensorCoreTiledLayout #1812

ebsmothers · 2024-10-11T14:27:13Z

Move torchao version check utilities out of modules and into utils (that's where they should've been all along)
Update the nightly version check to match latest torchao nightlies format
- We also no longer need all the extra importlib stuff now that all recent ao versions have __version__ defined
Define variable _NEW_TENSOR_CORE_TILED_LAYOUT_API based on the following conditions:
- Not fbcode and (ao version >= 0.7.0 or (ao version is nightly and ao nightly date >= "2024-10-10"))
Gate the import of TensorCoreTiledLayoutType in training/quantization.py and alias to TensorCoreTiledLayout (the new API name) in either case

Test plan

Test quantization recipe on both stable and nightly torchao versions. Prereq: download Llama2 7B:

tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf

Test on torchao 0.5

$ conda create -n ao-testing-stable python=3.11 -y 
$ conda activate ao-testing-stable
$ pip install torch torchvision torchao
$ pip install -e ".[dev]"
$ pip list | grep torchao
torchao                   0.5.0
$ tune run quantize --config quantization quantizer=torchtune.training.quantization.Int4WeightOnlyQuantizer quantizer.groupsize=128
...
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
INFO:torchtune.utils._logging:Time for quantization: 0.24 sec
INFO:torchtune.utils._logging:Memory used: 13.95 GB
INFO:torchtune.utils._logging:Model checkpoint of size 3.79 GB saved to /tmp/Llama-2-7b-hf/pytorch_model-00001-of-00002-4w.pt

Test on torchao nightly

$ conda create -n ao-testing-nightly python=3.11 -y 
$ conda activate ao-testing-nightly
$ pip install --pre torch torchvision torchao --index-url https://download.pytorch.org/whl/nightly/cu124
$ pip install -e ".[dev]"
$ pip list | grep torchao
torchao                   0.7.0.dev20241011+cu124
$ tune run quantize --config quantization quantizer=torchtune.training.quantization.Int4WeightOnlyQuantizer quantizer.groupsize=128
...
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
INFO:torchtune.utils._logging:Time for quantization: 0.37 sec
INFO:torchtune.utils._logging:Memory used: 13.95 GB
INFO:torchtune.utils._logging:Model checkpoint of size 3.79 GB saved to /tmp/Llama-2-7b-hf/pytorch_model-00001-of-00002-4w.pt

pytorch-bot · 2024-10-11T14:27:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1812

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 024b812 with merge base c5b7386 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

RdoubleA · 2024-10-11T19:06:33Z

torchtune/utils/_import_guard.py

+
+torchao_version = _get_torchao_version()
+
+_NEW_TENSOR_CORE_TILED_LAYOUT_API = not _is_fbcode() and (


mind adding a quick comment explaining this similar to flex attention above?

also nit: can you name this as something that implies a yes/no true/false answer (like _USE_NEW_TENSOR_CORE_TILED_LAYOUT_API)

RdoubleA

few minor comments, but no concerns

RdoubleA · 2024-10-11T19:07:22Z

torchtune/utils/_version.py

+    return not hasattr(torch.version, "git_version")
+
+
+def _nightly_version_ge(ao_version_str: str, date: str) -> bool:


this doesn't generalize to pytorch nightly version? if's ao specific, let's include ao in the function name

Actually it should generalize since PyTorch versions use the same format

if so, should we change the variable name to indicate general use?

RdoubleA · 2024-10-11T19:08:44Z

torchtune/utils/_import_guard.py


 # We can only use flex attention / BlockMask if torch version >= 2.5.0 and GPU is Turing / SM75 and above
 _SUPPORTS_FLEX_ATTENTION = (
    torch_version_ge("2.5.0")
    and torch.cuda.is_available()
    and torch.cuda.get_device_capability() >= (7, 5)
 )
+
+torchao_version = _get_torchao_version()


couldn't this return None if is_fbcode? a bit awkward to return either a version string or None, but then you check for fbcode again below

Yeah good point. Just did a check and in fbcode torchao.__version__ == 'unknown', so it at least won't throw an error. Then I can just explicitly say ao_version = torchao.__version__ in _import_guard.py, gate behind _is_fbcode() in _USE_NEW_TENSOR_CORE_TILED_LAYOUT_API, and delete _get_torchao_version altogether

RdoubleA · 2024-10-11T19:09:47Z

torchtune/utils/_import_guard.py

+
+torchao_version = _get_torchao_version()
+
+_NEW_TENSOR_CORE_TILED_LAYOUT_API = not _is_fbcode() and (


also nit: can you name this as something that implies a yes/no true/false answer (like _USE_NEW_TENSOR_CORE_TILED_LAYOUT_API)

codecov-commenter · 2024-10-11T20:16:51Z

Codecov Report

Attention: Patch coverage is 73.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 25.72%. Comparing base (54673b7) to head (024b812).
Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
torchtune/training/quantization.py	60.00%	2 Missing ⚠️
torchtune/utils/_version.py	66.66%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1812       +/-   ##
===========================================
- Coverage   67.05%   25.72%   -41.34%     
===========================================
  Files         305      304        -1     
  Lines       15937    16000       +63     
===========================================
- Hits        10687     4116     -6571     
- Misses       5250    11884     +6634

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jerryzh168 · 2024-10-11T21:58:23Z

ah, our diff train is not landed yet, should we land first? it will break internal torchtune I think

RdoubleA · 2024-10-11T22:11:20Z

ah, our diff train is not landed yet, should we land first? it will break internal torchtune I think

Yeah that sounds good we can wait, I don't have full context on if this is blocking anything

ebsmothers · 2024-10-11T22:20:24Z

@jerryzh168 can you clarify? I don’t fully understand why this PR would break internal. It checks fbcode and uses the old API, which is what I currently see in internal

jerryzh168 · 2024-10-12T01:41:06Z

@jerryzh168 can you clarify? I don’t fully understand why this PR would break internal. It checks fbcode and uses the old API, which is what I currently see in internal

oh what I meant is that torchao internal is not updated with the new name yet

If this PR can fix both internal and external, maybe this one should land first.

…orch#1812)

ebsmothers added 2 commits October 11, 2024 06:59

Move _get_torchao_version into its proper home

16ad316

do it all

32add18

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 11, 2024

ebsmothers added 2 commits October 11, 2024 11:51

0.7.0 -> 0.6.0

37544a4

explicitly check not nightly

e6b19b2

RdoubleA reviewed Oct 11, 2024

View reviewed changes

RdoubleA approved these changes Oct 11, 2024

View reviewed changes

address comments

024b812

jerryzh168 mentioned this pull request Oct 12, 2024

Fix import error in torchtune due to bc breaking changes in torchao #1817

Closed

ebsmothers merged commit 7744608 into pytorch:main Oct 12, 2024
17 checks passed

mori360 pushed a commit to mori360/torchtune that referenced this pull request Oct 14, 2024

Torchao version check changes/BC import of TensorCoreTiledLayout (pyt…

7d2be4f

…orch#1812)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torchao version check changes/BC import of TensorCoreTiledLayout #1812

Torchao version check changes/BC import of TensorCoreTiledLayout #1812

ebsmothers commented Oct 11, 2024 •

edited

Loading

pytorch-bot bot commented Oct 11, 2024 •

edited

Loading

RdoubleA Oct 11, 2024

RdoubleA Oct 11, 2024

RdoubleA left a comment

RdoubleA Oct 11, 2024

ebsmothers Oct 11, 2024

RdoubleA Oct 11, 2024

RdoubleA Oct 11, 2024 •

edited

Loading

ebsmothers Oct 11, 2024

RdoubleA Oct 11, 2024

codecov-commenter commented Oct 11, 2024 •

edited

Loading

jerryzh168 commented Oct 11, 2024

RdoubleA commented Oct 11, 2024

ebsmothers commented Oct 11, 2024

jerryzh168 commented Oct 12, 2024


		torchao_version = _get_torchao_version()

		_NEW_TENSOR_CORE_TILED_LAYOUT_API = not _is_fbcode() and (

		return not hasattr(torch.version, "git_version")


		def _nightly_version_ge(ao_version_str: str, date: str) -> bool:

Torchao version check changes/BC import of TensorCoreTiledLayout #1812

Torchao version check changes/BC import of TensorCoreTiledLayout #1812

Conversation

ebsmothers commented Oct 11, 2024 • edited Loading

Test plan

Test on torchao 0.5

Test on torchao nightly

pytorch-bot bot commented Oct 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1812

✅ No Failures

RdoubleA Oct 11, 2024

Choose a reason for hiding this comment

RdoubleA Oct 11, 2024

Choose a reason for hiding this comment

RdoubleA left a comment

Choose a reason for hiding this comment

RdoubleA Oct 11, 2024

Choose a reason for hiding this comment

ebsmothers Oct 11, 2024

Choose a reason for hiding this comment

RdoubleA Oct 11, 2024

Choose a reason for hiding this comment

RdoubleA Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

ebsmothers Oct 11, 2024

Choose a reason for hiding this comment

RdoubleA Oct 11, 2024

Choose a reason for hiding this comment

codecov-commenter commented Oct 11, 2024 • edited Loading

Codecov Report

jerryzh168 commented Oct 11, 2024

RdoubleA commented Oct 11, 2024

ebsmothers commented Oct 11, 2024

jerryzh168 commented Oct 12, 2024

ebsmothers commented Oct 11, 2024 •

edited

Loading

pytorch-bot bot commented Oct 11, 2024 •

edited

Loading

RdoubleA Oct 11, 2024 •

edited

Loading

codecov-commenter commented Oct 11, 2024 •

edited

Loading