Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Pin Bumps across PT/AO/tune/ET #1367

Merged
merged 42 commits into from
Dec 14, 2024
Merged

Multi Pin Bumps across PT/AO/tune/ET #1367

merged 42 commits into from
Dec 14, 2024

Conversation

Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Nov 12, 2024

Accounts for:


Update:
Merging while accepting the errors in

Copy link

pytorch-bot bot commented Nov 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1367

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Pending

As of commit 9579f18 with merge base 570aebc (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 12, 2024
@swolchok
Copy link
Contributor

Could not find a version that satisfies the requirement torchvision==0.20.0.dev20241111

this looks accurate; according to https://download.pytorch.org/whl/nightly/torchvision/ there are only windows builds for that day. 20241112 appears to have both linux and windows.

@Jack-Khuu Jack-Khuu changed the title Bump PyTorch pin to 20241111 Bump PyTorch pin to 20241112 Nov 12, 2024
@swolchok
Copy link
Contributor

initial debugging shows the test-cpu-aoti segfault is within aoti_torch_cpu_cat, which is automatically generated by https://github.com/pytorch/pytorch/blob/7e86a7c0155295539996e0cf422883571126073e/torchgen/gen_aoti_c_shim.py . digging up the generated source now.

@@ -96,6 +96,7 @@ def _load_checkpoints_from_storage(
checkpoint_path,
map_location=builder_args.device,
mmap=True,
weight_only=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it needs false? All LLMs should be loadable with weights_only, shouldn't they? (Also, there are no such option as weight_only (or so I hope :P ))

Suggested change
weight_only=False,
weights_only=True,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the typo;

As for setting it to False: I'd rather keep it behavior consistent in a pin bump PR; we can flip in a separate PR

@@ -238,7 +238,7 @@ def _to_core_aten(
raise ValueError(
f"Expected passed in model to be an instance of fx.GraphModule, got {type(model)}"
)
core_aten_ep = export(model, example_inputs, dynamic_shapes=dynamic_shapes)
core_aten_ep = export_for_training(model, example_inputs, dynamic_shapes=dynamic_shapes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what we are doing here, but shouldn't TorchChat be exporting for inference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was picked up from @tugsbayasgalan's PR migrating away from export(), but export_for_inference does sound more in line with what we want

@tugsbayasgalan Can you share info on the new APIs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep the intended use for inference IR is that user will export to a training IR and call run_decompositions() to lower to inference IR. In this flow, after core_aten_ep, there is to_edge call which lowers to inference. Export team is moving the IR to non-functional training IR so export_for_training will exist as an alias to official export. After we actually migrate official export, we will replace this call with export.

@swolchok
Copy link
Contributor

digging up the generated source now.

generated source looks OK. here's what doesn't look OK in the generated inductor .cpp file:

    AtenTensorHandle buf0_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
    RAIIAtenTensorHandle buf0(buf0_handle);
    AtenTensorHandle buf1_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf1_handle));
    RAIIAtenTensorHandle buf1(buf1_handle);
    cpp_fused_div_remainder_0((const uint8_t*)(self___model_tok_embeddings__buffers__weight.data_ptr()), (uint8_t*)(buf0.data_ptr()), (uint8_t*)(buf1.data_ptr()));
    // Topologically Sorted Source Nodes: [weight_unpacked], Original ATen: [aten.stack]
    static constexpr int64_t int_array_0[] = {32000LL, 144LL, 1LL};
    static constexpr int64_t int_array_1[] = {144LL, 1LL, 0LL};
    auto tmp_tensor_handle_0 = reinterpret_tensor_wrapper(buf0, 3, int_array_0, int_array_1, 0LL);
    auto tmp_tensor_handle_1 = reinterpret_tensor_wrapper(buf1, 3, int_array_0, int_array_1, 0LL);
    const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)};
    AtenTensorHandle buf3_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_cat(var_array_0, 2, -1LL, &buf3_handle));

The problem seems to be const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)}; -- this is creating RAIIATenTensorHandles, whose operator ATenTensorHandle is immediately called, and then they're destroyed (which decrements the refcount), so the net effect is (I think) to create dangling ATenTensorHandles.

@swolchok
Copy link
Contributor

@desertfire any change the above is a quick fix for you?

@swolchok
Copy link
Contributor

actually we might just need pytorch/pytorch#139411

@swolchok
Copy link
Contributor

no torchvision nightly again today. I'm guessing we could probably use torchvision from yesterday with torch from today?

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Nov 13, 2024

I had issues with Vision nightlies requiring the corresponding PT nightly few weeks back, I'll give it another go

Update: yup, vision is strict; will need to wait again

@swolchok
Copy link
Contributor

_convert_weight_to_int4pack breakage appears to be from pytorch/pytorch#139611; I guess it's now called _convert_weight_to_int4pack_for_cpu .

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Nov 14, 2024

Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278

@Jack-Khuu
Copy link
Contributor Author

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

@desertfire
Copy link
Contributor

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

pytorch/pytorch#139411 is relanded.

@Jack-Khuu Jack-Khuu changed the title Multi Pin Bumps across PT/AO/tune Multi Pin Bumps across PT/AO/tune/ET Dec 6, 2024
@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Dec 6, 2024

Missing omp.h for aarch is a regression on pt/pt, we're looking into it and will pinbump to a fix when found

pytorch/pytorch#142266

@@ -9,23 +9,17 @@ on:

jobs:
test-cuda:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between these two?

Copy link
Contributor Author

@Jack-Khuu Jack-Khuu Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Former is being replaced in the new wheel build: pytorch/pytorch#123649

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Dec 7, 2024

Note that the changes to gguf_loader are suspicious at best and an attempt to recover previous behavior prior to pytorch/pytorch#139611

Depending on the remaining CI, I may disable gguf-compile temporarily if it is the remaining failure

#1404 Looks into this

@Jack-Khuu
Copy link
Contributor Author

Missing omp.h for aarch is a regression on pt/pt, we're looking into it and will pinbump to a fix when found

pytorch/pytorch#142266

Hmmm looks like the wheels aren't fixed with the changes...

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Dec 13, 2024

gpu-aoti errors are from byte alignment (compiled as 16-byte, but input is not aligned) related to pytorch/pytorch#140664, that we will fix in a later bump pytorch/pytorch#143236

@Jack-Khuu Jack-Khuu merged commit bb72b09 into main Dec 14, 2024
48 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants