Upstream bump 1109 #2172

jjsjann123 · 2022-11-09T16:39:46Z

Upstream bump to devel. merged upstream/viable/strict commit fca6ed0

Updated upstream/master in PR #2173

See pytorch#87969 or pytorch#86586 for the reasoning. Pull Request resolved: pytorch#87972 Approved by: https://github.com/mruberry

See pytorch#87969 or pytorch#86586 for the reasoning. Pull Request resolved: pytorch#87973 Approved by: https://github.com/mruberry

See pytorch#87969 or pytorch#86586 for the reasoning. Pull Request resolved: pytorch#87974 Approved by: https://github.com/mruberry

Fixes pytorch/torchdynamo#1708 Our FX subgraph partitioner works by taking all of the original output nodes from a subgraph, and replacing it with a new `call_module` node in the graph. If the original subgraph outputs had fake tensors and other metadata stored in their `.meta` attribute though, then this information was getting lost when we spliced in the subgraph. Losing metadata on an FX graph also seems like an easy trap to fall into, so I'm wondering if there are any better guardrails that we can add. I ended up fixing in this PR by adding an optional kwarg to propagate meta info directly in the `fx.Node.replace_all_uses_with`, just because propagating metadata seems like a pretty core thing. Pull Request resolved: pytorch#87255 Approved by: https://github.com/wconstab, https://github.com/SherlockNoMad

as_strided_scatter's derivative formula was broken - instead of making a "mask" of 1's and 0's, it would effectively make a mask of 1's and uninitialized memory. Fixes pytorch#88105 Pull Request resolved: pytorch#87646 Approved by: https://github.com/albanD

…pytorch#87647) Pull Request resolved: pytorch#87647 Approved by: https://github.com/ezyang, https://github.com/Chillee

…#87874) * Wiring to allow user to pass event names to profiler and reflect the count to the chrometrace * If not used, the runtime and size overhead should be neglegible * For now, primary user will be KinetoEdgeCPUProfiler but the impl does not assume that * Not exposed to python yet Differential Revision: [D40238032](https://our.internmc.facebook.com/intern/diff/D40238032/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40238032/)! Pull Request resolved: pytorch#87874 Approved by: https://github.com/SS-JIA

…87876) * Add support in lite_predictor benchmark binary to select event lists * Uses Linux perf through Kineto profiler Differential Revision: [D39837216](https://our.internmc.facebook.com/intern/diff/D39837216/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39837216/)! Pull Request resolved: pytorch#87876 Approved by: https://github.com/SS-JIA

…rch#87877) * Runs an existing model and checks an aten op if it gets perf events generated in the chrometrace * Doesn't check for exact values since that's harder to do in a hardware independent way Differential Revision: [D40474957](https://our.internmc.facebook.com/intern/diff/D40474957/) Pull Request resolved: pytorch#87877 Approved by: https://github.com/SS-JIA

Add a stack of start counter values, and attribute each disable to the last enable Differential Revision: [D40539212](https://our.internmc.facebook.com/intern/diff/D40539212/) Pull Request resolved: pytorch#87904 Approved by: https://github.com/SS-JIA

…87905) Reports total counts (includes time spent in all children), self counts can be calculated manully. Differential Revision: [D40282770](https://our.internmc.facebook.com/intern/diff/D40282770/) Pull Request resolved: pytorch#87905 Approved by: https://github.com/SS-JIA

Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#88315 Approved by: https://github.com/soumith

Fixes pytorch#87313 Our ONNX pipelines do not run with BUILD_CAFFE2=0, so tests for operator_export_type ONNX_ATEN and ONNX_ATEN_FALLBACK will not be fully tested, allowing regressions to happen again. We need to run the same set of tests for both BUILD_CAFFE2=0 and 1 Pull Request resolved: pytorch#87735 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao

Pull Request resolved: pytorch#87212 Approved by: https://github.com/BowenBao

In attempt to mitigate OOMs, see pytorch#88309 Pull Request resolved: pytorch#88310 Approved by: https://github.com/albanD

Use `std::vector` to store tensor shapes and automatically free them when array goes out of scope Pull Request resolved: pytorch#88307 Approved by: https://github.com/kulinseth

Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#88292 Approved by: https://github.com/awgu

Pull Request resolved: pytorch#87939 Approved by: https://github.com/zhaojuanmao

This reverts commit f9d7985. Reverted pytorch#87646 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think this one or one of the PR in the stack break bionic-cuda11.7 on trunk https://hud.pytorch.org/pytorch/pytorch/commit/70782981f06a042796d4604df2ec1491f4f5b194

Pull Request resolved: pytorch#88158 Approved by: https://github.com/ngimel

This reverts commit 73379ac. Reverted pytorch#87610 on behalf of https://github.com/mehtanirav due to [Internal breakages](https://www.internalfb.com/intern/sandcastle/job/36028797828925790/insights)

Pull Request resolved: pytorch#87940 Approved by: https://github.com/mrshenli

After conda, consolidating all macos pip dependencies to cache every dependencies that macos CI needs. Two small issues are found along the way in `_mac-test-mps` workflow: * It didn't have `Install macOS homebrew dependencies` to install libomp like the regular `_mac-test` workflow * It didn't install `scipy`, thus silently skipping some `signal.windows` tests Both are fixed in this PR Pull Request resolved: pytorch#88071 Approved by: https://github.com/malfet

I'm not quite sure why GitHub starts to get flaky when we are trying to upload usage_log.txt to it (500 Internal server error). But we can live without it, so let's just ignore this for now, and follow up on this latter. The failures all come from M1 runner, so it seems to point to a connectivity issue between AWS and GitHub: * https://github.com/pytorch/pytorch/actions/runs/3373976793/jobs/5599310905 * https://github.com/pytorch/pytorch/actions/runs/3372858660/jobs/5597033598 * https://github.com/pytorch/pytorch/actions/runs/3371548201/jobs/5594274444 * https://github.com/pytorch/pytorch/actions/runs/3370877990/jobs/5592709210 * https://github.com/pytorch/pytorch/actions/runs/3370609384/jobs/5592008430 Pull Request resolved: pytorch#88288 Approved by: https://github.com/clee2000

Today, this doesn't work and dynamo errors out in a very non-obvious way (see: https://gist.github.com/suo/dde04830372ab51a4a34ea760f14200a). Here, we detect the error early and exit with a nicer msg. Also add a config option to just no-op dynamo (which need to unblock internal enablement). Pull Request resolved: pytorch#87797 Approved by: https://github.com/yf225, https://github.com/soumith, https://github.com/jansel

This calls `lspci`, `lsmod`, and `modinfo nvidia` before and after the installation to gather more data about the "No GPU available" transient issue on G5 runner, i.e. https://hud.pytorch.org/pytorch/pytorch/commit/59fe272c1e698989228af5ad197bdd2985e4e9b9 This also handles `nvidia-smi` call and tries to re-install the driver if the first call fails, i.e. `No devices were found` https://hud.pytorch.org/pytorch/pytorch/commit/8ea19c802e38c061e79176360c1ecaa81ce2088a Pull Request resolved: pytorch#88168 Approved by: https://github.com/clee2000, https://github.com/malfet

- Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR pytorch#81790, comment [link](pytorch#81790 (comment)) Pull Request resolved: pytorch#84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet

…ytorch#87941) Pull Request resolved: pytorch#87941 Approved by: https://github.com/mrshenli

Fixes: pytorch#88205 The `CreationMeta::NO_GRAD_MODE` path in handle_view_on_rebase wrongly assumes that the tensor would be a leaf, because tensors created in no_grad are always leaf tensors. However, due to creation_meta propagation, a view of a view created in no_grad also has `CreationMeta::NO_GRAD_MODE`, but DOES have grad_fn. Pull Request resolved: pytorch#88243 Approved by: https://github.com/albanD

Previously the permute function was extended to behave like the `order` function for first-class dimensions. However, unlike `permute`, `order` doesn't have a keyword argment `dims`, and there is no way to add it in a way that makes both permute an order to continue to have the same behavior. So this change just removes the extra functionality of permute, which wasn't documented anyway. Fixes pytorch#88187 Pull Request resolved: pytorch#88226 Approved by: https://github.com/zou3519

This PR unifies and rationalizes some of the input representation in Result. The current approach of storing separate types in separate vectors is tedious for two types (Tensors and scalars), but would be even more annoying with the addition of TensorLists. A similar disconnection exists with sizes and strides which the user is also expected to zip with tensor_metadata. I simplified things by moving inputs to a variant and moving sizes and strides into TensorMetadata. This also forced collection of sizes and strides in python tracer which helps to bring it in line with op profiling. Collection of TensorLists is fairly straightforward; `InputOutputEncoder` already has a spot for them (I actually collected them in the original TorchTidy prototype) so it was just a matter of plumbing things through. Differential Revision: [D40734451](https://our.internmc.facebook.com/intern/diff/D40734451/) Pull Request resolved: pytorch#87825 Approved by: https://github.com/slgong-fb, https://github.com/chaekit

…torch#88690) Pull Request resolved: pytorch#88690 Approved by: https://github.com/cpuhrsch

…ytorch#88640) I think this is the final resolution to issue caused by pytorch#87797. The nvfuser issue that PR tripped up was because, even though we're correctly disabling torchdynamo via a `DisableContext`, the nested fx trace check was still firing. This PR properly narrows it to only fire if we're not disabled. Pull Request resolved: pytorch#88640 Approved by: https://github.com/yf225

…6802) There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) Pull Request resolved: pytorch#86802 Approved by: https://github.com/chaekit

) Summary: X-link: pytorch/torchrec#759 Since the remove_duplicate flag was added to named_buffers in D39493161 (pytorch@c12f829), this adds the same flag to named_parameters Test Plan: python test/test_nn.py -k test_buffers_and_named_buffers OSS Tests Differential Revision: D40801899 Pull Request resolved: pytorch#88090 Approved by: https://github.com/albanD

…rs. (pytorch#88585) Summary: This diff modifies the implementation of the select operator so slices of the irregular dimension can be selected (e.g. nt[:,0,:]). Test Plan: Added new unit tests to test that the new functions work as intended (see them in diff). To test, `buck test mode/dev-nosan //caffe2/test:nested` Differential Revision: D41083993 Pull Request resolved: pytorch#88585 Approved by: https://github.com/cpuhrsch

) Pull Request resolved: pytorch#88610 Approved by: https://github.com/huydhn

…torch#88651) Summary: Today when we transform the captured graph in the last step in export(aten_graph=True), we construct a new graph which doesn't have the all the metadata to be preserved, for example, node.meta["val"]. meta["val"] is important for writing passes and analysis on the graph later in the pipeline, we may want to preserve that on placeholder nodes. Test Plan: test_export.py:test_export_meta_val Differential Revision: D41110864 Pull Request resolved: pytorch#88651 Approved by: https://github.com/tugsbayasgalan, https://github.com/jansel

Also, add an explicit cudart dependency to `torch_cuda` if Kineto is used with GPU support (it used to be somehow inherited from a wrong `gloo` setup) Pull Request resolved: pytorch#88530 Approved by: https://github.com/osalpekar

For some reason bernoulli uses legacy memory format, see linked issue. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#88676 Approved by: https://github.com/SherlockNoMad

Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#88675 Approved by: https://github.com/SherlockNoMad

Also, handle non-default alpha correctly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#88678 Approved by: https://github.com/SherlockNoMad, https://github.com/albanD

…h#88528) In pytorch/torch/_C/__init__.pyi, Graph.addInput has signature ```python def addInput(self, name: str) -> Value: ... ``` which doesn't match the corresponding function ```cpp Value* addInput(const std::string& name = "") { return block_->addInput(name); } ``` in python_ir.cpp. This PR aligns the bound function on both C++ and Python sides. Without this PR, mypy will compain whenever a change contains some calls to `addInput`; for example, ![image](https://user-images.githubusercontent.com/3524474/200092086-429b8d63-9321-4d03-b0d6-f4c9bd361756.png) Pull Request resolved: pytorch#88528 Approved by: https://github.com/davidberard98

@xwang233

Essentially a followup of pytorch#87436 CC @xwang233 @ptrblck Pull Request resolved: pytorch#87736 Approved by: https://github.com/xwang233, https://github.com/malfet

It allows one to SSH faster rather than having to wait for repo clone to finish. I.e. right now one usually have to wait for a few minutes fore PyTorch clone is finished, but with this change you can SSH ahead of time (thanks to `setup-ssh` being a composite action Pull Request resolved: pytorch#88715 Approved by: https://github.com/clee2000, https://github.com/izaitsevfb

## Description Support lowering of channel shuffle in FX by adding its module and functional op to `is_copy_node` list in `torch/ao/quantization/fx/_lower_to_native_backend.py` ## Validation UTs added to test - correctness of quantized `ChannelShuffle` module. - FX lowering of `ChannelShuffle` module and functional `channel_shuffle`. Pull Request resolved: pytorch#83731 Approved by: https://github.com/jerryzh168

…torch#87045) Fixes pytorch#87019. Pull Request resolved: pytorch#87045 Approved by: https://github.com/mruberry

It was previously CompositeExplicit but it was not really necessary. See discussion in pytorch#85405 Pull Request resolved: pytorch#85638 Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel

…h#88298) Fixes pytorch#88201 Pull Request resolved: pytorch#88298 Approved by: https://github.com/jgong5, https://github.com/jansel

jjsjann123 · 2022-11-09T16:51:58Z

CI looks wrong.. let me remove those

pmeier and others added 30 commits November 2, 2022 14:04

remove deprecated dtype getters from torch.testing (pytorch#87972)

8893c6c

See pytorch#87969 or pytorch#86586 for the reasoning. Pull Request resolved: pytorch#87972 Approved by: https://github.com/mruberry

remove make_non_contiguous from torch.testing (pytorch#87973)

b9c6178

See pytorch#87969 or pytorch#86586 for the reasoning. Pull Request resolved: pytorch#87973 Approved by: https://github.com/mruberry

remove assert_allclose from torch.testing (pytorch#87974)

5669e10

See pytorch#87969 or pytorch#86586 for the reasoning. Pull Request resolved: pytorch#87974 Approved by: https://github.com/mruberry

aot_dispatch test fix: always use functionalization in symbolic tests (…

7078298

…pytorch#87647) Pull Request resolved: pytorch#87647 Approved by: https://github.com/ezyang, https://github.com/Chillee

Remove Krovatkin from dynamic shapes auto request review (pytorch#88315)

b2679dc

Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#88315 Approved by: https://github.com/soumith

[ONNX] Add 0d-tensor test case in runtime check (pytorch#87212)

3d90788

Pull Request resolved: pytorch#87212 Approved by: https://github.com/BowenBao

Run asan's shard 4 on linux.4xlarge (pytorch#88310)

4e6f5f2

In attempt to mitigate OOMs, see pytorch#88309 Pull Request resolved: pytorch#88310 Approved by: https://github.com/albanD

[BE][MPS] Do not use malloc/free in 2022 (pytorch#88307)

7382c88

Use `std::vector` to store tensor shapes and automatically free them when array goes out of scope Pull Request resolved: pytorch#88307 Approved by: https://github.com/kulinseth

[Easy] Unused var in functional_adam (pytorch#88292)

bd5b4e6

Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#88292 Approved by: https://github.com/awgu

[FSDP()][24/N] Refactor _lazy_init() (pytorch#87939)

bf2819a

Pull Request resolved: pytorch#87939 Approved by: https://github.com/zhaojuanmao

Fix meta for aten.upsample_bilinear2d.vec (pytorch#88158)

c00c34f

Pull Request resolved: pytorch#88158 Approved by: https://github.com/ngimel

Revert "Do not use unsafe restriding for subclasses (pytorch#87610)"

5b75b19

This reverts commit 73379ac. Reverted pytorch#87610 on behalf of https://github.com/mehtanirav due to [Internal breakages](https://www.internalfb.com/intern/sandcastle/job/36028797828925790/insights)

[FSDP()][25/N] Add _post_forward_reshard() (pytorch#87940)

f132c17

Pull Request resolved: pytorch#87940 Approved by: https://github.com/mrshenli

[FSDP()][26/N] Move _lazy_init() into _fsdp_root_pre_forward() (p…

30dc6ce

…ytorch#87941) Pull Request resolved: pytorch#87941 Approved by: https://github.com/mrshenli

Taylor Robie and others added 19 commits November 8, 2022 21:48

Slight fix in error message for check_for_seq_len_1_nested_tensor (py…

a02ea65

…torch#88690) Pull Request resolved: pytorch#88690 Approved by: https://github.com/cpuhrsch

Reduce the number of shards inductor uses for model tests (pytorch#88610

5f876bf

) Pull Request resolved: pytorch#88610 Approved by: https://github.com/huydhn

Update gloo submodule (pytorch#88530)

6be426c

Also, add an explicit cudart dependency to `torch_cuda` if Kineto is used with GPU support (it used to be somehow inherited from a wrong `gloo` setup) Pull Request resolved: pytorch#88530 Approved by: https://github.com/osalpekar

Meta implementation for bernoulli (pytorch#88676)

1dab35c

For some reason bernoulli uses legacy memory format, see linked issue. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#88676 Approved by: https://github.com/SherlockNoMad

Meta implementation for unsqueeze_ (pytorch#88675)

a880ddc

Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#88675 Approved by: https://github.com/SherlockNoMad

Meta registrations for inplace operators (pytorch#88678)

f0e6cea

Also, handle non-default alpha correctly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#88678 Approved by: https://github.com/SherlockNoMad, https://github.com/albanD

Hopper (sm90) support (pytorch#87736)

a7420d2

Essentially a followup of pytorch#87436 CC @xwang233 @ptrblck Pull Request resolved: pytorch#87736 Approved by: https://github.com/xwang233, https://github.com/malfet

[primTorch] Improve narrow and narrow_copy: refs, tests, docs (py…

aa8279b

…torch#87045) Fixes pytorch#87019. Pull Request resolved: pytorch#87045 Approved by: https://github.com/mruberry

[Inductor] fix c++ compile error with masked float value init (pytorc…

fca6ed0

…h#88298) Fixes pytorch#88201 Pull Request resolved: pytorch#88298 Approved by: https://github.com/jgong5, https://github.com/jansel

jjsjann123 requested review from mruberry and IvanYashchuk as code owners November 9, 2022 16:39

jjsjann123 mentioned this pull request Nov 9, 2022

Upstream 1109 #2173

Merged

jjsjann123 requested review from csarofeen and removed request for IvanYashchuk and mruberry November 9, 2022 16:50

jjsjann123 force-pushed the upstream_bump_1109 branch from ac150d2 to c48f0b3 Compare November 9, 2022 17:21

Merge remote-tracking branch 'upstream/viable/strict' into HEAD

cfd8e5d

jjsjann123 force-pushed the upstream_bump_1109 branch from c48f0b3 to cfd8e5d Compare November 9, 2022 17:26

csarofeen merged commit aeecec0 into devel Nov 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream bump 1109 #2172

Upstream bump 1109 #2172

jjsjann123 commented Nov 9, 2022 •

edited

Loading

jjsjann123 commented Nov 9, 2022

Upstream bump 1109 #2172

Upstream bump 1109 #2172

Conversation

jjsjann123 commented Nov 9, 2022 • edited Loading

jjsjann123 commented Nov 9, 2022

jjsjann123 commented Nov 9, 2022 •

edited

Loading