Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream bump 1109 #2172

Merged
merged 1,368 commits into from
Nov 13, 2022
Merged

Upstream bump 1109 #2172

merged 1,368 commits into from
Nov 13, 2022

Conversation

jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Nov 9, 2022

Upstream bump to devel. merged upstream/viable/strict commit fca6ed0

Updated upstream/master in PR #2173

pmeier and others added 30 commits November 2, 2022 14:04
Fixes pytorch/torchdynamo#1708

Our FX subgraph partitioner works by taking all of the original output nodes from a subgraph, and replacing it with a new `call_module` node in the graph.

If the original subgraph outputs had fake tensors and other metadata stored in their `.meta` attribute though, then this information was getting lost when we spliced in the subgraph.

Losing metadata on an FX graph also seems like an easy trap to fall into, so I'm wondering if there are any better guardrails that we can add. I ended up fixing in this PR by adding an optional kwarg to propagate meta info directly in the `fx.Node.replace_all_uses_with`, just because propagating metadata seems like a pretty core thing.

Pull Request resolved: pytorch#87255
Approved by: https://github.com/wconstab, https://github.com/SherlockNoMad
as_strided_scatter's derivative formula was broken - instead of making a "mask" of 1's and 0's, it would effectively make a mask of 1's and uninitialized memory.

Fixes pytorch#88105

Pull Request resolved: pytorch#87646
Approved by: https://github.com/albanD
…#87874)

* Wiring to allow user to pass event names to profiler and reflect the count to the chrometrace
* If not used, the runtime and size overhead should be neglegible
* For now, primary user will be KinetoEdgeCPUProfiler but the impl does not assume that
* Not exposed to python yet

Differential Revision: [D40238032](https://our.internmc.facebook.com/intern/diff/D40238032/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40238032/)!
Pull Request resolved: pytorch#87874
Approved by: https://github.com/SS-JIA
…87876)

* Add support in lite_predictor benchmark binary to select event lists
* Uses Linux perf through Kineto profiler

Differential Revision: [D39837216](https://our.internmc.facebook.com/intern/diff/D39837216/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39837216/)!
Pull Request resolved: pytorch#87876
Approved by: https://github.com/SS-JIA
…rch#87877)

* Runs an existing model and checks an aten op if it gets perf events generated in the chrometrace
* Doesn't check for exact values since that's harder to do in a hardware independent way

Differential Revision: [D40474957](https://our.internmc.facebook.com/intern/diff/D40474957/)
Pull Request resolved: pytorch#87877
Approved by: https://github.com/SS-JIA
Add a stack of start counter values, and attribute each disable to the last enable

Differential Revision: [D40539212](https://our.internmc.facebook.com/intern/diff/D40539212/)
Pull Request resolved: pytorch#87904
Approved by: https://github.com/SS-JIA
…87905)

Reports total counts (includes time spent in all children), self counts can be calculated manully.

Differential Revision: [D40282770](https://our.internmc.facebook.com/intern/diff/D40282770/)
Pull Request resolved: pytorch#87905
Approved by: https://github.com/SS-JIA
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#88315
Approved by: https://github.com/soumith
Fixes pytorch#87313

Our ONNX pipelines do not run with BUILD_CAFFE2=0, so tests for operator_export_type ONNX_ATEN and ONNX_ATEN_FALLBACK will not be fully tested, allowing regressions to happen again.

We need to run the same set of tests for both BUILD_CAFFE2=0 and 1
Pull Request resolved: pytorch#87735
Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao
Use `std::vector` to store tensor shapes and automatically free them when array goes out of scope

Pull Request resolved: pytorch#88307
Approved by: https://github.com/kulinseth
This reverts commit f9d7985.

Reverted pytorch#87646 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think this one or one of the PR in the stack break bionic-cuda11.7 on trunk https://hud.pytorch.org/pytorch/pytorch/commit/70782981f06a042796d4604df2ec1491f4f5b194
After conda, consolidating all macos pip dependencies to cache every dependencies that macos CI needs. Two small issues are found along the way in `_mac-test-mps` workflow:

* It didn't have `Install macOS homebrew dependencies` to install libomp like the regular `_mac-test` workflow
* It didn't install `scipy`, thus silently skipping some `signal.windows` tests

Both are fixed in this PR
Pull Request resolved: pytorch#88071
Approved by: https://github.com/malfet
I'm not quite sure why GitHub starts to get flaky when we are trying to upload usage_log.txt to it (500 Internal server error). But we can live without it, so let's just ignore this for now, and follow up on this latter.

The failures all come from M1 runner, so it seems to point to a connectivity issue between AWS and GitHub:

* https://github.com/pytorch/pytorch/actions/runs/3373976793/jobs/5599310905
* https://github.com/pytorch/pytorch/actions/runs/3372858660/jobs/5597033598
* https://github.com/pytorch/pytorch/actions/runs/3371548201/jobs/5594274444
* https://github.com/pytorch/pytorch/actions/runs/3370877990/jobs/5592709210
* https://github.com/pytorch/pytorch/actions/runs/3370609384/jobs/5592008430

Pull Request resolved: pytorch#88288
Approved by: https://github.com/clee2000
Today, this doesn't work and dynamo errors out in a very non-obvious way (see:
https://gist.github.com/suo/dde04830372ab51a4a34ea760f14200a).

Here, we detect the error early and exit with a nicer msg. Also add a
config option to just no-op dynamo (which need to unblock internal
enablement).

Pull Request resolved: pytorch#87797
Approved by: https://github.com/yf225, https://github.com/soumith, https://github.com/jansel
This calls `lspci`, `lsmod`, and `modinfo nvidia` before and after the installation to gather more data about the "No GPU available" transient issue on G5 runner, i.e. https://hud.pytorch.org/pytorch/pytorch/commit/59fe272c1e698989228af5ad197bdd2985e4e9b9

This also handles `nvidia-smi` call and tries to re-install the driver if the first call fails, i.e. `No devices were found` https://hud.pytorch.org/pytorch/pytorch/commit/8ea19c802e38c061e79176360c1ecaa81ce2088a
Pull Request resolved: pytorch#88168
Approved by: https://github.com/clee2000, https://github.com/malfet
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR pytorch#81790, comment [link](pytorch#81790 (comment))

Pull Request resolved: pytorch#84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
Fixes: pytorch#88205

The `CreationMeta::NO_GRAD_MODE` path in handle_view_on_rebase wrongly assumes that the tensor would be a leaf, because tensors created in no_grad are always leaf tensors. However, due to creation_meta propagation, a view of a view created in no_grad also has `CreationMeta::NO_GRAD_MODE`, but DOES have grad_fn.
Pull Request resolved: pytorch#88243
Approved by: https://github.com/albanD
Previously the permute function was extended to behave like the `order`
function for first-class dimensions. However, unlike `permute`,
`order` doesn't have a keyword argment `dims`, and there is no way to add
it in a way that makes both permute an order to continue to have the same
behavior. So this change just removes the extra functionality of permute,
which wasn't documented anyway. Fixes pytorch#88187
Pull Request resolved: pytorch#88226
Approved by: https://github.com/zou3519
Taylor Robie and others added 19 commits November 8, 2022 21:48
This PR unifies and rationalizes some of the input representation in Result. The current approach of storing separate types in separate vectors is tedious for two types (Tensors and scalars), but would be even more annoying with the addition of TensorLists. A similar disconnection exists with sizes and strides which the user is also expected to zip with tensor_metadata.

I simplified things by moving inputs to a variant and moving sizes and strides into TensorMetadata. This also forced collection of sizes and strides in python tracer which helps to bring it in line with op profiling. Collection of TensorLists is fairly straightforward; `InputOutputEncoder` already has a spot for them (I actually collected them in the original TorchTidy prototype) so it was just a matter of plumbing things through.

Differential Revision: [D40734451](https://our.internmc.facebook.com/intern/diff/D40734451/)
Pull Request resolved: pytorch#87825
Approved by: https://github.com/slgong-fb, https://github.com/chaekit
…ytorch#88640)

I think this is the final resolution to issue caused by
pytorch#87797. The nvfuser issue that PR
tripped up was because, even though we're correctly disabling
torchdynamo via a `DisableContext`, the nested fx trace check was still
firing. This PR properly narrows it to only fire if we're not disabled.

Pull Request resolved: pytorch#88640
Approved by: https://github.com/yf225
…6802)

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
Pull Request resolved: pytorch#86802
Approved by: https://github.com/chaekit
)

Summary:
X-link: pytorch/torchrec#759

Since the remove_duplicate flag was added to named_buffers in D39493161 (pytorch@c12f829), this adds the same flag to named_parameters

Test Plan:
python test/test_nn.py -k test_buffers_and_named_buffers

OSS Tests

Differential Revision: D40801899

Pull Request resolved: pytorch#88090
Approved by: https://github.com/albanD
…rs. (pytorch#88585)

Summary: This diff modifies the implementation of the select operator so slices of the irregular dimension can be selected (e.g. nt[:,0,:]).

Test Plan:
Added new unit tests to test that the new functions work as intended (see them in diff). To test,
`buck test mode/dev-nosan //caffe2/test:nested`

Differential Revision: D41083993

Pull Request resolved: pytorch#88585
Approved by: https://github.com/cpuhrsch
…torch#88651)

Summary:
Today when we transform the captured graph in the last step in export(aten_graph=True), we construct a new graph which doesn't have the all the metadata to be preserved, for example, node.meta["val"].
meta["val"] is important for writing passes and analysis on the graph later in the pipeline, we may want to preserve that on placeholder nodes.

Test Plan: test_export.py:test_export_meta_val

Differential Revision: D41110864

Pull Request resolved: pytorch#88651
Approved by: https://github.com/tugsbayasgalan, https://github.com/jansel
Also, add an explicit cudart dependency to `torch_cuda` if Kineto is used with GPU support (it used to be somehow inherited from a wrong `gloo` setup)
Pull Request resolved: pytorch#88530
Approved by: https://github.com/osalpekar
For some reason bernoulli uses legacy memory format, see linked issue.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#88676
Approved by: https://github.com/SherlockNoMad
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#88675
Approved by: https://github.com/SherlockNoMad
Also, handle non-default alpha correctly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#88678
Approved by: https://github.com/SherlockNoMad, https://github.com/albanD
…h#88528)

In pytorch/torch/_C/__init__.pyi, Graph.addInput has signature
```python
  def addInput(self, name: str) -> Value: ...
```
which doesn't match the corresponding function
```cpp
  Value* addInput(const std::string& name = "") {
    return block_->addInput(name);
  }

```

in python_ir.cpp. This PR aligns the bound function on both C++ and Python sides. Without this PR, mypy will compain whenever a change contains some calls to `addInput`; for example,
![image](https://user-images.githubusercontent.com/3524474/200092086-429b8d63-9321-4d03-b0d6-f4c9bd361756.png)

Pull Request resolved: pytorch#88528
Approved by: https://github.com/davidberard98
It allows one to SSH faster rather than having to wait for repo clone to
finish.

I.e. right now one usually have to wait for a few minutes fore PyTorch clone is finished, but with this change you can SSH ahead of time (thanks to `setup-ssh` being a composite action

Pull Request resolved: pytorch#88715
Approved by: https://github.com/clee2000, https://github.com/izaitsevfb
## Description
Support lowering of channel shuffle in FX by adding its module and functional op to `is_copy_node` list in `torch/ao/quantization/fx/_lower_to_native_backend.py`

## Validation
UTs added to test
- correctness of quantized `ChannelShuffle` module.
- FX lowering of `ChannelShuffle` module and functional `channel_shuffle`.

Pull Request resolved: pytorch#83731
Approved by: https://github.com/jerryzh168
@jjsjann123 jjsjann123 mentioned this pull request Nov 9, 2022
@jjsjann123 jjsjann123 requested review from csarofeen and removed request for IvanYashchuk and mruberry November 9, 2022 16:50
@jjsjann123
Copy link
Collaborator Author

CI looks wrong.. let me remove those

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.