forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream merge 0803 #1887
Merged
Merged
Upstream merge 0803 #1887
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pull Request resolved: pytorch#82049 Approved by: https://github.com/ezyang
Pull Request resolved: pytorch#82051 Approved by: https://github.com/eellison, https://github.com/ezyang
Pull Request resolved: pytorch#82052 Approved by: https://github.com/ezyang
This reverts commit 30ed427. Reverted pytorch#82052 on behalf of https://github.com/Chillee due to broke build on master
…ted data types (pytorch#82183) This is in-continuation of fixes for TestConsistency for MPS backend. * Add error messages for unsupported matmul ops * Add error handling for int inputs for linear op ### Description <!-- What did you change and why was it needed? --> ### Issue <!-- Link to Issue ticket or RFP --> ### Testing <!-- How did you test your change? --> Pull Request resolved: pytorch#82183 Approved by: https://github.com/razarmehr
Docker docs says "For other items (files, directories) that do not require ADD’s tar auto-extraction capability, you should always use COPY": https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy I've found this by running https://github.com/hadolint/hadolint This is a follow-up after pytorch#81944 Pull Request resolved: pytorch#82151 Approved by: https://github.com/huydhn, https://github.com/jeffdaily, https://github.com/ZainRizvi
### Description we need to make sure the int overload of expand gets redispatched to the same device. Otherwise at::native::expand just calls a bunch of lower-level ops. ### Issue <!-- Link to Issue ticket or RFP --> ### Testing <!-- How did you test your change? --> Pull Request resolved: pytorch#82264 Approved by: https://github.com/bdhirsh
Pull Request resolved: pytorch#82278 Approved by: https://github.com/huydhn
### Description <!-- What did you change and why was it needed? --> ### Issue <!-- Link to Issue ticket or RFP --> ### Testing <!-- How did you test your change? --> Pull Request resolved: pytorch#82269 Approved by: https://github.com/kit1980
…"" (pytorch#82287) This reverts commit e519dd3. Pull Request resolved: pytorch#82287 Approved by: https://github.com/ezyang
PEr title, unfortunately testing invalid reads with caching allocator is hard. Pull Request resolved: pytorch#82272 Approved by: https://github.com/cpuhrsch
Implements linspace with arange, and logspace with linspace. - Implements a more precise path in linspace's ref when dtype is integral to avoid off-by-one issues when output of computation is casted to int. The trade off is that there's an increased chance of overflow. - Files several issues pytorch#82242, pytorch#82230, pytorch#81996, on preexisting issues with the linspace and logspace. These mainly concern when dtype is integral - the affect tests are xfailed in this PR. - Fixes the check that the reference implementation is closer to precise implementation than torch implementation to also update the dtype kwarg to the precise dtype. TODO: - ~support negative bases~ (not in this PR) - ~support complex. Since arange does not support complex, but linspace does, one solution is to just call linspace separately on the real and imag components and sum the results in the end~ (not in this PR) - ~default dtypes need to be explicitly handled since computation is done in a different dtype than result~ (done) Pull Request resolved: pytorch#81826 Approved by: https://github.com/ngimel
### Description Add compiler function to dump the forward, backward, and joint graphs. The partitioner is default partition. The input meta to each dumped graphs will also be dumped as a pickle file. Example usage: ``` save_fx_func = graph_dumper_aot(current_name, folder_name, dump_example_input = False) optimize_ctx = torchdynamo.optimize( save_fx_func ) with torch.enable_grad(): with optimize_ctx: result = forward_and_backward_pass(model, example_inputs) ``` Pull Request resolved: pytorch#82184 Approved by: https://github.com/Chillee
…1522) Move aten.native_batch_norm_backward decomposition from https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py#L148. Changed to not recompute mean and invstd, added type cast. In fucntorch, changed `@register_decomposition_for(aten.native_batch_norm_backward)` to `@register_decomposition_for_jvp(aten.native_batch_norm_backward)` Passing `pytest test/test_decomp.py -k norm` Note that when the output mask is False for grad_weight and grad_bias, we should return None to be consistent with the non-decomposed operator's behavior. But "None" doesn't work with vjp, so the version of decomposition in functorch used zeros. See https://github.com/pytorch/pytorch/blob/b33c1f7dd4a4d30ebc912f555e56d105ae66aa84/functorch/functorch/_src/decompositions.py#L210. Pull Request resolved: pytorch#81522 Approved by: https://github.com/Chillee
allows for benchmarking of ops Differential Revision: [D38081129](https://our.internmc.facebook.com/intern/diff/D38081129/) Pull Request resolved: pytorch#82123 Approved by: https://github.com/SS-JIA
becnhmarking of add op Differential Revision: [D38118138](https://our.internmc.facebook.com/intern/diff/D38118138/) Pull Request resolved: pytorch#82124 Approved by: https://github.com/SS-JIA
As migration from Jenkins to GHA is complete. Pull Request resolved: pytorch#82280 Approved by: https://github.com/huydhn
benchmarking of conv2d regular op Differential Revision: [D38118137](https://our.internmc.facebook.com/intern/diff/D38118137/) Pull Request resolved: pytorch#82125 Approved by: https://github.com/SS-JIA
Differential Revision: [D38119585](https://our.internmc.facebook.com/intern/diff/D38119585/) Pull Request resolved: pytorch#82127 Approved by: https://github.com/SS-JIA
Differential Revision: [D38119586](https://our.internmc.facebook.com/intern/diff/D38119586/) Pull Request resolved: pytorch#82128 Approved by: https://github.com/SS-JIA
Differential Revision: [D38153928](https://our.internmc.facebook.com/intern/diff/D38153928/) Pull Request resolved: pytorch#82221 Approved by: https://github.com/SS-JIA
Differential Revision: [D38153929](https://our.internmc.facebook.com/intern/diff/D38153929/) Pull Request resolved: pytorch#82222 Approved by: https://github.com/SS-JIA
benchmarking of div op Differential Revision: [D38154700](https://our.internmc.facebook.com/intern/diff/D38154700/) Pull Request resolved: pytorch#82225 Approved by: https://github.com/SS-JIA
Based off pytorch#80511 with extra changes: - Update pybind to the latest release as it contains some needed fixes - Extend the compat header to do reduce changes in code Pull Request resolved: pytorch#81242 Approved by: https://github.com/malfet, https://github.com/mattip
…ch#82215) Fixes pytorch#82150. Pull Request resolved: pytorch#82215 Approved by: https://github.com/amjames, https://github.com/cpuhrsch
…ion of new parameter (pytorch#82273) ### Description PR pytorch#80336 introduced a new parameter to the Sparse Adam optimizer. The new parameter is accessed inside the `step` method of the optimizer. If we try to deserialize and run an older version of the optimizer before this change was introduced, it fails in the step that tries to access the missing parameter. I have added a workaround to set a default value in case the parameter is unavailable in the optimizer. ### Issue <!-- Link to Issue ticket or RFP --> ### Testing * Testing on PyTorch CI * Manual validation against existing serialized models to make sure they continue to work Pull Request resolved: pytorch#82273 Approved by: https://github.com/mehtanirav, https://github.com/albanD
…elper (pytorch#81828) Introduce _DistWrapper class that wraps a process group and provides functional variants of collectives. It works without c10d enabled and is exception robust. Introduce tensor_narrow_n that handle narrowing over multiple dimentions. Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#81828 Approved by: https://github.com/wanchaol
It looks like DEBUG macro is never actually set anywhere, see pytorch#82276 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#82277 Approved by: https://github.com/malfet
) ### Description Improve the incremental build process on ROCM by eliminating unnecessary file changes. ### Issue N/A ### Testing 1. Run `python tools/amd_build/build_amd.py --out-of-place-only` multiple times, and ensure File `third_party/gloo/cmake/Modules/Findrccl.cmake` does not contain patterns like `RCCL_LIBRARY_PATH_PATH` 2. Run `python tools/amd_build/build_amd.py; USE_ROCM=1 python3 setup.py develop` twice, and confirm the second run does not trigger the compiling of thousands of files. Pull Request resolved: pytorch#82190 Approved by: https://github.com/jithunnair-amd, https://github.com/ezyang
The next PR up in the stack requires this for lintrunner to be happy. There are no logical changes; the file was autoformatted via the following: ``` mv functorch/codegen/gen_vmap_plumbing.py torchgen/gen_vmap_plumbing.py lintrunner torchgen/gen_vmap_plumbing.py -a mv torchgen/gen_vmap_plumbing.py functorch/codegen/gen_vmap_plumbing.py ``` Test Plan: - build functorch Differential Revision: [D38171956](https://our.internmc.facebook.com/intern/diff/D38171956) Pull Request resolved: pytorch#82246 Approved by: https://github.com/kit1980
### Description <!-- What did you change and why was it needed? --> We forgot that the < was for comments in markdown. Also added a link to the wiki to the start land checks message so users can see why their PR is taking extra time to land. ### Issue <!-- Link to Issue ticket or RFP --> n/a ### Testing <!-- How did you test your change? --> n/a Pull Request resolved: pytorch#82649 Approved by: https://github.com/janeyx99, https://github.com/ZainRizvi
fixes pytorch#81457 fixes pytorch#81216 fixes pytorch#81212 fixes pytorch#81207 fixes pytorch#81206 fixes pytorch#81218 fixes pytorch#81203 fixes pytorch#81202 fixes pytorch#81214 fixes pytorch#81220 fixes pytorch#81205 fixes pytorch#81200 fixes pytorch#81204 fixes pytorch#81221 fixes pytorch#81209 fixes pytorch#81210 fixes pytorch#81215 fixes pytorch#81217 fixes pytorch#81222 fixes pytorch#81211 fixes pytorch#81201 fixes pytorch#81208 As part of this PR I'm also re-enabling all of the functionalization tests that got marked as flaky in CI (they're not actually flaky - I think they got marked because a PR that should have changed their expect-test output made it to master without the changes. I'll let CI run on this PR to confirm though). reland of pytorch#80897 Pull Request resolved: pytorch#82407 Approved by: https://github.com/ezyang
Adds the dispatch boilerplate for MPS backend. Pull Request resolved: pytorch#82612 Approved by: https://github.com/malfet
…e case (pytorch#82441) - Refactor SchemaInfo to be able to handle cases where other variables besides running_mean and running_var mutate due to training = true - Add special case rrelu_with_noise to fix pytorch#82434 - Tested by running SchemaInfo tests Pull Request resolved: pytorch#82441 Approved by: https://github.com/davidberard98
This reverts commit 714669e. Reverted pytorch#82626 on behalf of https://github.com/zengk95 due to This looks like its breaking trunk
…uts (pytorch#82176)" This reverts commit 1dfcad8. Reverted pytorch#82176 on behalf of https://github.com/zengk95 due to This looks like it's breaking functorch tests on master
…ytorch#82552)"" (pytorch#82599) This reverts commit 532b8a9. Pull Request resolved: pytorch#82599 Approved by: https://github.com/albanD
…orch#82556)"" (pytorch#82600) This reverts commit ab8e5e6. Pull Request resolved: pytorch#82600 Approved by: https://github.com/janeyx99
This should reduce the prevalence of pytorch#82324 Differential Revision: [D38325919](https://our.internmc.facebook.com/intern/diff/D38325919) Pull Request resolved: pytorch#82596 Approved by: https://github.com/goldenxuett
update production ops (7/28). This is only for calculating mobile op test coverage. Meta employee can update it using ``` python test/mobile/model_test/update_production_ops.py ~/fbsource/xplat/pytorch_models/build/all_mobile_model_configs.yaml ``` Pull Request resolved: pytorch#82444 Approved by: https://github.com/kit1980
Fixes pytorch#82531 Pull Request resolved: pytorch#82650 Approved by: https://github.com/kulinseth
thanks to @atalman for catching this https://github.com/pytorch/pytorch/actions/runs/2778770227 Pull Request resolved: pytorch#82672 Approved by: https://github.com/atalman
Re-lands pytorch#81558 that got reverted due failing tests. This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values. Pull Request resolved: pytorch#81896 Approved by: https://github.com/ngimel
This moves first-class dimensions, as prototyped in https://github.com/facebookresearch/torchdim into the functorch build. This makes them availiable for use in PrimTorch more easily. Pull Request resolved: pytorch#82454 Approved by: https://github.com/ezyang, https://github.com/zou3519
Differential Revision: D38368525 Pull Request resolved: pytorch#82676 Approved by: https://github.com/ngimel
Currently, if we run softmax_backward/logsoftmax_backward which are not along the last dim, the calculation will fall to a [scalar version](https://github.com/pytorch/pytorch/blob/32593ef2dd26e32ed44d3c03d3f5de4a42eb149a/aten/src/ATen/native/SoftMax.cpp#L220-L287). And we find actually we have the chance to vectorize the calculation along the inner_size dim. Changes we made: Use vectorized softmax_backward_kernel/log_softmax_backward_kernel instead of host_softmax_backward when not along the last dim. We collected the benchmark data of softmax_backward and logsoftmax_backward for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz. Number of cores: 24 cores(1 socket) [softmax_benchmark_32593ef.log](https://github.com/pytorch/pytorch/files/8962956/softmax_benchmark_32593ef.log) [softmax_benchmark_the_pr.log](https://github.com/pytorch/pytorch/files/8962958/softmax_benchmark_the_pr.log) Pull Request resolved: pytorch#80114 Approved by: https://github.com/frank-wei
Summary: no functional changes, just testing to make sure this is working Test Plan: python test/test_ao_sparsity.py TestFxComposability Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#82204 Approved by: https://github.com/supriyar
…h#81802) Summary: Needed to refactor this PR to add tests for some new layers without copy pasting the entirety of the code. Its basically just a helper that does exactly what the other tests did since they were essentially copies of one another. Its possible to do similar with the quantized kernels test but its different enough that it seemed more effort than it was worth. Also bugfix: Originally line 150 I believe was wrong since model.weight is never used, though the only effect was that the specific weight wasn't used. Test Plan: python test/test_ao_sparsity.py TestQuantizedSparseLayers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#81802 Approved by: https://github.com/supriyar
`torch.cuda.is_bf16_supported()` return False on ROCm which is not correct, since BF16 is supported on all AMD GPU arch - gfx906, gfx908 and gfx90a. cc @jithunnair-amd Pull Request resolved: pytorch#80410 Approved by: https://github.com/jeffdaily, https://github.com/malfet
This reverts commit 23b9004. Reverted pytorch#82454 on behalf of https://github.com/zengk95 due to this is breaking mac jobs on trunk https://hud.pytorch.org/pytorch/pytorch/commit/23b90044dac04d23f43c1a0f518bdbf95efd3b47
…2688) Need to use `ASSERT_FLOAT_EQ` for floats. Right now the test often fails internally like this: ``` xplat/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/test/fully-connected-operator-tester.h:362 Expected equality of these values: output_dynamic[i * outputChannels() + c] Which is: -601.09 ((float)accumulators[i * outputChannels() + c] * requantization_scales[c]) + float(bias[c]) Which is: -601.09 at 0, 18: reference = -601.0899658203125, optimized = -601.09002685546875 ``` ``` xplat/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/test/fully-connected-operator-tester.h:362 Expected equality of these values: output_dynamic[i * outputChannels() + c] Which is: -65.6251 ((float)accumulators[i * outputChannels() + c] * requantization_scales[c]) + float(bias[c]) Which is: -65.6251 at 0, 7: reference = -65.625106811523438, optimized = -65.625099182128906 ``` Pull Request resolved: pytorch#82688 Approved by: https://github.com/mehtanirav
csarofeen
approved these changes
Aug 4, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jjsjann123
added a commit
that referenced
this pull request
Aug 29, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. removes un-necessary sync from redundant thread compute analysis 2. symmetric API for BestEffortReplay 3. support merge on trivial reductions 4. Ampere async copy improvements - bug fixes: 1. vectorization bug fixes 2. type inference patch : fixes upstream pytorch#81725 3. segmenter bug fix with deterministic iteration ordering - parser update 1. added leaky_relu - scheduler 1. normalization scheduler clean up. 2. simplifies matmul scheduling with new transform propagator 3. merge all dimensions in PW scheduler 4. various gemm related improvements - debuggability 1. nsight compute support 2. debug dump for InlinePropagator 3. Add `UnaryOpType::Print` Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` dfe02f3 Merge remote-tracking branch 'csarofeen/devel' into HEAD 1617373 Add `TensorViewBuilder::shape(std::vector<Val*> shape)` (#1884) 7cfb779 Merge pull request #1887 from csarofeen/upstream_merge_0803 3399f6d Merge remote-tracking branch 'origin/viable/strict' into HEAD 01208f5 Add `UnaryOpType::Print` which can be helpful for debugging (#1878) 0646522 Remove redundant TORCH_INTERNAL_ASSERT in lower_magic_zero.cpp (#1881) 7bc76aa Fix most inlined propagator for mismatched dims (#1875) 501f4aa Nonaffine swizzle formulation ep.2: Loop swizzle variant. (#1826) d863d69 Ampere async copy ep.2: circular buffering extension to support pipelined matmul operand load (#1827) e0ae11a Larger sized mma instructions to support full vectorization (#1824) 9bb4cf7 fragment iteration to support fully unrolled mma ops (#1823) a48270a Merge all dims in pointwise scheduler (#1872) 172fb36 Make MostInlined and BestEffort inline propagation no longer assert replayed (#1868) a64462a Allow trivial reduction to be merged (#1871) 440102b Symmetric API for BestEffortReplay (#1870) d1caf33 Some misc cleanups/refactor split out from #1854 (#1867) 1013eda Remove some welford specific logic. (#1864) 51589d3 Some cleanups on tests and heuristics params (#1866) a6b3e70 Segmenter bug fix, and deterministic iteration ordering. (#1865) 1b665b9 Add nullptr checks to IrBuilder (#1861) 1cd9451 Simplify matmul scheduling with the new transform propagator. (#1817) bbc1fb9 Add leaky_relu operation (#1852) e842a9b Minor cleanup in pointwise scheduler (#1858) 9ee850c Fix stringstream usage (#1857) 20a36c1 Improve nsight compute support (#1855) 4059103 Remove debugging `true ||` from getPointwiseHeuristics (#1822) 01117bf Misc cleanup (#1853) 5cc6494 Apply the magic-zero protection to each indexed domain individually for predicate indexing (#1846) 92e6f02 Cleanup normalization scheduler (#1845) db89c65 Type inference patch (#1848) 102fe93 Add debug dump for InlinePropagator (#1847) b7a4d93 Redundant thread compute analysis to avoid un-necessary sync insertion (#1687) 942be5b Upstream ci build fixes (#1842) 0b83645 Fix vectorization bug introduced in #1831 (#1840) 63630f1 Move MaxProducerPosUpdater into InlinePropagator::tearDown (#1825) 9135a96 Fix transpose benchmark dtype (#1839) 2c9a6c0 Add extra configurability to `parallelizeAllLike` (#1831) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D38543000](https://our.internmc.facebook.com/intern/diff/D38543000) Pull Request resolved: pytorch#83067 Approved by: https://github.com/davidberard98
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
merging upstream/master into csarofeen/devel
Upstream master commit: 9647bec
Corresponding PR to bump our master branch: #1886