Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop upstream sync 231030 #2287

Merged
merged 247 commits into from
Nov 7, 2023

Conversation

rahulbatra85
Copy link

No description provided.

bartchr808 and others added 30 commits October 23, 2023 04:44
In the compact version of printing, any custom attributes a user of StableHLO sets will not appear. This makes sure they appear if they exist.

PiperOrigin-RevId: 575778539
Imported from GitHub PR openxla/xla#6475

`bitcast_dtypes_expander_test` failed when executed in Docker container
```
//xla/service:bitcast_dtypes_expander_test                               FAILED in 3 out of 3 in 0.7s
```

Reason - swapped variable names in `S64toS32` test
```
%shift-right-logical.11 -> %shift-right-logical.12
%constant.12 -> %constant.11
```
To resolve the issue we can use pattern based var names instead.
```
%[[VAL_10:.*]]
%[[VAL_11:.*]]
```

Copybara import of the project:

--
b99ed813b39c6b53f1aa0981dbaae060d8cb5aa8 by Alexander Pivovarov <pivovaa@amazon.com>:

Fix flaky test bitcast_dtypes_expander_test

Merging this change closes tensorflow#6475

PiperOrigin-RevId: 575793343
Updates LLVM usage to match
[e558be51bab0](llvm/llvm-project@e558be51bab0)

PiperOrigin-RevId: 575794929
… cuda 12.3 dependencies.

Otherwise, when pulled, the 12.3 will change the alias of /usr/local/cuda to 12.3, causing failures.

PiperOrigin-RevId: 575799576
Log the op attributes on `--vmodule=xla_call_module_op=3`.

PiperOrigin-RevId: 575812286
PiperOrigin-RevId: 575813013
Fixed that IrEmitterTriton "swallowed" NaNs in minimum and maximum.

Fixed that ElementalIrEmitter "swallowed" NaNs in the case of minimum(x, NaN) on GPU. That was likely caused by an llvm error.

PiperOrigin-RevId: 575817645
PiperOrigin-RevId: 575850088
…mand buffers

Before this CL the libraries supported by GPU graphs are determined a integer level. This CL allows the user to have fine-grained control on whether a library should be enabled in graphs.

Example usage: XLA_FLAGS=--xla_gpu_command_buffer_command_types=FUSION,CUBLAS

PiperOrigin-RevId: 575856977
Consolidate implementations of requantize and use the same one as TF quantizer.

PiperOrigin-RevId: 575864178
PiperOrigin-RevId: 575866461
…dependencies, as part of a thunk clean up, and updated the necessary directories pointing to this thunk. tensorflow#5758

PiperOrigin-RevId: 575867727
PiperOrigin-RevId: 575881215
…en instruction, and reports the total # of such nodes.

PiperOrigin-RevId: 575887322
This will guarantee that the iterators created inside of parallel interleave iterator get destroyed before itself.
Otherwise, `CancelThreads(true)` might finish before `element`'s destructor is called because of `outstanding_threads_finished_cond_var_.wait(l);`

PiperOrigin-RevId: 575888988
…lag.

The flag could have a different value in different processes, which would invalidate the inferred shape. The dependency also inflates the size of the op library, by adding a dependency on a kernel library (and transitively on MLIR), and pulls in unrelated ops which complicates wrapper generation.

PiperOrigin-RevId: 575900618
PiperOrigin-RevId: 575904949
In preparation to create `xla_protos_all` to simplify further

PiperOrigin-RevId: 575911998
…ow/core/tpu/ops:sparse_core_ops.

These dependencies are unnecessary, and they can introduce duplicate op wrappers when we generate Python bindings.

PiperOrigin-RevId: 575921758
Trying to get these in sync with the current jobs, which are failing,
probably because of the wrong .so being used.

PiperOrigin-RevId: 575978433
tdanyluk and others added 14 commits October 30, 2023 11:43
…ach fusion

This is needed for xla_gpu_deterministic_ops to function reliably - as tested by determinism_test.cc.

PiperOrigin-RevId: 577906200
PiperOrigin-RevId: 577906242
… SetPjRtCBufferToTensor).

PiperOrigin-RevId: 577915709
PiperOrigin-RevId: 577915784
CreateTensorProto() that allows data to be loaded from
non-std::vector sources, which is often more efficient.

vector<bool> makes the implementation of internal::CreateTensorProto
a bit messier than it would otherwise be due to it not being
representable as an absl::Span.

PiperOrigin-RevId: 577926897
In StreamExecutor we rely on a thread local hack to pass workspace to cuBLAS, long term we need to make workspace explicit in all BLAS APIs.

Without user-managed workspace cuBLAS adds memory allocation nodes when captured into CUDA graphs, and that triggers graph update bugs in CUDA.

PiperOrigin-RevId: 577928686
… the dot

1. Reduce block sizes if they are bigger than NextPowerOf2(tensor size).
2. Reduce split_k to NextPowerOf2(k)/block_k.
3. Remove duplicates created by the previous steps.

I intend this cl not to make any performance degradation.

PiperOrigin-RevId: 577941843
@rahulbatra85 rahulbatra85 force-pushed the develop-upstream-sync-231030 branch from 669ff45 to 34f29b8 Compare November 2, 2023 18:03
@@ -496,7 +497,7 @@ def f(x):
}
}
"""

'''
Copy link

@i-chaochen i-chaochen Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use test.is_built_with_rocm() instead of just commented out in the code? Also, in your tracking ticket https://github.com/ROCmSoftwarePlatform/frameworks-internal/issues/6264 could you also add the new added commit from upstream for tracking? Thanks

Rahul Batra added 3 commits November 3, 2023 16:03
	//tensorflow/compiler/tests:xla_call_module_no_platform_check_test_gpu
	//tensorflow/compiler/tests:xla_call_module_no_shape_assertions_check_test_gpu
	//tensorflow/compiler/tests:xla_call_module_test_gpu
	//tensorflow/python/eager:context_test_gpu
	//tensorflow/python/ops/memory_tests:custom_gradient_memory_test_gpu
@rahulbatra85 rahulbatra85 force-pushed the develop-upstream-sync-231030 branch from 34f29b8 to 0c62bc3 Compare November 3, 2023 16:06
@rahulbatra85
Copy link
Author

retest Ubuntu-CPU please!

1 similar comment
@rahulbatra85
Copy link
Author

rahulbatra85 commented Nov 3, 2023

retest Ubuntu-CPU please!

@i-chaochen
Copy link

retest Ubuntu-CPU please

@rahulbatra85 rahulbatra85 merged commit 5351bd5 into develop-upstream Nov 7, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.