Callback #1

fsx950223 · 2019-08-17T08:04:50Z

No description provided.

PiperOrigin-RevId: 261242655

…ariable mismatch. PiperOrigin-RevId: 261245922

PiperOrigin-RevId: 261246913

PiperOrigin-RevId: 261246974

RELNOTES=In TensorFlow 2, layers now default to float32, and automatically cast their inputs to the layer's dtype. If you had a model that used float64, it will probably silently use float32 in TensorFlow 2, and a warning will be issued that starts with "Layer <layer-name> is casting an input tensor from dtype float64 to the layer's dtype of float32". To fix, either set the default dtype to float64 with `tf.keras.backend.set_floatx('float64')`, or pass `dtype='float64'` to each of the Layer constructors. See `tf.keras.layers.Layer` for more information. PiperOrigin-RevId: 261249415

…response, get the status from the response and clean up all remaining calls in the queue. PiperOrigin-RevId: 261252523

For the fully_connected layer, we have seen FakeQuant* with 16bits are used in the training models, so we should add this to the op type constraint to quantize these models. PiperOrigin-RevId: 261253563

PiperOrigin-RevId: 261267068

…rite call In a recent change, this rewrite pattern was moved to the second greedy pattern rewrite call, but the TF FakeQuant ops are constant folded in the first greedy pattern rewrite call, so the narrow_range and bit_width attribute from the TF FakeQuant ops are missing. We failed to test this case since the default values in the following passes matches the TOCO requirement. This patch restores the original behavior and uses the TFL quantize and dequantize ops to preserve these information. This patch fixed another related bug that the generated patterns are not applied in the second greedy pattern rewrite call, so the TF transpose/reshape are not lifting to enable the constant folding. This patch added the generated patterns to the second greedy pattern rewrite call. At the same time, two lifting rules were added so the TFL quantize and dequantize ops are handled. This patch also improved the implementation of the TFL QDQs inserting pattern. Some related tests are simplified to only check necessary invariant. PiperOrigin-RevId: 261267567

PiperOrigin-RevId: 261268533

This CL sets narrow_range to true to avoid the value -128 in in8 quantization, thus the weight values range only in [-127, 127]. This enables faster runtime arithmetic kernels on ARM NEON. For uint8 quantization, 128 is substracted from the quantized values and zero-points and then the int8 kernels can be used, thus the its narrow_range for weights are set to true as well. Note that the FakeQuant* for "weights" inserted in all the existing models have narrow_range set to true, so this CL just make it consistent for all the weights in the model. TOCO implemented the same logic in the ensure_uint8_weights_safe_for_fast_int8_kernels pass. This optimizaion is very specific to ARM architecture, so an TODO is added to make it configurable. Activations shouldn't use narrow_range, instead, it should use the full range. PiperOrigin-RevId: 261269551

PiperOrigin-RevId: 261269625

…rands with types matching the output type This will allow instances not following these requirements to printed so that they can be read again correctly. Also, updated the parser to parse long as well as the short form. Similar pattern to mark all but the first two operands as control inputs is there for other ops like NextIterationSinkOp, SwitchOp and SwitchNOp but these ops expects only two data operands so no changes are required for them. PiperOrigin-RevId: 261272248

…MergeOp verifier Variant types may have opaque subtypes info that need to match. Also, added constraint that all data operands and the output of the MergeOp are of tensor type. PiperOrigin-RevId: 261277322

PiperOrigin-RevId: 261284396

PiperOrigin-RevId: 261284400

PiperOrigin-RevId: 261289882

Note that cache key contains PyObject* and is therefore not easily reusable from other languages. CPU | Benchmark | Before (calls/sec) | After (calls/sec) | |---------------------------------+--------------------+-------------------| | benchmark_add_float_scalars | 96697.1650772 | 122549.093512 | | benchmark_add_int_scalars | 100551.000642 | 124905.320251 | | benchmark_create_float_constant | 269135.927106 | 368643.600035 | | benchmark_create_int32_constant | 250023.088998 | 347383.13732 | GPU | Benchmark | Before (calls/sec) | After (calls/sec) | |---------------------------------+--------------------+-------------------| | benchmark_add_float_scalars | 9478.74450315 | 17181.8063021 | | benchmark_add_int_scalars | 99584.0439651 | 117965.869066 | | benchmark_create_float_constant | 275277.007219 | 381577.874818 | Notes: * The timings between CPU and GPU are incomparable because they were measured on different hardware; * I suspect that benchmark_add_int_scalars on GPU does addition on CPU and copies to GPU after, therefore the gap between *_add_float_* and *_add_int_*. PiperOrigin-RevId: 261293772

PiperOrigin-RevId: 261294904

PiperOrigin-RevId: 261296066

PiperOrigin-RevId: 261306157

and wrong opt set (RUY_OPT_INTRINSICS, not RUY_OPT_ASM, there is no asm here). PiperOrigin-RevId: 261314107

PiperOrigin-RevId: 261318771

Don't use nullness of local_packed or packing_status array pointers to determine whether a side is pre-packed: use params->is_prepacked for that. Make local_packed a member of TrMulTask so we don't need to pass it around explicitly. PiperOrigin-RevId: 261319667

in the single-thread case. PiperOrigin-RevId: 261320407

As signed index is verified to be >= 0 at the point of compare with the unsigned size, we can make the compare explicitly an unsigned compare by casting index. Also avoids -Wsign-compare warning where enabled. PiperOrigin-RevId: 261321178

…ounters. Saves a store-release and a load-acquire (total ~100 cycles) per matmul. PiperOrigin-RevId: 261321407

AffineDataCopyGeneration pass relied on command line flags for internal logic in several places, which makes it unusable in a library context (i.e. outside a standalone mlir-opt binary that does the command line parsing). Define configuration flags in the constructor instead, and set them up to command line-based defaults to maintain the original behavior. PiperOrigin-RevId: 261322364

PiperOrigin-RevId: 261806721

…tly pass it through in convolution kernel. PiperOrigin-RevId: 261808345

PiperOrigin-RevId: 261809474

PiperOrigin-RevId: 261816030

… directive. This allows for proper forward declaration, as opposed to leaking the internal implementation via a using directive. This also allows for all pattern building to go through 'insert' methods on the OwningRewritePatternList, replacing uses of 'push_back' and 'RewriteListBuilder'. PiperOrigin-RevId: 261816316

PiperOrigin-RevId: 261816763

PiperOrigin-RevId: 261816972

PiperOrigin-RevId: 261820730

The input_length arg is passed as the maximum_iterations arg to tf.while_loop which adds a LogicalAnd to the loop condition which is slow on GPU. PiperOrigin-RevId: 261822039

PiperOrigin-RevId: 261823955

PiperOrigin-RevId: 261828878

PiperOrigin-RevId: 261840270

Many LLVM transformations benefits from knowing the targets. This enables optimizations, especially in a JIT context when the target is (generally) well-known. Closes tensorflow#49 COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#49 from dcaballe:dcaballe/tti ab02f72eb326f660945696e5dadeeb983cf263b3 PiperOrigin-RevId: 261840617

It is now tensorflow/core/platform:platform PiperOrigin-RevId: 261843350

No functionality changes. Lessons learned: 1. Some protobuf messages are large, e.g. FunctionDef. - Solution: Allocate them on heap instead. 2. Sometimes compiler inlines functions, so inlined function's stack frame will be merged into caller function stack frame, and we get a caller function with large stack frame. This is caught by inspecting assembly code. - Solution: Add TF_ATTRIBUTE_NOINLINE to those inlined functions. PiperOrigin-RevId: 261851076

PiperOrigin-RevId: 2618565

PiperOrigin-RevId: 261857381

…rnal to Google. Most tests were already being run with XLA. This primarily ensures that any new tests will also be run with XLA in the future. Some contrib/ tests are disabled for XLA because there are no guarantees on contrib being supported. PiperOrigin-RevId: 261861931

PiperOrigin-RevId: 261867752

PiperOrigin-RevId: 261867753

PiperOrigin-RevId: 261887312

This CL modifies the LowerLinalgToLoopsPass to use RewritePattern. This will make it easier to inline Linalg generic functions and regions when emitting to loops in a subsequent CL. PiperOrigin-RevId: 261894120

This CL extends the Linalg GenericOp with an alternative way of specifying the body of the computation based on a single block region. The "fun" attribute becomes optional. Either a SymbolRef "fun" attribute or a single block region must be specified to describe the side-effect-free computation. Upon lowering to loops, the new region body is inlined in the innermost loop. The parser, verifier and pretty printer are extended. Appropriate roundtrip, negative and lowering to loop tests are added. PiperOrigin-RevId: 261895568

Sync upstream

Stay up to date

Prototype showed significant dispatch performance improvements from the new backend. This is the first of a series of commits to add a new PJRT backend. The intention is to eventually replace the existing StreamExecutor-based CPU backend. PiperOrigin-RevId: 367514967 Change-Id: I16c9523b604445015125ad2e42fd8822ec0c38c5

On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () tensorflow#7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () tensorflow#8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () tensorflow#9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () tensorflow#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () tensorflow#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 tensorflow#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```

tensorflower-gardener and others added 30 commits August 1, 2019 21:13

Explicitly include necessary headers.

911105c

PiperOrigin-RevId: 261242655

Autograph: More user friendly error messages for branch/loop output v…

b78c42b

…ariable mismatch. PiperOrigin-RevId: 261245922

Automated rollback of commit d8563bd

308e827

PiperOrigin-RevId: 261246913

TFL micro: Improve error handling for builtin operator + custom options.

2f6df37

PiperOrigin-RevId: 261246974

Allow stream rpc return error. If a streaming call failed on reading …

5012944

…response, get the status from the response and clean up all remaining calls in the queue. PiperOrigin-RevId: 261252523

Add int16 and uint16 to the type constraint of tfl.fully_connected

e706a0b

For the fully_connected layer, we have seen FakeQuant* with 16bits are used in the training models, so we should add this to the op type constraint to quantize these models. PiperOrigin-RevId: 261253563

Add maximum and minimum ops for TF Micro

00c62a3

PiperOrigin-RevId: 261267068

Add round op to TFLite MLIR Converter

b56df40

PiperOrigin-RevId: 261268533

Merge pull request tensorflow#31215 from Intel-tensorflow:utfixes2

caed89e

PiperOrigin-RevId: 261269625

Drop variant type subtypes info before computing broadcasted type in …

b92896f

…MergeOp verifier Variant types may have opaque subtypes info that need to match. Also, added constraint that all data operands and the output of the MergeOp are of tensor type. PiperOrigin-RevId: 261277322

compat: Update forward compatibility horizon to 2019-08-02

46127c5

PiperOrigin-RevId: 261284396

Update GraphDef version to 115.

99a8eb7

PiperOrigin-RevId: 261284400

Add optimized dequantize methods for int8 and int16

eb4504d

PiperOrigin-RevId: 261289882

Fix memory allocation problem when calling AddNewInputConstantTensor

cacdda5

PiperOrigin-RevId: 261294904

Refactor to share the neon rounding logic.

63fdf5f

PiperOrigin-RevId: 261296066

Autograph: Allow output shape being more specific in loop

e9b0a23

PiperOrigin-RevId: 261306157

address shading declaration issue - related to merge conflict

5930721

Fix compile error (use of avx512 intrinsics without including header)

555de8b

and wrong opt set (RUY_OPT_INTRINSICS, not RUY_OPT_ASM, there is no asm here). PiperOrigin-RevId: 261314107

address another issue related to merge conflict

5cd3224

Fix layout assignment kConditional handling

62a8a87

PiperOrigin-RevId: 261318771

Avoid expensive atomics altogether, including the allocation of arrays,

263c000

in the single-thread case. PiperOrigin-RevId: 261320407

ruy::ThreadPool: when there is only 1 task, don't even touch atomic c…

c6156aa

…ounters. Saves a store-release and a load-acquire (total ~100 cycles) per matmul. PiperOrigin-RevId: 261321407

mrry and others added 24 commits August 5, 2019 17:42

Switch FastParseSingleExample() to accept an absl::string_view.

3c98b45

PiperOrigin-RevId: 261806721

[SE] Accept memory limit as an argument for redzone allocator. Correc…

3c8582b

…tly pass it through in convolution kernel. PiperOrigin-RevId: 261808345

[SE] Clarify ScratchAllocator and DeviceMemoryAllocator semantics

d30d01d

PiperOrigin-RevId: 261809474

[XLA] Add exhaustive binary tests for F32 and F64.

85a9058

PiperOrigin-RevId: 261816030

Fix depthwise_conv test. Looks somehow they are not really tested. :(

2d6cca0

PiperOrigin-RevId: 261816763

[tf.data] Forward compatibility cleanup.

9239c61

PiperOrigin-RevId: 261816972

internal cleanup functions

476fd9f

PiperOrigin-RevId: 261820730

Do not pass input_length to K.rnn in RNN layer since it is redundant.

640b5f2

The input_length arg is passed as the maximum_iterations arg to tf.while_loop which adds a LogicalAnd to the loop condition which is slow on GPU. PiperOrigin-RevId: 261822039

Move binary functions to a separate header.

c7fd77e

PiperOrigin-RevId: 261823955

Automated rollback of commit e820dd6

f8e323e

PiperOrigin-RevId: 261828878

Fix MatMul with transpose_a

4623b73

PiperOrigin-RevId: 261840270

Remove the target tensorflow/core:lib_platform

fa1dd9a

It is now tensorflow/core/platform:platform PiperOrigin-RevId: 261843350

Add LOGICAL_AND, LOGICAL_OR operator to TFL micro.

e08fa9e

PiperOrigin-RevId: 2618565

Upgrade Bazel to 0.26.1

80560f9

PiperOrigin-RevId: 261857381

Update GraphDef version to 119.

0ddf7dd

PiperOrigin-RevId: 261867752

compat: Update forward compatibility horizon to 2019-08-06

ee20280

PiperOrigin-RevId: 261867753

Created a Gather functor that works with batch dimensions.

ecc0b95

PiperOrigin-RevId: 261887312

Refactor Linalg ops to loop lowering (NFC)

6c553ff

This CL modifies the LowerLinalgToLoopsPass to use RewritePattern. This will make it easier to inline Linalg generic functions and regions when emitting to loops in a subsequent CL. PiperOrigin-RevId: 261894120

Add Tensor support for LearningRateScheduler

66540bf

fsx950223 closed this Aug 17, 2019

fsx950223 deleted the callback branch August 27, 2019 00:15

fsx950223 pushed a commit that referenced this pull request Mar 8, 2020

Merge pull request #1 from tensorflow/master

559c28d

Sync upstream

fsx950223 pushed a commit that referenced this pull request Mar 31, 2021

Merge pull request #1 from tensorflow/master

90bf8c1

Stay up to date

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callback #1

Callback #1

fsx950223 commented Aug 17, 2019

Callback #1

Callback #1

Conversation

fsx950223 commented Aug 17, 2019