BeamSearchDecoder segmentation fault (on GPU)

**System information**
- OS Platform and Distribution: Manjaro Linux testing
- TensorFlow version: pypi tf-nightly 2.2.0.dev20200218
- TensorFlow-Addons version: pypi tfa-nightly 0.9.0.dev20200219
- Python version: 3.7.6
- Is GPU used? (yes/no): yes, Nvidia Titan XP

**Describe the bug**

Calling a BeamSearchDecoder results in a _Segmentation fault (core dumped)_.

**Code to reproduce the issue**
```python
import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow.python.ops import array_ops
import os

os.environ['CUDA_VISIBLE_DEVICES'] = '0'
# os.environ['AUTOGRAPH_VERBOSITY'] = '10'

cell = tf.keras.layers.LSTMCell(3)
mechanism = tfa.seq2seq.LuongAttention(units=3)
cell = tfa.seq2seq.AttentionWrapper(
    cell=cell,
    attention_mechanism=mechanism)

embedding_layer = tf.keras.layers.Embedding(
            input_dim=3,
            output_dim=3)

decoder = tfa.seq2seq.BeamSearchDecoder(
    cell=cell,
    beam_width=10,
    embedding_fn=embedding_layer,
    maximum_iterations=8)

dataset = tf.data.Dataset.from_tensor_slices(tf.ones((100, 7, 3))).batch(2)
my_iterator = iter(dataset)

@tf.function
def decode(it):

    data = next(it)
    bs = tf.shape(data)[0]

    tiled_memory = tfa.seq2seq.tile_batch(data, multiplier=10)
    mechanism.setup_memory(tiled_memory)

    attention_state = cell.get_initial_state(batch_size=bs*10, dtype=tf.float32)

    return decoder(
        embedding=None,
        start_tokens=array_ops.fill([2], 1),
        end_token=2,
        initial_state=attention_state,
    )
print(10*'=' + ' start ' + 10*'=')
print(decode(my_iterator))
print(10*'=' + ' end ' + 10*'=')

```

**Other info / logs**

[Code on colab](https://colab.research.google.com/drive/1WNMCyMiVK4l9vce2qaWbUm0MZ-n6VJ0g)
Not sure if this is related to https://github.com/tensorflow/addons/issues/990, as the issue targets tensorflow-cpu.

In eager mode the code still crashes with the following output:
```
========== start ==========
(FinalBeamSearchDecoderOutput(predicted_ids=<tf.Tensor: shape=(2, 8, 10), dtype=int32, numpy=
[...................]
========== end ==========
corrupted size vs. prev_size
Aborted (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BeamSearchDecoder segmentation fault (on GPU) #1109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BeamSearchDecoder segmentation fault (on GPU) #1109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions