You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To align with the HuggingFace transformer layer that requires dense inputs, we convert ragged inputs to dense before calling the TransformerBlock. As a result of this conversion, the outputs are also dense.
This approach can be costly because it means computing logit scores for all positions, even the padded ones. For example, this can impact performance when applying weight-tying multiplication between the hidden representation and all items' embeddings.
It would be helpful to convert the output of the transformer block to a ragged format, which would eliminate the need for padding and avoid unnecessary computation.
The text was updated successfully, but these errors were encountered:
To align with the HuggingFace transformer layer that requires dense inputs, we convert ragged inputs to dense before calling the TransformerBlock. As a result of this conversion, the outputs are also dense.
This approach can be costly because it means computing logit scores for all positions, even the padded ones. For example, this can impact performance when applying weight-tying multiplication between the hidden representation and all items' embeddings.
It would be helpful to convert the output of the transformer block to a ragged format, which would eliminate the need for padding and avoid unnecessary computation.
The text was updated successfully, but these errors were encountered: