Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Parallelize detensorizing token IDs during tree search #3730

Merged
merged 1 commit into from
Jun 18, 2021

Conversation

EricMichaelSmith
Copy link
Contributor

@EricMichaelSmith EricMichaelSmith commented Jun 16, 2021

Patch description
Currently, during any kind of tree search, the token ID tensor will be detensorized one item at a time, which is expensive. This PR rewrites that to detensorize the whole batch at once.

Performance on 1000 generations (-mf zoo:blender/blender_90M/model -t blended_skill_talk -ne 1000), on an otherwise unoccupied devfair:

  • Original code, trial 1: median elapsed time 783 ms, mean 797 ms
  • Original code, trial 2: median 770 ms, mean 785 ms
  • This PR, trial 1: median 660 ms, mean 671 ms
  • This PR, trial 2: median 656 ms, mean 667 ms

Testing steps
CI checks

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so good.

@stephenroller
Copy link
Contributor

do we need to port this change to torchscript?

@EricMichaelSmith
Copy link
Contributor Author

EricMichaelSmith commented Jun 17, 2021

do we need to port this change to torchscript?

@stephenroller Hmm, doesn't look like it, because torchscript export currently only supports greedy search

@EricMichaelSmith EricMichaelSmith merged commit d02d864 into master Jun 18, 2021
@EricMichaelSmith EricMichaelSmith deleted the tok-id-speedup branch June 18, 2021 20:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants