Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to barracuda 1.3.3 and changes to the model inputs and outputs for LSTM #5236

Merged
merged 10 commits into from
Apr 13, 2021

Conversation

vincentpierre
Copy link
Contributor

@vincentpierre vincentpierre commented Apr 8, 2021

Proposed change(s)

In Barracuda 1.3.3-preview, the LSTM module no longer uses the _c and _h special inputs and outputs. This means that we now have to manually feed the data of the LSTM like any other input/output.
This PR contains :

  • Changes to the LSTM module in pytorch (additional transposes for the ONNX export)
  • Changes to the Memory Applier and generator
  • A new model version called MLAgents_2_0_Recurrent
  • Changes to the ModelLoader to use this new model serialization

There is a bug in 1.3.3 with Profiler.BeginSample. We would probably need to use the next version for this change.
This change would make running 1.3.2 with new models impossible. Need to address that.

Future releases of barracuda will break backwards compatibility : Meaning that old LSTM models will no longer run with latest barracuda (and ML-Agents).

It seems the next release would need to break compatibility both ways : New models old C# and new C# old models.

This table summarizes the compatibility between barracuda and ml-agents :

genererated with  |        Barracuda version 
ML-Agents version |  1.3.2  |  1.3.3  |  1.4.0  |  2.0
-------------------------------------------------------
1.9.0             |   ok    |  bad    |   ok    |  bad 
2.0               |   bad   |  ok* ** |   ok    |  ok*
* (with this PR change only)
** (has console errors due to profiler)

KNOWN ISSUE : Updating to barracuda 1.3.3 will generate spam error messages in the console when using LSTM. This issue will be resolved in 1.4.0. We need to update to barracuda 1.4.0 BEFORE the next release.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@vincentpierre vincentpierre requested a review from chriselion April 8, 2021 21:41
@vincentpierre vincentpierre self-assigned this Apr 8, 2021
@@ -206,7 +207,16 @@ def forward(
# We don't use torch.split here since it is not supported by Barracuda
h0 = memories[:, :, : self.hidden_size].contiguous()
c0 = memories[:, :, self.hidden_size :].contiguous()

if exporting_to_onnx.is_exporting():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment above about torch.split still accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this corresponds to a slice operator, not a split.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think split was added to recent versions of Barracuda, and we could use it here if we're breaking compat with old versions anyway. https://docs.unity3d.com/Packages/com.unity.barracuda@1.3/manual/SupportedOperators.html#Split

@@ -19,8 +19,9 @@ internal enum ModelApiVersion
{
MLAgents1_0 = 2,
MLAgents2_0 = 3,
MLAgents2_0_Recurrent = 4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we haven't released a version that uses MLAgents2_0, I'd be in favor of just defining MLAgents2_0 to contain the LSTM changes too.

Either way, we should describe the differences between the versions here, just for our own internal knowledge.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you might consider calling MLAgents2_0 => MLAgents1_9 if that is the model version that went out with the last release.

@vincentpierre vincentpierre marked this pull request as ready for review April 9, 2021 22:51
@vincentpierre vincentpierre changed the title Experiment barr 1.3.3 Update to barracuda 1.3.3 and changes to the model inputs and outputs for LSTM Apr 9, 2021
Co-authored-by: Chris Elion <chris.elion@unity3d.com>
@vincentpierre vincentpierre requested a review from ervteng April 13, 2021 00:18
@@ -643,6 +643,7 @@ def forward(
At this moment, torch.onnx.export() doesn't accept None as tensor to be exported,
so the size of return tuple varies with action spec.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra line?

# This transpose is needed both at input and output of the LSTM when
# exporting because ONNX will expect (sequence_len, batch, memory_size)
# instead of (batch, sequence_len, memory_size)
h0 = torch.transpose(h0, 0, 1)
Copy link
Contributor

@ervteng ervteng Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should transpose it before the split into (h0, c0) as it will be marginally faster and will be symmetrical with the transpose on the output below. But it's not a dealbreaker.

@@ -38,7 +39,7 @@ different sizes using the same model. For a summary of the interface changes, pl

### Minor Changes
#### com.unity.ml-agents / com.unity.ml-agents.extensions (C#)
- The `.onnx` models input names have changed. All input placeholders will now use the prefix `obs_` removing the distinction between visual and vector observations. Models created with this version will not be usable with previous versions of the package (#5080)
- The `.onnx` models input names have changed. All input placeholders will now use the prefix `obs_` removing the distinction between visual and vector observations. Models created with this version will not be usable with previous versions of the package (#5080, #5236)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something about LSTM as well

Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but otherwise looks fine

@vincentpierre vincentpierre merged commit c6c28b7 into main Apr 13, 2021
@delete-merged-branch delete-merged-branch bot deleted the experiment-barr-1.3.3 branch April 13, 2021 20:36
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants