Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Development] Batch renderer integration #1200

Merged
merged 35 commits into from
Apr 4, 2023
Merged

[Development] Batch renderer integration #1200

merged 35 commits into from
Apr 4, 2023

Conversation

0mdc
Copy link
Contributor

@0mdc 0mdc commented Mar 14, 2023

Motivation and Context

This changeset adds a barebone integration of the batch renderer.

Context:

To scale up training, multiple concurrent workers are instantiated. Right now, each worker has their own isolated renderer. These renderers independently load their required graphics assets. In most cases, there's a significant overlap in the assets loaded by the workers, which fills up GPU memory to the point that it bounds how many concurrent simulators can run. Loading time also adds up to a significant proportion of training time.

The batch renderer aims to circumvent this by rendering all environments simultaneously. Instead of loading assets independently, all graphics assets that will be used during a roll-out are pre-loaded exactly once, at the beginning of training. This leads to smaller GPU memory usage and faster episode loading time.

Batching rendering also increases performance by rendering more efficiently (less drawcalls, more instancing, ...), leveraging data locality and minimizing concurrent GPU contexts.

How it works:

Internally, the system is a replay renderer, meaning that it renders gfx-replay keyframes emitted by simulators.
When batch rendering, simulators produce keyframes and add them to observations. In "post_step", the renderer aggregates these observations, reconstitutes each graphical states then renders them simultaneously.

  1. Step:
    • Simulators step physics and skip rendering.
    • Each simulator record a gfx-replay keyframe and add it to their observations.
  2. Post-step:
    • The batch renderer aggregates observations from all workers associated to its GPU.
    • It uses gfx-replay keyframes to reconstruct the graphical state.
    • The batch renderer renders all sensors simultaneously and emplaces results in observations.

How Has This Been Tested:

Tested locally and on CI.

Notes:

Supersedes: 863

Current limitations:

  • Only GPU-to-CPU works.
  • Only one color sensor is supported.
  • Not yet integrated with training.

Depends on:

Checklist

  • My code follows the code style of this project.
  • I have updated the documentation if required.
  • I have read the CONTRIBUTING document.
  • I have completed my CLA (see CONTRIBUTING)
  • I have added tests to cover my changes if required.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 14, 2023
Copy link
Contributor

@eundersander eundersander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick skim and added some comments on this draft as you requested! Looking good so far!

habitat-lab/habitat/core/batch_renderer.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/vector_env.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/vector_env.py Outdated Show resolved Hide resolved
@0mdc 0mdc marked this pull request as ready for review March 16, 2023 20:55
Copy link
Contributor

@eundersander eundersander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Looking good so far!

habitat-lab/habitat/core/batch_renderer.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/batch_renderer.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/batch_renderer.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/batch_renderer.py Outdated Show resolved Hide resolved
test/test_habitat_env.py Show resolved Hide resolved
@@ -1462,6 +1472,7 @@ class HabitatConfig(HabitatBaseConfig):
task: TaskConfig = MISSING
dataset: DatasetConfig = MISSING
gym: GymConfig = GymConfig()
batch_renderer: BatchRendererConfig = BatchRendererConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is oddly specific. @vincentpierre can you weigh in?

Maybe a more generic RendererConfig would be better? For example, I need a place to add a bunch of tuning knobs related to SSAO soon (a graphical effect).

Or perhaps all this should go inside SimulatorConfig? Nowadays, we usually think of the sim and renderer as separate, but from a Gym perspective they are both part of the "simulator" (including camera sensor simulation, i.e. rendering).

There are a bunch of tuning knobs already related to rendering inside SimulatorConfig, right? e.g. camera sensor fov and resolution.

Copy link
Contributor Author

@0mdc 0mdc Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would make sense to nest RendererConfig into SimulatorConfig, and include renderer-specific parameters in there.

habitat-lab/habitat/config/default_structured_configs.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/batch_renderer.py Outdated Show resolved Hide resolved
) as envs:
envs.initialize_batch_renderer(configs[0])
observations = envs.reset()
envs.post_step(observations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, so you expect the owner of VectorEnv (the calling code) to remember to call post_step in all the right places? And that code isn't in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like my old draft PR did the same thing, and I was calling post_step in ppo_trainer.py:
https://github.com/facebookresearch/habitat-lab/pull/863/files#diff-2e3ecd82d2bf278a7c6b6b6e50f6e189c2a7d35ce1e947c97b6d8bf82cea3b8b

Is this planned for a follow-up PR? We can't do any training with this until calls to post_step are integrated, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, so you expect the owner of VectorEnv (the calling code) to remember to call post_step in all the right places?

Yes. An alternative would be to automatically call post_step after reset or step, but the user needs to remember manually calling post_step if the reset_at or step_at variants are used. I'm not a fan of this because we need to unwrap env outputs to get observations, then pack them again after post processing.

I opted to keep the call separated, and refactor later if we think that this is a better idea.

Is this planned for a follow-up PR? We can't do any training with this until calls to post_step are integrated, right?

Yes, I'm planning to add integration with rearrange sim and ppo_trainer.py immediately after.

Copy link
Contributor

@eundersander eundersander Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the reset_at or step_at variants are used

I don't think these are widely used. We should print a warning or fatal error if folks try to use these with the batch renderer.

@vincentpierre @mathfac what do you think about marking reset_at and step_at as deprecated? These are not batch-friendly and we should discourage their use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm okay with requiring user code to call post_step explicitly.

@0mdc 0mdc force-pushed the batch-renderer branch 2 times, most recently from 4c69e94 to 4383288 Compare March 21, 2023 18:58
@0mdc
Copy link
Contributor Author

0mdc commented Mar 21, 2023

The build failures are due to the batch renderer context being leaked between tests. I'm investigating this.

Edit: Fixed here: facebookresearch/habitat-sim#2043

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like someone with more knowledge of the sim batch renderer with weight in on this PR. The code looks good to me otherwise. A lot of comments, which I appreciate.

:property classic_replay_renderer: For debugging. Create a ClassicReplayRenderer instead of BatchReplayRenderer when enable_batch_renderer is active.
"""

enable_batch_renderer: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to have this be True at some point ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. However, we'll reach feature parity with the legacy renderer before doing so.

habitat-lab/habitat/core/env_batch_renderer.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/env_batch_renderer.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/env_batch_renderer.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/env_batch_renderer.py Outdated Show resolved Hide resolved
@@ -290,6 +291,97 @@ def test_rl_vectorized_envs(gpu2gpu):
), "dones should be true after max_episode steps"


@pytest.mark.parametrize("classic_replay_renderer", [False, True])
@pytest.mark.parametrize("gpu2gpu", [False])
def test_rl_vectorized_envs_batch_renderer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this test. Would it be possible to add an end to end test as well? I am thinking adding batch_rendering to some of the training tests we have in test_baselines_training.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address this in the following PRs that will introduce training.

habitat-lab/habitat/core/env_batch_renderer.py Outdated Show resolved Hide resolved
habitat-lab/habitat/core/env_batch_renderer.py Outdated Show resolved Hide resolved
will _always_ be false after :meth:`reset` or :meth:`get_observations_at` as neither of those
result in an action (step) being taken.
"""
return self._prev_sim_obs.get("collided", False)

def add_keyframe_to_observations(self, observations):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of adding batch_rendering specific code to habitat_simulator. But I guess we have to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to include this in habitat-sim's simulator.py class. However, this relates to functionality that entirely resides within habitat-lab, and may cause headaches to the uninitiated.

Copy link
Contributor

@eundersander eundersander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you addressed all my earlier comments, except the one where I commented below just now. Try to resolve that and then this looks good to me!

test/test_habitat_env.py Show resolved Hide resolved
Copy link
Contributor

@eundersander eundersander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another minor comment.

@0mdc 0mdc merged commit 4d9a3b9 into main Apr 4, 2023
@0mdc 0mdc deleted the batch-renderer branch May 29, 2023 14:28
dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request Jul 8, 2024
* Add batch renderer configuration fields.

* Set simulator configuration flags for batch rendering.

* Add BatchRenderVectorEnv.

* Add BatchRenderVectorEnv creation.

* Add type annotations.

* Add batch renderer class and load it.

* Add render_state to sim observations when batch rendering.

* Add batch rendered vector env test.

* Add gpu-to-cpu batch rendering implementation.

* Make test_rl_batch_render_envs execute gpu-to-cpu only flow only.

* Clean up batch renderer.

* Code clean-up.

* Simplify transfer buffer.

* Update docstrings.

* Create a file containing batch rendering constants. Move hardcoded string to constants.

* Formatting fix.

* Change render function name to match new habitat-sim api.

* Rename render state to keyframe. Rename batch vector env render function.

* Assert config instead of automatically correcting it.

* Remove BatchRendererVectorEnv. Directly use batch renderer from VectorEnv instead.

* Format fix

* Create batch renderer config. Move composite file specification in it.

* Formatting fixes.

* Rename BatchRenderer to EnvBatchRenderer and BatchRendererConfig to RendererConfig. Add asserts.

* Move enable_batch_renderer into RendererConfig.

* Formatting fixes.

* Fix composite file config path change.

* Compare rgb image produced by reset in batch env test.

* Add classic replay renderer option.

* Add classic replay renderer test.

* Review pass.

* Change replayer renderer render call to match main.

* Fix package import for new directory.

* Fix CI module import issue.

* Change condition for assertion.
HHYHRHY pushed a commit to SgtVincent/habitat-lab that referenced this pull request Aug 31, 2024
* Add batch renderer configuration fields.

* Set simulator configuration flags for batch rendering.

* Add BatchRenderVectorEnv.

* Add BatchRenderVectorEnv creation.

* Add type annotations.

* Add batch renderer class and load it.

* Add render_state to sim observations when batch rendering.

* Add batch rendered vector env test.

* Add gpu-to-cpu batch rendering implementation.

* Make test_rl_batch_render_envs execute gpu-to-cpu only flow only.

* Clean up batch renderer.

* Code clean-up.

* Simplify transfer buffer.

* Update docstrings.

* Create a file containing batch rendering constants. Move hardcoded string to constants.

* Formatting fix.

* Change render function name to match new habitat-sim api.

* Rename render state to keyframe. Rename batch vector env render function.

* Assert config instead of automatically correcting it.

* Remove BatchRendererVectorEnv. Directly use batch renderer from VectorEnv instead.

* Format fix

* Create batch renderer config. Move composite file specification in it.

* Formatting fixes.

* Rename BatchRenderer to EnvBatchRenderer and BatchRendererConfig to RendererConfig. Add asserts.

* Move enable_batch_renderer into RendererConfig.

* Formatting fixes.

* Fix composite file config path change.

* Compare rgb image produced by reset in batch env test.

* Add classic replay renderer option.

* Add classic replay renderer test.

* Review pass.

* Change replayer renderer render call to match main.

* Fix package import for new directory.

* Fix CI module import issue.

* Change condition for assertion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants