[Development] Batch renderer integration #1200

0mdc · 2023-03-14T20:58:28Z

Motivation and Context

This changeset adds a barebone integration of the batch renderer.

Context:

To scale up training, multiple concurrent workers are instantiated. Right now, each worker has their own isolated renderer. These renderers independently load their required graphics assets. In most cases, there's a significant overlap in the assets loaded by the workers, which fills up GPU memory to the point that it bounds how many concurrent simulators can run. Loading time also adds up to a significant proportion of training time.

The batch renderer aims to circumvent this by rendering all environments simultaneously. Instead of loading assets independently, all graphics assets that will be used during a roll-out are pre-loaded exactly once, at the beginning of training. This leads to smaller GPU memory usage and faster episode loading time.

Batching rendering also increases performance by rendering more efficiently (less drawcalls, more instancing, ...), leveraging data locality and minimizing concurrent GPU contexts.

How it works:

Internally, the system is a replay renderer, meaning that it renders gfx-replay keyframes emitted by simulators.
When batch rendering, simulators produce keyframes and add them to observations. In "post_step", the renderer aggregates these observations, reconstitutes each graphical states then renders them simultaneously.

Step:
- Simulators step physics and skip rendering.
- Each simulator record a gfx-replay keyframe and add it to their observations.
Post-step:
- The batch renderer aggregates observations from all workers associated to its GPU.
- It uses gfx-replay keyframes to reconstruct the graphical state.
- The batch renderer renders all sensors simultaneously and emplaces results in observations.

How Has This Been Tested:

Tested locally and on CI.

Notes:

Supersedes: 863

Current limitations:

Only GPU-to-CPU works.
Only one color sensor is supported.
Not yet integrated with training.

Depends on:

Checklist

My code follows the code style of this project.
I have updated the documentation if required.
I have read the CONTRIBUTING document.
I have completed my CLA (see CONTRIBUTING)
I have added tests to cover my changes if required.

habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py

eundersander

I did a quick skim and added some comments on this draft as you requested! Looking good so far!

habitat-lab/habitat/core/batch_renderer.py

habitat-lab/habitat/core/vector_env.py

habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py

habitat-lab/habitat/core/vector_env.py

eundersander

I left some comments. Looking good so far!

habitat-lab/habitat/core/batch_renderer.py

test/test_habitat_env.py

eundersander · 2023-03-17T02:11:42Z

habitat-lab/habitat/config/default_structured_configs.py

@@ -1462,6 +1472,7 @@ class HabitatConfig(HabitatBaseConfig):
    task: TaskConfig = MISSING
    dataset: DatasetConfig = MISSING
    gym: GymConfig = GymConfig()
+    batch_renderer: BatchRendererConfig = BatchRendererConfig()


This is oddly specific. @vincentpierre can you weigh in?

Maybe a more generic RendererConfig would be better? For example, I need a place to add a bunch of tuning knobs related to SSAO soon (a graphical effect).

Or perhaps all this should go inside SimulatorConfig? Nowadays, we usually think of the sim and renderer as separate, but from a Gym perspective they are both part of the "simulator" (including camera sensor simulation, i.e. rendering).

There are a bunch of tuning knobs already related to rendering inside SimulatorConfig, right? e.g. camera sensor fov and resolution.

I think that it would make sense to nest RendererConfig into SimulatorConfig, and include renderer-specific parameters in there.

habitat-lab/habitat/config/default_structured_configs.py

habitat-baselines/habitat_baselines/common/construct_vector_env.py

habitat-lab/habitat/core/batch_renderer.py

eundersander · 2023-03-17T02:29:26Z

test/test_habitat_env.py

+    ) as envs:
+        envs.initialize_batch_renderer(configs[0])
+        observations = envs.reset()
+        envs.post_step(observations)


Ah, ok, so you expect the owner of VectorEnv (the calling code) to remember to call post_step in all the right places? And that code isn't in this PR?

It looks like my old draft PR did the same thing, and I was calling post_step in ppo_trainer.py:
https://github.com/facebookresearch/habitat-lab/pull/863/files#diff-2e3ecd82d2bf278a7c6b6b6e50f6e189c2a7d35ce1e947c97b6d8bf82cea3b8b

Is this planned for a follow-up PR? We can't do any training with this until calls to post_step are integrated, right?

Ah, ok, so you expect the owner of VectorEnv (the calling code) to remember to call post_step in all the right places?

Yes. An alternative would be to automatically call post_step after reset or step, but the user needs to remember manually calling post_step if the reset_at or step_at variants are used. I'm not a fan of this because we need to unwrap env outputs to get observations, then pack them again after post processing.

I opted to keep the call separated, and refactor later if we think that this is a better idea.

Is this planned for a follow-up PR? We can't do any training with this until calls to post_step are integrated, right?

Yes, I'm planning to add integration with rearrange sim and ppo_trainer.py immediately after.

if the reset_at or step_at variants are used

I don't think these are widely used. We should print a warning or fatal error if folks try to use these with the batch renderer.

@vincentpierre @mathfac what do you think about marking reset_at and step_at as deprecated? These are not batch-friendly and we should discourage their use.

Also, I'm okay with requiring user code to call post_step explicitly.

0mdc · 2023-03-21T22:31:59Z

The build failures are due to the batch renderer context being leaked between tests. I'm investigating this.

Edit: Fixed here: facebookresearch/habitat-sim#2043

vincentpierre

I would like someone with more knowledge of the sim batch renderer with weight in on this PR. The code looks good to me otherwise. A lot of comments, which I appreciate.

vincentpierre · 2023-03-23T20:28:01Z

habitat-lab/habitat/config/default_structured_configs.py

+    :property classic_replay_renderer: For debugging. Create a ClassicReplayRenderer instead of BatchReplayRenderer when enable_batch_renderer is active.
+    """
+
+    enable_batch_renderer: bool = False


Is the goal to have this be True at some point ?

Yes. However, we'll reach feature parity with the legacy renderer before doing so.

habitat-lab/habitat/core/env_batch_renderer.py

vincentpierre · 2023-03-23T20:50:21Z

test/test_habitat_env.py

@@ -290,6 +291,97 @@ def test_rl_vectorized_envs(gpu2gpu):
                ), "dones should be true after max_episode steps"


+@pytest.mark.parametrize("classic_replay_renderer", [False, True])
+@pytest.mark.parametrize("gpu2gpu", [False])
+def test_rl_vectorized_envs_batch_renderer(


Love this test. Would it be possible to add an end to end test as well? I am thinking adding batch_rendering to some of the training tests we have in test_baselines_training.py.

I'll address this in the following PRs that will introduce training.

habitat-lab/habitat/core/env_batch_renderer.py

habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py

vincentpierre · 2023-03-23T21:04:40Z

habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py

            will _always_ be false after :meth:`reset` or :meth:`get_observations_at` as neither of those
            result in an action (step) being taken.
        """
        return self._prev_sim_obs.get("collided", False)
+
+    def add_keyframe_to_observations(self, observations):


Not a fan of adding batch_rendering specific code to habitat_simulator. But I guess we have to.

Another option would be to include this in habitat-sim's simulator.py class. However, this relates to functionality that entirely resides within habitat-lab, and may cause headaches to the uninitiated.

…ring to constants.

…ion.

…rEnv instead.

…endererConfig. Add asserts.

eundersander

Looks like you addressed all my earlier comments, except the one where I commented below just now. Try to resolve that and then this looks good to me!

test/test_habitat_env.py

eundersander

Another minor comment.

habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py

* Add batch renderer configuration fields. * Set simulator configuration flags for batch rendering. * Add BatchRenderVectorEnv. * Add BatchRenderVectorEnv creation. * Add type annotations. * Add batch renderer class and load it. * Add render_state to sim observations when batch rendering. * Add batch rendered vector env test. * Add gpu-to-cpu batch rendering implementation. * Make test_rl_batch_render_envs execute gpu-to-cpu only flow only. * Clean up batch renderer. * Code clean-up. * Simplify transfer buffer. * Update docstrings. * Create a file containing batch rendering constants. Move hardcoded string to constants. * Formatting fix. * Change render function name to match new habitat-sim api. * Rename render state to keyframe. Rename batch vector env render function. * Assert config instead of automatically correcting it. * Remove BatchRendererVectorEnv. Directly use batch renderer from VectorEnv instead. * Format fix * Create batch renderer config. Move composite file specification in it. * Formatting fixes. * Rename BatchRenderer to EnvBatchRenderer and BatchRendererConfig to RendererConfig. Add asserts. * Move enable_batch_renderer into RendererConfig. * Formatting fixes. * Fix composite file config path change. * Compare rgb image produced by reset in batch env test. * Add classic replay renderer option. * Add classic replay renderer test. * Review pass. * Change replayer renderer render call to match main. * Fix package import for new directory. * Fix CI module import issue. * Change condition for assertion.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 14, 2023

0mdc commented Mar 14, 2023

View reviewed changes

habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py Outdated Show resolved Hide resolved

eundersander reviewed Mar 14, 2023

View reviewed changes

0mdc force-pushed the batch-renderer branch from 49e53a1 to b0a76bc Compare March 16, 2023 03:11

0mdc marked this pull request as ready for review March 16, 2023 20:55

0mdc requested review from mathfac, vincentpierre, aclegg3 and ASzot March 16, 2023 20:56

eundersander reviewed Mar 17, 2023

View reviewed changes

0mdc force-pushed the batch-renderer branch 2 times, most recently from 4c69e94 to 4383288 Compare March 21, 2023 18:58

vincentpierre approved these changes Mar 23, 2023

View reviewed changes

0mdc force-pushed the batch-renderer branch from 675ff66 to 97eef9a Compare March 27, 2023 22:33

0mdc added 15 commits March 27, 2023 18:33

Add batch renderer configuration fields.

f6bb80f

Set simulator configuration flags for batch rendering.

4e3c091

Add BatchRenderVectorEnv.

3745b59

Add BatchRenderVectorEnv creation.

acc09e7

Add type annotations.

59231ce

Add batch renderer class and load it.

aa5680e

Add render_state to sim observations when batch rendering.

48bfa22

Add batch rendered vector env test.

74b4886

Add gpu-to-cpu batch rendering implementation.

07c355a

Make test_rl_batch_render_envs execute gpu-to-cpu only flow only.

03a6607

Clean up batch renderer.

d5e862d

Code clean-up.

a068cd9

Simplify transfer buffer.

39ad565

Update docstrings.

c94a89d

Create a file containing batch rendering constants. Move hardcoded st…

ec9ef73

…ring to constants.

0mdc and others added 17 commits March 27, 2023 18:33

Formatting fix.

d89c87a

Change render function name to match new habitat-sim api.

d04f809

Rename render state to keyframe. Rename batch vector env render funct…

988e0a7

…ion.

Assert config instead of automatically correcting it.

1770b4b

Remove BatchRendererVectorEnv. Directly use batch renderer from Vecto…

1177353

…rEnv instead.

Format fix

7bfa257

Create batch renderer config. Move composite file specification in it.

0668add

Formatting fixes.

1855dcd

Rename BatchRenderer to EnvBatchRenderer and BatchRendererConfig to R…

0d55286

…endererConfig. Add asserts.

Move enable_batch_renderer into RendererConfig.

35ba958

Formatting fixes.

5eb470d

Fix composite file config path change.

a0a3f0b

Compare rgb image produced by reset in batch env test.

1edfab5

Add classic replay renderer option.

127466c

Add classic replay renderer test.

f7eb9b3

Review pass.

83b7b31

Change replayer renderer render call to match main.

eaa205d

0mdc force-pushed the batch-renderer branch from 97eef9a to eb06290 Compare March 27, 2023 22:34

Fix package import for new directory.

eb042c7

0mdc force-pushed the batch-renderer branch from eb06290 to eb042c7 Compare April 1, 2023 14:52

Fix CI module import issue.

b08ee5e

eundersander approved these changes Apr 3, 2023

View reviewed changes

test/test_habitat_env.py Show resolved Hide resolved

eundersander reviewed Apr 3, 2023

View reviewed changes

habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py Outdated Show resolved Hide resolved

Change condition for assertion.

c06f18a

0mdc merged commit 4d9a3b9 into main Apr 4, 2023

0mdc deleted the batch-renderer branch May 29, 2023 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Development] Batch renderer integration #1200

[Development] Batch renderer integration #1200

0mdc commented Mar 14, 2023 •

edited

Loading

eundersander left a comment

eundersander left a comment

eundersander Mar 17, 2023

0mdc Mar 17, 2023 •

edited

Loading

eundersander Mar 17, 2023

eundersander Mar 17, 2023

0mdc Mar 17, 2023

eundersander Mar 20, 2023 •

edited

Loading

eundersander Mar 20, 2023

0mdc commented Mar 21, 2023 •

edited

Loading

vincentpierre left a comment

vincentpierre Mar 23, 2023

0mdc Mar 23, 2023

vincentpierre Mar 23, 2023

0mdc Apr 4, 2023

vincentpierre Mar 23, 2023

0mdc Mar 24, 2023

eundersander left a comment

eundersander left a comment

[Development] Batch renderer integration #1200

[Development] Batch renderer integration #1200

Conversation

0mdc commented Mar 14, 2023 • edited Loading

Motivation and Context

Context:

How it works:

How Has This Been Tested:

Notes:

Checklist

eundersander left a comment

Choose a reason for hiding this comment

eundersander left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0mdc Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eundersander Mar 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0mdc commented Mar 21, 2023 • edited Loading

vincentpierre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eundersander left a comment

Choose a reason for hiding this comment

eundersander left a comment

Choose a reason for hiding this comment

0mdc commented Mar 14, 2023 •

edited

Loading

0mdc Mar 17, 2023 •

edited

Loading

eundersander Mar 20, 2023 •

edited

Loading

0mdc commented Mar 21, 2023 •

edited

Loading