-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Development] Batch renderer integration #1200
Conversation
habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick skim and added some comments on this draft as you requested! Looking good so far!
habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py
Outdated
Show resolved
Hide resolved
habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. Looking good so far!
@@ -1462,6 +1472,7 @@ class HabitatConfig(HabitatBaseConfig): | |||
task: TaskConfig = MISSING | |||
dataset: DatasetConfig = MISSING | |||
gym: GymConfig = GymConfig() | |||
batch_renderer: BatchRendererConfig = BatchRendererConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is oddly specific. @vincentpierre can you weigh in?
Maybe a more generic RendererConfig
would be better? For example, I need a place to add a bunch of tuning knobs related to SSAO soon (a graphical effect).
Or perhaps all this should go inside SimulatorConfig
? Nowadays, we usually think of the sim and renderer as separate, but from a Gym perspective they are both part of the "simulator" (including camera sensor simulation, i.e. rendering).
There are a bunch of tuning knobs already related to rendering inside SimulatorConfig
, right? e.g. camera sensor fov and resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would make sense to nest RendererConfig
into SimulatorConfig
, and include renderer-specific parameters in there.
test/test_habitat_env.py
Outdated
) as envs: | ||
envs.initialize_batch_renderer(configs[0]) | ||
observations = envs.reset() | ||
envs.post_step(observations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, so you expect the owner of VectorEnv
(the calling code) to remember to call post_step
in all the right places? And that code isn't in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like my old draft PR did the same thing, and I was calling post_step
in ppo_trainer.py
:
https://github.com/facebookresearch/habitat-lab/pull/863/files#diff-2e3ecd82d2bf278a7c6b6b6e50f6e189c2a7d35ce1e947c97b6d8bf82cea3b8b
Is this planned for a follow-up PR? We can't do any training with this until calls to post_step
are integrated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, so you expect the owner of VectorEnv (the calling code) to remember to call post_step in all the right places?
Yes. An alternative would be to automatically call post_step
after reset
or step
, but the user needs to remember manually calling post_step
if the reset_at
or step_at
variants are used. I'm not a fan of this because we need to unwrap env outputs to get observations, then pack them again after post processing.
I opted to keep the call separated, and refactor later if we think that this is a better idea.
Is this planned for a follow-up PR? We can't do any training with this until calls to post_step are integrated, right?
Yes, I'm planning to add integration with rearrange sim and ppo_trainer.py
immediately after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the reset_at or step_at variants are used
I don't think these are widely used. We should print a warning or fatal error if folks try to use these with the batch renderer.
@vincentpierre @mathfac what do you think about marking reset_at
and step_at
as deprecated? These are not batch-friendly and we should discourage their use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm okay with requiring user code to call post_step
explicitly.
4c69e94
to
4383288
Compare
The build failures are due to the batch renderer context being leaked between tests. I'm investigating this. Edit: Fixed here: facebookresearch/habitat-sim#2043 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like someone with more knowledge of the sim batch renderer with weight in on this PR. The code looks good to me otherwise. A lot of comments, which I appreciate.
:property classic_replay_renderer: For debugging. Create a ClassicReplayRenderer instead of BatchReplayRenderer when enable_batch_renderer is active. | ||
""" | ||
|
||
enable_batch_renderer: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the goal to have this be True at some point ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. However, we'll reach feature parity with the legacy renderer before doing so.
@@ -290,6 +291,97 @@ def test_rl_vectorized_envs(gpu2gpu): | |||
), "dones should be true after max_episode steps" | |||
|
|||
|
|||
@pytest.mark.parametrize("classic_replay_renderer", [False, True]) | |||
@pytest.mark.parametrize("gpu2gpu", [False]) | |||
def test_rl_vectorized_envs_batch_renderer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this test. Would it be possible to add an end to end test as well? I am thinking adding batch_rendering to some of the training tests we have in test_baselines_training.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll address this in the following PRs that will introduce training.
habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py
Outdated
Show resolved
Hide resolved
will _always_ be false after :meth:`reset` or :meth:`get_observations_at` as neither of those | ||
result in an action (step) being taken. | ||
""" | ||
return self._prev_sim_obs.get("collided", False) | ||
|
||
def add_keyframe_to_observations(self, observations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of adding batch_rendering specific code to habitat_simulator. But I guess we have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to include this in habitat-sim
's simulator.py
class. However, this relates to functionality that entirely resides within habitat-lab
, and may cause headaches to the uninitiated.
…ring to constants.
…endererConfig. Add asserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you addressed all my earlier comments, except the one where I commented below just now. Try to resolve that and then this looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another minor comment.
habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py
Outdated
Show resolved
Hide resolved
* Add batch renderer configuration fields. * Set simulator configuration flags for batch rendering. * Add BatchRenderVectorEnv. * Add BatchRenderVectorEnv creation. * Add type annotations. * Add batch renderer class and load it. * Add render_state to sim observations when batch rendering. * Add batch rendered vector env test. * Add gpu-to-cpu batch rendering implementation. * Make test_rl_batch_render_envs execute gpu-to-cpu only flow only. * Clean up batch renderer. * Code clean-up. * Simplify transfer buffer. * Update docstrings. * Create a file containing batch rendering constants. Move hardcoded string to constants. * Formatting fix. * Change render function name to match new habitat-sim api. * Rename render state to keyframe. Rename batch vector env render function. * Assert config instead of automatically correcting it. * Remove BatchRendererVectorEnv. Directly use batch renderer from VectorEnv instead. * Format fix * Create batch renderer config. Move composite file specification in it. * Formatting fixes. * Rename BatchRenderer to EnvBatchRenderer and BatchRendererConfig to RendererConfig. Add asserts. * Move enable_batch_renderer into RendererConfig. * Formatting fixes. * Fix composite file config path change. * Compare rgb image produced by reset in batch env test. * Add classic replay renderer option. * Add classic replay renderer test. * Review pass. * Change replayer renderer render call to match main. * Fix package import for new directory. * Fix CI module import issue. * Change condition for assertion.
* Add batch renderer configuration fields. * Set simulator configuration flags for batch rendering. * Add BatchRenderVectorEnv. * Add BatchRenderVectorEnv creation. * Add type annotations. * Add batch renderer class and load it. * Add render_state to sim observations when batch rendering. * Add batch rendered vector env test. * Add gpu-to-cpu batch rendering implementation. * Make test_rl_batch_render_envs execute gpu-to-cpu only flow only. * Clean up batch renderer. * Code clean-up. * Simplify transfer buffer. * Update docstrings. * Create a file containing batch rendering constants. Move hardcoded string to constants. * Formatting fix. * Change render function name to match new habitat-sim api. * Rename render state to keyframe. Rename batch vector env render function. * Assert config instead of automatically correcting it. * Remove BatchRendererVectorEnv. Directly use batch renderer from VectorEnv instead. * Format fix * Create batch renderer config. Move composite file specification in it. * Formatting fixes. * Rename BatchRenderer to EnvBatchRenderer and BatchRendererConfig to RendererConfig. Add asserts. * Move enable_batch_renderer into RendererConfig. * Formatting fixes. * Fix composite file config path change. * Compare rgb image produced by reset in batch env test. * Add classic replay renderer option. * Add classic replay renderer test. * Review pass. * Change replayer renderer render call to match main. * Fix package import for new directory. * Fix CI module import issue. * Change condition for assertion.
Motivation and Context
This changeset adds a barebone integration of the batch renderer.
Context:
To scale up training, multiple concurrent workers are instantiated. Right now, each worker has their own isolated renderer. These renderers independently load their required graphics assets. In most cases, there's a significant overlap in the assets loaded by the workers, which fills up GPU memory to the point that it bounds how many concurrent simulators can run. Loading time also adds up to a significant proportion of training time.
The batch renderer aims to circumvent this by rendering all environments simultaneously. Instead of loading assets independently, all graphics assets that will be used during a roll-out are pre-loaded exactly once, at the beginning of training. This leads to smaller GPU memory usage and faster episode loading time.
Batching rendering also increases performance by rendering more efficiently (less drawcalls, more instancing, ...), leveraging data locality and minimizing concurrent GPU contexts.
How it works:
Internally, the system is a replay renderer, meaning that it renders gfx-replay keyframes emitted by simulators.
When batch rendering, simulators produce keyframes and add them to observations. In "post_step", the renderer aggregates these observations, reconstitutes each graphical states then renders them simultaneously.
How Has This Been Tested:
Tested locally and on CI.
Notes:
Supersedes: 863
Current limitations:
Depends on:
Checklist