Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools/sim: Implement MetaDrive simulator #27373

Closed
wants to merge 32 commits into from

Conversation

jon-chuang
Copy link

@jon-chuang jon-chuang commented Feb 17, 2023

Description
Add MetaDrive simulator.

Runs at 10 FPS on 1928 X 1208 inputs on an iGPU (Ryzen 6850H - Laptop).

We disable render screen.

Misc notes:

  • Refactored bridge.py into clean SimulatorBridge abstraction that shares common logic between CarlaBridge and MetaDriveBridge
  • bridge.py now accepts --simulator, --ticks_per_frame
  • Monkey patches MetaDrive methods for improved performance
  • We have to force WideCameraOnly for performance, in particular, needing to run in ExperimentalMode
  • We use similar fov=160 as the comma three wide camera & CARLA simulator camera

Verification
Adding an integration test similar to carla simulator's

First part of: #27284

Next steps:

  1. Implement streaming img/obs through Docker? (if openpilot UI already spawns from inside Docker, we don't need to)
  2. More e2e tests (CI: e2e tests with CARLA #26215)

tools/sim/bridge.py Outdated Show resolved Hide resolved
self.params.put_bool("WideCameraOnly", not arguments.dual_camera)
self.params.put_bool("DisengageOnAccelerator", True)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this...?

Hmmm... But anw we can't throttle_out while openpilot selfdrive is running...

@adeebshihadeh
Copy link
Contributor

adeebshihadeh commented Feb 17, 2023

Nice job! Some quick comments until I have some time to properly run and review this:

  • let's split out all the MetaDrive stuff into its own directory: tools/sim/metadrive/
  • common stuff shared with CARLA can go into something like tools/sim/common.py, though I wouldn't put much effort into abstractions for CARLA right now
  • I wouldn't really worry about the full e2e test until this whole thing is super solid, but a simple integration test like CARLA has is definitely welcome

This is my rough plan for openpilot simulator support:

  1. make a simple simulator experience that's extremely reliable (completely ignore CARLA for now)
  2. cleanup openpilot abstractions (e.g. replace the SIMULATOR environment variable checks with something cleaner)
  3. fix up CARLA integration using the nice simulator abstractions from the previous step
  4. do some cool higher-end features in CARLA, like traffic lights and stop signs
  5. use the simulators for e2e tests

We're still in step 1, which I expect will take a bit of time.

tools/sim/bridge.py Outdated Show resolved Hide resolved
@adeebshihadeh adeebshihadeh added PC Issues related to running openpilot on PC simulation running openpilot in environments like CARLA labels Feb 18, 2023
img = np.frombuffer(image.raw_data, dtype=np.dtype("uint8"))
img = np.reshape(img, (H, W, 4))
img = img[:, :, [0, 1, 2]].copy()
def cam_send_yuv_wide_road(self, yuv):
Copy link
Author

@jon-chuang jon-chuang Feb 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really need this anymore... (plus there's no point sending an image twice - except maybe to keep openpilot happy? Getting some CommIssue alerts now and again...

@jon-chuang jon-chuang force-pushed the jon/monkey-patch-bridge branch from 8ffa275 to 1899b41 Compare February 18, 2023 08:41
@jon-chuang
Copy link
Author

jon-chuang commented Feb 18, 2023

let's split out all the MetaDrive stuff into its own directory: tools/sim/metadrive/

common stuff shared with CARLA can go into something like tools/sim/common.py, though I wouldn't put much effort into abstractions for CARLA right now

Done, done. CARLA bridge is already factored.

I wouldn't really worry about the full e2e test until this whole thing is super solid, but a simple integration test like CARLA has is definitely welcome

The integration test is not going perfectly on local due to controlsdLagging and commIssue possibly due to my slow GPU. I will test out on my desktop once I get the AIO installed.

cleanup openpilot abstractions (e.g. replace the SIMULATOR environment variable checks with something cleaner)

Could you explain this in greater detail?


My plan:

Regarding MetaDrive simulator:

  1. Next PR implements integration tests, hopefully we can sort out the commIssue or controlsdLagging issues.We will also factor out common test logic into test_helper.py.
  2. Next, we'll make the simulator available as an experimental feature for people to play with on their own machines. We can collect some feedback on:
    • performance
    • choice of map (the current one has pretty sharp turns and openpilot can fail if turning at too high speeds - see also: Lateral over-correcting in longer/harder corners, riding close or on top of line #26677)
    • calibration: accuracy of controls, accuracy of speed display, realtime-ness of the simulation (we should tune for more realtime and be synced up with the rendered display on better devices, but it should be very playable even at low frame rates on low-spec devices like mine).
  3. Once metadrive has released 0.2.7 with their dependencies fixed, we can use that in pyproject.toml (currently uses a patched branch).
  4. Once we are confident, we can enable metadrive by default, rewrite the sim/README.md to emphasize metadrive over CARLA, reorganize scripts, and work on dockerizing.
  5. Concurrently, we should be writing more e2e tests that leverage the capabilities of openpilot. We can start by testing longitudinal control. For metadrive, interactions with other vehicles and with junctions would be nice. At this point, we will enable more features like Radar.

Blocked on:

  1. Issues blocking Integration tests (next PR). Need some assistance on this part.
{"event": "commIssue", "error": true, "invalid": ["liveLocationKalman", "liveParameters", "liveTorqueParameters"], "not_alive": ["testJoystick", "roadCameraState", "driverCameraState"], "not_freq_ok": ["testJoystick", "roadCameraState", "driverCameraState"], "can_rcv_timeout": false}

@@ -1,553 +1,24 @@
#!/usr/bin/env python3
Copy link
Author

@jon-chuang jon-chuang Feb 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I rename this file run_bridge.py, the line-by-line changes between bridge.py/common.py would be easier to review. The reviewer can let me know if they prefer this. Then when we merge, we can rename to bridge.py.

Copy link
Author

@jon-chuang jon-chuang Feb 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to run_bridge.py for now. Let's close this thread once we rename to back to bridge.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rmb to do this before merge.

@jon-chuang
Copy link
Author

jon-chuang commented Feb 18, 2023

Possibly failing due to optional dependencies. (just add pylint) Done.

I'd also like to re-enable the Carla integration test since we've refactored it (probably with errors) and I can't verify locally...

@jon-chuang
Copy link
Author

didn't really drive out of the box. seems there's an issue with modeld

It was working perfectly at time of submission, and then stopped working so well after merging main a couple of times. I'll debug this.

@jon-chuang
Copy link
Author

jon-chuang commented Mar 6, 2023

only runs at 10FPS on my machine

You will have to tweak the --ticks_per_frame parameter to possible 1 or 2. The default setting of 10 FPS (--ticks-per-frame=10) was for low-spec devices... (it would be nice to multiplex based on whether a dedicated GPU is detected on the machine).

Part of the bottleneck is running the model with OnnxCpuEP and the other half is the generation of the image data from the simulator.

I'll see if there are any panda3D settings that can help further with the latter.

@adeebshihadeh
Copy link
Contributor

@adeebshihadeh did you try mine? https://github.com/MoreTore/openpilot/tree/meta

I haven't. The diff is massive (500+ files) on that branch, so it's not easy to see what I'd be running.

Oh and it was my understanding that the bounty is locked when it's merged.

You're right. Though any after this one will have to be significantly less work to be merged since we're putting in the effort to review the first PR.

If comma clones metadrive I can make a PR to it to add openpilot support.

What do you mean? What needs to go in there that's openpilot specific?

@MoreTore
Copy link
Contributor

MoreTore commented Mar 6, 2023

I haven't. The diff is massive (500+ files) on that branch, so it's not easy to see what I'd be running.

If comma clones metadrive I can make a PR to it to add openpilot support.

I added metadrive to the base dir of openpilot. That was the simplest way for me to develop the changes. It would be added as a submodule in the end. The changes I made include the custom controller, the custom camera adjuster, and the IPC streamer. The streamer increases multi-thread performance.
https://github.com/MoreTore/openpilot/tree/meta/metadrive

@jon-chuang
Copy link
Author

jon-chuang commented Mar 6, 2023

The problem for me actually seems to be the CL kernel now. It is sometimes takine 200ms, and 400ms.

I don't recall seeing this initially.

In the worst case, I even get:

KERNEL_PORTION took 4708.432ms
yuv kernel took 4714.306ms

4.7 seconds to run the OpenCL kernel.

It seems that modeld's use of OpenCL is interfering. See for instance:
https://github.com/commaai/openpilot/blob/master/selfdrive/modeld/transforms/transform.h
https://github.com/commaai/openpilot/blob/59ed74ae4a7a51fcfa673393947b755d7b29c323/selfdrive/modeld/models/commonmodel.h

However, I don't understand what is clogging up the queue so badly. It may be some recent changes to master. @adeebshihadeh do you have any idea of recent changes that may have increased the usage of OpenCL kernels by openpilot?

Oops, turns out that not running git submodule update --recursive meant that certain errors occured due to miscompiled modules causing some sort of slowdown of opencl somewhere. All good now.

Oops, turns out that `LOGPRINT=debug` was actually causing the issue since a voluminous amount of logs is captured and printed on every frame... the observer effect is real!

@jon-chuang
Copy link
Author

didn't really drive out of the box. seems there's an issue with modeld

This should be fixed now. Just needed to merge master.

only runs at 10FPS on my machine (threadripper + 2080)

Let me know how it goes with --frames_per_tick=2 (50 FPS)

Btw, could you be more specific about what is the base spec we are targetting? Are we targetting laptops with no dGPU (like my current unfortunate setup?)

the output is super spammy, it's hard to see what's going on

Cleaned up the stdout

@jon-chuang
Copy link
Author

renamed to run_bridge.py for now. Let's close this thread once we rename to back to bridge.py

Reminder

@adeebshihadeh
Copy link
Contributor

@jon-chuang we can break up the first stage of the bounty/project, so $500 of the $2k for this PR to be merged with the following goals:

  • runs comfortably at 20fps on my high-end workstation with a real GPU, i.e. doesn't need to work on a laptop yet
  • drives within the lane for at least 5s (i.e. it should do something)
  • simple CI test similar to test_carla_integration.py

Most of the parts are here, just needs to be rebased and cleaned up a bit. I just merged the MetaDrive package in (#29736), so that should reduce the diff a bit.

@jon-chuang
Copy link
Author

Hello @adeebshihadeh , I'm no longer working on this project @MoreTore can take over if he wants.

@adeebshihadeh
Copy link
Contributor

Alright, thanks for kicking this off!

@jnewb1
Copy link
Contributor

jnewb1 commented Sep 21, 2023

took stuff from this PR and upstreamed it here: #29935

thanks again!

@jnewb1 jnewb1 closed this Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PC Issues related to running openpilot on PC simulation running openpilot in environments like CARLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants