My attempt to understand how smart the openpilot is. Inspired by this blogpost: How open pilot works(2017). Which is till somehow relevant, but misses all the interesting details about the vision system.
Before sending to the model image taken from the camera goes trough a few pre-reprocessing steps. There are a lot of undocumented low-level opencl code to transform the image. Super difficult to understand what is happening there 😱 Looks like they do two main steps:
Transform image to a road plane (transform.c)
They converted to YUV 4:2:0 (vision.cc). YUV420 has a different dimensions from different channels(U and V are half of Y), it took me a while to understand how do the give to NN which must expect all channels to be of the same size. I think after this they do the channels transformation as described in this Efficient Pre-Processor for CNN paper
I cannot confirm the last step, as it’s very hard to understand the opencl code. But the fact that they ended up with 6x128x256 input layers for their CNN, makes it very likely. Main purpose is to reduce computation on the device. Also my experiment kind of confirms this, as I managed to get a proper result our of their model after doing this channel trick.
They use Qualcomm Neural Processing SDK - deep learning library optimized for Snapdragon chips. As I understand it can do only inference. It stores the models in the proprietary *.dlc format.
The SDK allows importing models from Caffe, ONNX and TF. But does not allows exporting into another formats! Essentially this means that stealing their model is not very straight forward. When you think about it, it kind of makes sense. The SDK is supposed to be used on smartphones, where the model is shipped as an resource within an apk. And essentially can be taken by anyone. I found an interesting talk and slides on DL model security.
I have written a simple python wrapper around the SNPE model SDK so that I can use it from jupyter.
It looks very custom made. Open html file for more details.
Some kind of resnet/inception for the encoder with 1 dense layer on top produces 1x512 feature vector.
The last concatenated with 1x8 desire vector. Not sure what it is, they always pass NULL at the moment. My guess it’s related to the route planner, e.g. telling model where you want to go 🛣️.
That extended 1x520 feature vector goes into LSTM (it’s optional but enabled in the code at the moment).
Output of LSTM goes into 4 separate dense 3-layer MDNs. The road coordinate system: x→forward, y→left, z→up. They predict prob distribution of lane
Number of elements | Meaning |
---|---|
192 | Y coordinate of the predicted path |
192 | std of each coordinate from above |
192 | Y coordinate of the left lane |
192 | std of each point above |
1 | confidence of the left lane prediction |
192 | Y coordinate of the right lane |
192 | std of above each point above |
1 | confidence of the right lane prediction |
58 | Lead car prediction, see below |
512 | LSTM cell state |
I think I managed to receive a resonable result from the model. Expect that path prediction does not work at all(perhaps it's need RNN input):
The lead(last 1x58 output) seems to be a MDN of size 5 which estimates the location of the can we are following.
// Every output distribution from the MDN includes the probabilties
// of it representing a current lead car, a lead car in 2s
// or a lead car in 4s
They use the following structure:
0 | dist | distance to the car (0, 140.0) |
---|---|---|
1 | y | I guess it’s a horizontal offset |
2 | v | Relative velocity (0, 10.0) |
3 | a | Angle I assume |
4 | std(dist), softplus | |
5 | std(y), softplus | |
6 | std(v), softplus | |
7 | std(a), softplus | |
8 | ? | Mixture params. Gausian with max field is used for the lead_car |
9 | ? | Mixture params. Gausian with max field is used for the furute_ead_car |
10 | ? | |
5x above | ||
55 | prob lead | ? |
56 | prod future lead | ? |
57 | ? | ? |
I haven’t seen it being used anywhere except visualization in UI module.
Planning is split in 2 parts: longitudinal(planner.py) and lateral(pathplaner.py, lane_planner.py).
Points and stds from the model are used to fit 4 degree polynomial for path, left and right lane. As I can see only polynomials are later used for path planning.
Longitudinal planner provides velocity and acceleration. There seems to be 4 sources for predicting those:
- Cruise control of the car
- Path predicted by the DL model. Curvature used to estimate maximum feasible speed
- Two different MPC solvers. Not sure what is the difference between them
The final controls are chosen using some kind of heuristic. Seems to prefer the “slowest” model, which makes sense for safety and comfort reasons.
Choosing the path to follow:
The result is transmitted to the control module as a PathPlan. There is a lot of code which calculates accelerations, steering angles and checks braking. Very complex and hard to read. In general it seems that control module will execute the path via PID, LQR or INDI controller.
> make notebook
Takes some time to build a docker image
- Try multiple frames to test what LSTM is doing. I think this is needed to get a proper path prediction from the model.
- They work on lane change logic according to this video. There seems to be a place in the network to ingest an intent. I am curious to try it when they release it.