Interesting paper, they perform RL in a contact-rich environment (so, hard for open loop, traj opt?) in assembly by using prior knowledge that we normally have a CAD design of stuff we're manufacturing. Then they show they can frame it as a reinforcement learning problem.
(Sorry, didn't get all the details during first read.)