hi! First of all, thank you for publishing this impressive work - thank you very much for your contribution. I would like to ask a question: During the visual encoding process of the model, did it only use the point cloud information and not the RGB image information? If it did use the RGB image information, then in which specific step? I hope you can reply as soon as possible. Thank you very much.