Skip to content

Latest commit

 

History

History
executable file
·
112 lines (87 loc) · 14.6 KB

2024-03-30.md

File metadata and controls

executable file
·
112 lines (87 loc) · 14.6 KB

[UPDATED!] 2024-03-30 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-03-30 Denoising Monte Carlo Renders With Diffusion Models 使用扩散模型对蒙特卡洛渲染进行去噪 Vaibhav Vavilala, Rahul Vasanth, David Forsyth http://arxiv.org/pdf/2404.00491v1 null
2024-03-30 DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans DiffHuman:概率真实感 3D 人体重建 Akash Sengupta, Thiemo Alldieck, Nikos Kolotouros, Enric Corona, Andrei Zanfir, Cristian Sminchisescu http://arxiv.org/pdf/2404.00485v1 null
2024-03-30 Score-Based Diffusion Models for Photoacoustic Tomography Image Reconstruction 用于光声断层扫描图像重建的基于分数的扩散模型 Sreemanti Dey, Snigdha Saha, Berthy T. Feng, Manxiu Cui, Laure Delisle, Oscar Leong, Lihong V. Wang, Katherine L. Bouman http://arxiv.org/pdf/2404.00471v1 null
2024-03-30 Towards Variable and Coordinated Holistic Co-Speech Motion Generation 实现可变且协调的整体语音动作生成 Yifei Liu, Qiong Cao, Yandong Wen, Huaiguang Jiang, Changxing Ding http://arxiv.org/pdf/2404.00368v1 null
2024-03-30 Spread Your Wings: A Radial Strip Transformer for Image Deblurring 张开翅膀:用于图像去模糊的径向条形变压器 Duosheng Chen, Shihao Zhou, Jinshan Pan, Jinglei Shi, Lishen Qu, Jufeng Yang http://arxiv.org/pdf/2404.00358v1 null
2024-03-30 Grid Diffusion Models for Text-to-Video Generation 用于文本到视频生成的网格扩散模型 Taegyeong Lee, Soyeong Kwon, Taehwan Kim http://arxiv.org/pdf/2404.00234v1 null
2024-03-30 Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space 潜在水印:在潜在扩散空间中注入和检测水印 Zheling Meng, Bo Peng, Jing Dong http://arxiv.org/pdf/2404.00230v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-03-30 SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs SceneGraphLoc:3D 场景图上的跨模态粗略视觉定位 Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Dániel Béla Baráth http://arxiv.org/pdf/2404.00469v1 null
2024-03-30 MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text MaGRITTe:从图像、俯视图和文本中进行操作和生成 3D 实现 Takayuki Hara, Tatsuya Harada http://arxiv.org/pdf/2404.00345v1 null
2024-03-30 Learned Scanpaths Aid Blind Panoramic Video Quality Assessment 学习的扫描路径有助于盲式全景视频质量评估 Kanglong Fan, Wen Wen, Mu Li, Yifan Peng, Kede Ma http://arxiv.org/pdf/2404.00252v1 null
2024-03-30 Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training 根据需要进行设计:利用视觉问答进行多模式预训练 Tongkun Su, Jun Li, Xi Zhang, Haibo Jin, Hao Chen, Qiong Wang, Faqin Lv, Baoliang Zhao, Yin Hu http://arxiv.org/pdf/2404.00226v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-03-30 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting 3DGSR:使用 3D 高斯泼溅进行隐式表面重建 Xiaoyang Lyu, Yang-Tian Sun, Yi-Hua Huang, Xiuzhe Wu, Ziyi Yang, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi http://arxiv.org/pdf/2404.00409v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-03-30 Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation 协调潜在专业知识:通过多级监督和反向自我蒸馏推进在线持续学习 HongWei Yan, Liyuan Wang, Kaisheng Ma, Yi Zhong http://arxiv.org/pdf/2404.00417v1 null
2024-03-30 TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias TTD:文本标签自蒸馏增强 CLIP 中的图像文本对齐,以减轻单标签偏差 Sanghyun Jo, Soohyun Ryu, Sungyub Kim, Eunho Yang, Kyungsu Kim http://arxiv.org/pdf/2404.00384v1 link
2024-03-30 Long-Tailed Recognition on Binary Networks by Calibrating A Pre-trained Model 通过校准预训练模型进行二元网络长尾识别 Jihun Kim, Dahyun Kim, Hyungrok Jung, Taeil Oh, Jonghyun Choi http://arxiv.org/pdf/2404.00285v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-03-30 DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation DHR:弱监督语义分割的类间和类内区域的双特征驱动的层次再平衡 Sanghyun Jo, Fei Pan, In-Jae Yu, Kyungsu Kim http://arxiv.org/pdf/2404.00380v1 link
2024-03-30 The Devil is in the Edges: Monocular Depth Estimation with Edge-aware Consistency Fusion 魔鬼在边缘:具有边缘感知一致性融合的单目深度估计 Pengzhi Li, Yikang Ding, Haohan Wang, Chengshuai Tang, Zhiheng Li http://arxiv.org/pdf/2404.00373v1 null
2024-03-30 Efficient Multi-branch Segmentation Network for Situation Awareness in Autonomous Navigation 用于自主导航态势感知的高效多分支分割网络 Guan-Cheng Zhou, Chen Chengb, Yan-zhou Chena http://arxiv.org/pdf/2404.00366v1 null
2024-03-30 Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint 重新思考基于注意力的多实例学习用于全幻灯片病理图像分类:实例属性观点 Linghan Cai, Shenjin Huang, Ye Zhang, Jinpeng Lu, Yongbing Zhang http://arxiv.org/pdf/2404.00351v1 null
2024-03-30 YNetr: Dual-Encoder architecture on Plain Scan Liver Tumors (PSLT) YNetr:平扫肝脏肿瘤 (PSLT) 的双编码器架构 Wen Sheng, Zhong Zheng, Jiajun Liu, Han Lu, Hanyuan Zhang, Zhengyong Jiang, Zhihong Zhang, Daoping Zhu http://arxiv.org/pdf/2404.00327v1 null
2024-03-30 CLIP-driven Outliers Synthesis for few-shot OOD detection 用于小样本 OOD 检测的 CLIP 驱动的离群值合成 Hao Sun, Rundong He, Zhongyi Han, Zhicong Lin, Yongshun Gong, Yilong Yin http://arxiv.org/pdf/2404.00323v1 null
2024-03-30 Instrument-tissue Interaction Detection Framework for Surgical Video Understanding 用于手术视频理解的仪器-组织相互作用检测框架 Wenjun Lin, Yan Hu, Huazhu Fu, Mingming Yang, Chin-Boon Chng, Ryo Kawasaki, Cheekong Chui, Jiang Liu http://arxiv.org/pdf/2404.00322v1 null
2024-03-30 Bayesian Exploration of Pre-trained Models for Low-shot Image Classification 用于低样本图像分类的预训练模型的贝叶斯探索 Yibo Miao, Yu Lei, Feng Zhou, Zhijie Deng http://arxiv.org/pdf/2404.00312v1 null
2024-03-30 HSIMamba: Hyperpsectral Imaging Efficient Feature Learning with Bidirectional State Space for Classification HSIMamba:使用双向状态空间进行分类的超光谱成像高效特征学习 Judy X Yang, Jun Zhou, Jing Wang, Hui Tian, Alan Wee Chung Liew http://arxiv.org/pdf/2404.00272v1 null
2024-03-30 Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation 通过基础模型进行图像到图像匹配:开放词汇语义分割的新视角 Yuan Wang, Rui Sun, Naisong Luo, Yuwen Pan, Tianzhu Zhang http://arxiv.org/pdf/2404.00262v1 null
2024-03-30 YOLOOC: YOLO-based Open-Class Incremental Object Detection with Novel Class Discovery YOLOOC:基于 YOLO 的开放类增量对象检测与新类发现 Qian Wan, Xiang Xiang, Qinhao Zhou http://arxiv.org/pdf/2404.00257v1 null
2024-03-30 Attention-based Shape-Deformation Networks for Artifact-Free Geometry Reconstruction of Lumbar Spine from MR Images 基于注意力的形状变形网络,用于从 MR 图像中进行腰椎无伪影几何重建 Linchen Qian, Jiasong Chen, Linhai Ma, Timur Urakov, Weiyong Gu, Liang Liang http://arxiv.org/pdf/2404.00231v1 null

GNN

Publish Date Title Title_CN Authors PDF Code
2024-03-30 Constrained Layout Generation with Factor Graphs 使用因子图生成约束布局 Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng, Guoji Fu, Yong Liang Goh, Wei Lu, Wee Sun Lee http://arxiv.org/pdf/2404.00385v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-03-30 Reusable Architecture Growth for Continual Stereo Matching 用于持续立体匹配的可重用架构增长 Chenghao Zhang, Gaofeng Meng, Bin Fan, Kun Tian, Zhaoxiang Zhang, Shiming Xiang, Chunhong Pan http://arxiv.org/pdf/2404.00360v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-03-30 SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout SVGCraft:超越单一对象文本到 SVG 合成,具有全面的画布布局 Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, Anjan Dutta http://arxiv.org/pdf/2404.00412v1 null
2024-03-30 Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation 通过程序生成的 3D 场景表示,使用大型语言和视觉模型与机器人一起探索看不见的环境 Arjun P S, Andrew Melnik, Gora Chand Nandi http://arxiv.org/pdf/2404.00318v1 null
2024-03-30 ST-LLM: Large Language Models Are Effective Temporal Learners ST-LLM:大型语言模型是有效的时间学习者 Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li http://arxiv.org/pdf/2404.00308v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-03-30 Multiway Point Cloud Mosaicking with Diffusion and Global Optimization 具有扩散和全局优化的多路点云镶嵌 Shengze Jin, Iro Armeni, Marc Pollefeys, Daniel Barath http://arxiv.org/pdf/2404.00429v1 null
2024-03-30 SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising SGDFormer:基于变压器的一级架构,用于跨光谱立体图像引导去噪 Runmin Zhang, Zhu Yu, Zehua Sheng, Jiacheng Ying, Si-Yuan Cao, Shu-Jie Chen, Bailin Yang, Junwei Li, Hui-Liang Shen http://arxiv.org/pdf/2404.00349v1 null
2024-03-30 Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration 看到看不见的东西:用于图像恢复的频率提示引导变压器 Shihao Zhou, Jinshan Pan, Jinglei Shi, Duosheng Chen, Lishen Qu, Jufeng Yang http://arxiv.org/pdf/2404.00288v1 null
2024-03-30 Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration 跳跃前环顾四周:用于图像恢复的高频注入变压器 Shihao Zhou, Duosheng Chen, Jinshan Pan, Jufeng Yang http://arxiv.org/pdf/2404.00279v1 null
2024-03-30 IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images IPoD:使用点扩散进行隐式场学习,用于从单个 RGB-D 图像重建可泛化的 3D 对象 Yushuang Wu, Luyue Shi, Junhao Cai, Weihao Yuan, Lingteng Qiu, Zilong Dong, Liefeng Bo, Shuguang Cui, Xiaoguang Han http://arxiv.org/pdf/2404.00269v1 null

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-03-30 Monocular Identity-Conditioned Facial Reflectance Reconstruction 单目身份条件面部反射率重建 Xingyu Ren, Jiankang Deng, Yuhao Cheng, Jia Guo, Chao Ma, Yichao Yan, Wenhan Zhu, Xiaokang Yang http://arxiv.org/pdf/2404.00301v1 null
2024-04-02 HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment HOI-M3:捕捉情境环境中的多个人与物体的交互 Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, Jingya Wang http://arxiv.org/pdf/2404.00299v2 null

各类学习方式

Publish Date Title Title_CN Authors PDF Code
2024-03-30 Continual Learning for Autonomous Robots: A Prototype-based Approach 自主机器人的持续学习:基于原型的方法 Elvin Hajizada, Balachandran Swaminathan, Yulia Sandamirskaya http://arxiv.org/pdf/2404.00418v1 null
2024-04-02 InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning InfLoRA:用于持续学习的无干扰低阶适应 Yan-Shuo Liang, Wu-Jun Li http://arxiv.org/pdf/2404.00228v2 link

其他

Publish Date Title Title_CN Authors PDF Code
2024-03-30 94% on CIFAR-10 in 3.29 Seconds on a Single GPU 在单个 GPU 上,CIFAR-10 在 3.29 秒内达到 94% Keller Jordan http://arxiv.org/pdf/2404.00498v1 null
2024-03-30 Extracting Manifold Information from Point Clouds 从点云中提取流形信息 Patrick Guidotti http://arxiv.org/pdf/2404.00427v1 null
2024-03-30 Do Vision-Language Models Understand Compound Nouns? 视觉语言模型能理解复合名词吗? Sonal Kumar, Sreyan Ghosh, S Sakshi, Utkarsh Tyagi, Dinesh Manocha http://arxiv.org/pdf/2404.00419v1 null
2024-03-30 STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario STBA:针对查询受限的黑盒场景评估 DNN 的鲁棒性 Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong http://arxiv.org/pdf/2404.00362v1 null
2024-03-30 Learing Trimaps via Clicks for Image Matting 通过点击图像抠图来学习 Trimap Chenyi Zhang, Yihan Hu, Henghui Ding, Humphrey Shi, Yao Zhao, Yunchao Wei http://arxiv.org/pdf/2404.00335v1 null
2024-03-30 Memory-Scalable and Simplified Functional Map Learning 内存可扩展且简化的功能图学习 Robin Magnet, Maks Ovsjanikov http://arxiv.org/pdf/2404.00330v1 null
2024-03-30 Harmonizing Light and Darkness: A Symphony of Prior-guided Data Synthesis and Adaptive Focus for Nighttime Flare Removal 协调光明与黑暗:预先引导的数据合成和夜间耀斑去除的自适应聚焦的交响乐 Lishen Qu, Shihao Zhou, Jinshan Pan, Jinglei Shi, Duosheng Chen, Jufeng Yang http://arxiv.org/pdf/2404.00313v1 null
2024-03-30 LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion LAKE-RED:通过潜在背景知识检索增强扩散生成伪装图像 Pancheng Zhao, Peng Xu, Pengda Qin, Deng-Ping Fan, Zhicheng Zhang, Guoli Jia, Bowen Zhou, Jufeng Yang http://arxiv.org/pdf/2404.00292v1 null
2024-03-30 Exploiting Self-Supervised Constraints in Image Super-Resolution 利用图像超分辨率中的自监督约束 Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu http://arxiv.org/pdf/2404.00260v1 null