Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | Capacity Control is an Effective Memorization Mitigation Mechanism in Text-Conditional Diffusion Models | 文本条件扩散模型中的容量控制是有效的记忆缓解机制 | Raman Dutt, Pedro Sanchez, Ondrej Bohdal, Sotirios A. Tsaftaris, Timothy Hospedales | http://arxiv.org/pdf/2410.22149v1 | link |
2024-10-29 | PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement | PACA:基于视角感知的跨注意力表示,用于零样本场景重排 | Shutong Jin, Ruiyu Wang, Kuangyi Chen, Florian T. Pokorny | http://arxiv.org/pdf/2410.22059v1 | null |
2024-10-29 | FANCL: Feature-Guided Attention Network with Curriculum Learning for Brain Metastases Segmentation | 基于课程学习的特征引导注意力网络在脑转移瘤分割中的应用 | Zijiang Liu, Xiaoyu Liu, Linhao Qu, Yonghong Shi | http://arxiv.org/pdf/2410.22057v1 | null |
2024-10-29 | PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference | PrefPaint:将图像修复扩散模型与人类偏好对齐 | Kendong Liu, Zhiyu Zhu, Chuanhao Li, Hui Liu, Huanqiang Zeng, Junhui Hou | http://arxiv.org/pdf/2410.21966v1 | null |
2024-10-29 | CT to PET Translation: A Large-scale Dataset and Domain-Knowledge-Guided Diffusion Approach | CT到PET转换:大规模数据集和领域知识引导的扩散方法 | Dac Thai Nguyen, Trung Thanh Nguyen, Huu Tien Nguyen, Thanh Trung Nguyen, Huy Hieu Pham, Thanh Hung Nguyen, Thao Nguyen Truong, Phi Le Nguyen | http://arxiv.org/pdf/2410.21932v1 | link |
2024-10-29 | Enhancing Learned Image Compression via Cross Window-based Attention | 通过跨窗口注意力增强学习图像压缩 | Priyanka Mudgal, Feng Liu | http://arxiv.org/pdf/2410.21144v2 | null |
2024-10-29 | EEG-Driven 3D Object Reconstruction with Color Consistency and Diffusion Prior | 基于EEG驱动的具有颜色一致性和扩散先验的3D物体重建 | Xin Xiang, Wenhui Zhou, Guojun Dai | http://arxiv.org/pdf/2410.20981v2 | null |
2024-10-29 | Fractal and Turbulent Feature Extraction and NFT Label Generation for Pollock Style Migration Paintings Based on VGG19 | 基于VGG19的波洛克风格迁移画作分形和湍流特征提取及NFT标签生成 | Yiquan Wang, Xu Wang, Jiazhuo Pan | http://arxiv.org/pdf/2410.20519v2 | link |
2024-10-29 | Towards Visual Text Design Transfer Across Languages | 跨语言视觉文本设计迁移 | Yejin Choi, Jiwan Chung, Sumin Shim, Giyeong Oh, Youngjae Yu | http://arxiv.org/pdf/2410.18823v2 | null |
2024-10-29 | Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization | 泛化一致性策略至视觉强化学习:优先近端经验正则化 | Haoran Li, Zhennan Jiang, Yuhui Chen, Dongbin Zhao | http://arxiv.org/pdf/2410.00051v2 | null |
2024-10-29 | Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images | 多假设条件点云扩散:遮挡图像的3D人体重建 | Donghwan Kim, Tae-Kyun Kim | http://arxiv.org/pdf/2409.18364v3 | link |
2024-10-29 | Aligning Machine and Human Visual Representations across Abstraction Levels | 在抽象层次上对机器和人类视觉表示进行对齐 | Lukas Muttenthaler, Klaus Greff, Frieda Born, Bernhard Spitzer, Simon Kornblith, Michael C. Mozer, Klaus-Robert Müller, Thomas Unterthiner, Andrew K. Lampinen | http://arxiv.org/pdf/2409.06509v3 | null |
2024-10-29 | Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images | 基于扩散模型的少样本X射线图像地标检测自监督预训练 | Roberto Di Via, Francesca Odone, Vito Paolo Pastore | http://arxiv.org/pdf/2407.18125v2 | null |
2024-10-29 | GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction | 基于视图引导的高斯溅射扩散三维重建 | Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng | http://arxiv.org/pdf/2407.04237v4 | null |
2024-10-29 | FastDrag: Manipulate Anything in One Step | 快速拖拽:一步操作操纵任何对象 | Xuanjia Zhao, Jian Guan, Congyi Fan, Dongli Xu, Youtian Lin, Haiwei Pan, Pengming Feng | http://arxiv.org/pdf/2405.15769v3 | null |
2024-10-29 | SMART: Scalable Multi-agent Real-time Generation via Next-token Prediction | SMART:基于下一标记预测的可扩展多智能体实时生成 | Wei Wu, Xiaoxin Feng, Ziyan Gao, Yuheng Kan | http://arxiv.org/pdf/2405.15677v2 | link |
2024-10-29 | PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher | PaGoDA:从低分辨率扩散教师逐步成长的一步生成器 | Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon | http://arxiv.org/pdf/2405.14822v2 | null |
2024-10-29 | Benchmarking Counterfactual Image Generation | 反事实图像生成基准测试 | Thomas Melistas, Nikos Spyrou, Nefeli Gkouti, Pedro Sanchez, Athanasios Vlontzos, Yannis Panagakis, Giorgos Papanastasiou, Sotirios A. Tsaftaris | http://arxiv.org/pdf/2403.20287v3 | link |
2024-10-29 | A Probabilistic Hadamard U-Net for MRI Bias Field Correction | 概率Hadamard U-Net用于MRI偏场校正 | Xin Zhu, Hongyi Pan, Yury Velichko, Adam B. Murphy, Ashley Ross, Baris Turkbey, Ahmet Enis Cetin, Ulas Bagci | http://arxiv.org/pdf/2403.05024v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising | 多模态专家型视频检索系统:用于情境广告的ContextIQ | Ashutosh Chaubey, Anoubhav Agarwaal, Sartaki Sinha Roy, Aayush Agarwal, Susmita Ghose | http://arxiv.org/pdf/2410.22233v1 | null |
2024-10-29 | ADAM: An Embodied Causal Agent in Open-World Environments | ADAM:开放世界环境中的具身因果代理 | Shu Yu, Chaochao Lu | http://arxiv.org/pdf/2410.22194v1 | null |
2024-10-29 | Are VLMs Really Blind | VLMs真的盲目吗? | Ayush Singh, Mansi Gupta, Shivank Garg | http://arxiv.org/pdf/2410.22029v1 | null |
2024-10-29 | Feature distribution Adaptation Network for Speech Emotion Recognition | 语音情感识别的特征分布自适应网络 | Shaokai Li, Yixuan Ji, Peng Song, Haoqin Sun, Wenming Zheng | http://arxiv.org/pdf/2410.22023v1 | null |
2024-10-29 | A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Anomaly Detection | RGB、3D和多模态方法在无监督工业异常检测中的应用综述 | Yuxuan Lin, Yang Chang, Xuan Tong, Jiawen Yu, Antonio Liotta, Guofan Huang, Wei Song, Deyu Zeng, Zongze Wu, Yan Wang, et.al. | http://arxiv.org/pdf/2410.21982v1 | null |
2024-10-29 | Spatio-temporal Transformers for Action Unit Classification with Event Cameras | 基于事件相机的动作单元分类时空变换器 | Luca Cultrera, Federico Becattini, Lorenzo Berlincioni, Claudio Ferrari, Alberto Del Bimbo | http://arxiv.org/pdf/2410.21958v1 | null |
2024-10-29 | AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? | AutoBench-V:大型视觉语言模型能否自我基准测试? | Han Bao, Yue Huang, Yanbo Wang, Jiayi Ye, Xiangqi Wang, Xiuying Chen, Mohamed Elhoseiny, Xiangliang Zhang | http://arxiv.org/pdf/2410.21259v2 | link |
2024-10-29 | Non-rigid Relative Placement through 3D Dense Diffusion | 三维密集扩散实现的非刚性相对定位 | Eric Cai, Octavian Donca, Ben Eisner, David Held | http://arxiv.org/pdf/2410.19247v2 | null |
2024-10-29 | Visual Robustness Benchmark for Visual Question Answering (VQA) | 视觉问答(VQA)视觉鲁棒性基准 | Md Farhan Ishmam, Ishmam Tashdeed, Talukder Asir Saadat, Md Hamjajul Ashmafee, Abu Raihan Mostofa Kamal, Md. Azam Hossain | http://arxiv.org/pdf/2407.03386v5 | link |
2024-10-29 | M$^2$IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension | M$^2$IST:多模态交互式边调优高效指代表达理解 | Xuyang Liu, Ting Liu, Siteng Huang, Yi Xin, Yue Hu, Quanjun Yin, Donglin Wang, Honggang Chen | http://arxiv.org/pdf/2407.01131v2 | null |
2024-10-29 | MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | MMDU:多轮多图像对话理解基准与用于LVLMs的指令微调数据集 | Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, et.al. | http://arxiv.org/pdf/2406.11833v2 | link |
2024-10-29 | No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | 没有指数级数据就无法实现“零样本”:预训练概念频率决定多模态模型性能 | Vishaal Udandarao, Ameya Prabhu, Adhiraj Ghosh, Yash Sharma, Philip H. S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge | http://arxiv.org/pdf/2404.04125v3 | link |
2024-10-29 | VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark | VLKEB:一个大规模视觉-语言模型知识编辑基准 | Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan | http://arxiv.org/pdf/2403.07350v3 | link |
2024-10-29 | GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification | 全球文档:面向真实世界文档图像检索与分类的多模态视觉-语言框架 | Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickaël Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós | http://arxiv.org/pdf/2309.05756v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos | MoDGS:从随意捕获的单目视频中动态高斯散布 | Qingming Liu, Yuan Liu, Jiepeng Wang, Xianqiang Lyv, Peng Wang, Wenping Wang, Junhui Hou | http://arxiv.org/pdf/2406.00434v2 | null |
2024-10-29 | DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus | 基于高斯一致性的大规模3D重建的分布式高斯散点法 | Yu Chen, Gim Hee Lee | http://arxiv.org/pdf/2405.13943v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | PF3plat:免姿态前馈3D高斯喷溅 | Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang Han, Jiaolong Yang, Chong Luo, Seungryong Kim | http://arxiv.org/pdf/2410.22128v1 | link |
2024-10-29 | ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting | ActiveSplat:通过主动高斯Splatting实现的高保真场景重建 | Yuetao Li, Zijia Kuang, Ting Li, Guyue Zhou, Shaohui Zhang, Zike Yan | http://arxiv.org/pdf/2410.21955v1 | null |
2024-10-29 | OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting | 全向高斯溅射:快速辐射场重建 | Longwei Li, Huajian Huang, Sai-Kit Yeung, Hui Cheng | http://arxiv.org/pdf/2404.03202v4 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | Effective Guidance for Model Attention with Simple Yes-no Annotations | 有效利用简单是/否标注进行模型注意力的指导 | Seongmin Lee, Ali Payani, Duen Horng, Chau | http://arxiv.org/pdf/2410.22312v1 | null |
2024-10-29 | Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets | 多级特征蒸馏:基于不同图像数据集联合训练的教师模型 | Adrian Iordache, Bogdan Alexe, Radu Tudor Ionescu | http://arxiv.org/pdf/2410.22184v1 | link |
2024-10-29 | HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation | HRPVT:适用于中尺度和小尺度人体姿态估计的高分辨率金字塔视觉Transformer | Zhoujie Xu | http://arxiv.org/pdf/2410.22079v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier | 多类文本逆转换秘密生成语义无关分类器 | Kai Wang, Fei Yang, Bogdan Raducanu, Joost van de Weijer | http://arxiv.org/pdf/2410.22317v1 | link |
2024-10-29 | Guide3D: A Bi-planar X-ray Dataset for 3D Shape Reconstruction | 双平面X射线数据集:用于三维形状重建的Guide3D | Tudor Jianu, Baoru Huang, Hoan Nguyen, Binod Bhattarai, Tuong Do, Erman Tjiputra, Quang Tran, Pierre Berthet-Rayne, Ngan Le, Sebastiano Fichera, et.al. | http://arxiv.org/pdf/2410.22224v1 | null |
2024-10-29 | MAPUNetR: A Hybrid Vision Transformer and U-Net Architecture for Efficient and Interpretable Medical Image Segmentation | MAPUNetR:一种混合视觉Transformer和U-Net架构的高效且可解释医学图像分割 | Ovais Iqbal Shah, Danish Raza Rizvi, Aqib Nazir Mir | http://arxiv.org/pdf/2410.22223v1 | null |
2024-10-29 | Active Learning for Vision-Language Models | 视觉-语言模型中的主动学习 | Bardia Safaei, Vishal M. Patel | http://arxiv.org/pdf/2410.22187v1 | null |
2024-10-29 | Lighten CARAFE: Dynamic Lightweight Upsampling with Guided Reassemble Kernels | 轻量级CARAFE:带引导重组核的动态轻量上采样 | Ruigang Fu, Qingyong Hu, Xiaohu Dong, Yinghui Gao, Biao Li, Ping Zhong | http://arxiv.org/pdf/2410.22139v1 | link |
2024-10-29 | Lightweight Frequency Masker for Cross-Domain Few-Shot Semantic Segmentation | 轻量级跨域小样本语义分割频率掩码器 | Jintao Tong, Yixiong Zou, Yuhua Li, Ruixuan Li | http://arxiv.org/pdf/2410.22135v1 | null |
2024-10-29 | RankUp: Boosting Semi-Supervised Regression with an Auxiliary Ranking Classifier | RankUp:通过辅助排序分类器提升半监督回归 | Pin-Yen Huang, Szu-Wei Fu, Yu Tsao | http://arxiv.org/pdf/2410.22124v1 | link |
2024-10-29 | Hyperspectral Imaging-Based Perception in Autonomous Driving Scenarios: Benchmarking Baseline Semantic Segmentation Models | 基于高光谱成像的自动驾驶场景感知:基准语义分割模型基准测试 | Imad Ali Shah, Jiarong Li, Martin Glavin, Edward Jones, Enda Ward, Brian Deegan | http://arxiv.org/pdf/2410.22101v1 | null |
2024-10-29 | DINeuro: Distilling Knowledge from 2D Natural Images via Deformable Tubular Transferring Strategy for 3D Neuron Reconstruction | DINeuro:通过可变形管状迁移策略从2D自然图像中提炼知识以实现3D神经元重建 | Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Yui Lo, Yuqian Chen, Lauren J. O'Donnell, Weidong Cai | http://arxiv.org/pdf/2410.22078v1 | null |
2024-10-29 | Benchmarking Human and Automated Prompting in the Segment Anything Model | 段任意模型中的人机提示基准测试 | Jorge Quesada, Zoe Fowler, Mohammad Alotaibi, Mohit Prabhushankar, Ghassan AlRegib | http://arxiv.org/pdf/2410.22048v1 | null |
2024-10-29 | A Machine Learning-Based Secure Face Verification Scheme and Its Applications to Digital Surveillance | 基于机器学习的安全人脸验证方案及其在数字监控中的应用 | Huan-Chih Wang, Ja-Ling Wu | http://arxiv.org/pdf/2410.21993v1 | null |
2024-10-29 | From Explicit Rules to Implicit Reasoning in an Interpretable Violence Monitoring System | 从显式规则到可解释暴力监控系统的隐式推理 | Wen-Dong Jiang, Chih-Yung Chang, Hsiang-Chuan Chang, Diptendu Sinha Roy | http://arxiv.org/pdf/2410.21991v1 | null |
2024-10-29 | BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays | BenchX:胸部X光片医学视觉-语言预训练的统一基准框架 | Yang Zhou, Tan Li Hui Faith, Yanyu Xu, Sicong Leng, Xinxing Xu, Yong Liu, Rick Siow Mong Goh | http://arxiv.org/pdf/2410.21969v1 | link |
2024-10-29 | FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection | 伪前驱:高效、基于漏洞的Transformer,用于泛化型深度伪造检测 | Dat Nguyen, Marcella Astrid, Enjie Ghorbel, Djamila Aouada | http://arxiv.org/pdf/2410.21964v1 | null |
2024-10-29 | Multi-step feature fusion for natural disaster damage assessment on satellite images | 多步特征融合用于卫星图像上的自然灾害损害评估 | Mateusz Żarski, Jarosław Adam Miszczak | http://arxiv.org/pdf/2410.21901v1 | link |
2024-10-29 | Advancing Efficient Brain Tumor Multi-Class Classification -- New Insights from the Vision Mamba Model in Transfer Learning | 推进高效脑肿瘤多类别分类——视觉Mamba模型在迁移学习中的新见解 | Yinyi Lai, Anbo Cao, Yuan Gao, Jiaqi Shang, Zongyu Li, Jia Guo | http://arxiv.org/pdf/2410.21872v1 | null |
2024-10-29 | HRGR: Enhancing Image Manipulation Detection via Hierarchical Region-aware Graph Reasoning | HRGR:通过分层区域感知图推理增强图像操纵检测 | Xudong Wang, Yuezun Li, Huiyu Zhou, Jiaran Zhou, Junyu Dong | http://arxiv.org/pdf/2410.21861v1 | null |
2024-10-29 | Paved or unpaved? A Deep Learning derived Road Surface Global Dataset from Mapillary Street-View Imagery | Classification of Road Surface Materials Using Deep Learning Techniques from Mapillary Street-View Imagery | Sukanya Randhawa, Eren Aygun, Guntaj Randhawa, Benjamin Herfort, Sven Lautenbach, Alexander Zipf | http://arxiv.org/pdf/2410.19874v2 | null |
2024-10-29 | Efficient Neural Network Training via Subset Pretraining | 通过子集预训练的高效神经网络训练 | Jan Spörer, Bernhard Bermeitinger, Tomas Hrycej, Niklas Limacher, Siegfried Handschuh | http://arxiv.org/pdf/2410.16523v2 | null |
2024-10-29 | An Integrated Deep Learning Model for Skin Cancer Detection Using Hybrid Feature Fusion Technique | 基于混合特征融合技术的集成深度学习皮肤癌检测模型 | Maksuda Akter, Rabea Khatun, Md. Alamin Talukder, Md. Manowarul Islam, Md. Ashraf Uddin | http://arxiv.org/pdf/2410.14489v2 | null |
2024-10-29 | Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition | 分层域自适应:场景文字识别的渐进式自训练方法 | Kha Nhat Le, Hoang-Tuan Nguyen, Hung Tien Tran, Thanh Duc Ngo | http://arxiv.org/pdf/2410.09913v3 | link |
2024-10-29 | Spatial-Aware Conformal Prediction for Trustworthy Hyperspectral Image Classification | 空间感知的符合预测在可靠的高光谱图像分类中的应用 | Kangdao Liu, Tianhao Sun, Hao Zeng, Yongshan Zhang, Chi-Man Pun, Chi-Man Vong | http://arxiv.org/pdf/2409.01236v2 | link |
2024-10-29 | SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation | SeTAR:基于选择性低秩近似的分布外检测 | Yixia Li, Boya Xiong, Guanhua Chen, Yun Chen | http://arxiv.org/pdf/2406.12629v3 | link |
2024-10-29 | CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation | CAMS:卷积和无注意力Mamba基于的心脏图像分割 | Abbas Khan, Muhammad Asad, Martin Benning, Caroline Roney, Gregory Slabaugh | http://arxiv.org/pdf/2406.05786v3 | link |
2024-10-29 | Dissecting Query-Key Interaction in Vision Transformers | 视觉Transformer中的查询-键交互剖析 | Xu Pan, Aaron Philip, Ziqian Xie, Odelia Schwartz | http://arxiv.org/pdf/2405.14880v3 | null |
2024-10-29 | PointCompress3D: A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems | PointCompress3D:智能交通系统中路边激光雷达点云压缩框架 | Walter Zimmer, Ramandika Pranamulia, Xingcheng Zhou, Mingyu Liu, Alois C. Knoll | http://arxiv.org/pdf/2405.01750v2 | null |
2024-10-29 | Texture, Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection | 纹理、形状和顺序至关重要:用于序列深度伪造检测的新型Transformer设计 | Yunfei Li, Yuezun Li, Xin Wang, Baoyuan Wu, Jiaran Zhou, Junyu Dong | http://arxiv.org/pdf/2404.13873v3 | null |
2024-10-29 | Location-Free Scene Graph Generation | 无位置场景图生成 | Ege Özsoy, Felix Holm, Mahdi Saleh, Tobias Czempiel, Chantal Pellegrini, Nassir Navab, Benjamin Busam | http://arxiv.org/pdf/2303.10944v2 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers | 历史手写密文字母结构分析与比较 | Martín Méndez, Pau Torras, Adrià Molina, Jialuo Chen, Oriol Ramos-Terrades, Alicia Fornés | http://arxiv.org/pdf/2410.21913v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models | MamMIL:基于状态空间模型的整张切片图像的多实例学习 | Zijie Fang, Yifeng Wang, Ye Zhang, Zhi Wang, Jian Zhang, Xiangyang Ji, Yongbing Zhang | http://arxiv.org/pdf/2403.05160v2 | link |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | Active Event Alignment for Monocular Distance Estimation | 单目距离估计中的主动事件对齐 | Nan Cai, Pia Bideau | http://arxiv.org/pdf/2410.22280v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | Local Policies Enable Zero-shot Long-horizon Manipulation | 本地策略实现零样本长时程操纵 | Murtaza Dalal, Min Liu, Walter Talbott, Chen Chen, Deepak Pathak, Jian Zhang, Ruslan Salakhutdinov | http://arxiv.org/pdf/2410.22332v1 | null |
2024-10-29 | Natural Language Inference Improves Compositionality in Vision-Language Models | 自然语言推理提升视觉-语言模型中的组合性 | Paola Cascante-Bonilla, Yu Hou, Yang Trista Cao, Hal Daumé III, Rachel Rudinger | http://arxiv.org/pdf/2410.22315v1 | null |
2024-10-29 | Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective | 面向视觉时代统一理解和生成:从自回归视角的综述 | Shenghao Xie, Wenqiang Zu, Mingyang Zhao, Duo Su, Shilong Liu, Ruohua Shi, Guoqi Li, Shanghang Zhang, Lei Ma | http://arxiv.org/pdf/2410.22217v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention | 多目标3D地面定位:动态模块与语言信息驱动的空间注意力 | Haomeng Zhang, Chiao-An Yang, Raymond A. Yeh | http://arxiv.org/pdf/2410.22306v1 | null |
2024-10-29 | Emotion-Guided Image to Music Generation | 情感引导的图像到音乐生成 | Souraja Kundu, Saket Singh, Yuji Iwahori | http://arxiv.org/pdf/2410.22299v1 | null |
2024-10-29 | NCA-Morph: Medical Image Registration with Neural Cellular Automata | NCA-Morph:基于神经网络细胞自动机的医学图像配准 | Amin Ranem, John Kalkhof, Anirban Mukhopadhyay | http://arxiv.org/pdf/2410.22265v1 | null |
2024-10-29 | MotionBooth: Motion-Aware Customized Text-to-Video Generation | 动态感知定制化文本到视频生成 | Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen | http://arxiv.org/pdf/2406.17758v3 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | TractShapeNet: Efficient Multi-Shape Learning with 3D Tractography Point Clouds | TractShapeNet:基于3D束形轨迹点云的高效多形状学习 | Yui Lo, Yuqian Chen, Dongnan Liu, Jon Haitz Legarreta, Leo Zekelman, Fan Zhang, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, et.al. | http://arxiv.org/pdf/2410.22099v1 | null |
2024-10-29 | FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives | FreeGaussian:无需指导的可控3D高斯块与流导数 | Qizhi Chen, Delin Qu, Yiwen Tang, Haoming Song, Yiting Zhang, Dong Wang, Bin Zhao, Xuelong Li | http://arxiv.org/pdf/2410.22070v1 | null |
2024-10-29 | Micro-Structures Graph-Based Point Cloud Registration for Balancing Efficiency and Accuracy | 基于微观结构图点云配准的效率与精度平衡 | Rongling Zhang, Li Yan, Pengcheng Wei, Hong Xie, Pinzhuo Wang, Binbing Wang | http://arxiv.org/pdf/2410.21857v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-10-29 | Task Vectors are Cross-Modal | 跨模态的任务向量 | Grace Luo, Trevor Darrell, Amir Bar | http://arxiv.org/pdf/2410.22330v1 | null |
2024-10-29 | Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset | 机器人预训练机器人:以操作为中心的机器人表示从大规模机器人数据集 | Guangqi Jiang, Yifei Sun, Tao Huang, Huanyu Li, Yongyuan Liang, Huazhe Xu | http://arxiv.org/pdf/2410.22325v1 | null |
2024-10-29 | Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving | Senna:连接大型视觉-语言模型与端到端自动驾驶 | Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang | http://arxiv.org/pdf/2410.22313v1 | null |
2024-10-29 | Motion Graph Unleashed: A Novel Approach to Video Prediction | 视频预测的全新方法:运动图技术解禁 | Yiqi Zhong, Luming Liang, Bohan Tang, Ilya Zharkov, Ulrich Neumann | http://arxiv.org/pdf/2410.22288v1 | null |
2024-10-29 | LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues | LiVisSfM:基于激光雷达和视觉线索的精确且鲁棒的位姿估计 | Hanqing Jiang, Liyang Zhou, Zhuang Zhang, Yihao Yu, Guofeng Zhang | http://arxiv.org/pdf/2410.22213v1 | null |
2024-10-29 | Shining a Light on Hurricane Damage Estimation via Nighttime Light Data: Pre-processing Matters | 借助夜间灯光数据揭示飓风损害评估:预处理至关重要 | Nancy Thomas, Saba Rahimi, Annita Vapsi, Cathy Ansell, Elizabeth Christie, Daniel Borrajo, Tucker Balch, Manuela Veloso | http://arxiv.org/pdf/2410.22150v1 | null |
2024-10-29 | 4D-based Robot Navigation Using Relativistic Image Processing | 基于4D的机器人导航采用相对图像处理 | Simone Müller, Dieter Kranzlmüller | http://arxiv.org/pdf/2410.22087v1 | null |
2024-10-29 | Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement | 分析噪声模型和高级滤波算法以增强图像 | Sahil Ali Akbar, Ananya Verma | http://arxiv.org/pdf/2410.21946v1 | link |
2024-10-29 | ReMix: Training Generalized Person Re-identification on a Mixture of Data | ReMix:基于数据混合训练通用行人重识别 | Timur Mamedov, Anton Konushin, Vadim Konushin | http://arxiv.org/pdf/2410.21938v1 | null |
2024-10-29 | A Longitudinal Analysis of Racial and Gender Bias in New York Times and Fox News Images and Articles | 纽约时报与福克斯新闻图像和文章中的种族和性别偏见纵向分析 | Hazem Ibrahim, Nouar AlDahoul, Syed Mustafa Ali Abbasi, Fareed Zaffar, Talal Rahwan, Yasir Zaki | http://arxiv.org/pdf/2410.21898v1 | null |
2024-10-29 | Self-Relaxed Joint Training: Sample Selection for Severity Estimation with Ordinal Noisy Labels | 自松弛联合训练:具有序数噪声标签的严重程度估计样本选择 | Shumpei Takezaki, Kiyohito Tanaka, Seiichi Uchida | http://arxiv.org/pdf/2410.21885v1 | link |
2024-10-29 | Search Wide, Focus Deep: Automated Fetal Brain Extraction with Sparse Training Data | 广泛搜索,深度聚焦:稀疏训练数据下的胎儿大脑自动提取 | Javid Dadashkarimi, Valeria Pena Trujillo, Camilo Jaimes, Lilla Zöllei, Malte Hoffmann | http://arxiv.org/pdf/2410.20532v2 | null |
2024-10-29 | Integration of Communication and Computational Imaging | 通信与计算成像的融合 | Zhenming Yu, Liming Cheng, Hongyu Huang, Wei Zhang, Liang Lin, Kun Xu | http://arxiv.org/pdf/2410.19415v2 | null |
2024-10-29 | Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis | 对比序列扩散学习:非线性和多场景指令视频生成 | Vasco Ramos, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes | http://arxiv.org/pdf/2407.11814v2 | null |
2024-10-29 | Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior | 基于化学位移编码的甘油三酯中双键定量分析:深度图像先验 | Chaoxing Huang, Ziqiang Yu, Zijian Gao, Qiuyi Shen, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen | http://arxiv.org/pdf/2407.01926v4 | null |
2024-10-29 | NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing | 纳瑞康:融合扩散先验的自然精炼规范图像用于视频编辑 | Ting-Hsuan Chen, Jiewen Chan, Hau-Shiang Shiu, Shih-Han Yen, Chang-Han Yeh, Yu-Lun Liu | http://arxiv.org/pdf/2406.06523v2 | link |
2024-10-29 | GO4Align: Group Optimization for Multi-Task Alignment | GO4Align:多任务对齐的分组优化 | Jiayi Shen, Cheems Wang, Zehao Xiao, Nanne Van Noord, Marcel Worring | http://arxiv.org/pdf/2404.06486v2 | link |
2024-10-29 | What Makes ImageNet Look Unlike LAION | 图像网与LAION有何不同之处 | Ali Shirali, Moritz Hardt | http://arxiv.org/pdf/2306.15769v2 | link |