Skip to content

Latest commit

 

History

History
executable file
·
130 lines (107 loc) · 22.4 KB

2024-06-24.md

File metadata and controls

executable file
·
130 lines (107 loc) · 22.4 KB

[UPDATED!] 2024-06-24 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-06-24 FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models FreeTraj:视频传播模型中的无调节轨迹控制 Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, Ziwei Liu http://arxiv.org/pdf/2406.16863v1 link
2024-06-24 Dreamitate: Real-World Visuomotor Policy Learning via Video Generation Dreamitate:通过视频生成进行现实世界的视觉运动策略学习 Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, Carl Vondrick http://arxiv.org/pdf/2406.16862v1 null
2024-06-24 DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation DreamBench++:个性化图像生成的人机对齐基准 Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia http://arxiv.org/pdf/2406.16855v1 link
2024-06-24 Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image Portrait3D:从单个野生肖像图像生成 3D 头部 Jinkun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Lizhuang Ma http://arxiv.org/pdf/2406.16710v1 null
2024-06-24 Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling 通过 3D 一致性噪声和梯度一致性建模实现几何感知分数蒸馏 Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, Jin-wha Kim, Seungryong Kim http://arxiv.org/pdf/2406.16695v1 null
2024-06-24 Repulsive Score Distillation for Diverse Sampling of Diffusion Models 扩散模型多样化采样的排斥分数蒸馏 Nicolas Zilberstein, Morteza Mardani, Santiago Segarra http://arxiv.org/pdf/2406.16683v1 null
2024-06-24 Do As I Do: Pose Guided Human Motion Copy 照我做:姿势引导人体动作复制 Sifan Wu, Zhenguang Liu, Beibei Zhang, Roger Zimmermann, Zhongjie Ba, Xiaosong Zhang, Kui Ren http://arxiv.org/pdf/2406.16601v1 null
2024-06-24 FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud FASTC:使用点云进行语义可遍历性分类的快速注意力框架 Yirui Chen, Pengjin Wei, Zhenhuan Liu, Bingchao Wang, Jie Yang, Wei Liu http://arxiv.org/pdf/2406.16564v1 link
2024-06-24 EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations EvalAlign:通过对多模态大型模型进行精确对齐并对人工注释进行监督微调来评估文本到图像模型 Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Mengping Yang, Cheng Zhang, Hao Li http://arxiv.org/pdf/2406.16562v1 link
2024-06-24 GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization GIM:用于生成图像处理检测和定位的百万级基准 Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, et.al. http://arxiv.org/pdf/2406.16531v1 null
2024-06-24 DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution DaLPSR:利用降级对齐语言提示实现真实世界图像超分辨率 Aiwen Jiang, Zhi Wei, Long Peng, Feiqiang Liu, Wenbo Li, Mingwen Wang http://arxiv.org/pdf/2406.16477v1 null
2024-06-24 ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance ResMaster:通过结构化和细粒度指导掌握高分辨率图像生成 Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng http://arxiv.org/pdf/2406.16476v1 null
2024-06-24 Improving Generative Adversarial Networks for Video Super-Resolution 改进视频超分辨率的生成对抗网络 Daniel Wen http://arxiv.org/pdf/2406.16359v1 null
2024-06-24 Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models 快速一致性图像生成 (PCIG):集成 LLM、知识图谱和可控扩散模型的统一框架 Yichen Sun, Zhixuan Chu, Zhan Qin, Kui Ren http://arxiv.org/pdf/2406.16333v1 null
2024-06-24 YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals YouDream:生成解剖学上可控制的一致文本到 3D 动物 Sandeep Mishra, Oindrila Saha, Alan C. Bovik http://arxiv.org/pdf/2406.16273v1 null
2024-06-24 Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement 通过注意力引导特征增强修复文本到图像扩散模型中的灾难性忽视 Zhiyuan Chang, Mingyang Li, Junjie Wang, Yi Liu, Qing Wang, Yang Liu http://arxiv.org/pdf/2406.16272v1 null
2024-06-24 Video-Infinity: Distributed Long Video Generation Video-Infinity:分布式长视频生成 Zhenxiong Tan, Xingyi Yang, Songhua Liu, Xinchao Wang http://arxiv.org/pdf/2406.16260v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-06-24 Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models 重新审视大型多模态模型时代的指称表达理解评估 Jierun Chen, Fangyun Wei, Jinjing Zhao, Sizhe Song, Bohuai Wu, Zhuoxuan Peng, S. -H. Gary Chan, Hongyang Zhang http://arxiv.org/pdf/2406.16866v1 link
2024-06-24 Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Cambrian-1:全面开放、以视觉为中心的多模态法学硕士探索 Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, et.al. http://arxiv.org/pdf/2406.16860v1 null
2024-06-24 Long Context Transfer from Language to Vision 从语言到视觉的长上下文迁移 Peiyuan Zhang, Kaichen Zhang, Bo Li, Guangtao Zeng, Jingkang Yang, Yuanhan Zhang, Ziyue Wang, Haoran Tan, Chunyuan Li, Ziwei Liu http://arxiv.org/pdf/2406.16852v1 link
2024-06-24 Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts 在图像大海捞针:视觉语言模型在短距离和长距离语境中容易分心 Aditya Sharma, Michael Saxon, William Yang Wang http://arxiv.org/pdf/2406.16851v1 null
2024-06-24 From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking 从完美到嘈杂的世界模拟:用于 SLAM 鲁棒性基准测试的可定制体现多模态扰动 Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang http://arxiv.org/pdf/2406.16850v1 link
2024-06-24 Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment 视觉语言一致性引导的多模式提示学习,用于盲人人工智能生成的图像质量评估 Jun Fu, Wei Zhou, Qiuping Jiang, Hantao Liu, Guangtao Zhai http://arxiv.org/pdf/2406.16641v1 null
2024-06-24 OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer OmAgent:一种用于复杂视频理解的具有任务分而治之的多模式代理框架 Lu Zhang, Tiancheng Zhao, Heting Ying, Yibo Ma, Kyusong Lee http://arxiv.org/pdf/2406.16620v1 null
2024-06-24 Multi-Modal Vision Transformers for Crop Mapping from Satellite Image Time Series 利用卫星图像时间序列进行农作物制图的多模态视觉变换器 Theresa Follath, David Mickisch, Jan Hemmerling, Stefan Erasmi, Marcel Schwieder, Begüm Demir http://arxiv.org/pdf/2406.16513v1 null
2024-06-24 InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection InterCLIP-MEP:用于多模态讽刺检测的交互式 CLIP 和记忆增强预测器 Junjie Chen, Subin Huang http://arxiv.org/pdf/2406.16464v1 link
2024-06-24 EmoLLM: Multimodal Emotional Understanding Meets Large Language Models EmoLLM:多模态情感理解与大型语言模型的结合 Qu Yang, Mang Ye, Bo Du http://arxiv.org/pdf/2406.16442v1 null
2024-06-24 Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks 定向域微调:为特定训练任务定制单独的模式 Daniel Wen, Nafisa Hussain http://arxiv.org/pdf/2406.16346v1 null
2024-06-24 VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models VideoHallucer:评估大型视频语言模型中的内在和外在幻觉 Yuxuan Wang, Yueqian Wang, Dongyan Zhao, Cihang Xie, Zilong Zheng http://arxiv.org/pdf/2406.16338v1 null
2024-06-24 Priorformer: A UGC-VQA Method with content and distortion priors Priorformer:具有内容和失真先验的 UGC-VQA 方法 Yajing Pei, Shiyu Huang, Yiting Lu, Xin Li, Zhibo Chen http://arxiv.org/pdf/2406.16297v1 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-06-24 Articulate your NeRF: Unsupervised articulated object modeling via conditional view synthesis 清晰表达你的 NeRF:通过条件视图合成实现无监督清晰对象建模 Jianning Deng, Kartic Subr, Hakan Bilen http://arxiv.org/pdf/2406.16623v1 null
2024-06-24 Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction 众包 NeRF:从生产车辆收集数据以进行 3D 街景重建 Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang http://arxiv.org/pdf/2406.16289v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-06-24 ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians ClotheDreamer:使用 3D 高斯算法生成文本引导的服装 Yufei Liu, Junshu Tang, Chu Zheng, Shijie Zhang, Jinkun Hao, Junwei Zhu, Dongjin Huang http://arxiv.org/pdf/2406.16815v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-06-24 Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition 在不确定性中寻求确定性:解决动态面部表情识别不确定性的双阶段统一框架 Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao, et.al. http://arxiv.org/pdf/2406.16473v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-06-24 Unsupervised Domain Adaptation for Pediatric Brain Tumor Segmentation 儿童脑肿瘤分割的无监督域自适应 Jingru Fu, Simone Bendazzoli, Örjan Smedby, Rodrigo Moreno http://arxiv.org/pdf/2406.16848v1 null
2024-06-24 The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers Transformer 从语言到视觉再到 MOT 的进展:使用 Transformer 进行多目标跟踪的文献综述 Abhi Kamboj http://arxiv.org/pdf/2406.16784v1 null
2024-06-24 Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation 半监督 3D 实例分割的实例一致性正则化 Yizheng Wu, Zhiyu Pan, Kewei Wang, Xingyi Li, Jiahao Cui, Liwen Xiao, Guosheng Lin, Zhiguo Cao http://arxiv.org/pdf/2406.16776v1 link
2024-06-24 μ-Net: A Deep Learning-Based Architecture for μ-CT Segmentation μ-Net:基于深度学习的 μ-CT 分割架构 Pierangela Bruno, Edoardo De Rose, Carlo Adornetto, Francesco Calimeri, Sandro Donato, Raffaele Giuseppe Agostino, Daniela Amelio, Riccardo Barberi, Maria Carmela Cerra, Maria Caterina Crocco, et.al. http://arxiv.org/pdf/2406.16724v1 null
2024-06-24 Demystifying the Effect of Receptive Field Size in U-Net Models for Medical Image Segmentation 揭秘医学图像分割 U-Net 模型中感受野大小的影响 Vincent Loos, Rohit Pardasani, Navchetan Awasthi http://arxiv.org/pdf/2406.16701v1 null
2024-06-24 Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models 使用参数优化的多阶段图卷积网络和 Transformer 模型进行特征融合以实现人类活动识别 Mohammad Belal, Taimur Hassan, Abdelfatah Ahmed, Ahmad Aljarah, Nael Alsheikh, Irfan Hussain http://arxiv.org/pdf/2406.16638v1 null
2024-06-24 The Championship-Winning Solution for the 5th CLVISION Challenge 2024 第五届 CLVISION 挑战赛 2024 冠军解决方案 Sishun Pan, Tingmin Li, Yang Yang http://arxiv.org/pdf/2406.16615v1 null
2024-06-24 Toward Fairer Face Recognition Datasets 迈向更公平的人脸识别数据集 Alexandre Fournier-Mongieux, Michael Soumm, Adrian Popescu, Bertrand Luvison, Hervé Le Borgne http://arxiv.org/pdf/2406.16592v1 null
2024-06-24 Personalized federated learning based on feature fusion 基于特征融合的个性化联邦学习 Wolong Xing, Zhenkui Shi, Hongyan Peng, Xiantao Hu, Xianxian Li http://arxiv.org/pdf/2406.16583v1 null
2024-06-24 Improving robustness to corruptions with multiplicative weight perturbations 利用乘性权重扰动提高对腐败的鲁棒性 Trung Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski http://arxiv.org/pdf/2406.16540v1 null
2024-06-24 Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization 角色适配器:提示引导区域控制,实现高保真角色定制 Yuhang Ma, Wenting Xu, Jiji Tang, Qinfeng Jin, Rongsheng Zhang, Zeng Zhao, Changjie Fan, Zhipeng Hu http://arxiv.org/pdf/2406.16537v1 null
2024-06-24 Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces 基于 Vision Mamba 的混凝土、沥青和砖石表面裂缝自动分割 Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa http://arxiv.org/pdf/2406.16518v1 null
2024-06-24 LOGCAN++: Local-global class-aware network for semantic segmentation of remote sensing images LOGCAN++:用于遥感图像语义分割的局部-全局类感知网络 Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Hongbo Guo, Mengting Ma, Sensen Wu, Zhenhong Du, Siyang Song, Wei Zhang http://arxiv.org/pdf/2406.16502v1 link
2024-06-24 UNICAD: A Unified Approach for Attack Detection, Noise Reduction and Novel Class Identification UNICAD:一种用于攻击检测、降噪和新类别识别的统一方法 Alvaro Lopez Pellicer, Kittipos Giatgong, Yi Li, Neeraj Suri, Plamen Angelov http://arxiv.org/pdf/2406.16501v1 null
2024-06-24 Improving Quaternion Neural Networks with Quaternionic Activation Functions 使用四元数激活函数改进四元数神经网络 Johannes Pöppelbaum, Andreas Schwung http://arxiv.org/pdf/2406.16481v1 null
2024-06-24 Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration 评估视觉和文化解释:K-Viscuit 基准与 Human-VLM 协作 Yujin Baek, ChaeHun Park, Jaeseok Kim, Yu-Jung Heo, Du-Seong Chang, Jaegul Choo http://arxiv.org/pdf/2406.16469v1 null
2024-06-24 SLOctolyzer: Fully automatic analysis toolkit for segmentation and feature extracting in scanning laser ophthalmoscopy images SLOctolyzer:用于扫描激光检眼镜图像分割和特征提取的全自动分析工具包 Jamie Burke, Samuel Gibbon, Justin Engelmann, Adam Threlfall, Ylenia Giarratano, Charlene Hamid, Stuart King, Ian J. C. MacCormick, Tom MacGillivray http://arxiv.org/pdf/2406.16466v1 null
2024-06-24 Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments 探索不断变化的环境中物体检测的测试时间适应性 Shilei Cao, Yan Liu, Juepeng Zheng, Weijia Li, Runmin Dong, Haohuan Fu http://arxiv.org/pdf/2406.16439v1 null
2024-06-24 Multi-threshold Deep Metric Learning for Facial Expression Recognition 用于面部表情识别的多阈值深度度量学习 Wenwu Yang, Jinyi Yu, Tuo Chen, Zhenguang Liu, Xun Wang, Jianbing Shen http://arxiv.org/pdf/2406.16434v1 null
2024-06-24 Dynamic Pseudo Label Optimization in Point-Supervised Nuclei Segmentation 点监督核分割中的动态伪标签优化 Ziyue Wang, Ye Zhang, Yifeng Wang, Linghan Cai, Yongbing Zhang http://arxiv.org/pdf/2406.16427v1 null
2024-06-24 Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting 通过频率感知提示探索跨领域小样本分类 Tiange Zhang, Qing Cai, Feng Gao, Lin Qi, Junyu Dong http://arxiv.org/pdf/2406.16422v1 link
2024-06-24 Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans 用于多期 CT 扫描中肾肿瘤亚型分类的病变感知跨期注意网络 Kwang-Hyun Uhm, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko http://arxiv.org/pdf/2406.16322v1 null
2024-06-24 Artistic-style text detector and a new Movie-Poster dataset 艺术风格文本检测器和新的电影海报数据集 Aoxiang Ning, Yiting Wei, Minglong Xue, Senming Zhong http://arxiv.org/pdf/2406.16307v1 null
2024-06-24 SegNet4D: Effective and Efficient 4D LiDAR Semantic Segmentation in Autonomous Driving Environments SegNet4D:自动驾驶环境中有效且高效的 4D LiDAR 语义分割 Neng Wang, Ruibin Guo, Chenghao Shi, Hui Zhang, Huimin Lu, Zhiqiang Zheng, Xieyuanli Chen http://arxiv.org/pdf/2406.16279v1 link
2024-06-24 Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation 特征提示 GBMSeg:用于肾小球基底膜分割的一次性参考引导无训练提示工程 Xueyu Liu, Guangze Shi, Rui Wang, Yexin Lai, Jianan Zhang, Lele Sun, Quan Yang, Yongfei Wu, MIng Li, Weixia Han, et.al. http://arxiv.org/pdf/2406.16271v1 null

OCR

Publish Date Title Title_CN Authors PDF Code
2024-06-24 StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal StableNormal:减少扩散方差以实现稳定和尖锐的正常状态 Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han http://arxiv.org/pdf/2406.16864v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-06-24 GPT-4V Explorations: Mining Autonomous Driving GPT-4V 探索:采矿自动驾驶 Zixuan Li http://arxiv.org/pdf/2406.16817v1 null
2024-06-24 MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling MIRReS:使用储层采样进行多反射逆向渲染 Yuxin Dai, Qi Wang, Jingsen Zhu, Dianbing Xi, Yuchi Huo, Chen Qian, Ying He http://arxiv.org/pdf/2406.16360v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-06-24 UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos UBiSS:视频双模态语义摘要的统一框架 Yuting Mei, Linli Yao, Qin Jin http://arxiv.org/pdf/2406.16301v1 link

3D/CG

Publish Date Title Title_CN Authors PDF Code
2024-06-24 Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation 超越赞/踩:解决文本到图像生成的细粒度反馈的挑战 Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, et.al. http://arxiv.org/pdf/2406.16807v1 null
2024-06-24 Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution 抑制盲超分辨率退化估计中的不确定性 Junxiong Lin, Zeng Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin, et.al. http://arxiv.org/pdf/2406.16459v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-06-24 The MRI Scanner as a Diagnostic: Image-less Active Sampling MRI 扫描仪作为诊断手段:无图像主动采样 Yuning Du, Rohan Dharmakumar, Sotirios A. Tsaftaris http://arxiv.org/pdf/2406.16754v1 null
2024-06-24 Sampling Strategies in Bayesian Inversion: A Study of RTO and Langevin Methods 贝叶斯反演中的采样策略:RTO 和朗之万方法的研究 Remi Laumont, Yiqiu Dong, Martin Skovgaard Andersen http://arxiv.org/pdf/2406.16658v1 null
2024-06-24 MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network MLAAN:利用多层跳跃增强辅助网络扩展监督局部学习 Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Jiabin Liu, Changpeng Cai http://arxiv.org/pdf/2406.16633v1 null
2024-06-24 When Invariant Representation Learning Meets Label Shift: Insufficiency and Theoretical Insights 当不变表征学习遇到标签转移:不足与理论见解 You-Wei Luo, Chuan-Xian Ren http://arxiv.org/pdf/2406.16608v1 null
2024-06-24 Measuring the Recyclability of Electronic Components to Assist Automatic Disassembly and Sorting Waste Printed Circuit Boards 测量电子元件的可回收性,协助自动拆卸和分类废弃印刷电路板 Muhammad Mohsin, Xianlai Zeng, Stefano Rovetta, Francesco Masulli http://arxiv.org/pdf/2406.16593v1 null
2024-06-24 Hierarchical B-frame Video Coding for Long Group of Pictures 针对长组图像的分层 B 帧视频编码 Ivan Kirillov, Denis Parkhomenko, Kirill Chernyshev, Alexander Pletnev, Yibo Shi, Kai Lin, Dmitry Babin http://arxiv.org/pdf/2406.16544v1 null
2024-06-24 Evaluating and Analyzing Relationship Hallucinations in LVLMs 评估和分析 LVLM 中的关系幻觉 Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji http://arxiv.org/pdf/2406.16449v1 null
2024-06-24 High-resolution open-vocabulary object 6D pose estimation 高分辨率开放词汇对象 6D 姿态估计 Jaime Corsetti, Davide Boscaini, Francesco Giuliari, Changjae Oh, Andrea Cavallaro, Fabio Poiesi http://arxiv.org/pdf/2406.16384v1 null