Skip to content

Latest commit

 

History

History
executable file
·
106 lines (83 loc) · 14.6 KB

2024-04-21.md

File metadata and controls

executable file
·
106 lines (83 loc) · 14.6 KB

[UPDATED!] 2024-04-21 (Publish Time)

生成模型

Publish Date Title Title_CN Authors PDF Code
2024-04-21 Universal Fingerprint Generation: Controllable Diffusion Model with Multimodal Conditions 通用指纹生成:多模态条件下的可控扩散模型 Steven A. Grosz, Anil K. Jain http://arxiv.org/pdf/2404.13791v1 null
2024-04-21 Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control 文本到图像生成中的对象属性绑定:评估和控制 Maria Mihaela Trusca, Wolf Nuyts, Jonathan Thomm, Robert Honig, Thomas Hofmann, Tinne Tuytelaars, Marie-Francine Moens http://arxiv.org/pdf/2404.13766v1 null
2024-04-21 ArtNeRF: A Stylized Neural Field for 3D-Aware Cartoonized Face Synthesis ArtNeRF:用于 3D 感知卡通化人脸合成的风格化神经场 Zichen Tang, Hongyu Yang http://arxiv.org/pdf/2404.13711v1 null
2024-04-21 Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models 规避扩散模型中概念抑制的概念算法 Vitali Petsiuk, Kate Saenko http://arxiv.org/pdf/2404.13706v1 null
2024-04-21 Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Hyper-SD:用于高效图像合成的轨迹分段一致性模型 Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao http://arxiv.org/pdf/2404.13686v1 null
2024-04-21 A Dataset and Model for Realistic License Plate Deblurring 真实车牌去模糊的数据集和模型 Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, Jingxin Liu, Siqi Huang, Hongbin Liu http://arxiv.org/pdf/2404.13677v1 null
2024-04-21 Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap 探索 AIGC 视频质量:关注视觉和谐、视频文本一致性和域分布差距 Bowen Qu, Xiaoyu Liang, Shangkun Sun, Wei Gao http://arxiv.org/pdf/2404.13573v1 null
2024-04-21 Exploring Diverse Methods in Visual Question Answering 探索视觉问答的多样化方法 Panfeng Li, Qikai Yang, Xieming Geng, Wenjing Zhou, Zhicheng Ding, Yi Nian http://arxiv.org/pdf/2404.13565v1 null
2024-04-21 Motion-aware Latent Diffusion Models for Video Frame Interpolation 用于视频帧插值的运动感知潜在扩散模型 Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang http://arxiv.org/pdf/2404.13534v1 null

多模态

Publish Date Title Title_CN Authors PDF Code
2024-04-21 Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images 迭代促使多模式法学硕士再现自然和人工智能生成的图像 Ali Naseh, Katherine Thai, Mohit Iyyer, Amir Houmansadr http://arxiv.org/pdf/2404.13784v1 null
2024-04-21 PEMMA: Parameter-Efficient Multi-Modal Adaptation for Medical Image Segmentation PEMMA:用于医学图像分割的参数高效多模态自适应 Nada Saadi, Numan Saeed, Mohammad Yaqub, Karthik Nandakumar http://arxiv.org/pdf/2404.13704v1 null
2024-04-21 A Complete System for Automated 3D Semantic-Geometric Mapping of Corrosion in Industrial Environments 用于工业环境中腐蚀的自动 3D 语义几何绘图的完整系统 Rui Pimentel de Figueiredo, Stefan Nordborg Eriksen, Ignacio Rodriguez, Simon Bøgh http://arxiv.org/pdf/2404.13691v1 null
2024-04-21 FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization FiLo:通过细粒度描述和高质量定位进行零样本异常检测 Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang http://arxiv.org/pdf/2404.13671v1 null
2024-04-21 LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing LMFNet:一种用于高分辨率遥感语义分割的高效多模态融合方法 Tong Wang, Guanzhou Chen, Xiaodong Zhang, Chenxi Liu, Xiaoliang Tan, Jiaqi Wang, Chanjuan He, Wenlin Zhou http://arxiv.org/pdf/2404.13659v1 null
2024-04-21 Video sentence grounding with temporally global textual knowledge 具有时间全局文本知识的视频句子基础 Cai Chen, Runzhong Zhang, Jianjun Gao, Kejun Wu, Kim-Hui Yap, Yi Wang http://arxiv.org/pdf/2404.13611v1 null
2024-04-21 MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning MARVEL:通过视觉评估和学习进行多维抽象和推理 Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara http://arxiv.org/pdf/2404.13591v1 null
2024-04-21 Listen Then See: Video Alignment with Speaker Attention 先听后看:视频与演讲者注意力对齐 Aviral Agrawal, Carlos Mateo Samudio Lezcano, Iqui Balam Heredia-Marin, Prabhdeep Singh Sethi http://arxiv.org/pdf/2404.13530v1 null

Nerf

Publish Date Title Title_CN Authors PDF Code
2024-04-21 Generalizable Novel-View Synthesis using a Stereo Camera 使用立体相机进行可推广的新颖视图合成 Haechan Lee, Wonjoon Jin, Seung-Hwan Baek, Sunghyun Cho http://arxiv.org/pdf/2404.13541v1 null

3DGS

Publish Date Title Title_CN Authors PDF Code
2024-04-21 GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal GScream:学习 3D 几何和特征一致的高斯泼溅以去除对象 Yuxin Wang, Qianyi Wu, Guofeng Zhang, Dan Xu http://arxiv.org/pdf/2404.13679v1 null

模型压缩/优化

Publish Date Title Title_CN Authors PDF Code
2024-04-21 Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation 强制公平表示学习和因果图像生成的条件独立性 Jensen Hwa, Qingyu Zhao, Aditya Lahiri, Adnan Masood, Babak Salimi, Ehsan Adeli http://arxiv.org/pdf/2404.13798v1 null
2024-04-21 EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder EncodeNet:利用熵驱动的广义转换自动编码器提高 DNN 准确性的框架 Hasanul Mahmud, Kevin Desai, Palden Lama, Sushil K. Prasad http://arxiv.org/pdf/2404.13770v1 null
2024-04-21 A Nasal Cytology Dataset for Object Detection and Deep Learning 用于目标检测和深度学习的鼻细胞学数据集 Mauro Camporeale, Giovanni Dimauro, Matteo Gelardi, Giorgia Iacobellis, Mattia Sebastiano Ladisa, Sergio Latrofa, Nunzia Lomonte http://arxiv.org/pdf/2404.13745v1 null
2024-04-21 Data-independent Module-aware Pruning for Hierarchical Vision Transformers 分层视觉变压器的数据独立模块感知修剪 Yang He, Joey Tianyi Zhou http://arxiv.org/pdf/2404.13648v1 null

分类/检测/识别/分割/...

Publish Date Title Title_CN Authors PDF Code
2024-04-21 AnyPattern: Towards In-context Image Copy Detection AnyPattern:面向上下文图像副本检测 Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang http://arxiv.org/pdf/2404.13788v1 null
2024-04-21 BC-MRI-SEG: A Breast Cancer MRI Tumor Segmentation Benchmark BC-MRI-SEG:乳腺癌 MRI 肿瘤分割基准 Anthony Bilic, Chen Chen http://arxiv.org/pdf/2404.13756v1 null
2024-04-21 Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation 基于语义重排的多级对齐领域广义分割 Guanlong Jiao, Chenyangguang Zhang, Haonan Yin, Yu Mo, Biqing Huang, Hui Pan, Yi Luo, Jingxian Liu http://arxiv.org/pdf/2404.13701v1 null
2024-04-21 PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images PV-S3:使用电致发光图像的半监督语义分割推进自动光伏缺陷检测 Abhishek Jha, Yogesh Rawat, Shruti Vyas http://arxiv.org/pdf/2404.13693v1 null
2024-04-21 MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition MathNet:一种以数据为中心的印刷数学表达式识别方法 Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, Alireza Darvishy http://arxiv.org/pdf/2404.13667v1 null
2024-04-21 Attack on Scene Flow using Point Clouds 使用点云攻击场景流 Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, Shohreh Kasaei http://arxiv.org/pdf/2404.13621v1 null
2024-04-21 Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence Turb-Seg-Res:大气湍流动态视频的分段然后恢复管道 Ripon Kumar Saha, Dehao Qin, Nianyi Li, Jinwei Ye, Suren Jayasuriya http://arxiv.org/pdf/2404.13605v1 null
2024-04-21 Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning 使用 Transformer 和对比学习重新思考任意风格迁移 Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo http://arxiv.org/pdf/2404.13584v1 null
2024-04-21 I2CANSAY:Inter-Class Analogical Augmentation and Intra-Class Significance Analysis for Non-Exemplar Online Task-Free Continual Learning I2CANSAY:非范例在线无任务持续学习的类间类比增强和类内显着性分析 Songlin Dong, Yingjie Chen, Yuhang He, Yuhan Jin, Alex C. Kot, Yihong Gong http://arxiv.org/pdf/2404.13576v1 null
2024-04-21 Cell Phone Image-Based Persian Rice Detection and Classification Using Deep Learning Techniques 使用深度学习技术进行基于手机图像的波斯大米检测和分类 Mahmood Saeedi kelishami, Amin Saeidi Kelishami, Sajjad Saeedi Kelishami http://arxiv.org/pdf/2404.13555v1 null
2024-04-21 Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation 静态中的动态:用于自监督视频对象分割的混合视觉对应 Gensheng Pei, Yazhou Yao, Jianbo Jiao, Wenguan Wang, Liqiang Nie, Jinhui Tang http://arxiv.org/pdf/2404.13505v1 null
2024-04-21 Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News 真实的情绪映射:真实新闻中面部表情的基准测试 Qixuan Zhang, Zhifeng Wang, Yang Liu, Zhenyue Qin, Kaihao Zhang, Sabrina Caldwell, Tom Gedeon http://arxiv.org/pdf/2404.13493v1 null

GNN

Publish Date Title Title_CN Authors PDF Code
2024-04-21 Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces Graph4GUI:用于表示图形用户界面的图神经网络 Yue Jiang, Changkong Zhou, Vikas Garg, Antti Oulasvirta http://arxiv.org/pdf/2404.13521v1 null

图像理解

Publish Date Title Title_CN Authors PDF Code
2024-04-21 MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions MLP:用于未修剪 3D 人体运动中时间句子定位的运动标签先验 Sheng Yan, Mengyuan Liu, Yong Wang, Yang Liu, Chen Chen, Hong Liu http://arxiv.org/pdf/2404.13657v1 null

LLM

Publish Date Title Title_CN Authors PDF Code
2024-04-21 SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities SVGEditBench:定量评估法学硕士 SVG 编辑能力的基准数据集 Kunato Nishina, Yusuke Matsui http://arxiv.org/pdf/2404.13710v1 null
2024-04-21 Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers 迷失在空间:探索视觉和语言重采样器中的细粒度空间理解 Georgios Pantazopoulos, Alessandro Suglia, Oliver Lemon, Arash Eshghi http://arxiv.org/pdf/2404.13594v1 null
2024-04-21 LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation LASER:无需调整的 LLM 驱动的注意力控制,可实现高效的文本条件图像到动画 Haoyu Zheng, Wenqiao Zhang, Yaoke Wang, Hao Zhou, Jiang Liu, Juncheng Li, Zheqi Lv, Siliang Tang, Yueting Zhuang http://arxiv.org/pdf/2404.13558v1 null

Transformer

Publish Date Title Title_CN Authors PDF Code
2024-04-21 PoseAnimate: Zero-shot high fidelity pose controllable character animation PoseAnimate:零镜头高保真姿势可控角色动画 Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Yu-Gang Jiang, Guo-Jun Qi http://arxiv.org/pdf/2404.13680v1 null
2024-04-21 Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer 超越对齐:通过解析引导时间相干变压​​器进行盲视频人脸恢复 Kepeng Xu, Li Xu, Gang He, Wenxin Yu, Yunsong Li http://arxiv.org/pdf/2404.13640v1 null
2024-04-21 LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions LTOS:通过自适应交叉注意融合进行布局可控的文本对象合成 Xiaoran Zhao, Tianhao Wu, Yu Lai, Zhiliang Tian, Zhen Huang, Yahui Liu, Zejiang He, Dongsheng Li http://arxiv.org/pdf/2404.13579v1 null
2024-04-21 Masked Latent Transformer with the Random Masking Ratio to Advance the Diagnosis of Dental Fluorosis 具有随机掩蔽比的掩蔽潜变压器促进氟牙症的诊断 Yun Wu, Hao Xu, Maohua Gu, Zhongchuan Jiang, Jun Xu, Youliang Tian http://arxiv.org/pdf/2404.13564v1 link
2024-04-21 Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes Pointsoup:用于大规模点云场景的高性能且极低解码延迟的学习几何编解码器 Kang You, Kai Liu, Li Yu, Pan Gao, Dandan Ding http://arxiv.org/pdf/2404.13550v1 null

其他

Publish Date Title Title_CN Authors PDF Code
2024-04-21 Autonomous Robot for Disaster Mapping and Victim Localization 用于灾害测绘和受害者定位的自主机器人 Michael Potter, Rahil Bhowal, Richard Zhao, Anuj Patel, Jingming Cheng http://arxiv.org/pdf/2404.13767v1 null
2024-04-21 Elucidating the Design Space of Dataset Condensation 阐明数据集压缩的设计空间 Shitong Shao, Zikai Zhou, Huanran Chen, Zhiqiang Shen http://arxiv.org/pdf/2404.13733v1 null
2024-04-21 A sustainable development perspective on urban-scale roof greening priorities and benefits 从可持续发展的角度看城市屋顶绿化的优先事项和效益 Jie Shao, Wei Yao, Lei Luo, Linzhou Zeng, Zhiyi He, Puzuo Wang, Huadong Guo http://arxiv.org/pdf/2404.13692v1 null
2024-04-21 Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition 通过高低频分解包围图像恢复和增强 Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan http://arxiv.org/pdf/2404.13537v1 null