Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | Universal Fingerprint Generation: Controllable Diffusion Model with Multimodal Conditions | 通用指纹生成:多模态条件下的可控扩散模型 | Steven A. Grosz, Anil K. Jain | http://arxiv.org/pdf/2404.13791v1 | null |
2024-04-21 | Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control | 文本到图像生成中的对象属性绑定:评估和控制 | Maria Mihaela Trusca, Wolf Nuyts, Jonathan Thomm, Robert Honig, Thomas Hofmann, Tinne Tuytelaars, Marie-Francine Moens | http://arxiv.org/pdf/2404.13766v1 | null |
2024-04-21 | ArtNeRF: A Stylized Neural Field for 3D-Aware Cartoonized Face Synthesis | ArtNeRF:用于 3D 感知卡通化人脸合成的风格化神经场 | Zichen Tang, Hongyu Yang | http://arxiv.org/pdf/2404.13711v1 | null |
2024-04-21 | Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models | 规避扩散模型中概念抑制的概念算法 | Vitali Petsiuk, Kate Saenko | http://arxiv.org/pdf/2404.13706v1 | null |
2024-04-21 | Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis | Hyper-SD:用于高效图像合成的轨迹分段一致性模型 | Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao | http://arxiv.org/pdf/2404.13686v1 | null |
2024-04-21 | A Dataset and Model for Realistic License Plate Deblurring | 真实车牌去模糊的数据集和模型 | Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, Jingxin Liu, Siqi Huang, Hongbin Liu | http://arxiv.org/pdf/2404.13677v1 | null |
2024-04-21 | Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap | 探索 AIGC 视频质量:关注视觉和谐、视频文本一致性和域分布差距 | Bowen Qu, Xiaoyu Liang, Shangkun Sun, Wei Gao | http://arxiv.org/pdf/2404.13573v1 | null |
2024-04-21 | Exploring Diverse Methods in Visual Question Answering | 探索视觉问答的多样化方法 | Panfeng Li, Qikai Yang, Xieming Geng, Wenjing Zhou, Zhicheng Ding, Yi Nian | http://arxiv.org/pdf/2404.13565v1 | null |
2024-04-21 | Motion-aware Latent Diffusion Models for Video Frame Interpolation | 用于视频帧插值的运动感知潜在扩散模型 | Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang | http://arxiv.org/pdf/2404.13534v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images | 迭代促使多模式法学硕士再现自然和人工智能生成的图像 | Ali Naseh, Katherine Thai, Mohit Iyyer, Amir Houmansadr | http://arxiv.org/pdf/2404.13784v1 | null |
2024-04-21 | PEMMA: Parameter-Efficient Multi-Modal Adaptation for Medical Image Segmentation | PEMMA:用于医学图像分割的参数高效多模态自适应 | Nada Saadi, Numan Saeed, Mohammad Yaqub, Karthik Nandakumar | http://arxiv.org/pdf/2404.13704v1 | null |
2024-04-21 | A Complete System for Automated 3D Semantic-Geometric Mapping of Corrosion in Industrial Environments | 用于工业环境中腐蚀的自动 3D 语义几何绘图的完整系统 | Rui Pimentel de Figueiredo, Stefan Nordborg Eriksen, Ignacio Rodriguez, Simon Bøgh | http://arxiv.org/pdf/2404.13691v1 | null |
2024-04-21 | FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization | FiLo:通过细粒度描述和高质量定位进行零样本异常检测 | Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang | http://arxiv.org/pdf/2404.13671v1 | null |
2024-04-21 | LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing | LMFNet:一种用于高分辨率遥感语义分割的高效多模态融合方法 | Tong Wang, Guanzhou Chen, Xiaodong Zhang, Chenxi Liu, Xiaoliang Tan, Jiaqi Wang, Chanjuan He, Wenlin Zhou | http://arxiv.org/pdf/2404.13659v1 | null |
2024-04-21 | Video sentence grounding with temporally global textual knowledge | 具有时间全局文本知识的视频句子基础 | Cai Chen, Runzhong Zhang, Jianjun Gao, Kejun Wu, Kim-Hui Yap, Yi Wang | http://arxiv.org/pdf/2404.13611v1 | null |
2024-04-21 | MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning | MARVEL:通过视觉评估和学习进行多维抽象和推理 | Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara | http://arxiv.org/pdf/2404.13591v1 | null |
2024-04-21 | Listen Then See: Video Alignment with Speaker Attention | 先听后看:视频与演讲者注意力对齐 | Aviral Agrawal, Carlos Mateo Samudio Lezcano, Iqui Balam Heredia-Marin, Prabhdeep Singh Sethi | http://arxiv.org/pdf/2404.13530v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | Generalizable Novel-View Synthesis using a Stereo Camera | 使用立体相机进行可推广的新颖视图合成 | Haechan Lee, Wonjoon Jin, Seung-Hwan Baek, Sunghyun Cho | http://arxiv.org/pdf/2404.13541v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal | GScream:学习 3D 几何和特征一致的高斯泼溅以去除对象 | Yuxin Wang, Qianyi Wu, Guofeng Zhang, Dan Xu | http://arxiv.org/pdf/2404.13679v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation | 强制公平表示学习和因果图像生成的条件独立性 | Jensen Hwa, Qingyu Zhao, Aditya Lahiri, Adnan Masood, Babak Salimi, Ehsan Adeli | http://arxiv.org/pdf/2404.13798v1 | null |
2024-04-21 | EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder | EncodeNet:利用熵驱动的广义转换自动编码器提高 DNN 准确性的框架 | Hasanul Mahmud, Kevin Desai, Palden Lama, Sushil K. Prasad | http://arxiv.org/pdf/2404.13770v1 | null |
2024-04-21 | A Nasal Cytology Dataset for Object Detection and Deep Learning | 用于目标检测和深度学习的鼻细胞学数据集 | Mauro Camporeale, Giovanni Dimauro, Matteo Gelardi, Giorgia Iacobellis, Mattia Sebastiano Ladisa, Sergio Latrofa, Nunzia Lomonte | http://arxiv.org/pdf/2404.13745v1 | null |
2024-04-21 | Data-independent Module-aware Pruning for Hierarchical Vision Transformers | 分层视觉变压器的数据独立模块感知修剪 | Yang He, Joey Tianyi Zhou | http://arxiv.org/pdf/2404.13648v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | AnyPattern: Towards In-context Image Copy Detection | AnyPattern:面向上下文图像副本检测 | Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang | http://arxiv.org/pdf/2404.13788v1 | null |
2024-04-21 | BC-MRI-SEG: A Breast Cancer MRI Tumor Segmentation Benchmark | BC-MRI-SEG:乳腺癌 MRI 肿瘤分割基准 | Anthony Bilic, Chen Chen | http://arxiv.org/pdf/2404.13756v1 | null |
2024-04-21 | Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation | 基于语义重排的多级对齐领域广义分割 | Guanlong Jiao, Chenyangguang Zhang, Haonan Yin, Yu Mo, Biqing Huang, Hui Pan, Yi Luo, Jingxian Liu | http://arxiv.org/pdf/2404.13701v1 | null |
2024-04-21 | PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images | PV-S3:使用电致发光图像的半监督语义分割推进自动光伏缺陷检测 | Abhishek Jha, Yogesh Rawat, Shruti Vyas | http://arxiv.org/pdf/2404.13693v1 | null |
2024-04-21 | MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition | MathNet:一种以数据为中心的印刷数学表达式识别方法 | Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, Alireza Darvishy | http://arxiv.org/pdf/2404.13667v1 | null |
2024-04-21 | Attack on Scene Flow using Point Clouds | 使用点云攻击场景流 | Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, Shohreh Kasaei | http://arxiv.org/pdf/2404.13621v1 | null |
2024-04-21 | Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence | Turb-Seg-Res:大气湍流动态视频的分段然后恢复管道 | Ripon Kumar Saha, Dehao Qin, Nianyi Li, Jinwei Ye, Suren Jayasuriya | http://arxiv.org/pdf/2404.13605v1 | null |
2024-04-21 | Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning | 使用 Transformer 和对比学习重新思考任意风格迁移 | Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo | http://arxiv.org/pdf/2404.13584v1 | null |
2024-04-21 | I2CANSAY:Inter-Class Analogical Augmentation and Intra-Class Significance Analysis for Non-Exemplar Online Task-Free Continual Learning | I2CANSAY:非范例在线无任务持续学习的类间类比增强和类内显着性分析 | Songlin Dong, Yingjie Chen, Yuhang He, Yuhan Jin, Alex C. Kot, Yihong Gong | http://arxiv.org/pdf/2404.13576v1 | null |
2024-04-21 | Cell Phone Image-Based Persian Rice Detection and Classification Using Deep Learning Techniques | 使用深度学习技术进行基于手机图像的波斯大米检测和分类 | Mahmood Saeedi kelishami, Amin Saeidi Kelishami, Sajjad Saeedi Kelishami | http://arxiv.org/pdf/2404.13555v1 | null |
2024-04-21 | Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation | 静态中的动态:用于自监督视频对象分割的混合视觉对应 | Gensheng Pei, Yazhou Yao, Jianbo Jiao, Wenguan Wang, Liqiang Nie, Jinhui Tang | http://arxiv.org/pdf/2404.13505v1 | null |
2024-04-21 | Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News | 真实的情绪映射:真实新闻中面部表情的基准测试 | Qixuan Zhang, Zhifeng Wang, Yang Liu, Zhenyue Qin, Kaihao Zhang, Sabrina Caldwell, Tom Gedeon | http://arxiv.org/pdf/2404.13493v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces | Graph4GUI:用于表示图形用户界面的图神经网络 | Yue Jiang, Changkong Zhou, Vikas Garg, Antti Oulasvirta | http://arxiv.org/pdf/2404.13521v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions | MLP:用于未修剪 3D 人体运动中时间句子定位的运动标签先验 | Sheng Yan, Mengyuan Liu, Yong Wang, Yang Liu, Chen Chen, Hong Liu | http://arxiv.org/pdf/2404.13657v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities | SVGEditBench:定量评估法学硕士 SVG 编辑能力的基准数据集 | Kunato Nishina, Yusuke Matsui | http://arxiv.org/pdf/2404.13710v1 | null |
2024-04-21 | Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers | 迷失在空间:探索视觉和语言重采样器中的细粒度空间理解 | Georgios Pantazopoulos, Alessandro Suglia, Oliver Lemon, Arash Eshghi | http://arxiv.org/pdf/2404.13594v1 | null |
2024-04-21 | LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation | LASER:无需调整的 LLM 驱动的注意力控制,可实现高效的文本条件图像到动画 | Haoyu Zheng, Wenqiao Zhang, Yaoke Wang, Hao Zhou, Jiang Liu, Juncheng Li, Zheqi Lv, Siliang Tang, Yueting Zhuang | http://arxiv.org/pdf/2404.13558v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | PoseAnimate: Zero-shot high fidelity pose controllable character animation | PoseAnimate:零镜头高保真姿势可控角色动画 | Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Yu-Gang Jiang, Guo-Jun Qi | http://arxiv.org/pdf/2404.13680v1 | null |
2024-04-21 | Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer | 超越对齐:通过解析引导时间相干变压器进行盲视频人脸恢复 | Kepeng Xu, Li Xu, Gang He, Wenxin Yu, Yunsong Li | http://arxiv.org/pdf/2404.13640v1 | null |
2024-04-21 | LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions | LTOS:通过自适应交叉注意融合进行布局可控的文本对象合成 | Xiaoran Zhao, Tianhao Wu, Yu Lai, Zhiliang Tian, Zhen Huang, Yahui Liu, Zejiang He, Dongsheng Li | http://arxiv.org/pdf/2404.13579v1 | null |
2024-04-21 | Masked Latent Transformer with the Random Masking Ratio to Advance the Diagnosis of Dental Fluorosis | 具有随机掩蔽比的掩蔽潜变压器促进氟牙症的诊断 | Yun Wu, Hao Xu, Maohua Gu, Zhongchuan Jiang, Jun Xu, Youliang Tian | http://arxiv.org/pdf/2404.13564v1 | link |
2024-04-21 | Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes | Pointsoup:用于大规模点云场景的高性能且极低解码延迟的学习几何编解码器 | Kang You, Kai Liu, Li Yu, Pan Gao, Dandan Ding | http://arxiv.org/pdf/2404.13550v1 | null |
Publish Date | Title | Title_CN | Authors | Code | |
---|---|---|---|---|---|
2024-04-21 | Autonomous Robot for Disaster Mapping and Victim Localization | 用于灾害测绘和受害者定位的自主机器人 | Michael Potter, Rahil Bhowal, Richard Zhao, Anuj Patel, Jingming Cheng | http://arxiv.org/pdf/2404.13767v1 | null |
2024-04-21 | Elucidating the Design Space of Dataset Condensation | 阐明数据集压缩的设计空间 | Shitong Shao, Zikai Zhou, Huanran Chen, Zhiqiang Shen | http://arxiv.org/pdf/2404.13733v1 | null |
2024-04-21 | A sustainable development perspective on urban-scale roof greening priorities and benefits | 从可持续发展的角度看城市屋顶绿化的优先事项和效益 | Jie Shao, Wei Yao, Lei Luo, Linzhou Zeng, Zhiyi He, Puzuo Wang, Huadong Guo | http://arxiv.org/pdf/2404.13692v1 | null |
2024-04-21 | Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition | 通过高低频分解包围图像恢复和增强 | Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan | http://arxiv.org/pdf/2404.13537v1 | null |