ECCV 2024 decisions are now available!
注1:欢迎各位大佬提交issue,分享ECCV 2024论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
想看ECCV 2024和最新最全的顶会工作,欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来!
- 3DGS(Gaussian Splatting)
- Mamba / SSM)
- Avatars
- Backbone
- CLIP
- MAE
- Embodied AI
- GAN
- GNN
- 多模态大语言模型(MLLM)
- 大语言模型(LLM)
- NAS
- OCR
- NeRF
- DETR
- Prompt
- 扩散模型(Diffusion Models)
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 异常检测(Anomaly Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像(Medical Image)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 自动驾驶(Autonomous Driving)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 3D生成(3D Generation)
- 视频理解(Video Understanding)
- 行为识别(Action Recognition)
- 行为检测(Action Detection)
- 文本检测(Text Detection)
- 知识蒸馏(Knowledge Distillation)
- 模型剪枝(Model Pruning)
- 图像压缩(Image Compression)
- 三维重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 轨迹预测(Trajectory Prediction)
- 车道线检测(Lane Detection)
- 图像描述(Image Captioning)
- 视觉问答(Visual Question Answering)
- 手语识别(Sign Language Recognition)
- 视频预测(Video Prediction)
- 新视点合成(Novel View Synthesis)
- Zero-Shot Learning(零样本学习)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 场景图生成(Scene Graph Generation)
- 计数(Counting)
- 隐式神经表示(Implicit Neural Representations)
- 图像质量评价(Image Quality Assessment)
- 视频质量评价(Video Quality Assessment)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
- Project: https://donydchen.github.io/mvsplat
- Paper: https://arxiv.org/abs/2403.14627
- Code:https://github.com/donydchen/mvsplat
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
- Paper: https://arxiv.org/abs/2404.01133
- Code: https://github.com/DekuLiuTesla/CityGaussian
FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
- Project: https://zehaozhu.github.io/FSGS/
- Paper: https://arxiv.org/abs/2312.00451
- Code: https://github.com/VITA-Group/FSGS
VideoMamba: State Space Model for Efficient Video Understanding
ZIGMA: A DiT-style Zigzag Mamba Diffusion Model
- Paper: https://arxiv.org/abs/2403.13802
- Code: https://taohu.me/zigma/
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
- Paper: https://arxiv.org/abs/2407.07764
- Code: https://github.com/SJTU-DeepVisionLab/PosFormer
Fully Sparse 3D Occupancy Prediction
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
- Project: https://nerf-mae.github.io/
- Paper: https://arxiv.org/pdf/2404.01300
- Code: https://github.com/zubair-irshad/NeRF-MAE
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
ControlCap: Controllable Region-level Captioning
ZIGMA: A DiT-style Zigzag Mamba Diffusion Model
- Paper: https://arxiv.org/abs/2403.13802
- Code: https://taohu.me/zigma/
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
- Paper: https://arxiv.org/abs/2403.16394
- Code: https://github.com/zdxdsw/skewed_relations_T2I
The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
- Project: https://ut-mao.github.io/noise.github.io/
- Paper: https://arxiv.org/abs/2312.08872
- Code: https://github.com/UT-Mao/Initial-Noise-Construction
GiT: Towards Generalist Vision Transformer through Universal Language Interface
GalLoP: Learning Global and Local Prompts for Vision-Language Models
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
- Paper: https://arxiv.org/abs/2407.11699v1
- Code: https://github.com/xiuqhou/Relation-DETR
- Dataset: https://huggingface.co/datasets/xiuqhou/SA-Det-100k
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
- Project: http://yuqianfu.com/CDFSOD-benchmark/
- Paper: https://arxiv.org/pdf/2402.03094
- Code: https://github.com/lovelyqian/CDFSOD-benchmark
Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
- Project: https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k
- Paper : https://arxiv.org/abs/2407.08813
- Dataset: https://drive.google.com/drive/u/1/folders/1huH93JVeXMj9rK6p1OZRub868vv0UK0O
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain
ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
- Project: https://scribbleprompt.csail.mit.edu/
- Paper: https://arxiv.org/abs/2312.07381
- Code: https://github.com/halleewong/ScribblePrompt
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
- Paper: https://arxiv.org/abs/2407.14754
- Code: https://github.com/cbmi-group/FFM-Multi-Decoder-Network
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
- Project: https://zhang-tao-whu.github.io/projects/DVIS_DAQ/
- Paper: https://arxiv.org/abs/2404.00086
- Code: https://github.com/zhang-tao-whu/DVIS_Plus
Fully Sparse 3D Occupancy Prediction
milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
4D Contrastive Superflows are Dense 3D Representation Learners
- Paper : https://arxiv.org/abs/2407.06190
- Code: https://github.com/Xiangxu-0103/SuperFlow
3D Small Object Detection with Dynamic Spatial Pruning
- Project: https://xuxw98.github.io/DSPDet3D/
- Paper: https://arxiv.org/abs/2305.03716
- Code: https://github.com/xuxw98/DSPDet3D
Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
- Project https://tencentarc.github.io/BrushNet/
- Paper: https://arxiv.org/abs/2403.06976
- Code: https://github.com/TencentARC/BrushNet
Restoring Images in Adverse Weather Conditions via Histogram Transformer
- Paper: https://arxiv.org/abs/2407.10172
- Code: https://github.com/sunshangquan/Histoformer
OneRestore: A Universal Restoration Framework for Composite Degradation
- Project https://gy65896.github.io/projects/ECCV2024_OneRestore
- Paper: https://arxiv.org/abs/2407.04621
- Code: https://github.com/gy65896/OneRestore
Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
- Project: https://kaminyou.com/Dense-Normalization/
- Paper: https://arxiv.org/abs/2407.04245
- Code: https://github.com/Kaminyou/Dense-Normalization
ZIGMA: A DiT-style Zigzag Mamba Diffusion Model
- Paper: https://arxiv.org/abs/2403.13802
- Code: https://taohu.me/zigma/
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
- Paper: https://arxiv.org/abs/2403.16394
- Code: https://github.com/zdxdsw/skewed_relations_T2I
VideoStudio: Generating Consistent-Content and Multi-Scene Videos
VideoMamba: State Space Model for Efficient Video Understanding
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders
Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
- Code: https://github.com/qingshi9974/ECCV2024-AdpatICMH
- Paper: http://arxiv.org/abs/2407.09853
Zero-shot Object Counting with Good Exemplars
Multi-branch Collaborative Learning Network for 3D Visual Grounding
PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
- Code: https://github.com/ananthu-aniraj/pdiscoformer
- Paper: https://arxiv.org/abs/2407.04538
SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
- Project: https://fraunhoferhhi.github.io/spvloc/
- Paper: https://arxiv.org/abs/2404.10527
- Code: https://github.com/fraunhoferhhi/spvloc
REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices