Skip to content

Commit

Permalink
renew
Browse files Browse the repository at this point in the history
  • Loading branch information
teowu committed Jul 31, 2023
1 parent cb38328 commit 7857960
Show file tree
Hide file tree
Showing 16 changed files with 101 additions and 296 deletions.
65 changes: 23 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
# DOVER

Official Codes, Demos, Models for the [Disentangled Objective Video Quality Evaluator (DOVER)](arxiv.org/abs/2211.04894v2).
Official Codes, Demos, Models for the [Disentangled Objective Video Quality Evaluator (DOVER)](arxiv.org/abs/2211.04894v3), state-of-the-art in UGC-VQA.

- 9 Feb, 2022: **DOVER-Mobile** is available! Evaluate on CPU with High Speed!
- 16 Jan, 2022: Full Training Code Available (include LVBS). See below.
- 19 Dec, 2022: Training Code for *Head-only Transfer Learning* is ready!! See [training](https://github.com/QualityAssessment/DOVER#training-adapt-dover-to-your-video-quality-dataset).
- 18 Dec, 2022: 感谢媒矿工厂提供中文解读。Thrid-party Chinese Explanation on this paper: [微信公众号](https://mp.weixin.qq.com/s/NZlyTwT7FAPkKhZUNc-30w).
- 17 Jul, 2023: DOVER has been accepted by ICCV2023. We will release the DIVIDE-3k dataset to train DOVER++ via fully-supervised LVBS soon.
- 9 Feb, 2023: **DOVER-Mobile** is available! Evaluate on CPU with Very High Speed!
- 16 Jan, 2023: Full Training Code Available (include LVBS). See below.
- 10 Dec, 2022: Now the evaluation tool can directly predict a fused score for any video. See [here](https://github.com/QualityAssessment/DOVER#new-get-the-fused-quality-score-for-use).


Expand All @@ -31,18 +30,20 @@ Official Codes, Demos, Models for the [Disentangled Objective Video Quality Eval
Corresponding video results can be found [here](https://github.com/QualityAssessment/DOVER/tree/master/figs).

The first attempt to disentangle the VQA problem into aesthetic and technical quality evaluations.
Official code for ArXiv Preprint Paper *"Disentangling Aesthetic and Technical Effects for Video Quality Assessment of User Generated Content"*.
Official code for [ICCV2023] Paper *"Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives"*.



## Introduction

### Problem Definition

*In-the-wild UGC-VQA is entangled by aesthetic and technical perspectives, which may result in different opinions on the term **QUALITY**.*

![Fig](figs/problem_definition.png)

### the proposed DOVER

*This inspires us to propose a simple and effective way to disengtangle the two perspectives from **EXISTING** UGC-VQA datasets.*

![Fig](figs/approach.png)

Expand Down Expand Up @@ -219,53 +220,33 @@ Or, just take a look at our training curves that are made public:
and welcome to reproduce them!


## Results

### Score-level Fusion

Directly training on LSVQ and testing on other datasets:

| | PLCC@LSVQ_1080p | PLCC@LSVQ_test | PLCC@LIVE_VQC | PLCC@KoNViD | MACs | config | model |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| DOVER | 0.830 | 0.889 | 0.855 | 0.883 | 282G | [config](dover.yml) | [github](https://github.com/teowu/DOVER/releases/download/v0.1.0/DOVER.pth) |

### Representation-level Fusion

Transfer learning on smaller datasets (as reproduced in current training code):

| | KoNViD-1k | CVD2014 | LIVE-VQC | YouTube-UGC |
| ---- | ---- | ---- | ---- | ---- |
| SROCC | 0.905 (0.906 in paper) | 0.894 | 0.855 (0.858 in paper) | 0.888 (0.880 in paper) |
| PLCC | 0.905 (0.909 in paper) | 0.908 | 0.875 (0.874 in paper) | 0.884 (0.874 in paper) |

LVBS is introduced in the representation-level fusion.



## Acknowledgement

Thanks for [Annan Wang](https://github.com/AnnanWangDaniel) for developing the interfaces for subjective studies.
Thanks for every participant of the studies!
Thanks for every participant of the subjective studies!

## Citation

Should you find our works interesting and would like to cite them, please feel free to add these in your references!
Should you find our work interesting and would like to cite it, please feel free to add these in your references!

```bibtex
@article{wu2022disentanglevqa,
title={Disentangling Aesthetic and Technical Effects for Video Quality Assessment of User Generated Content},
author={Wu, Haoning and Liao, Liang and Chen, Chaofeng and Hou, Jingwen and Wang, Annan and Sun, Wenxiu and Yan, Qiong and Lin, Weisi},
journal={arXiv preprint arXiv:2211.04894},
year={2022}
}

@article{wu2022fastquality,
```bibtex
%fastvqa
@inproceedings{wu2022fastvqa,
title={FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling},
author={Wu, Haoning and Chen, Chaofeng and Hou, Jingwen and Liao, Liang and Wang, Annan and Sun, Wenxiu and Yan, Qiong and Lin, Weisi},
journal={Proceedings of European Conference of Computer Vision (ECCV)},
booktitle ={Proceedings of European Conference of Computer Vision (ECCV)},
year={2022}
}
%dover
@inproceedings{wu2023dover,
title={Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives},
author={Wu, Haoning and Zhang, Erli and Liao, Liang and Chen, Chaofeng and Hou, Jingwen Hou and Wang, Annan and Sun, Wenxiu Sun and Yan, Qiong and Lin, Weisi},
year={2023},
booktitle={International Conference on Computer Vision (ICCV)},
}
@misc{end2endvideoqualitytool,
title = {Open Source Deep End-to-End Video Quality Assessment Toolbox},
author = {Wu, Haoning},
Expand Down
2 changes: 1 addition & 1 deletion dover/datasets/.ipynb_checkpoints/__init__-checkpoint.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
## API for DOVER and its variants
from .basic_datasets import *
from .fusion_datasets import *
from .dover_datasets import *
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,19 @@ def get_spatial_fragments(
random=False,
random_upsample=False,
fallback_type="upsample",
upsample=-1,
**kwargs,
):
if upsample > 0:
old_h, old_w = video.shape[-2], video.shape[-1]
if old_h >= old_w:
w = upsample
h = int(upsample * old_h / old_w)
else:
h = upsample
w = int(upsample * old_w / old_h)

video = get_resized_video(video, h, w)
size_h = fragments_h * fsize_h
size_w = fragments_w * fsize_w
## video: [C,T,H,W]
Expand All @@ -56,7 +67,7 @@ def get_spatial_fragments(
video / 255.0, scale_factor=randratio, mode="bilinear"
)
video = (video * 255.0).type_as(ovideo)

assert dur_t % aligned == 0, "Please provide match vclip and align index"
size = size_h, size_w

Expand Down Expand Up @@ -231,6 +242,7 @@ def spatial_temporal_view_decomposition(
video[stype] = torch.stack(imgs, 0).permute(3, 0, 1, 2)
del ovideo
else:
decord.bridge.set_bridge("torch")
vreader = VideoReader(video_path)
### Avoid duplicated video decoding!!! Important!!!!
all_frame_inds = []
Expand Down Expand Up @@ -319,6 +331,9 @@ def __init__(self, opt):

self.weight = opt.get("weight", 0.5)

self.fully_supervised = opt.get("fully_supervised", False)
print("Fully supervised:", self.fully_supervised)

self.video_infos = []
self.ann_file = opt["anno_file"]
self.data_prefix = opt["data_prefix"]
Expand Down Expand Up @@ -362,8 +377,11 @@ def __init__(self, opt):
with open(self.ann_file, "r") as fin:
for line in fin:
line_split = line.strip().split(",")
filename, _, _, label = line_split
label = float(label)
filename, a, t, label = line_split
if self.fully_supervised:
label = float(a), float(t), float(label)
else:
label = float(label)
filename = osp.join(self.data_prefix, filename)
self.video_infos.append(dict(filename=filename, label=label))
except:
Expand Down
2 changes: 1 addition & 1 deletion dover/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
## API for DOVER and its variants
from .basic_datasets import *
from .fusion_datasets import *
from .dover_datasets import *
Binary file modified dover/datasets/__pycache__/fusion_datasets.cpython-38.pyc
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,19 @@ def get_spatial_fragments(
random=False,
random_upsample=False,
fallback_type="upsample",
upsample=-1,
**kwargs,
):
if upsample > 0:
old_h, old_w = video.shape[-2], video.shape[-1]
if old_h >= old_w:
w = upsample
h = int(upsample * old_h / old_w)
else:
h = upsample
w = int(upsample * old_w / old_h)

video = get_resized_video(video, h, w)
size_h = fragments_h * fsize_h
size_w = fragments_w * fsize_w
## video: [C,T,H,W]
Expand All @@ -56,7 +67,7 @@ def get_spatial_fragments(
video / 255.0, scale_factor=randratio, mode="bilinear"
)
video = (video * 255.0).type_as(ovideo)

assert dur_t % aligned == 0, "Please provide match vclip and align index"
size = size_h, size_w

Expand Down Expand Up @@ -231,6 +242,7 @@ def spatial_temporal_view_decomposition(
video[stype] = torch.stack(imgs, 0).permute(3, 0, 1, 2)
del ovideo
else:
decord.bridge.set_bridge("torch")
vreader = VideoReader(video_path)
### Avoid duplicated video decoding!!! Important!!!!
all_frame_inds = []
Expand Down Expand Up @@ -319,6 +331,9 @@ def __init__(self, opt):

self.weight = opt.get("weight", 0.5)

self.fully_supervised = opt.get("fully_supervised", False)
print("Fully supervised:", self.fully_supervised)

self.video_infos = []
self.ann_file = opt["anno_file"]
self.data_prefix = opt["data_prefix"]
Expand Down Expand Up @@ -362,8 +377,11 @@ def __init__(self, opt):
with open(self.ann_file, "r") as fin:
for line in fin:
line_split = line.strip().split(",")
filename, _, _, label = line_split
label = float(label)
filename, a, t, label = line_split
if self.fully_supervised:
label = float(a), float(t), float(label)
else:
label = float(label)
filename = osp.join(self.data_prefix, filename)
self.video_infos.append(dict(filename=filename, label=label))
except:
Expand Down
1 change: 0 additions & 1 deletion dover/models/.ipynb_checkpoints/__init__-checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
"VQABackbone",
"IQABackbone",
"VQAHead",
"MaxVQAHead",
"IQAHead",
"VARHead",
"BaseEvaluator",
Expand Down
4 changes: 4 additions & 0 deletions dover/models/.ipynb_checkpoints/conv_backbone-checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
from timm.models.layers import trunc_normal_, DropPath
from timm.models.registry import register_model

from open_clip import CLIP3D
import open_clip

class GRN(nn.Module):
""" GRN (Global Response Normalization) layer
"""
Expand Down Expand Up @@ -635,6 +638,7 @@ def convnextv2_huge(**kwargs):
return model




if __name__ == "__main__":

Expand Down
Loading

0 comments on commit 7857960

Please sign in to comment.