-
Notifications
You must be signed in to change notification settings - Fork 9
/
CITATION.cff
77 lines (77 loc) · 3.09 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
cff-version: 1.2.0
license: MIT
title: vp-suite
abstract: >-
A general framework for video prediction in PyTorch.
type: software
message: >-
Please consider citing if you find our findings or
our repository helpful.
authors:
- family-names: Boltres
given-names: Andreas
affiliation: >-
Institute for Autonomous Intelligent Systems
(AIS), University of Bonn
preferred-citation:
type: article
year: 2022
title: >-
Video Prediction at Multiple Scales with
Hierarchical Recurrent Networks
abstract: >-
Autonomous systems not only need to understand their
current environment, but should also be able to predict
future actions conditioned on past states, for instance
based on captured camera frames. For certain tasks,
detailed predictions such as future video frames are
required in the near future, whereas for others it is
beneficial to also predict more abstract representations
for longer time horizons. However, existing video
prediction models mainly focus on forecasting detailed
possible outcomes for short time-horizons, hence being
of limited use for robot perception and spatial reasoning.
We propose Multi-Scale Hierarchical Prediction (MSPred),
a novel video prediction model able to forecast future
possible outcomes of different levels of granularity at
different time-scales simultaneously. By combining
spatial and temporal downsampling, MSPred is able to
efficiently predict abstract representations such as human
poses or object locations over long time horizons, while
still maintaining a competitive performance for video
frame prediction. In our experiments, we demonstrate that
our proposed model accurately predicts future video
frames as well as other representations (e.g. keypoints
or positions) on various scenarios, including bin-picking
scenes or action recognition datasets, consistently
outperforming popular approaches for video frame
prediction. Furthermore, we conduct an ablation study
to investigate the importance of the different modules
and design choices in MSPred. In the spirit of
reproducible research, we open-source VP-Suite, a general
framework for deep-learning-based video prediction, as
well as pretrained models to reproduce our results.
message: >-
Please consider citing if you find our findings or
our repository helpful.
authors:
- given-names: Ani
family-names: Karapetyan
affiliation: >-
Institute for Autonomous Intelligent Systems
(AIS), University of Bonn
- given-names: Angel
family-names: Villar-Corrales
affiliation: >-
Institute for Autonomous Intelligent Systems
(AIS), University of Bonn
- given-names: Andreas
family-names: Boltres
affiliation: >-
Institute for Autonomous Intelligent Systems
(AIS), University of Bonn
- given-names: Sven
family-names: Behnke
affiliation: >-
Institute for Autonomous Intelligent Systems
(AIS), University of Bonn