Latent Coordinate Networks for Image and Video Memorization

We use a coordinate MLP with a learned latent code attached to the input in order to learn multiple images and videos with a single network. We use selected videos from the WAIC-TSR dataset, showing only our video results below:

Using a Coordinate MLP to Remember Multiple Videos

We use the same positional encoding scheme as the 2d image MLP, $\gamma(\mathbf{x}) = [\sin(2\pi\mathbf{Bx}), \cos(2\pi\mathbf{Bx})]^T$, only changing the smaller dimension of the $\mathbf{B}$ matrix from 2 to 3. We vary the hyperparameter $\sigma$, the standard deviation of elements in the positional encoding matrix $\mathbf{B}$. Notice the blurriness of the video generated without positional encoding, as well as the "static" texture of the $\sigma = 100$ positional encoding.

🔴 Note: if you are on mobile it may be helpful to zoom in on the videos. 🔴

Results After Learning 2 Videos

Ground Truth	No Pos. Enc.	$\sigma = 1$	$\sigma = 10$	$\sigma = 100$

Results After Learning 4 Videos

Ground Truth	No Pos. Enc.	$\sigma = 1$	$\sigma = 10$	$\sigma = 100$

Interpolation Between Latent Codes After Learning 2 Videos

Similar to our experiments for images, we also interpolate between latent codes and show the results below:

	0.0	0.25	0.5	0.75	1.0
No Pos. Enc.
$\sigma = 1$
$\sigma = 10$
$\sigma = 100$

Interpolation Between Latent Codes After Learning 4 Videos

Curiously, we notice that the interpolations after learning 4 videos are higher quality, having more faithful color and shape reconstruction than the above.

	0.0	0.25	0.5	0.75	1.0
No Pos. Enc.
$\sigma = 1$
$\sigma = 10$
$\sigma = 100$

Results After Learning a Single Video

To compare against the multi-video scenario, we also train MLPs which learn only a single video at a time.

Ground Truth	No Pos. Enc.	$\sigma = 1$	$\sigma = 10$	$\sigma = 100$

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
latent-2d-MLP		latent-2d-MLP
results		results
videoMLP		videoMLP
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Coordinate Networks for Image and Video Memorization

Using a Coordinate MLP to Remember Multiple Videos

Results After Learning 2 Videos

Results After Learning 4 Videos

Interpolation Between Latent Codes After Learning 2 Videos

Interpolation Between Latent Codes After Learning 4 Videos

Results After Learning a Single Video

About

Releases

Packages

Languages

oliveraw/542-final-project

Folders and files

Latest commit

History

Repository files navigation

Latent Coordinate Networks for Image and Video Memorization

Using a Coordinate MLP to Remember Multiple Videos

Results After Learning 2 Videos

Results After Learning 4 Videos

Interpolation Between Latent Codes After Learning 2 Videos

Interpolation Between Latent Codes After Learning 4 Videos

Results After Learning a Single Video

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages