We use a coordinate MLP with a learned latent code attached to the input in order to learn multiple images and videos with a single network. We use selected videos from the WAIC-TSR dataset, showing only our video results below:
We use the same positional encoding scheme as the 2d image MLP,
🔴 Note: if you are on mobile it may be helpful to zoom in on the videos. 🔴
Ground Truth | No Pos. Enc. | |||
---|---|---|---|---|
Ground Truth | No Pos. Enc. | |||
---|---|---|---|---|
Similar to our experiments for images, we also interpolate between latent codes and show the results below:
0.0 | 0.25 | 0.5 | 0.75 | 1.0 | |
---|---|---|---|---|---|
No Pos. Enc. | |||||
Curiously, we notice that the interpolations after learning 4 videos are higher quality, having more faithful color and shape reconstruction than the above.
0.0 | 0.25 | 0.5 | 0.75 | 1.0 | |
---|---|---|---|---|---|
No Pos. Enc. | |||||
To compare against the multi-video scenario, we also train MLPs which learn only a single video at a time.
Ground Truth | No Pos. Enc. | |||
---|---|---|---|---|