Skip to content

maorash/Local-Global-INRs

Repository files navigation

Local-Global INRs - Towards Croppable Implicit Neural Representations

Abstract

Implicit Neural Representations (INRs) have peaked interest in recent years due to their ability to encode natural signals using neural networks. While INRs allow for useful applications such as interpolating new coordinates and signal compression, their black-box nature makes it difficult to modify them post-training. In this paper we explore the idea of editable INRs, and specifically focus on the widely used cropping operation. To this end, we present Local-Global SIRENs -- a novel INR architecture that supports cropping by design. Local-Global SIRENs are based on combining local and global feature extraction for signal encoding. What makes their design unique is the ability to effortlessly remove specific portions of an encoded signal, with a proportional weight decrease. This is achieved by eliminating the corresponding weights from the network, without the need for retraining. We further show how this architecture can be used to support the straightforward extension of previously encoded signals. Beyond signal editing, we examine how the Local-Global approach can accelerate training, enhance encoding on various signals, improve downstream performance, and be applied to modern INRs such as INCODE, highlighting its potential and flexibility.

Table of Contents

Technological Overview and Sources

  • Most of our codebase is based on SIREN:
  • Audio and video data samples under the data directory are taken from the official SIREN repository.
  • INCODE experiments are based on the official implementation:
  • The DIV2K assets used in the experiments in the paper are under data/DIV2K
    • data/DIV2K/DIV2K_subset.txt is the list of 25 randomly selected images, used as a subset for the image encoding experiments.
      • Due to the supplementary material size limitation, we are not able to attach the images themselves.
      • However, the full DIV2K dataset is available at DIV2K, and the subset can be created by choosing the images provided in the list.
    • data/DIV2K/denoising contains the image used in the denoising experiment.
    • data/DIV2K/superresolution contains the image used in the super-resolution experiment.
    • The license agreement for DIV2K is mentioned in their website (academic use only).
  • The image used for CT reconstruction is data/img_377_ct_reconstruction.png
  • The Lucy shape used for 3D encoding is under data/preprocessed_lucy.npy
  • For configuration management, we use pyrallis.
  • WandB (Weights and Biases) is integrated into the code for experiment tracking and visualization. You can deactivate WandB by setting the --use_wandb False option.

Installation

Run the following commands to set up the environment:

conda env create -f environment.yml
conda activate lgsirens

Training on Images, Audio and Videos

Refer to the appendix of the paper for the hyperparameters used for training.

  • The type of network is passed using --mode. The options are lg for Local-Global SIREN, lc for SIREN-per-Partition, and mlp for SIREN.
  • Global hidden features are passed using --global_hidden_features
  • Local hidden features are passed using --hidden_features.
    • Note that the script expects the number of local hidden features to be a multiplied by the number of partitions. For example, if you want to use 14 local hidden features per partition for a total of 16*16=256 partitions, you should pass --hidden_features 3584 (14*256=3584).
  • The number of partitions is passed using --downsample. For example:
    • For an image encoding task, if you want to use 16*16=256 partitions, you should pass --downsample [16,16].
    • For an audio encoding task, if you want to use 32 partitions, you should pass --downsample [1,32,1].
    • For a video encoding task, if you want to use 5*16*16=1280 partitions, you should pass --downsample [5,16,16].
  • The number of overlapping coordinates in each dimension is passed using '--overlaps'. For example:
    • For a video encoding task, if you want to sample from 2 adjacent frames and from 1 pixel in each spatial dimension, you should pass --overlaps [2,1,1] (this is what we used in the paper).
  • For video encoding tasks, the fraction of sampled pixels in each iteration is passed using --sample_frac. For example, if you want to sample 2% of the pixels in each iteration, you should pass --sample_frac 0.02.
  • Experiment scripts are run from the experiments_scripts directory:

We present examples for image, audio, and video encoding tasks. The examples are based on the hyperparameters used in the paper.

Example - Image Encoding

To train the model to encode an image, run the following command:

# From the experiment_scripts directory

# Local-Global SIREN
PYTHONPATH=../ python train_img.py --experiment_name test_image_lg --lr 5e-4 --num_epochs 1001 --hidden_features 3584 --epochs_til_ckpt 1000 --mode lg --global_hidden_features 84 --downsample [16,16]

# SIREN-per-Partition
PYTHONPATH=../ python train_img.py --experiment_name test_image_lc --lr 5e-4 --num_epochs 1001 --hidden_features 3840 --epochs_til_ckpt 1000 --mode lc --downsample [16,16]

# SIREN
PYTHONPATH=../ python train_img.py --experiment_name test_image_mlp --lr 5e-4 --num_epochs 1001 --hidden_features 256 --epochs_til_ckpt 1000 --mode mlp

Example - Audio Encoding

To train the model to encode an audio file, run the following command:

# From the experiment_scripts directory

# Local-Global SIREN
PYTHONPATH=../ python train_img.py --experiment_name test_audio_lg --lr 1e-4 --num_epochs 1001 --hidden_features 1344 --epochs_til_ckpt 1000 --mode lg --global_hidden_features 72 --downsample [1,32,1]

# SIREN-per-Partition
PYTHONPATH=../ python train_img.py --experiment_name test_audio_lc --lr 1e-4 --num_epochs 1001 --hidden_features 1440 --epochs_til_ckpt 1000 --mode lc --downsample [1,32,1]

# SIREN
PYTHONPATH=../ python train_img.py --experiment_name test_audio_siren --lr 1e-4 --num_epochs 1001 --hidden_features 256 --epochs_til_ckpt 1000 --mode mlp

Example - Video Encoding

To train the model to encode a video file, run the following command:

# From the experiment_scripts directory

# Local-Global SIREN
python train_video.py --experiment_name test_video_lg --num_epochs 5001 --epochs_til_ckpt 1000 --mode lg --dataset cat --downsample [5,8,8] --overlaps [1,2,2] --hidden_features 17600 --global_hidden_features 180 --sample_frac 0.02

# SIREN-per-Partition
python train_video.py --experiment_name test_video_lc --num_epochs 5001 --epochs_til_ckpt 1000 --mode lc --dataset cat --downsample [5,8,8] --overlaps [1,2,2] --hidden_features 17920 --sample_frac 0.02

# SIREN
python train_video.py --experiment_name test_video_siren --num_epochs 5001 --steps_til_summary 1000 --epochs_til_ckpt 5000 --mode mlp --dataset cat --hidden_features 1030

Encoding Images with Automatic Partitioning

There are three parameters that control the automatic partitioning logic:

  • First and foremost, the --partition_size argument sets the desired partition size
  • The --auto_total_number_of_parameters argument sets the required total number of parameters in the network
    • It is set by default to 200k, and might deviate by roughly 5-10%
    • This is equivalent to an MLP with 3 hidden layers and 256 hidden units
  • The --auto_global_weights_factor argument sets the ratio of global weights to the entire network
    • It is set by default to 0.1, and might deviate by roughly 5-10%
    • We use the same ratio for the global weights when encoding images throughout the paper

Example

To train the model to encode an image with automatic partitioning on a DIV2K image, downsampled by 4 before training, run the following command:

# From the experiment_scripts directory

# Local-Global SIREN
PYTHONPATH=../ python train_img.py --experiment_name test_image_lg --dataset_path <PATH_TO_DIV2K_IMAGE> --image_resolution_factor 4 --num_epochs 2001 --steps_til_summary 200 --epochs_til_ckpt 1000 --mode lg --partition_size [32,32]

Debug Outputs

During training, debug outputs are logged to the logs/ directory by default (this can be changed using the --logging_root flag). We log both the metrics and the visualizations using Tensorboard. When --use_wandb is passed, we also log the metrics to Weights and Biases.

Decoding and Cropping Videos

To decode a video using the trained model, use similar configuration as before, but pass the --decode flag and the --checkpoint_path flag. For example:

# From the experiment_scripts directory

# Local-Global SIREN
python decode_video.py --experiment_name test_video_lg --mode lg --dataset cat --downsample 5 8 8 --overlaps 2 1 1 --hidden_features 17600 --global_hidden_features 180 --decode --checkpoint_path model.pth

To crop specific partitions of the video, use any one (or more) of the following flags:

  • --crop_entire_dim_values: A list of three lists. Crop partitions based on indices in specific dimensions, across all the signal. For example:
    • To crop the entire spatial border of the video (across all frames), assuming the video was downsampled by a factor of 8 in each spatial dimension, pass --crop_entire_dim_values [[],[0,7],[0,7]].
    • To crop the second and third partitions in the temporal dimension (across all spatial locations), pass --crop_entire_dim_values [[1,2],[],[]].
  • '--crop_partition_indices': A list of lists of size three. Crop specific partitions based on their indices (in this case, each partition is indexed by three coordinates). For example:
    • To crop partition at the start of the video, in the bottom right corner (assuming the video was downsampled by a factor of 8 in each spatial dimension), pass --crop_partition_indices [[0,7,7]].

3D Shape Encoding

The 3D shape encoding code is based on the official INCODE implementation. There is a notebook available under the experiment_scripts directory called train_3d.ipynb that can be used to encode Lucy, as seen in the paper.

Local-Global INCODE and Downstream tasks

All downstream task code is available in the incode_experiments directory.

Note: there are additional requirements that need to be installed to run the INCODE experiments, listed in the incode_experiments/requirements.txt file.

The code is based on the official INCODE implementation. There are three notebooks, one for each downstream task:

  • denoising.ipynb: Denoising experiment
  • superresolution.ipynb: Super-resolution experiment
  • ct_reconstruction.ipynb: CT reconstruction experiment

The notebooks are ready to run, and the results can be reproduced by following the notebooks.

Citation

If you use this code or paper for your research, please cite the following:

@misc{ashkenazi2024croppableimplicitneuralrepresentations,
      title={Towards Croppable Implicit Neural Representations}, 
      author={Maor Ashkenazi and Eran Treister},
      year={2024},
      eprint={2409.19472},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.19472}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published