GitHub - gongzix/NeuroClips: Official code base for NeuroClips

NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

NeuroClips is a novel framework for fMRI-to-video decoding (NeurIPS 2024 Oral). If you like our project, please give us a star ⭐.

🛠️ Method

📣 News

Dec. 3, 2024. Full Codes release.
Nov. 30, 2024. Pre-processed code and dataset release.
Sep. 26, 2024. Accepted by NeurIPS 2024 for Oral Presentation.
May. 24, 2024. Project release.

Data Preprocessing

We use the public cc2017(Wen) dataset from this. You can download and follow the official preprocess to only deal with your fMRI data. Only use movie_fmri_data_processing.m and movie_fmri_reproducibility.m, and notice that the selected voxels(Bonferroni correction, P < 0.05) were more than before(Bonferroni correction, P < 0.01).

We also offer our pre-processed fMRI data and frames sampled from videos for training in NeuroClips, and you can directly download them from .

You can use python src/caption.py to generate the captions.

Installation

We recommend using the virtual environment for Neuroclips training, inference keyframes, and blurry videos separately from the pre-trained T2V diffusion's virtual environment to avoid any conflict issue of different environment package versions.

For Neuroclips:

. src/setup.sh

For pre-trained AnimateDiffusion, you can follow this:

conda create -n animatediff python==3.10
conda activate animatediff
cd AnimateDiff
pip install -r requirements.txt

Train Semantic Reconstructor

We suggest training the backbone first and then the prior to achieve better Semantic Reconstructor.

conda activate neuroclips
python src/train_SR.py --subj 1 --batch_size 240 --num_epochs 30 --mixup_pct 1.0 --max_lr 1e-4 --use_text
python src/train_SR.py --subj 1 --batch_size 64 --num_epochs 150 --mixup_pct 0.0 --max_lr 3e-4 --use_prior --use_text

Train Perception Reconstructor

python src/train_PR.py --subj 1 --batch_size 40 --mixup_pct 0.0 --num_epochs 80

Reconstruct Keyframe

python src/recon_keyframe.py --subj 1

After keyframes are generated, you could use BLIP-2:python src/caption.py to get captions of keyframes.

Reconstruct Blurry Video

python src/recon_blurry.py --subj 1

Reconstruct Videos

After preparing all the inputs, you can reconstruct the video. You can use any pre-trained T2V or V2V model. We are using the T2V pre-trained model AnimateDiffusion here, specifically SparseCtrl for first-frame guidance.

conda activate animatediff
cd Animatediff
python -m scripts.neuroclips --config configs/NeuroClips/control.yaml

The pre-trained weights you should prepare are in here.

🖼️ Reconstruction Demos

Human Behavior

GT	Ours	GT	Ours	GT	Ours

GT	Ours	GT	Ours	GT	Ours

Animals

GT	Ours	GT	Ours	GT	Ours

GT	Ours	GT	Ours	GT	Ours

Traffic

GT	Ours	GT	Ours	GT	Ours

Natural Scene

GT	Ours	GT	Ours	GT	Ours

Multi-fMRI Fusion

With the help of NeuroClips’ SR, we explored the generation of longer videos for the first time. Since the technical field of long video generation is still immature, we chose a more straightforward fusion strategy that does not require additional GPU training. In the inference process, we consider the semantic similarity of two reconstructed keyframes from two neighboring fMRI samples (here we directly determine whether they belong to the same class of objects, e.g., both are jellyfish). If semantically similar, we replace the keyframe of the latter fMRI with the tail-frame of the former fMRI’s reconstructed video, which will be taken as the first-frame of the latter fMRI to generate the video.

Fail Cases

Overall the fail cases can be divided into two categories: on the one hand, the semantics are not accurate enough and on the other hand, the scene transition affects the generated results.

Pixel Control & Semantic Deficit

In CC2017 dataset, the video clips in the testing movie were different from those in the training movie, and there were even some categories of objects that didn't appear in the training set. However thanks to NeuroClips' Perceptual Reconstructor, we can still reconstruct the video at a low-level of vision.

GT	Ours	GT	Ours

Scene Transitions

Due to the low-temporal resolution of fMRI (i.e., 2s), a segment of fMRI may include two video scenes, leading to semantic confusion in the video reconstruction, or even semantic and perceptual fusion, as shown in the following image of a jellyfish transitioning to the moon, which ultimately generates a jellyfish with a black background.

GT	Ours	GT	Ours

BibTeX

@article{gong2024neuroclips,
  title={NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction},
  author={Gong, Zixuan and Bao, Guangyin and Zhang, Qi and Wan, Zhongwei and Miao, Duoqian and Wang, Shoujin and Zhu, Lei and Wang, Changwei and Xu, Rongtao and Hu, Liang and others},
  journal={arXiv preprint arXiv:2410.19452},
  year={2024}
}

Acknowledgement

We sincerely thank the following authors, and Neuroclips is based on their excellent open-source projects or impressive ideas.

T2V diffusion: https://github.com/guoyww/AnimateDiff

Excellent Backbone: https://github.com/MedARC-AI/MindEyeV2

Temporal Design: https://arxiv.org/abs/2304.08818

Keyframe Captioning: https://github.com/salesforce/LAVIS/tree/main/projects/blip2

Dataset and Pre-processed code: https://purr.purdue.edu/publications/2809

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
Animatediff		Animatediff
assets		assets
preprocess		preprocess
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

NeuroClips is a novel framework for fMRI-to-video decoding (NeurIPS 2024 Oral). If you like our project, please give us a star ⭐.

🛠️ Method

📣 News

Data Preprocessing

Installation

Train Semantic Reconstructor

Train Perception Reconstructor

Reconstruct Keyframe

Reconstruct Blurry Video

Reconstruct Videos

🖼️ Reconstruction Demos

Human Behavior

Animals

Traffic

Natural Scene

Multi-fMRI Fusion

Fail Cases

Pixel Control & Semantic Deficit

Scene Transitions

BibTeX

Acknowledgement

About

Releases

Packages

Languages

gongzix/NeuroClips

Folders and files

Latest commit

History

Repository files navigation

NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

NeuroClips is a novel framework for fMRI-to-video decoding (NeurIPS 2024 Oral). If you like our project, please give us a star ⭐.

🛠️ Method

📣 News

Data Preprocessing

Installation

Train Semantic Reconstructor

Train Perception Reconstructor

Reconstruct Keyframe

Reconstruct Blurry Video

Reconstruct Videos

🖼️ Reconstruction Demos

Human Behavior

Animals

Traffic

Natural Scene

Multi-fMRI Fusion

Fail Cases

Pixel Control & Semantic Deficit

Scene Transitions

BibTeX

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages