Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the data process #5

Open
SecondHupuJR opened this issue Aug 12, 2022 · 5 comments
Open

Questions about the data process #5

SecondHupuJR opened this issue Aug 12, 2022 · 5 comments

Comments

@SecondHupuJR
Copy link

Congrats on the great work! This is a great work. I beg some replies to my confusion. In the code line 77, utils.py, new_t is divided by interval, which is (total_end-total_start)/num_frame, and num_frame here is the num_bins (set to 16) according to the code.

idx = np.floor(new_t / interval).astype(int)

If what I think is right, in the C channel of event tensor N,2C,H,W, there will be some channel with only zeros in it. Because ts is always smaller than total_end. Is this the situation?

Besides, The shape of the event tensor for the model will be (batchsize49, 162, H, W), which is quite big. Is this the reason that experiments on GoPro are conducted on 160x320? Thank you for the good work!

@XiangZ-0
Copy link
Owner

Hello SecondHupuJR, thanks for your interest in our work.
You are right about the event tensors, each of which is constructed based on (total_end-total_start)/num_frame and sometimes contains zeros in it. This design ensures the same temporal resolution in different event tensors, which is important since we use weight-sharing LDI networks. Furthermore, it enables arbitrary choice of the target timestamp without changing the input shape of event tensor.
For the GoPro experiments, it is feasible to conduct experiments on the full image size of the original REDS dataset. For example, we can crop the input data to 256x256 or 128x128 for training despite the shape of event tensor being big. And we use 160x320 just for efficient training and testing, as the amount of simulated event data at full image size is quite large.

@SecondHupuJR
Copy link
Author

Hi XiangZ-0, many thanks for the reply!

In my second question, what I mean is that the event tensor's shape is (batchsize*49, 162, H, W), which is quite large. For the images in GoPro (1280x720), is it possible to inference the full image? in that case the H and W will be 720 and 1280, respectively.

@XiangZ-0
Copy link
Owner

XiangZ-0 commented Aug 16, 2022

Actually, event tensors with the shape (batchsizex49, num_binsx2x4, H, W) are only needed for training, and the number 49 means that we recover 49 latent frames simultaneously for computing the blur-sharp loss. During inference, it is ok to process 1280x720 images since each LDI network only takes (batchsize (usually set to 1), num_binsx2, H, W) as input, so that would not be a problem.
Thanks for your question.

@SecondHupuJR
Copy link
Author

Oh, right!

Thank you for the reply! This is really a nice work. Besides, I'm wondering if the experiments are conduct on synthetic data with ground truth in supervised fasion, and then fine-tune on the real event data with unsupervised fashion, will the result be better?

@XiangZ-0
Copy link
Owner

XiangZ-0 commented Sep 1, 2022

Probably yes. We used to train EVDI with ground truth sharp images on the GoPro dataset a long time ago, and I remembered that the supervised EVDI model surpasses its self-supervised counterpart by around 1-2 dB in PSNR due to the strong supervision signals from gt images. You are also welcome to validate this by yourself :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants