Questions about the data process #5

SecondHupuJR · 2022-08-12T21:58:17Z

Congrats on the great work! This is a great work. I beg some replies to my confusion. In the code line 77, utils.py, new_t is divided by interval, which is (total_end-total_start)/num_frame, and num_frame here is the num_bins (set to 16) according to the code.

EVDI/codes/util.py

Line 77 in a9a22ce

idx = np.floor(new_t / interval).astype(int)

If what I think is right, in the C channel of event tensor N,2C,H,W, there will be some channel with only zeros in it. Because ts is always smaller than total_end. Is this the situation?

Besides, The shape of the event tensor for the model will be (batchsize49, 162, H, W), which is quite big. Is this the reason that experiments on GoPro are conducted on 160x320? Thank you for the good work!

XiangZ-0 · 2022-08-14T02:27:04Z

Hello SecondHupuJR, thanks for your interest in our work.
You are right about the event tensors, each of which is constructed based on (total_end-total_start)/num_frame and sometimes contains zeros in it. This design ensures the same temporal resolution in different event tensors, which is important since we use weight-sharing LDI networks. Furthermore, it enables arbitrary choice of the target timestamp without changing the input shape of event tensor.
For the GoPro experiments, it is feasible to conduct experiments on the full image size of the original REDS dataset. For example, we can crop the input data to 256x256 or 128x128 for training despite the shape of event tensor being big. And we use 160x320 just for efficient training and testing, as the amount of simulated event data at full image size is quite large.

SecondHupuJR · 2022-08-15T16:06:49Z

Hi XiangZ-0, many thanks for the reply!

In my second question, what I mean is that the event tensor's shape is (batchsize*49, 162, H, W), which is quite large. For the images in GoPro (1280x720), is it possible to inference the full image? in that case the H and W will be 720 and 1280, respectively.

XiangZ-0 · 2022-08-16T07:14:28Z

Actually, event tensors with the shape (batchsizex49, num_binsx2x4, H, W) are only needed for training, and the number 49 means that we recover 49 latent frames simultaneously for computing the blur-sharp loss. During inference, it is ok to process 1280x720 images since each LDI network only takes (batchsize (usually set to 1), num_binsx2, H, W) as input, so that would not be a problem.
Thanks for your question.

SecondHupuJR · 2022-08-16T14:29:03Z

Oh, right!

Thank you for the reply! This is really a nice work. Besides, I'm wondering if the experiments are conduct on synthetic data with ground truth in supervised fasion, and then fine-tune on the real event data with unsupervised fashion, will the result be better?

XiangZ-0 · 2022-09-01T15:49:43Z

Probably yes. We used to train EVDI with ground truth sharp images on the GoPro dataset a long time ago, and I remembered that the supervised EVDI model surpasses its self-supervised counterpart by around 1-2 dB in PSNR due to the strong supervision signals from gt images. You are also welcome to validate this by yourself :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the data process #5

Questions about the data process #5

SecondHupuJR commented Aug 12, 2022

XiangZ-0 commented Aug 14, 2022

SecondHupuJR commented Aug 15, 2022

XiangZ-0 commented Aug 16, 2022 •

edited

Loading

SecondHupuJR commented Aug 16, 2022

XiangZ-0 commented Sep 1, 2022

Questions about the data process #5

Questions about the data process #5

Comments

SecondHupuJR commented Aug 12, 2022

XiangZ-0 commented Aug 14, 2022

SecondHupuJR commented Aug 15, 2022

XiangZ-0 commented Aug 16, 2022 • edited Loading

SecondHupuJR commented Aug 16, 2022

XiangZ-0 commented Sep 1, 2022

XiangZ-0 commented Aug 16, 2022 •

edited

Loading