-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Downsample inputs for faster analysis #26
Comments
What type of downsampling you talking about? If it is about raw video and spatial or temporal downsampling - then it might be better to use OpenCV for that... |
I'm suggesting spatial downsampling. Yes, it would definitely work with OpenCV, or ffmpeg. A maxpooling layer in the network itself may be faster (is this accurate?) and more convenient, but definitely not the only way to make it happen. Thanks! |
This is definitely possible but we would want to avoid adding too much complexity to the code. The easiest approach is probably to add an option for the A lot of the overhead of the processing time during inference is actually transferring the images into GPU memory, so I'm not sure how much faster this would be compared to preprocessing the frames with opencv. However, even if this isn't faster it would make using the code much simpler as everything is self-contained within the model. That being said, This would also be useful for adjusting image resolution to a power of 2 (for downsampling and upsampling within the model), and could allow for variably sized images. I originally thought zero padding was the best way, but this seems like the better option. |
Thanks Jake. So you prefer incorporating the downsampling in the model? If transferring to GPU is major bottleneck, would downsampling (with opencv) in the generator before transferring to GPU increase the speed? One point on MaxPooling2D vs. more clever layers: those tracking mouse whiskers (or anything approaching 1 pixel thickness) might prefer max pooling, as it is more likely to preserve very thin features. Probably not super important, but perhaps worth considering. Would the pooling layer automatically result in power-of-2 dimensions? I implemented zero padding in my branch. It would be nice to get rid of this, as it slows things down a bit. Thanks again! |
(Below is a shameless self-PR) |
I ran some tests and it looks like this is probably not worth implementing. The opencv resize function appears to be significantly faster on all counts. There's just a ton of overhead to move the images into GPU memory. Zero padding is cheap, so it's probably best to implement padding as the solution for odd sized images. import cv2
cv2.setNumThreads(1) # test without parallelism
import tensorflow as tf
import numpy as np
tfl = tf.keras.layers ORIGINAL = (1024, 1024)
RESIZED = (512, 512) class CVResize:
def __init__(self):
inputs = tfl.Input((None, None, 3), dtype=tf.uint8)
outputs = inputs[:, :32, :3, 0] # simulate keypoint outputs
self.tf_model = tf.keras.Model(inputs, outputs)
def __call__(self, images, size=RESIZED, batch_size=1):
images = np.stack([cv2.resize(image, size, interpolation=cv2.INTER_NEAREST) for image in images])
return self.tf_model.predict(images, batch_size=batch_size) cv_resize = CVResize() inputs = tfl.Input((None, None, 3), dtype=tf.uint8)
resized = tf.image.resize(inputs, RESIZED, method='nearest')
outputs = resized[:, :32, :3, 0] # simulate keypoint outputs
tf_resize = tf.keras.Model(inputs, outputs) inputs = tfl.Input((None, None, 3), dtype=tf.uint8)
resized = tfl.MaxPooling2D(ORIGINAL[0] // RESIZED[0])(inputs)
outputs = resized[:, :32, :3, 0] # simulate keypoint outputs
tf_maxpool = tf.keras.Model(inputs, outputs) images = np.random.randint(0, 255, (256, ORIGINAL[0], ORIGINAL[1], 3), dtype=np.uint8) %timeit cv_resize(images, batch_size=1)
%timeit cv_resize(images, batch_size=128)
%timeit tf_resize.predict(images, batch_size=1)
%timeit tf_resize.predict(images, batch_size=128)
%timeit tf_maxpool.predict(images, batch_size=1)
%timeit tf_maxpool.predict(images, batch_size=128)
|
This is great. Thanks for running these tests. Do you have any plans on implementing an opencv resizing option, e.g. in the DataGenerator, along with automatic rescaling of the network outputs? If not I'll hack something together on my end. Relatedly, I'm finding that deepposekit underperforms deeplabcut when there are long range spatial contingencies. See the image here, where the left and right paw in the top view get swapped. The bottom view is useful here for resolving ambiguities in the top view; I think the deeper networks may have an easier time with these long range contingencies due to greater receptive field size at the outputs. I'm thinking spatial downsampling of the inputs may actually increase accuracy for deepposekit by effectively increasing receptive field size... Lmk if there are any other parameters I can play with that may help deepposekit perform better under conditions like these. Thanks again! |
Shouldn't be too difficult to add, but it's not high priority at the moment. I'll need to think about how best to accomplish this. If you want to submit a PR I'm happy to work on it with you. Do you mean performance between networks within DPK or between the two software packages? Swapping issues might be due to erroneous or overly-aggressive augmentation, especially if the |
Thanks! I'll open a new issue and let you know if I end up implementing the resizing. |
I may try to implement a re-scaling option. If you have time (this isn't super high priority for me either), can you let me know if the following strategy seems alright?
|
Using opencv to resize images doesn't require any interaction with the |
Hi,
Many people collect videos at much higher spatial resolution than is necessary to perform accurate tracking (myself included). It would be great to have optional MaxPooling2D layer(s) at the input of DPK, which would downsample the input and cause the inference to be (way) faster. The output coordinates would need to be scaled up, etc. I think many would really benefit from the increased speed. What do you think?
Thanks,
Rick
The text was updated successfully, but these errors were encountered: