-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Calculated padded input size per channel: (2 x 2). Kernel size: (3 x 3). Kernel size can't be greater than actual input size #27
Comments
it maybe your input size is not 192x192 |
Should I have to resize the input video to 192 *192 before executing inference.py? Is this thing is not handled in inference.py. If we have to resize the input video then how to resize the video? I have found one FFmpeg command
|
no, you should setup it in hparams.py |
I have changed the img_size from 288 to 192 in hparams.py. But still the same error
|
Yeah, I am also facing similar issue even img_size changed to 192 in hparams.py Any solution @primepake |
it can be your hyper-paramater setup |
`# Default hyperparameters
) this is my hparams.py , just changed img_size to 192 from your code |
Have you modified model hidden layers? My repo is just for 288x288 input size |
So if I convert my input video to 288*288, it will work , u saying ? let me try that. Thanks |
@Unmesh28 is there a good way to reach out to you besides here? I was planning on trying to make an AVSpeech checkpoint so we might as well pool our efforts and resources together |
Hi @primepake I have not changed any layer. I have used training code as it is. |
I also used 288*288 training code with all the preprocessing you mentioned here : #21 (comment) getting same error :
Tried with original img_size = 288 and also with 192 in hparams.py, getting same error for both. @primepake please let me know if u know the reason or if I am doing anything wrong here |
@primepake Any suggestion on this ? |
@primepake The issue is not resolved yet |
So far I've recreated the issue with his sample checkpoint. Inside the forward function a kernel with 3x3 shape is trying to run over a conv2d (128, 512, 2, 2) for its shape, resulting in an error. This is occuring at line the self.conv_block(x) in the Conv2d module during forward There actually isn't any spot I've directly observed an img_size/resolution being used in the project that isnt using 96 pixels instead of hparams specifed value (args.img_size is hardcoded to 96). Few things im trying to figure out: What is the shape of a pretrained checkpoint supposed to be? From what I read x seems to be part of the face embedding (pulling out of self.face_encoder_blocks). Is the hardcoded 96 pixels and lack of direct references to hparam img size leading to mismatched configs? |
replace args.img_size=96 to 288 |
I have trained wloss_hq_wav2lip_train.py and used checkpoint checkpoint_step000003000.pth for inference
The text was updated successfully, but these errors were encountered: