-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
confusoin about renset50 encoder for global descriptor es #31
Comments
I double check later - I thought I could avoid Spade as it was related to super res training stage. There is slightly more than vanilla - I had to rip apart some things - and use custom code to restore the weights. The generator / discriminator does actually train and spit out images. I need to get the warping / cropping in order / as well as more videos and dynamic driving video. It’s working on 512x512 for time being. |
my latest merge of PR seems like it's training. |
FYI - #36 May not need es |
Yeah i saw that note. Still VASA-1's encoder generate an "identity code" along side vapp / z_dyn. I am guessing we still need some vector to represent identity if true disentangled is the aim, but the identity code could directly be extracted from a few more conv layers that ortho project volume into 2d after Eapp's 3d resblocks, instead of a separate resnet50. |
when i worked on emote paper - they use something similiar but maybe better i could never get the motion frames to concatenate in channel dimension.. |
appendix said the encoder for es used resnet50 with custom resblock at 11c , which is the resbock contains the spade norm. I saw you implementatied SPADE with avatar embeding. But in Eapp it seems you just used a vanilla resnet50 with no custom resblocks ?
I am also a bit confused if the appendix is refering to custom block at 10c or 11c. Since 11c is used for avatar specifc distillation in student model, but es is obvisouly a general latent embedding that can accept arbitrary input. Unless there is additional mapping somewhere else, it is not possible to feed out of distribution identity into Eapp if that resnet50 block also used SPADE .
Plus author mentioned "where 𝑛 denotes the dimension
of a convolutional layer (either 2D or 3D) and x denotes the number
of output channels", and only figure10c fits that description.
The text was updated successfully, but these errors were encountered: