Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusoin about renset50 encoder for global descriptor es #31

Closed
hazard-10 opened this issue Jun 1, 2024 · 5 comments
Closed

confusoin about renset50 encoder for global descriptor es #31

hazard-10 opened this issue Jun 1, 2024 · 5 comments

Comments

@hazard-10
Copy link

hazard-10 commented Jun 1, 2024

appendix said the encoder for es used resnet50 with custom resblock at 11c , which is the resbock contains the spade norm. I saw you implementatied SPADE with avatar embeding. But in Eapp it seems you just used a vanilla resnet50 with no custom resblocks ?

I am also a bit confused if the appendix is refering to custom block at 10c or 11c. Since 11c is used for avatar specifc distillation in student model, but es is obvisouly a general latent embedding that can accept arbitrary input. Unless there is additional mapping somewhere else, it is not possible to feed out of distribution identity into Eapp if that resnet50 block also used SPADE .
Plus author mentioned "where 𝑛 denotes the dimension
of a convolutional layer (either 2D or 3D) and x denotes the number
of output channels", and only figure10c fits that description.

image
@johndpope
Copy link
Owner

johndpope commented Jun 1, 2024

I double check later - I thought I could avoid Spade as it was related to super res training stage. There is slightly more than vanilla - I had to rip apart some things - and use custom code to restore the weights.

The generator / discriminator does actually train and spit out images. I need to get the warping / cropping in order / as well as more videos and dynamic driving video. It’s working on 512x512 for time being.

#27

@johndpope
Copy link
Owner

my latest merge of PR seems like it's training.
The MetaPortrait codebase has super res + SPADE module.

@johndpope
Copy link
Owner

FYI - #36

May not need es

@hazard-10
Copy link
Author

FYI - #36

May not need es

Yeah i saw that note. Still VASA-1's encoder generate an "identity code" along side vapp / z_dyn. I am guessing we still need some vector to represent identity if true disentangled is the aim, but the identity code could directly be extracted from a few more conv layers that ortho project volume into 2d after Eapp's 3d resblocks, instead of a separate resnet50.

@johndpope
Copy link
Owner

when i worked on emote paper - they use something similiar but maybe better

https://github.com/johndpope/Emote-hack/blob/7ee104354d52a5461504c27b9f38d269eac86893/Net.py#L56

i could never get the motion frames to concatenate in channel dimension..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants