confusoin about renset50 encoder for global descriptor es #31

hazard-10 · 2024-06-01T22:13:42Z

appendix said the encoder for es used resnet50 with custom resblock at 11c , which is the resbock contains the spade norm. I saw you implementatied SPADE with avatar embeding. But in Eapp it seems you just used a vanilla resnet50 with no custom resblocks ?

I am also a bit confused if the appendix is refering to custom block at 10c or 11c. Since 11c is used for avatar specifc distillation in student model, but es is obvisouly a general latent embedding that can accept arbitrary input. Unless there is additional mapping somewhere else, it is not possible to feed out of distribution identity into Eapp if that resnet50 block also used SPADE .
Plus author mentioned "where 𝑛 denotes the dimension
of a convolutional layer (either 2D or 3D) and x denotes the number
of output channels", and only figure10c fits that description.

johndpope · 2024-06-01T23:32:15Z

I double check later - I thought I could avoid Spade as it was related to super res training stage. There is slightly more than vanilla - I had to rip apart some things - and use custom code to restore the weights.

The generator / discriminator does actually train and spit out images. I need to get the warping / cropping in order / as well as more videos and dynamic driving video. It’s working on 512x512 for time being.

#27

johndpope · 2024-06-04T12:38:44Z

my latest merge of PR seems like it's training.
The MetaPortrait codebase has super res + SPADE module.

johndpope · 2024-06-05T11:07:19Z

FYI - #36

May not need es

hazard-10 · 2024-06-05T22:11:08Z

FYI - #36

May not need es

Yeah i saw that note. Still VASA-1's encoder generate an "identity code" along side vapp / z_dyn. I am guessing we still need some vector to represent identity if true disentangled is the aim, but the identity code could directly be extracted from a few more conv layers that ortho project volume into 2d after Eapp's 3d resblocks, instead of a separate resnet50.

johndpope · 2024-06-11T02:49:58Z

when i worked on emote paper - they use something similiar but maybe better

https://github.com/johndpope/Emote-hack/blob/7ee104354d52a5461504c27b9f38d269eac86893/Net.py#L56

i could never get the motion frames to concatenate in channel dimension..

johndpope closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

confusoin about renset50 encoder for global descriptor es #31

confusoin about renset50 encoder for global descriptor es #31

hazard-10 commented Jun 1, 2024 •

edited

Loading

johndpope commented Jun 1, 2024 •

edited

Loading

johndpope commented Jun 4, 2024

johndpope commented Jun 5, 2024

hazard-10 commented Jun 5, 2024

johndpope commented Jun 11, 2024

confusoin about renset50 encoder for global descriptor es #31

confusoin about renset50 encoder for global descriptor es #31

Comments

hazard-10 commented Jun 1, 2024 • edited Loading

johndpope commented Jun 1, 2024 • edited Loading

johndpope commented Jun 4, 2024

johndpope commented Jun 5, 2024

hazard-10 commented Jun 5, 2024

johndpope commented Jun 11, 2024

hazard-10 commented Jun 1, 2024 •

edited

Loading

johndpope commented Jun 1, 2024 •

edited

Loading