Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accidentially used Encoder instead of Decoder? #10

Closed
DerEchteFeuerpfeil opened this issue Feb 12, 2024 · 2 comments
Closed

Accidentially used Encoder instead of Decoder? #10

DerEchteFeuerpfeil opened this issue Feb 12, 2024 · 2 comments

Comments

@DerEchteFeuerpfeil
Copy link

Hey there, great project!

I have just one thing that I am not sure if I misunderstood or you did. You wanted to reproduce the architecture of IconShop, but to my understanding, IconShop is a decoder-only transformer (like a language model).
Excerpt from their paper:
image

I first thought you had a minor mistake in your blog post, but also there you wrote encoder-only:
image

An encoder-only autoregressive transformer is self-contradictory, isn't it?

Also here, you took the xFormerEncoderConfig instead of the (imo) correct xFormerDecoderConfig.

fontogen/model/model.py

Lines 201 to 227 in de1e50b

x_config = xFormerEncoderConfig(
num_layers=num_layers,
dim_model=d_model,
feedforward_config={
"name": "MLP",
"dropout": 0.1,
"activation": "gelu",
"hidden_layer_multiplier": 4,
},
multi_head_config={
"num_heads": nhead,
"attention": BlockSparseAttention(
num_heads=nhead,
layout=BigBirdSparsityConfig(
num_heads=nhead,
block_size=block_size,
num_global_blocks=1,
num_random_blocks=2,
num_sliding_window_blocks=int(3 * max_glyph_tokens / block_size)
).make_layout(seq_len=self.seq_len),
block_size=block_size,
dropout=0.1,
causal=True,
)
},
)
self.transformer = xFormer(x_config)

I haven't thought this through what the implications are. You correctly used causal attention, so there should not be issues on that front. There might not even be an issue at all, as decoders (without cross-attention) are kinda just encoders with causal attention masking, but I don't know the xformers library and its inner workings. Just wanted to set the mental model straight.

Thanks for releasing your code and project!

@SerCeMan
Copy link
Owner

SerCeMan commented Feb 12, 2024

Hey, @DerEchteFeuerpfeil! The article should indeed be saying decoder-only. Thank you for this, I'll fix it up!

Implementation-wise, if I understand correctly, there is no other way to implement a decoder-only transformer in xFormers, at least until this lands facebookresearch/xformers#617.

@DerEchteFeuerpfeil
Copy link
Author

Alright, makes sense! Does not speak for the library that a decoder-only architecture issue is open for >1y 😅

When re-implementing parts of this for my project I used x_transformers by lucidrains instead.

I'll close this issue for now, as it has served its purpose. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants