Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModelHubMixin config support throws error #2379

Closed
joelburget opened this issue Jul 7, 2024 · 4 comments
Closed

ModelHubMixin config support throws error #2379

joelburget opened this issue Jul 7, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@joelburget
Copy link

joelburget commented Jul 7, 2024

Describe the bug

I created a notebook which tries to use PyTorchModelHubMixin in a way very similar to that described in the docs and #2001. As you can see, when I try to instantiate it with MyModel.from_pretrained I get AttributeError: 'dict' object has no attribute 'hidden_size'. AutoModel.from_pretrained fails with AttributeError: 'NoneType' object has no attribute 'get'. It's not clear with either error what the root cause is.

Reproduction

https://gist.github.com/joelburget/623a13c71129044c661009a56b2cf46d is self-contained

Logs

No response

System info

- huggingface_hub version: 0.22.2
- Platform: macOS-14.5-x86_64-i386-64bit
- Python version: 3.11.9
- Running in iPython ?: Yes
- iPython shell: ZMQInteractiveShell
- Running in notebook ?: Yes
- Running in Google Colab ?: No
- Token path ?: /Users/joel/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: joelb
- Configured git credential helpers: osxkeychain, store
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.1.2
- Jinja2: 3.1.3
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.3.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: N/A
- aiohttp: 3.9.3
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /Users/joel/.cache/huggingface/hub
- HF_ASSETS_CACHE: /Users/joel/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/joel/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@joelburget joelburget added the bug Something isn't working label Jul 7, 2024
@Wauplin
Copy link
Contributor

Wauplin commented Jul 8, 2024

Hi @joelburget, the problem in your example is that you are serializing the config object into a dictionary (config=config.to_dict()). Therefore, when you reload it you are getting a dictionary. The config dictionary is forwarded by your class to GPTNeoBlock which expects a transformers.configuration_utils.PretrainedConfig class. This is why you get AttributeError: 'dict' object has no attribute 'hidden_size'. Your notebook is not very similar to the docs you've linked from what I can see. In the docs, we showcase that having proper parameters with type annotations like hidden_size: int (instead of config) works great. You can have a look to this guide for more details.

In general, I'm not sure to understand what you are trying to achieve. PyTorchModelHubMixin is a class to facilitate exporting/importing torch models. It has no link to transformers. If you want to customize/adapt a transformers model, it's better to check there how to do it :)

@joelburget
Copy link
Author

Hi @Wauplin, thanks for looking into this.

the problem in your example is that you are serializing the config object into a dictionary

I first tried model.push_to_hub("joelb/my-awesome-model", config=config), but this fails with TypeError: Object of type GPTNeoConfig is not JSON serializable.

In the docs, we showcase that having proper parameters with type annotations like hidden_size: int (instead of config) works great... If you want to customize/adapt a transformers model, it's better to check there how to do it

You're right, I was basing this off the fact that all transformers models take a config rather than ints only (as that's all the linked docs show). I can check over at the transformers repo.

@joelburget
Copy link
Author

joelburget commented Jul 9, 2024

For anyone else trying to do something similar:

hf_model = AutoModel.from_pretrained("EleutherAI/gpt-neo-125M")

class MyModel(GPTNeoModel):
    def __init__(self, config):
        super().__init__(config)
        self.h = nn.ModuleList([GPTNeoBlock(config, 0)])

config = AutoConfig.from_pretrained("EleutherAI/gpt-neo-125M")
config.num_layers = 1
config.attention_layers = config.attention_layers[:1]
config.attention_types = [[['global'], 1]]

model = MyModel(config)
model.push_to_hub("joelb/my-awesome-model", config=config)

@Wauplin
Copy link
Contributor

Wauplin commented Jul 9, 2024

I first tried model.push_to_hub("joelb/my-awesome-model", config=config), but this fails with TypeError: Object of type GPTNeoConfig is not JSON serializable.

Glad you've found a workaround for your use case @joelburget :) Just for your info, this issue tells you that the mixin don't know how to serialize your GPTNeoConfig object as a JSON. What you can do is to provide an encoder and a decoder methods when defining your class as explained in this section of the guide. However your solution that do not involve PyTorchHubMixin is much better as it relies only on the transformers library which is better suited to handle transformers objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants