[Bug] `tokenizer.model_max_length` is different when loading model from shortcut or local path

## To reproduce

When loading model from local path, the `tokenizer.model_max_length` is `VERY_LARGE_INTEGER`:
```py
from transformers import GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
print(tokenizer.model_max_length)
# 1024

tokenizer = GPT2Tokenizer.from_pretrained("path/to/local/gpt2")
print(tokenizer.model_max_length)
# 1000000000000000019884624838656
```

## Related code

https://github.com/huggingface/transformers/blob/25156eb296ae88c7b810235a368c953b7a4b9af9/src/transformers/tokenization_utils_base.py#L1858-L1864
https://github.com/huggingface/transformers/blob/ebbe8cc3fe7a2553e924353ab454bd026fd23135/src/transformers/models/gpt2/tokenization_gpt2.py#L153
https://github.com/huggingface/transformers/blob/ebbe8cc3fe7a2553e924353ab454bd026fd23135/src/transformers/models/gpt2/tokenization_gpt2.py#L56-L62

## Expected behavior

Assign correct `model_max_length` when loading model from local path:
```py
from transformers import GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("path/to/local/gpt2")
print(tokenizer.model_max_length)
# 1024
```


	# Set max length if needed
	if pretrained_model_name_or_path in cls.max_model_input_sizes:
	# if we're using a pretrained model, ensure the tokenizer
	# wont index sequences longer than the number of positional embeddings
	model_max_length = cls.max_model_input_sizes[pretrained_model_name_or_path]
	if model_max_length is not None and isinstance(model_max_length, (int, float)):
	init_kwargs["model_max_length"] = min(init_kwargs.get("model_max_length", int(1e30)), model_max_length)

	PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
	"gpt2": 1024,
	"gpt2-medium": 1024,
	"gpt2-large": 1024,
	"gpt2-xl": 1024,
	"distilgpt2": 1024,
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] `tokenizer.model_max_length` is different when loading model from shortcut or local path #14561

To reproduce

Related code

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] tokenizer.model_max_length is different when loading model from shortcut or local path #14561

Description

To reproduce

Related code

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] `tokenizer.model_max_length` is different when loading model from shortcut or local path #14561