don't load state_dict twice when using low_cpu_mem_usage in from_pretrained #16602

patil-suraj · 2022-04-05T10:36:58Z

What does this PR do?

In from_pretrained the state_dict is loaded twice when low_cpu_mem_usage is True, which is not required since the state_dict is already loaded before as we can here.

transformers/src/transformers/modeling_utils.py

Lines 1795 to 1797 in 21decb7

    
           if not is_sharded: 
        
               # Time to load the checkpoint 
        
               state_dict = load_state_dict(resolved_archive_file)

So, it's fine to remove it there because when is_sharded=False, the state_dict is loaded at line 1797

HuggingFaceDocBuilderDev · 2022-04-05T10:49:52Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

To complete the description, it's fine to remove it there because when is_sharded=False, the state_dict is loaded at line 1797.

load state_dict once

bcb927e

patil-suraj requested review from sgugger, LysandreJik and patrickvonplaten April 5, 2022 11:36

sgugger approved these changes Apr 5, 2022

View reviewed changes

LysandreJik approved these changes Apr 5, 2022

View reviewed changes

patil-suraj merged commit 47c5c05 into huggingface:main Apr 6, 2022

patil-suraj deleted the load-sd-once branch April 6, 2022 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't load state_dict twice when using low_cpu_mem_usage in from_pretrained #16602

don't load state_dict twice when using low_cpu_mem_usage in from_pretrained #16602

patil-suraj commented Apr 5, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 5, 2022 •

edited

Loading

sgugger left a comment

	if not is_sharded:
	# Time to load the checkpoint
	state_dict = load_state_dict(resolved_archive_file)

don't load state_dict twice when using low_cpu_mem_usage in from_pretrained #16602

don't load state_dict twice when using low_cpu_mem_usage in from_pretrained #16602

Conversation

patil-suraj commented Apr 5, 2022 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 5, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

patil-suraj commented Apr 5, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 5, 2022 •

edited

Loading