-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuning and Config used for HPS #17
Comments
Can u check whether the training is actually getting run ? if not why is it skipping the training loop |
It's working but I think it's only using few examples from the prompt file. |
i think random.choice is uniform over all prompts, i'm not sure what's the bug here. If you find it let me know. |
Yeah Sure! |
Also while I was tryin to train with a custom loss function, the models seem to collapse very early unless I adjust the learning rate. is this the expected behaviour ? |
@mihirp1998 Sorry to tag you again, but can you let me know how much time it took per epoch on your 4 A100 GPU's ? Clear Skies! |
On Aesthetics 2-3 minutes per epoch
…On Sat, Nov 9, 2024 at 12:07 AM Karun ***@***.***> wrote:
@mihirp1998 <https://github.com/mihirp1998> Sorry to tag you again, but
can you let me know how much time it took per epoch on your 4 A100 GPU's ?
Clear Skies!
—
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE4C5LG33CWOFGYKS3SESDTZ7WJ7TAVCNFSM6AAAAABROVLMA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWGA2TENJXGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I wanted to confirm are you using all 750 prompts in hps_v2_all.txt file for 1 single epoch ? |
Hi @mihirp1998 , so this is where I am confused: So you used step() function to do 1 training step/ 1 epoch ? def train(self, epochs: Optional[int] = None):
"""
Train the model for a given number of epochs
"""
global_step = 0
if epochs is None:
epochs = self.config.num_epochs
for epoch in range(self.first_epoch, epochs):
global_step = self.step(epoch, global_step) And here in step() function, it only seems to finetine on num_gpus * batch_size * train_gradient_accumulation_steps number of images, am I missing something ? What if someone used just 1 GPU to train ? def step(self, epoch: int, global_step: int):
info = defaultdict(list)
print(f"Epoch: {epoch}, Global Step: {global_step}")
self.sd_pipeline.unet.train()
for _ in range(self.config.train_gradient_accumulation_steps):
with self.accelerator.accumulate(self.sd_pipeline.unet), self.autocast(), torch.enable_grad():
prompt_image_pairs = self._generate_samples(
batch_size=self.config.train_batch_size,
)
|
@mihirp1998
I was tryin to finetune Stable Diffusion 1.5 using your HPS reward function and the hps.sh training script, I used batch size of 1 but still the training seems to get completed very quickly, 50 epochs just took 2-4 minutes.
And here you are trying to just use batch_size number of prompts ? I am using batch_size of 2 on 1 A100 GPU to test the script.
Your help will mean a lot!
The text was updated successfully, but these errors were encountered: