reproducing fine-tuned tokens. (textual inversion) #24

bansh123 · 2024-03-17T12:22:50Z

I have attempted to reproduce the results of the few-shot classification task on the PASCAL VOC dataset.
I managed to achieve comparable outcomes when utilizing the fine-tuned tokens you previously shared via the Google Drive link.
However, I was unsuccessful in reproducing the fine-tuned tokens.
When employing fine_tune.py and aggregate_embeddings.py with the provided scripts, I obtained inferior tokens, resulting in significantly lower accuracy (approximately a 10% gap in 1-shot).

Am I overlooking something?

brandontrabucco · 2024-03-19T21:00:43Z

Hello bansh123,

Thanks for bringing this to my attention, I'll revisit this experiment to determine why you are seeing this behavior.

In the meantime, for the latest paper update on OpenReview, we used fine_tune_upstream.py, using the class name of the object as the initialization, and training 4 vectors per concept in the dataset with the --num_vectors argument.

We trained these tokens using a batch size of 8, stable diffusion 1.4, and 2000 gradient descent steps, a learning rate of 5.0e-04, scaled by the effective batch size via the --scale_lr argument to the script.

Which figure are you working to reproduce?

-Brandon

bansh123 · 2024-03-20T02:00:02Z

Thank you for your response.
I am currently working on reproducing Figure 5, and I plan to tackle Figure 8 subsequently.
I will conduct the experiments again, adhering to your configuration. Thank you!

JiaojiaoYe1994 · 2024-03-20T14:57:25Z

Hello bansh123,

Thanks for bringing this to my attention, I'll revisit this experiment to determine why you are seeing this behavior.

In the meantime, for the latest paper update on OpenReview, we used fine_tune_upstream.py, using the class name of the object as the initialization, and training 4 vectors per concept in the dataset with the --num_vectors argument.

We trained these tokens using a batch size of 8, stable diffusion 1.4, and 2000 gradient descent steps, a learning rate of 5.0e-04, scaled by the effective batch size via the --scale_lr argument to the script.

Which figure are you working to reproduce?

-Brandon

Hi, I am trying to reproduce figure 5, could you provide some hints (scripts) for visualization, as there are multiple methods to be evaluated together?

jsw6872 · 2024-03-21T08:44:36Z

I have attempted to reproduce the results of the few-shot classification task on the PASCAL VOC dataset. I managed to achieve comparable outcomes when utilizing the fine-tuned tokens you previously shared via the Google Drive link. However, I was unsuccessful in reproducing the fine-tuned tokens. When employing fine_tune.py and aggregate_embeddings.py with the provided scripts, I obtained inferior tokens, resulting in significantly lower accuracy (approximately a 10% gap in 1-shot).

Am I overlooking something?

I've tried using the pascal token provided by the author, but I can't reproduce the performance.
If you don't mind me asking, could you share the seed or hyperparameters you used?

Thank you.

zhixiongzh · 2024-04-08T18:37:01Z

Hi @brandontrabucco,

is it possible to share in google drive the new embeddings, which you get by initializing with the original class name and 4 num_vectors? Recomputing the embedding needs a lot of compute resources for more classes.

Thanks in advance for considering this request.

zhixiongzh · 2024-04-08T20:15:28Z

@brandontrabucco
another question, how to ensure no ValueError when using the class name of the object as the initialization, since you have following check in the fine_tune_upstream.py

    if len(token_ids) > 1:
        raise ValueError("The initializer token must be a single token.")

some original class names which have multiple words would have more tokens than 1, e.g. pink primrose in Flowers102 dataset.

tanriverdiege · 2024-07-10T20:23:01Z

Thank you for your response. I am currently working on reproducing Figure 5, and I plan to tackle Figure 8 subsequently. I will conduct the experiments again, adhering to your configuration. Thank you!

@bansh123 Did you reproduce the results, I am having trouble getting the results presented in the paper (Figure 5 to be specific). Do you have any tips (scripts) that you can share to help me ? Thank you.

tanriverdiege · 2024-07-10T22:18:52Z

I have attempted to reproduce the results of the few-shot classification task on the PASCAL VOC dataset. I managed to achieve comparable outcomes when utilizing the fine-tuned tokens you previously shared via the Google Drive link. However, I was unsuccessful in reproducing the fine-tuned tokens. When employing fine_tune.py and aggregate_embeddings.py with the provided scripts, I obtained inferior tokens, resulting in significantly lower accuracy (approximately a 10% gap in 1-shot).
Am I overlooking something?

I've tried using the pascal token provided by the author, but I can't reproduce the performance. If you don't mind me asking, could you share the seed or hyperparameters you used?

Thank you.

@jsw6872 Did you manage to reproduce the results ?

tanriverdiege · 2024-07-20T02:47:10Z

@brandontrabucco another question, how to ensure no ValueError when using the class name of the object as the initialization, since you have following check in the fine_tune_upstream.py
    if len(token_ids) > 1:
        raise ValueError("The initializer token must be a single token.")
some original class names which have multiple words would have more tokens than 1, e.g. pink primrose in Flowers102 dataset.

How did you resolve the issue ? I would really appreciate your help. Thanks !

brandontrabucco self-assigned this Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproducing fine-tuned tokens. (textual inversion) #24

reproducing fine-tuned tokens. (textual inversion) #24

bansh123 commented Mar 17, 2024

brandontrabucco commented Mar 19, 2024

bansh123 commented Mar 20, 2024

JiaojiaoYe1994 commented Mar 20, 2024

jsw6872 commented Mar 21, 2024

zhixiongzh commented Apr 8, 2024

zhixiongzh commented Apr 8, 2024

tanriverdiege commented Jul 10, 2024

tanriverdiege commented Jul 10, 2024

tanriverdiege commented Jul 20, 2024

reproducing fine-tuned tokens. (textual inversion) #24

reproducing fine-tuned tokens. (textual inversion) #24

Comments

bansh123 commented Mar 17, 2024

brandontrabucco commented Mar 19, 2024

bansh123 commented Mar 20, 2024

JiaojiaoYe1994 commented Mar 20, 2024

jsw6872 commented Mar 21, 2024

zhixiongzh commented Apr 8, 2024

zhixiongzh commented Apr 8, 2024

tanriverdiege commented Jul 10, 2024

tanriverdiege commented Jul 10, 2024

tanriverdiege commented Jul 20, 2024