Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on the provided training log file #11

Open
gu6225ha-s opened this issue Feb 23, 2024 · 4 comments
Open

Clarification on the provided training log file #11

gu6225ha-s opened this issue Feb 23, 2024 · 4 comments

Comments

@gu6225ha-s
Copy link

Hi @MCC-WH. First, thanks for making your training code publicly available. I'm trying to reproduce your training results and have some questions about the provided log file. The README states that it's from a 200 epoch schedule but it only seems to contain 100 epochs?

Also I'm wondering if the log is from the rSfM120k or AugrSfM120k experiment? With a batch size of 256 there should be 91642 / 256 = 358 batches for rSfM120k and 274926 / 256 = 1074 for AugrSfM120k. However the log file indicates that training was run on 765 batches. Perhaps another batch size was used?

@MCC-WH
Copy link
Owner

MCC-WH commented Feb 23, 2024

In fact, we use gradient accumulation during the training process, so the equivalent batch size would be larger than the one inside the script. According to our experimental experience, the performance is stable when equivalent batch size is large than 500. You can try to adjust the step of gradient accumulation to get different equivalent batch sizes.

@gu6225ha-s
Copy link
Author

gu6225ha-s commented Feb 23, 2024

Okay so you're saying I could increase the --update_every parameter, which would result in a larger effective batch size? I'll try that if I don't get good results when running experiment_rSfm120k.sh as it is.

@gu6225ha-s
Copy link
Author

Unfortunately the performance was not as expected when training a model with the experiment_rSfm120k.sh script. Here is the output from test.sh:

>> Test Dataset: roxford5k *** fist-stage >>
>> gl18-tl-resnet101-gem-w: mAP Eeay: 84.42, Medium: 67.31, Hard: 44.26
>> gl18-tl-resnet101-gem-w: mP@k[1, 5, 10] Easy: [97.06 91.76 87.04], Medium: [95.71 90.29 84.57], Hard: [87.14 70.29 59.57]

>> Test Dataset: roxford5k *** rerank-top1024 >>
>> gl18-tl-resnet101-gem-w: mAP Eeay: 84.36, Medium: 67.12, Hard: 41.85
>> gl18-tl-resnet101-gem-w: mP@k[1, 5, 10] Easy: [91.18 89.93 86.63], Medium: [90.   85.05 80.6 ], Hard: [77.14 69.5  57.68]

>> Test Dataset: rparis6k *** fist-stage >>
>> gl18-tl-resnet101-gem-w: mAP Eeay: 92.83, Medium: 80.5, Hard: 61.36
>> gl18-tl-resnet101-gem-w: mP@k[1, 5, 10] Easy: [98.57 96.   95.29], Medium: [100.    98.    96.86], Hard: [97.14 93.43 90.43]

>> Test Dataset: rparis6k *** rerank-top1024 >>
>> gl18-tl-resnet101-gem-w: mAP Eeay: 94.1, Medium: 84.25, Hard: 67.98
>> gl18-tl-resnet101-gem-w: mP@k[1, 5, 10] Easy: [95.71 96.29 95.43], Medium: [95.71 98.   97.  ], Hard: [94.29 94.29 90.71]

I trained on a single GPU though, should I be adjusting the batch size or learning rate in any way to compensate?

@gu6225ha-s
Copy link
Author

Sorry to bother you again @MCC-WH but I also have a question about Table 5 in your paper. What values of K and L did you use to compute the mAP for Affinity Feature (second row)? Also do you L2 normalize the affinity features before re-ranking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants