Unable to reproduce published results #13

sheikhomar · 2020-02-23T16:53:23Z

Hi,

Thank you for a great paper and for publishing the code for d-SNE.

I am trying to reproduce the MNIST-USPS results in Table 1 in your paper but so far I have been unable to do so.

To reproduce the results I have done following this:

Forked your Git repository.
Wrote a script to download MNIST and USPS datasets.
Attempted to run ./scripts/digits-mt-us-su-s.sh
Fixed several runtime errors and ran ./scripts/digits-mt-us-su-s.sh again. Please see the PR for the changes I made.
Wrote a script to parse the logs and save the results to CSV files.

In my analysis, I get the following Eval-Acc-Tgt accuracies when performing domain adaptation from MNIST to USPS:

	0	1	3	5	7
d-SNE	N/A	41.1	38.5	40.1	43.7

What did I do wrong? Can you please provide some guidance?

The text was updated successfully, but these errors were encountered:

KevinDocel · 2020-03-12T09:03:18Z

I've been trying to reproduce the results for a long time, however, I still cannot get it, either.

I hope the authors could provide the exact model architecture and hyper-parameters.

And also, I found that the model in the MNIST(2000)->USPS expriment is not as same as the CCSA keras implementation.

sheikhomar · 2020-03-12T09:11:27Z

@KevinDocel, you are right about the model not being the same as the one in CCSA. To be fair, I have implemented it in d-SNE's codebase. Please see commit f98083.

KevinDocel · 2020-03-12T09:35:09Z

@sheikhomar, the differences not only lie in the feature dimension as you did, but also the first 2 conv layers, including the kernel size and channels.

BTW, the input size to CCSA keras model is 16x16, while it seems that 32x32 is used in the authors' implementation. But, simply changing the input size from 32x32 to 16x16 in CCSA keras model can not reproduce the results, since this will lead to heavily overfitting for CCSA keras model as I've tried.

ShownX · 2020-03-14T23:17:02Z

Hi,
I have to apologize the code is not in the good format.
BTW, I quickly checked the numbers you got and thought they are too low. I spent some personal time doing a quick format of the original codebase here but lack of some details (I would do further code adapting to fully reproduce the results when I have time, right now, mainly focusing on the other projects). Anyway, it can provide you a better starter. Have fun!

joshuacwnewton · 2020-04-01T17:52:53Z

Hi @ShownX. Thank you for taking the time to develop a new starter codebase. It contains a functional MNIST -> MNIST-M test (using the MT-MM.yaml config) which I am happy about. :)

Note: Should this new MNIST-MNISTM discussion go in a separate issue? Happy to move if it would be helpful, as this is distinct from the conversation above.

I modified the MAX_EPOCH to 5 for curiosity's sake and ran a test. I have attached the output in a file -- if you have a moment, could you verify that these values are as expected?

MT-MM_MAX-EPOCH-5_output.txt

Additionally, looking at the table in the arXiv publication for MT-MM, for d-SNE with 10 target images/class, it appears I should get ~88% accuracy.

However, in the attached log, I noticed that the accuracy seems to stall around 11% at the beginning, then abruptly starts to increase for a number of iterations, then overfits. (See: Test accuracy: 81%, Target/Source accuracy: 100%/99%)

This loss/accuracy evolution seems like it might not be ideal. I am wondering about possible concerns about initialization for the network (could be the cause of stalling at 11% at beginning.) I am also wondering about possible concerns about the chosen hyperparameters (could be the cause of sharp increase then overfitting.)

Do you have any information about the configuration used to produce the 88% reported accuracy? Also, for the MT-MM.yaml test, why is "LenetPlus" chosen as the default architecture? I can't seem to find it mentioned anywhere else, but VGG-16 and ResNet-101 are mentioned throughout the publication, so I am a bit confused there.

Thank you kindly for your time and effort. :)

lambdaofgod · 2020-04-17T14:56:38Z

@sheikhomar How did you get MNIST-M and other digits datasets? In your loader I only see code for MNIST and USPS

joshuacwnewton · 2020-04-17T15:16:42Z

@lambdaofgod

The original source of the MNIST-M dataset can be found on Yaroslav Ganin's page under the "Unsupervised Domain Adaptation by Backpropagation" subheading.

Be sure to use the "unpacked version of MNIST-M" link, as the packed version of the dataset contains LMDB files that need to be read using the lmdb library, which d-SNE does not use.

The SVHN dataset can be found here, as well.

sheikhomar mentioned this issue Feb 23, 2020

Attempt to reproduce MNIST-USPS results. #12

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce published results #13

Unable to reproduce published results #13

sheikhomar commented Feb 23, 2020

KevinDocel commented Mar 12, 2020

sheikhomar commented Mar 12, 2020

KevinDocel commented Mar 12, 2020

ShownX commented Mar 14, 2020 •

edited

Loading

joshuacwnewton commented Apr 1, 2020 •

edited

Loading

lambdaofgod commented Apr 17, 2020

joshuacwnewton commented Apr 17, 2020 •

edited

Loading

Unable to reproduce published results #13

Unable to reproduce published results #13

Comments

sheikhomar commented Feb 23, 2020

KevinDocel commented Mar 12, 2020

sheikhomar commented Mar 12, 2020

KevinDocel commented Mar 12, 2020

ShownX commented Mar 14, 2020 • edited Loading

joshuacwnewton commented Apr 1, 2020 • edited Loading

lambdaofgod commented Apr 17, 2020

joshuacwnewton commented Apr 17, 2020 • edited Loading

ShownX commented Mar 14, 2020 •

edited

Loading

joshuacwnewton commented Apr 1, 2020 •

edited

Loading

joshuacwnewton commented Apr 17, 2020 •

edited

Loading