Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce published results #13

Open
sheikhomar opened this issue Feb 23, 2020 · 7 comments
Open

Unable to reproduce published results #13

sheikhomar opened this issue Feb 23, 2020 · 7 comments

Comments

@sheikhomar
Copy link

Hi,

Thank you for a great paper and for publishing the code for d-SNE.

I am trying to reproduce the MNIST-USPS results in Table 1 in your paper but so far I have been unable to do so.

Alt Text

To reproduce the results I have done following this:

  • Forked your Git repository.
  • Wrote a script to download MNIST and USPS datasets.
  • Attempted to run ./scripts/digits-mt-us-su-s.sh
  • Fixed several runtime errors and ran ./scripts/digits-mt-us-su-s.sh again. Please see the PR for the changes I made.
  • Wrote a script to parse the logs and save the results to CSV files.

In my analysis, I get the following Eval-Acc-Tgt accuracies when performing domain adaptation from MNIST to USPS:

0 1 3 5 7
d-SNE N/A 41.1 38.5 40.1 43.7

What did I do wrong? Can you please provide some guidance?

@KevinDocel
Copy link

I've been trying to reproduce the results for a long time, however, I still cannot get it, either.

I hope the authors could provide the exact model architecture and hyper-parameters.

And also, I found that the model in the MNIST(2000)->USPS expriment is not as same as the CCSA keras implementation.

@sheikhomar
Copy link
Author

@KevinDocel, you are right about the model not being the same as the one in CCSA. To be fair, I have implemented it in d-SNE's codebase. Please see commit f98083.

@KevinDocel
Copy link

@sheikhomar, the differences not only lie in the feature dimension as you did, but also the first 2 conv layers, including the kernel size and channels.

BTW, the input size to CCSA keras model is 16x16, while it seems that 32x32 is used in the authors' implementation. But, simply changing the input size from 32x32 to 16x16 in CCSA keras model can not reproduce the results, since this will lead to heavily overfitting for CCSA keras model as I've tried.

@ShownX
Copy link
Contributor

ShownX commented Mar 14, 2020

Hi,
I have to apologize the code is not in the good format.
BTW, I quickly checked the numbers you got and thought they are too low. I spent some personal time doing a quick format of the original codebase here but lack of some details (I would do further code adapting to fully reproduce the results when I have time, right now, mainly focusing on the other projects). Anyway, it can provide you a better starter. Have fun!

@joshuacwnewton
Copy link

joshuacwnewton commented Apr 1, 2020

Hi @ShownX. Thank you for taking the time to develop a new starter codebase. It contains a functional MNIST -> MNIST-M test (using the MT-MM.yaml config) which I am happy about. :)

Note: Should this new MNIST-MNISTM discussion go in a separate issue? Happy to move if it would be helpful, as this is distinct from the conversation above.


I modified the MAX_EPOCH to 5 for curiosity's sake and ran a test. I have attached the output in a file -- if you have a moment, could you verify that these values are as expected?

MT-MM_MAX-EPOCH-5_output.txt


Additionally, looking at the table in the arXiv publication for MT-MM, for d-SNE with 10 target images/class, it appears I should get ~88% accuracy.

image

However, in the attached log, I noticed that the accuracy seems to stall around 11% at the beginning, then abruptly starts to increase for a number of iterations, then overfits. (See: Test accuracy: 81%, Target/Source accuracy: 100%/99%)

This loss/accuracy evolution seems like it might not be ideal. I am wondering about possible concerns about initialization for the network (could be the cause of stalling at 11% at beginning.) I am also wondering about possible concerns about the chosen hyperparameters (could be the cause of sharp increase then overfitting.)


Do you have any information about the configuration used to produce the 88% reported accuracy? Also, for the MT-MM.yaml test, why is "LenetPlus" chosen as the default architecture? I can't seem to find it mentioned anywhere else, but VGG-16 and ResNet-101 are mentioned throughout the publication, so I am a bit confused there.

Thank you kindly for your time and effort. :)

@lambdaofgod
Copy link

@sheikhomar How did you get MNIST-M and other digits datasets? In your loader I only see code for MNIST and USPS

@joshuacwnewton
Copy link

joshuacwnewton commented Apr 17, 2020

@lambdaofgod


The original source of the MNIST-M dataset can be found on Yaroslav Ganin's page under the "Unsupervised Domain Adaptation by Backpropagation" subheading.

Be sure to use the "unpacked version of MNIST-M" link, as the packed version of the dataset contains LMDB files that need to be read using the lmdb library, which d-SNE does not use.


The SVHN dataset can be found here, as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants