-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submit report for S1erHoR5t7 (#10) #135
base: master
Are you sure you want to change the base?
Conversation
Hi, please find below a review submitted by one of the reviewers: Score: 4 In detail: the author describes some of the image pre-processing and data loading steps they follow, though without enough detail (random seeds, etc.) for it to be fully reproducible. For context, the original paper clearly stated the seed value they used, for example. Furthermore, the author of the reproducibility report does not comment on whether these pre-processing procedures match the ones used in the original publication and whether they might introduce some systematic discrepancies. This report clearly states the layer composition of the architectures tested in this study (see Table 1), though it is left to the reader to understand exactly whether these match any of the ones in the original paper. As far as I can tell, what the author of the report calls DCGAN 32x32 is equivalent to what the original paper calls Standard CNN. The DCGAN 64x64, DCGAN 128x128, and DCGAN 256x256 used in the original paper are not re-implemented. This is understandable and acceptable given limited computing resources. On the other hand, the author builds and tests the performance of a Resnet architecture instead. This is helpful to further explore the generality of the method proposed in the original work. On top of the metric used to report results in the original paper, the author of the report also provides results under a different metric, thus providing more information for analyzers to digest the results. The original paper, however, mentions that their choice of using one metric over the other is dictated by the former having a stronger correlation with image quality than the latter. This issue is not addressed in the reproducibility result; therefore, any discrepancy in the interpretation of the results under different metrics might be due to the metric being inappropriate for the task, and may not actually provide any further useful insight on the performance of the method proposed in the original paper. Although the author of the reproducibility report enumerates the various hyperparameters used for training (rate of discriminator to generator updates, optimizer parameters, etc.), it never states whether this matches what was done in the original paper, making it hard to cross-check what stays constant and what doesn't between the two experimental setups. Since the meaning and interpretation of the results varies depending on whether the hyperparameters tested in this work are identical to the ones in the original work, it would be key for the author to state the differences and similarities more clearly. Overall, this report highlights the fact that the choice of normalization can significantly affect the convergence properties (and therefore the results) of the trainings presented in the original paper. It also points to the fact that it is unclear whether the R-formulation can be expected to be successful for training GANs with any architecture choice, as shown in the experiments that compare DCGAN to Resnet. One should note though that the contradictory results may arise for a variety of different reasons involving the coupling between the dataset and architecture choices, their mutual suitability, and more, and may not be indicative of the effectiveness of the method proposed in the original paper. More in-depth discussion of the suspected causes of variation could be useful to the reader. Problem statement:The report shows sufficient understanding of the problem addressed by the solution proposed in the original paper and concisely, yet not masterfully, summarizes it in the introduction paragraph. Code:The author reimplements the original code in TensorFlow and shares it in a new repository, with instructions on how to run it. This new implementation, however, lives within a larger repo of reimplemented models for video and image super-resolution, and heavily depends on multiple classes and functions implemented within this larger framework, thus introducing the meta-problem of having to validate and verify another not necessarily straightforward implementation and source code. This makes it hard to debug and I cannot attest this implementation is bug-free. On the positive side, the code is rather clean, neat, easy to follow and read. Having an implementation in TensorFlow, on top of the original PyTorch implementation, can be beneficial for the community. Communication with the original authors:The author communicated their findings to the original author on Open Review, but no in-depth analysis of the agreements/disagreements ensued. The author did not use Open Review to attempt to clarify any details of the original publication that may not be documented in the original publication and may be crucial for reproducibility. Hyperparameter Search:Overall, no real hyperparameter search is performed to verify the optimality of those picked in the original paper and to measure the sensitivity of the significance of the results on these variables. Ablation Study:The author tries out three different types of normalization (no normalization, batch norm, spectral norm), to augment the original author's choice of using batch norm in the generator and spectral norm in the discriminator for the CIFAR-10 experiments. These three normalization choices can return wildly different results. No other ablation studies are present. Discussion on results:The discussion of the results and their implication can be found in section 2.5. This is more of an analysis of the results obtained in this round of experiments rather than a detailed discussion on the state of reproducibility of the original paper. Recommendations for reproducibility:The report makes no suggestion on how the original author could have improved their paper or code release to ease reproducibility efforts. Overall organization and clarity:This report would strongly benefit from further proofreading and grammar review by a native English speaker. |
Hi, please find below a review submitted by one of the reviewers: Score: 4
Confidence : 5 |
Hi, please find below a review submitted by one of the reviewers: Score: 7 Problem statement Code Communication with original authors Hyperparameter search Ablation study Discussion on results Recommendations for reproducibility Overall organization and clarity |
The reproducibility report for RGAN (#10)