Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JOSS REVIEW] Functionality Review #597

Closed
16 tasks done
adam-m-jcbs opened this issue Jul 22, 2020 · 17 comments
Closed
16 tasks done

[JOSS REVIEW] Functionality Review #597

adam-m-jcbs opened this issue Jul 22, 2020 · 17 comments
Assignees
Labels
reg:paper This is related to a publication.

Comments

@adam-m-jcbs
Copy link

adam-m-jcbs commented Jul 22, 2020

This issue is part of a JOSS review, tracked in-project with issue #479 (paper) and #478 (library and features).

To streamline reviewing, I'm creating this meta-issue that addresses all of the functionality criteria for a JOSS review. Specific functionality issues will also be created that link back to this.

Once this meta-issue closes, all functionality requirements for the JOSS review will be met.

  • Install works as outlined

    • Simple (no CUDA or OMP) build and install on Arch Linux
    • Full build and install with CUDA and OMP in Arch Linux
    • Simple build and install in Windows
  • Software achieves functionality claims
    I'm mostly evaluating this by verifying examples and tests.

    • integrated test suite performs as expected/documented (without CUDA)
    • integrated test suite performs as expected/documented (with CUDA)
    • test_install builds and runs with simple reference build
    • simple-solver runs and gives reasonable output in line with expectations
    • minimal-cuda-solver runs and gives reasonable output in line with expectations
    • performance-debugging runs and gives reasonable output in line with expectations
    • preconditioned solvers run and give reasonable output in line with expectations
      • residual norm is off on my machine by orders of magnitude from the reference output, but still small. Devlopers see this as well. Given the small size, this is acceptable for a functionality review but I recommend developers look into it and either resolve it or update the published expected outputs.
    • nine-pt-stencil-solver (and thus its dependencies) runs and gives reasonable output in line with expectations
  • Performance - any performance claims made are verified

    • Execute ctest and other provided tests to see basic performance matches reference performance given within variation of machines
    • Produce local benchmarking to compare qualitatively with ginkgo-provided benchmarks (this is only a basic check, clearly my local machine will not have the power and hardware of the machines ginkgo benchmarking is run on)
@adam-m-jcbs
Copy link
Author

Hi devs! I have completed a review of functionality based on JOSS criteria. I had success with the basic reference build that doesn't include OMP and CUDA.

The issues that came up, perhaps unsurprisingly, are related to CUDA. I have verified that my machine has the stated requirements and it does have an NVIDIA graphics card, but am having some build issues. I will file a dedicated issue to help resolve this, and it's possible it's simply a configuration issue or something on my machine, as it's a less-used distribution of linux. I use Arch Linux, and see this as a good chance to test the buildability of ginkgo in a potentially new linux environment.

Overall, the functionality of the basic reference install is robust and behaves as expected. I feel confident the same will be true of the build with OMP and CUDA once we resolve the errors that come up.

@upsj
Copy link
Member

upsj commented Jul 22, 2020

Thanks for the review!

[ ] preconditioned solvers run and give reasonable output in line with expectations

  • residual norm is off on my machine by orders of magnitude from the reference output, but still small. OK?

I can reproduce this (1e-8 instead of 1e-16) and will investigate further :)

The issues that came up, perhaps unsurprisingly, are related to CUDA. I have verified that my machine has the stated requirements and it does have an NVIDIA graphics card, but am having some build issues.

Can you take a look at #579 and see if your build issues are related and/or the fix (explicitly specifying an older gcc version) works for you? This seems to be a known issue with Arch Linux and CUDA

@pratikvn pratikvn added the reg:paper This is related to a publication. label Jul 22, 2020
@pratikvn
Copy link
Member

Thank you for the review, @adam-m-jcbs. Thanks @upsj for the quick response!

@adam-m-jcbs
Copy link
Author

adam-m-jcbs commented Jul 22, 2020

I've updated the OP to reflect completion of more aspects of the functionality review with some fixes described in #598 .

While the CUDA build is successful, the execution on the GPU is having issues still. See #598 for more.

@adam-m-jcbs
Copy link
Author

#598 was successfully resolved! I'm moving forward with the last few components of the functionality review.

@adam-m-jcbs
Copy link
Author

I was able to do a basic (no CUDA) build and install on Windows! For the purposes of this review, I believe verifying a basic Windows install is sufficient. My Windows partition has limited space for full CUDA testing, but I can verify CUDA components in Linux.

I created #600 due to the need for minor manual intervention during a Windows build + install, but I suspect this will be relatively quick to resolve.

To finish up the functionality review, I will go back into Linux to verify the 27 pt stencil and the given performance claims.

@adam-m-jcbs
Copy link
Author

Quick update: I'm moving #600 into the documentation review as the Windows build does install and function, just needs a small bit added to the documentation which I believe is being done already. I'm marking the windows install as successful, completing the verification of installation functionality.

@adam-m-jcbs
Copy link
Author

Just a quick FYI: I had to focus on some other things for a bit, but plan to complete the functionality review and file any issues for the other reviews (Docs and Paper) this week (apologies for taking longer on functionality than hoped). In my experience, doc and paper review should be a bit quicker, as I will have less need to build/run/analyze/profile code.

@adam-m-jcbs
Copy link
Author

adam-m-jcbs commented Aug 4, 2020

Unfortunately for my hopes of wrapping things up, I have run into an unexpected new CUDA-related build issue that has delayed me. It is possible my cuda was updated since my last tests, so I've been working on seeing if I created a problem. I have traced the build error to the use of a deprecated cusparse function in gingko's CUDA executor. I have filed an issue (#613 ) to track this and provide more details.

@adam-m-jcbs
Copy link
Author

As mentioned in #613 , as this is an issue introduced by a recent release of CUDA, I think it is reasonable for me to downgrade and finish the functionality review with CUDA 10.x (the version which was current upon JOSS submission).

Happily, bringing ginkgo into compliance with CUDA 11 is already being tracked in PR #603 .

@adam-m-jcbs
Copy link
Author

adam-m-jcbs commented Aug 6, 2020

I was able to get things going with CUDA 10.x. Tests passed, the GPU is exercised, and things generally look functional as before.

I was able to do some minor performance profiling using the benchmarking facilities provided by ginkgo and could qualitatively compare with your published performance benchmarks. While the process could be better documented and made a bit more useable potentially, it is also relatively unique in my experience with codes like ginkgo to support such robust user benchmarking, performance profiling, and providing real + comparable results from machines that may be more difficult for all users to access. I really appreciate it and encourage its further development.

Based on this, I am judging the performance to be consistent with claims.

The only issue I ran into is that my results for the 27 pt stencil are significantly off from the reference results. I get:

The average relative error is 0.197422

While the expected is 1.87318e-15. Any guidance on what's happening here? It may be just a documentation thing, but I want to make sure it's not functional. That's the only thing I have left on my list for the functionality review.

@adam-m-jcbs
Copy link
Author

I did a rebuild to confirm, and still find

./twentyseven-pt-stencil-solver 10
...
0.557092 0.502154 0.505285 0.525146 0.558927 0.60887 0.678399 0.772797 0.907416 1.17498 
0.731815 0.72313 0.734387 0.761826 0.806322 0.871221 0.960838 1.08097 1.24232 1.46524 
0.782408 0.792321 0.810668 0.841941 0.889996 0.958951 1.05318 1.17745 1.33715 1.5367 
0.821682 0.83914 0.86159 0.895201 0.94491 1.01523 1.11043 1.23455 1.39097 1.58097 
0.872535 0.893499 0.918155 0.953107 1.00366 1.07449 1.16979 1.29323 1.44755 1.63329 
0.943486 0.966094 0.991769 1.02733 1.07824 1.14924 1.24449 1.36754 1.52083 1.70468 
1.03999 1.06299 1.08875 1.1243 1.17518 1.24618 1.34147 1.46453 1.61773 1.80119 
1.16654 1.18863 1.21354 1.24843 1.29889 1.36975 1.46519 1.58876 1.74284 1.9274 
1.32686 1.34631 1.36919 1.40267 1.45228 1.52279 1.61848 1.74304 1.89912 2.0868 
1.5232 1.53728 1.55667 1.58785 1.63601 1.70582 1.80166 1.92756 2.08684 2.2808 
:

The average relative error is 0.197422
The runtime is 767.310821 ms

My CMakeCache.txt.

@adam-m-jcbs
Copy link
Author

adam-m-jcbs commented Aug 7, 2020

@upsj Perhaps this is related to the preconditioned solver result that changed? Though in this case the error ends up not being near zero (not even for single precision) as it was in the case of residual norms for the preconditioned solvers.

@pratikvn
Copy link
Member

pratikvn commented Aug 7, 2020

@adam-m-jcbs , I observe this too. We are working on fixing it.

@pratikvn
Copy link
Member

@adam-m-jcbs , We decided to remove the 27-pt-stencil solver as an example for now as it did not showcase any specific aspect of Ginkgo that the 9pt and 3pt did not already show and had a difficult to track down bug. We will probably re-write it in the future. But for now I think we can proceed with the review. Thank you for bringing this to our attention.

Please refer to #612.

@adam-m-jcbs
Copy link
Author

@pratikvn thanks for bringing to my attention, and apologies that I haven't had time to validate until today.

I have confirmed that #612 resolves the only remaining functionality issue. Given that documentation and the paper have already been approved in my review, my recommendation will be for acceptance, as you satisfy all JOSS criteria! I will update the other issues associated with this review shortly.

Also, I appreciate the improvements to the benchmarking documentation. More codes should provide such robust tools for evaluating, comparing, and validating performance. This adds tremendous value to your public codebase and sets an example for others developing openly, in my view.

@pratikvn
Copy link
Member

@adam-m-jcbs , thank you. We greatly appreciate your thorough review and your constructive comments on various aspects of the library. It has definitely helped us improve Ginkgo.

tcojean added a commit that referenced this issue Aug 24, 2020
Adds more detailed information on how to benchmark Ginkgo.

Thanks Adam Jacobs for his useful review process for the JOSS paper which pointed to a lack of documentation on how to benchmark Ginkgo.

Summary:
+ Detail a bit how to use `ssget` and what to watch out for.
+ Add a benchmark overview section with the most important options.
+ Optionally detail a little how to interact with the GPE after obtaining benchmark results as well as what to watch out for.
+ Detail a little how to obtain more detailed information as well as how to debug Ginkgo through loggers.
+ Update the available options in the script.
+ In addition, I try to change the way we generate the `run_all_benchmarks.sh` script in order to give the proper rights to this file by default (execution `x` access).

Related issue: #597
Related PR: #619
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reg:paper This is related to a publication.
Projects
None yet
Development

No branches or pull requests

3 participants