Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix Package for NTPoly #231

Closed
maxwell-gisborne opened this issue Feb 27, 2024 · 7 comments
Closed

Nix Package for NTPoly #231

maxwell-gisborne opened this issue Feb 27, 2024 · 7 comments

Comments

@maxwell-gisborne
Copy link

Hi,
I am trying to package ntpoly-v2.3.1 into a nix flake, and I am having some difficulty.

I hope it is okay for me to make an issue here about the topic, I apologies if its not.

I have managed to get it to compile now, but it's failing tests 1-11.

I used the Linux.cmake config, but in order to get it to compile I had to remove the -openmp CXX_FLAG option as it seemed to be confusing cc1plus. The compiler is provided by the mpicxx so I suppose it already knows it should link to mpi. But maybe this is causing problems.
As seen later, I belive the errors are a failure to link properly with mpi.
since adding a -openmp flag seems to break the c++ compiler, I'm not sure how this is supposed to be done.

The CMAKE_TOOLCHAIN_FILE I am using is this

    # Build file for a gcc, linux system.
    set(CMAKE_SYSTEM_NAME Linux)
    set(CMAKE_C_COMPILER mpicc)
    set(CMAKE_Fortran_COMPILER mpif90)
    set(CMAKE_CXX_COMPILER mpicxx)
    set(CMAKE_CXX_FLAGS "")

    # Library Files
    set(TOOLCHAIN_LIBS "-lblas")

    # Release suggestions
    set(CXX_TOOLCHAINFLAGS_RELEASE "-O3 -lgomp")
    set(F_TOOLCHAINFLAGS_RELEASE "-O3 -cpp")

    # Debug suggestions
    set(CXX_TOOLCHAINFLAGS_DEBUG "-O0 -Wall")
    set(F_TOOLCHAINFLAGS_DEBUG "-O0 -cpp -fcheck=all -Wall")

    #set(NOSWIG "yes")
    set(CMAKE_BUILD_TYPE "Debug")
    set(CMAKE_Fortran_FLAGS "-fallow-argument-mismatch")

When I run make test test 1-11 fail, while the rest succeed.

the output of the failed tests are

hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.                                                                                                                                                           
--------------------------------------------------------------------------                                                                                                                                                                    
The value of the MCA parameter "plm_rsh_agent" was set to a path                                                       
that could not be found:                                                                                               

  plm_rsh_agent: ssh : rsh                                                                                             

Please either unset the parameter, or check that the path is correct                                                                                                                                                                          
--------------------------------------------------------------------------                                                                                                                                                                    
[localhost:01345] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error plm_rsh_component.c(335)                                                                                                                                       
<end of output>                                                                                                        
Test time =   0.71 sec                                                                                                 
----------------------------------------------------------                                                             
Test Failed.                                                                                                           
"Regression111" end time: Feb 27 18:43 UTC                                                                             
"Regression111" time elapsed: 00:00:00                                                                                 
----------------------------------------------------------                                                             

2/26 Testing: Regression211                                                                                            
2/26 Test: Regression211                                                                                               
Command: "/nix/store/6payx2da66dbjl6vg15csxfb5hpf3df4-bash-5.2-p15/bin/bash" "/build/source/Build/bin/RunTest.sh" "2" "1" "1" "2"                                                                                                             
Directory: /build/source/Build/UnitTests                                                                               
"Regression211" start time: Feb 27 18:43 UTC                                                                           
Output:                                                                                                                
----------------------------------------------------------                                                             
hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.                                                                                                                                                           
--------------------------------------------------------------------------                                                                                                                                                                    
The value of the MCA parameter "plm_rsh_agent" was set to a path                                                       
that could not be found:                                                                                               

  plm_rsh_agent: ssh : rsh                                                                                             

Please either unset the parameter, or check that the path is correct                                                                                                                                                                          
--------------------------------------------------------------------------                                                                                                                                                                    
[localhost:01347] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error plm_rsh_component.c(335)                                                                                                                                       
<end of output>                                                                                                        
Test time =   0.11 sec                                                                                                 
----------------------------------------------------------                                                             
Test Failed.                                                                                                           
"Regression211" end time: Feb 27 18:43 UTC                                                                             
"Regression211" time elapsed: 00:00:00                                                                                 
----------------------------------------------------------                                                             

3/26 Testing: Regression121                                                                                            
3/26 Test: Regression121                                                                                               
Command: "/nix/store/6payx2da66dbjl6vg15csxfb5hpf3df4-bash-5.2-p15/bin/bash" "/build/source/Build/bin/RunTest.sh" "1" "2" "1" "2"                                                                                                             
Directory: /build/source/Build/UnitTests                                                                               
"Regression121" start time: Feb 27 18:43 UTC                                                                           
Output:                                                                                                                
----------------------------------------------------------                                                             
hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.                                                                                                                                                           
--------------------------------------------------------------------------                                                                                                                                                                    
The value of the MCA parameter "plm_rsh_agent" was set to a path                                                       
that could not be found:                                                                                               

  plm_rsh_agent: ssh : rsh                                                                                             

Please either unset the parameter, or check that the path is correct                                                                                                                                                                          
--------------------------------------------------------------------------                                                                                                                                                                    
[localhost:01349] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error plm_rsh_component.c(335)                                                                                                                                       
<end of output>                                                                                                        
Test time =   0.11 sec                                                                                                 
----------------------------------------------------------                                                             
Test Failed.                                                                                                           
"Regression121" end time: Feb 27 18:43 UTC                                                                             
"Regression121" time elapsed: 00:00:00                                                                                 
----------------------------------------------------------                                                             

4/26 Testing: Regression112                                                                                            
4/26 Test: Regression112                                                                                               
Command: "/nix/store/6payx2da66dbjl6vg15csxfb5hpf3df4-bash-5.2-p15/bin/bash" "/build/source/Build/bin/RunTest.sh" "1" "1" "2" "2"                                                                                                             
Directory: /build/source/Build/UnitTests                                                                               
"Regression112" start time: Feb 27 18:43 UTC                                                                           
Output:                                                                                                                
----------------------------------------------------------                                                             
hwloc/linux: failed to find sysfs cpu topology directory, aborting linux discovery.                                                                                                                                                           
--------------------------------------------------------------------------                                                                                                                                                                    
The value of the MCA parameter "plm_rsh_agent" was set to a path                                                       
that could not be found:                                                                                               

  plm_rsh_agent: ssh : rsh                                                                                             

Please either unset the parameter, or check that the path is correct                                                                                                                                                                          
--------------------------------------------------------------------------                                                                                                                                                                    
[localhost:01351] [[INVALID],INVALID] FORCE-TERMINATE AT Not found:-13 - error plm_rsh_component.c(335)                                                                                                                                       
<end of output>                                                                                                        
Test time =   0.11 sec                                                                                                 
----------------------------------------------------------                                                             
Test Failed.                                                                                                           

Any help would be greatly

@william-dawson
Copy link
Owner

Thanks for working on the nix flake.

  1. If the first set of tests work, there is probably no issue with openmp linking, so I wouldn't worry about it. In fact, cmake is set to search for openmp itself if the flag is not provided (you might see some useful output about this during the cmake configure step).
  2. For nix, is the build done in a docker container? The error sounds something like this one: (Can we use OpenMpi on Docker Container? open-mpi/ompi#3625). Maybe it can be fixed by install ssh in the container.

@maxwell-gisborne
Copy link
Author

maxwell-gisborne commented Feb 28, 2024

Thank you for replying.

(1) it is tests from 1 to 11 that the ones that are failing, and tests 12 to 26 that are passing. So i suppose that means it's the first set which is failing.

(2) I am not using a docker container. Nix containerizes its build environments. So perhaps its the same problem

@maxwell-gisborne
Copy link
Author

After adding openssh to the build enviroment, the same tests are failing, but now with a different error message.

They now bear

At line 7 of file dense_includes/CheckMemoryPoolValidity.f90
Fortran runtime error: Allocatable argument 'this' is not allocated

repeated a few times followed by

Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborte

@william-dawson
Copy link
Owner

Great. Installing openssh seems to have helped, because now we're actually getting into the code.

It seems like there is actually a bug in the 2.7.1 version, I can reproduce this on my machine. Fortunately the bug doesn't exist in the 3.0 series. My guess is that #188 fixed it. I will backport whatever fix was needed and release a v2.7.2 for you. Sorry for the trouble and thanks for finding this.

@maxwell-gisborne
Copy link
Author

Okay, thanks.

I would like to package a version compatible with bigdft. Should I chose 3.0.0 or 3.1.0_bigdft. What is the difference?

@maxwell-gisborne
Copy link
Author

Ive installed v3.0.0 with all tests passed :)

Thankyou for your help.

@william-dawson
Copy link
Owner

For the latest release of BigDFT (1.9.4) it is using NTPoly 3.0.0, so I recommend that you start with that. The _bigdft version was a prerelease so I could test out some new features.

Thank you for your contributions! I'm looking forward to there being a BigDFT nix package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants