Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new version gives error on parsing *.JSON inputs (cuda/10.2.89) #99

Closed
Jaberh opened this issue May 27, 2020 · 53 comments
Closed

new version gives error on parsing *.JSON inputs (cuda/10.2.89) #99

Jaberh opened this issue May 27, 2020 · 53 comments

Comments

@Jaberh
Copy link

Jaberh commented May 27, 2020

I am trying to integrate AMGX to an industrial code. if I build in in debug mode, there are no parsing issues, but when in release mode, it gives the following error; the input file is AGGREGATION_JACOBI.json
Converting config string to current config version
Error parsing parameter string: Incorrect config entry (number of equal signs is not 1) : "config_version": 2
To make stuff more interesting, if I use AMGX_config_create(&m_config, "config_version=2,algorithm=AGGREGATION,selector=SIZE_2,print_grid_stats=1,max_iters=1000,monitor_residual=1,obtain_timings=1,print_solve_stats=1,print_grid_stats=1");
it refuses to print grid and solver stats but accepts the rest of the inputs.

@marsaev
Copy link
Collaborator

marsaev commented Jun 4, 2020

Hi @Jaberh ,
For every config (file or string) it first tries to parse it as JSON and if it fails then it tries it to process as 'plain text' config (see https://github.com/NVIDIA/AMGX/blob/master/doc/AMGX_Reference.pdf). I strongly recommend sticking to the JSON configs as they are simpler to handle.

Converting config string to current config version
Error parsing parameter string: Incorrect config entry (number of equal signs is not 1) : "config_version": 2

It seems that it cannot parse your config as JSON for some reason and then it cannot parse it as plain text config (because it is not) which results in fail. Now it's weird that Release/Debug differs in such manner. Which host compiler do you use? Do you wish to use config files or rather config strings in your final code?

To make stuff more interesting, if I use AMGX_config_create(&m_config, "config_version=2,algorithm=AGGREGATION,selector=SIZE_2,print_grid_stats=1,max_iters=1000,monitor_residual=1,obtain_timings=1,print_solve_stats=1,print_grid_stats=1");
it refuses to print grid and solver stats but accepts the rest of the inputs.

This happens because AMGX parsed this config as plain text and it has slightly different syntax. Grid and solver stats parameter need to be explicitly specified for a solver. Again, i strongly recommend using JSON configs, but if you wish i can explain to you how to use plain text configs.

@Jaberh
Copy link
Author

Jaberh commented Jun 4, 2020

Hi Marsaev, my first choice was also using the JSON file, the issue is when I integrate the amgx to the bigger code, using JSON file it does not iterate, I used the config file from your example, stand alone works perfect adding to the bigger code using JSON number of iterations kept at 0. Using the same JSON file that works as stand alone, I think by now I have almost memorized your entire AMGX_Reference.pdf file. I am using gcc/8.2.0 with mpich/3.2.1 for mpi and cuda/10.2.89

@marsaev
Copy link
Collaborator

marsaev commented Jun 5, 2020

You can also provide JSON as a string. Here is example of extension of amgx_capi example that reads json file and provide it's contents to AMGX_config_create() function, it works identical to using AMGX_config_create_from_file() for me:

    {
        FILE* fjson = fopen(argv[pidz + 1], "r");
        const size_t config_max_len = 4096;
        char config[config_max_len];

        char* read_ptr = config;
        size_t len = config_max_len - 1;

        while(fgets(read_ptr, len, fjson)) {
            len = MAX(config_max_len - strlen(config) - 1, 0);
            read_ptr = config + strlen(config);
        }

        AMGX_SAFE_CALL(AMGX_config_create(&cfg, config));
    }

You can also hardcode config, for example here i copy-pasted AGGREGATION_JACOBI.json config and sent it to AMGX_config create:

    {
        const char config[] = 
            "{                                                  "        
            "    \"config_version\": 2,                         "                                    
            "    \"determinism_flag\": 1,                       "                                    
            "    \"solver\": {                                  "                        
            "        \"print_grid_stats\": 1,                   "                                        
            "        \"algorithm\": \"AGGREGATION\",            "                                                
            "        \"obtain_timings\": 1,                     "                                        
            "        \"solver\": \"AMG\",                       "                                    
            "        \"smoother\": \"BLOCK_JACOBI\",            "                                                
            "        \"print_solve_stats\": 1,                  "                                        
            "        \"presweeps\": 2,                          "                                
            "        \"selector\": \"SIZE_2\",                  "                                        
            "        \"convergence\": \"RELATIVE_MAX_CORE\",    "                                                        
            "        \"coarsest_sweeps\": 2,                    "                                        
            "        \"max_iters\": 100,                        "                                    
            "        \"monitor_residual\": 1,                   "                                        
            "        \"min_coarse_rows\": 2,                    "                                        
            "        \"relaxation_factor\": 0.75,               "                                            
            "        \"scope\": \"main\",                       "                                    
            "        \"max_levels\": 1000,                      "                                    
            "        \"postsweeps\": 2,                         "                                    
            "        \"tolerance\": 0.1,                        "                                    
            "        \"norm\": \"L1\",                          "                                
            "        \"cycle\": \"V\"                           "                                
            "    }                                              "            
            "}                                                  ";

        AMGX_SAFE_CALL(AMGX_config_create(&cfg, config));
    }

and output is also identical to what's above. For everything here i used same gcc and Release build.

Would any of those options work for you in your app?

@Jaberh
Copy link
Author

Jaberh commented Jun 5, 2020

Interesting, I copied your hard coded version and still the error is the same as before
Error parsing parameter string: Incorrect config entry (number of equal signs is not 1) : { "config_version": 2

AMGX ERROR: file *.cu line 215
AMGX ERROR: Incorrect amgx configuration provided.
The only different think that I do is I dont manually do the "lib_handle = amgx_libopen("libamgxsh.so");" although I use the dynamic lib, this I dont think should matter, But all else is pretty much according to the manual.
using the debug version reads successfully but does zero iterations, here is the output
AMG Grid:
Number of Levels: 6
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 10000 49600 0.000496 0.000848
1(D) 4717 27857 0.00125 0.00083
2(D) 2145 13539 0.00294 0.000397
3(D) 990 6506 0.00664 0.000189
4(D) 456 3002 0.0144 8.73e-05
5(D) 213 1387 0.0306 3.74e-05
--------------------------------------------------------------
Grid Complexity: 1.8521
Operator Complexity: 2.05425
Total Memory Usage: 0.00238962 GB
--------------------------------------------------------------
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.909485 2.199999e+03
--------------------------------------------------------------
Total Iterations: 0
Avg Convergence Rate: 1.000000
Final Residual: 2.199999e+03
Total Reduction in Residual: 1.000000e+00
Maximum Memory Usage: 0.909 GB
--------------------------------------------------------------
Total Time: 0.00571098
setup: 0.00561062 s
solve: 0.000100352 s
solve(per iteration): 0 s

@marsaev
Copy link
Collaborator

marsaev commented Jun 5, 2020

Just to check, when you build Release AMGX, can you check that RAPIDJSON_DEFINED is in the build C flags? (make VERBOSE=1 for any AMGX library file)

@Jaberh
Copy link
Author

Jaberh commented Jun 5, 2020

I will check it now, but I remember the CMakeLists.txt enforces it to one, I will double check again, btw spasiba for all your help, We did a fresh build, now for a simple test case the first option works and the second one still gives the same error. The only change that I made to your config file is to add "store_res_history": 1, now with your proposed alternatives, and the new build, I do not get the read error
Howerver, using any JSON file from your examples I get such as the following
{
"config_version": 2,
"determinism_flag": 1,
"solver": {
"print_grid_stats": 1,
"algorithm": "AGGREGATION",
"obtain_timings": 1,
"solver": "AMG",
"smoother": "BLOCK_JACOBI",
"print_solve_stats": 1,
"presweeps": 2,
"selector": "SIZE_2",
"convergence": "RELATIVE_MAX_CORE",
"coarsest_sweeps": 2,
"max_iters": 100,
"monitor_residual": 1,
"min_coarse_rows": 2,
"relaxation_factor": 0.75,
"scope": "main",
"max_levels": 1000,
"postsweeps": 2,
"tolerance": 0.1,
"norm": "L1",
"cycle": "V",
"store_res_history": 1
}
}
~
I get zero iterations
AMG Grid:
Number of Levels: 7
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 12831 87115 0.000529 0.00135
1(D) 5900 52052 0.0015 0.00142
2(D) 2770 29606 0.00386 0.000785
3(D) 1314 15616 0.00904 0.000407
4(D) 621 7745 0.0201 0.000201
5(D) 297 3695 0.0419 9.58e-05
6(D) 140 1658 0.0846 4.13e-05
--------------------------------------------------------------
Grid Complexity: 1.86057
Operator Complexity: 2.26697
Total Memory Usage: 0.00430242 GB
--------------------------------------------------------------
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.909485 1.071363e+05
--------------------------------------------------------------
Total Iterations: 0
Avg Convergence Rate: 1.000000
Final Residual: 1.071363e+05
Total Reduction in Residual: 1.000000e+00
Maximum Memory Usage: 0.909 GB
--------------------------------------------------------------
Total Time: 0.0183321
setup: 0.0182201 s
solve: 0.000112064 s
solve(per iteration): 0 s

However if I use,
AMGX_config_create(&m_config,"config_version=2, solver(s1)=FGMRES, s1:preconditioner=BLOCK_JACOBI ,s1:max_iters=100,s1:convergence=RELATIVE_INI_CORE ,s1:norm=L2, s1:tolerance=1e-3 ,s1:monitor_residual=1,s1:gmres_n_restart=20");
It iterates and converges as follows;
CLASSICAL is not supported in AMGX_read_system_maps_one_ring. (this is also interesting as for the FGMRES there are no AMG's to chose from Classical or aggregation)
res( 1 )11.2807
res( 2 )6.69336
res( 3 )4.74958
res( 4 )3.69872
res( 5 )2.9753
res( 6 )2.25479
res( 7 )1.69783
res( 8 )1.29488
res( 9 )0.989278
res( 10 )0.742286
res( 11 )0.579798
res( 12 )0.47657
res( 13 )0.311112
res( 14 )0.195809
Which definitely reduces the residual, again stand alone both work perfectly, the problem occurs when I integrate it to the real code.
I hope this helps with tracking down the issue, I can also talk on zoom or whatever you prefer if you think It might help with the issue. To me it seems that for whatever reason the the iteration kernel is not being launched

@Jaberh
Copy link
Author

Jaberh commented Jun 17, 2020

did you get a chance to look at the above issue?

@marsaev
Copy link
Collaborator

marsaev commented Jun 19, 2020

Hey Jaberth,

I suspect none of your configs are parsed as expected. It would definitely help to print exact solver components during solver constructions just to confirm solver structure but there is no such functionality at the moment. Can we try identify issue step by step?
Can you try running built-in example amgx_capi on the 2cubes_sphere.mtx matrix from https://suitesparse-collection-website.herokuapp.com/MM/Um/2cubes_sphere.tar.gz ? This would be expected output:

$ examples/amgx_capi -m /tmp/2cubes_sphere/2cubes_sphere.mtx -c ../core/configs/AGGREGATION_JACOBI.json 
AMGX version 2.1.0.131-opensource
Built on Jun  5 2020, 16:50:40
Compiled with CUDA Runtime 10.2, using CUDA driver 11.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
AMG Grid:
         Number of Levels: 10
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)       101492           1647264   0.00016         0.0214
           1(D)        48283            834307  0.000358         0.0208
           2(D)        23049            429889  0.000809         0.0106
           3(D)        11038            221914   0.00182        0.00545
           4(D)         5313            114457   0.00405        0.00279
           5(D)         2557             58789   0.00899        0.00143
           6(D)         1242             29810    0.0193       0.000722
           7(D)          602             14574    0.0402       0.000353
           8(D)          294              6970    0.0806       0.000169
           9(D)          143              3261     0.159       7.72e-05
         --------------------------------------------------------------
         Grid Complexity: 1.91161
         Operator Complexity: 2.0405
         Total Memory Usage: 0.0638129 GB
         --------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.900269   1.014920e+05
              0            0.900269   1.633251e+03         0.0161
         --------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate: 		         0.0161
         Final Residual: 		   1.633251e+03
         Total Reduction in Residual: 	   1.609241e-02
         Maximum Memory Usage: 		          0.900 GB
         --------------------------------------------------------------
Total Time: 0.0139016
    setup: 0.0120689 s
    solve: 0.00183274 s
    solve(per iteration): 0.00183274 s

@marsaev
Copy link
Collaborator

marsaev commented Jun 19, 2020

For the same config and input data, but for two ranks it should be:

$ mpirun -n 2  examples/amgx_mpi_capi -m /tmp/2cubes_sphere/2cubes_sphere.mtx -c ../core/configs/AGGREGATION_JACOBI.json 
Process 0 selecting device 0
Process 1 selecting device 1
AMGX version 2.1.0.131-opensource
Built on Jun 19 2020, 20:14:42
Compiled with CUDA Runtime 10.2, using CUDA driver 11.0
Warning: No mode specified, using dDDI by default.
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: exception_handling=1 ; 
Warning: No mode specified, using dDDI by default.
Using Normal MPI (Hostbuffer) communicator...
Reading matrix dimensions in file: /tmp/2cubes_sphere/2cubes_sphere.mtx
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
AMG Grid:
         Number of Levels: 9
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)       101492           1647264   0.00016          0.022
           1(D)        48168            832024  0.000359         0.0213
           2(D)        22955            428089  0.000812         0.0109
           3(D)        10967            220573   0.00183         0.0056
           4(D)         5277            113673   0.00408        0.00288
           5(D)         2541             57753   0.00894        0.00147
           6(D)         1221             28739    0.0193       0.000734
           7(D)          588             13920    0.0403        0.00036
           8(D)          285              6605    0.0813       0.000166
         --------------------------------------------------------------
         Grid Complexity: 1.9065
         Operator Complexity: 2.03285
         Total Memory Usage: 0.0653328 GB
         --------------------------------------------------------------
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.902222   1.014920e+05
              0            0.902222   1.637347e+03         0.0161
         --------------------------------------------------------------
         Total Iterations: 1
         Avg Convergence Rate: 		         0.0161
         Final Residual: 		   1.637347e+03
         Total Reduction in Residual: 	   1.613277e-02
         Maximum Memory Usage: 		          0.902 GB
         --------------------------------------------------------------
Total Time: 0.0273681
    setup: 0.0240261 s
    solve: 0.00334202 s
    solve(per iteration): 0.00334202 s

@Jaberh
Copy link
Author

Jaberh commented Jun 19, 2020

Hi Marsaev
Thank you for the detailed example.
I will give this a try. However, the stand alone code using numerous examples and configs works fine. I have verified those about a month ago. The issue is when I integrate the interface and AMGX to a real industrial scale code, for whatever reason, in parallel mode, it does not perform iteration. same config works in the unit test no problem. I am trying to schedule a meeting with NVIDIA through our company as I think it is easier to demonstrate what is going on easier in the meeting. I will let you know once I test this later today

@marsaev
Copy link
Collaborator

marsaev commented Jun 22, 2020

Is it possible somehow to load this example matrix into your industrial code to check if same solve can be repeated in your app?
I.e. if you had something like this on each rank:

AMGX_config_create_from_file( ... ,"AGGREGATION_JACOBI.json")
AMGX_matrix_upload_distributed(...)
AMGX_vector_upload(...)
AMGX_solver_setup(...)
AMGX_solver_solve(...)

replace it with:

AMGX_config_create_from_file( ... ,"AGGREGATION_JACOBI.json")
AMGX_read_system("2cubes_sphere.mtx")
AMGX_solver_setup(...)
AMGX_solver_solve(...)
AMGX_finalize(...)
exit(0)

and check that each rank solves this system identically and each rank's AMGX's output is similar to output of standalone example. If your distributed setup is homogeneous, there shouldn't be any major differences (IIRC possible difference - parallel reduction result, but it shouldn't affect solve drastically)

If output is somehow different - can you try log every AMG API call just to check order of calls and parameters? Sorry, there is no built-int logging of API calls, but if you wrap AMGX calls with error checking you can do something like:

#define AMGX_SAFE_CALL(rc) \
    std::cout << #rc << endl;    \
    if (AMGX_RC_OK != (rc)) .....


AMGX_SAFE_CALL( AMGX_config_create_from_file(...) );

@Jaberh
Copy link
Author

Jaberh commented Jun 23, 2020

Hi Marsaev
I will work on this today, I have printed out all the return values as per your suggestion from every function, they are all 0's
I will try the above matrix asap

@marsaev
Copy link
Collaborator

marsaev commented Jun 23, 2020

I have printed out all the return values as per your suggestion from every function, they are all 0's

It's great that there are no errors, but it would be great to see actual order of the calls with parameters, hence macro expansion with hashtag #rc

@Jaberh
Copy link
Author

Jaberh commented Jun 23, 2020

here is the output with parameters
Warning: using only 1 of 2 available GPUs
AMGX_config_create_from_file(&m_config, param_file)
AMGX_resources_create(&m_resources, m_config, static_cast<void
>(&m_amgx_comm), m_max_device_per_host, (const int
)(&m_device_id))
AMGX_matrix_create(&m_matrix, m_resources, m_mode)
AMGX_vector_create(&m_rhs, m_resources, m_mode)
AMGX_vector_create(&m_solution, m_resources, m_mode)
AMGX_solver_create(&m_solver, m_resources, m_mode, m_config)
AMGX_matrix_comm_from_maps_one_ring(m_matrix, m_allocated_halo_depth, m_num_nbrs, m_nbrs, m_send_size, m_send_map, m_recv_size, m_recv_map)
AMGX_matrix_upload_all(m_matrix, m_n, m_nnz, m_block_dimx, m_block_dimy, (const int*)row_ptrs, (const int*)col_indices, (const double*)data, (const double*)diag_data)
AMGX_vector_bind(m_rhs, m_matrix)
AMGX_vector_bind(m_solution, m_matrix)
AMGX_vector_upload(m_rhs, m_n_plus_ghost, m_block_size, rhs)
AMGX_vector_upload(m_solution, m_n_plus_ghost, m_block_size, rhs)
AMGX_solver_setup(m_solver, m_matrix)**
AMG Grid:
Number of Levels: 6
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.68e-05
--------------------------------------------------------------
Grid Complexity: 1.8587
Operator Complexity: 2.00357
Total Memory Usage: 0.00233688 GB
--------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.909485 9.969297e-02
--------------------------------------------------------------
Total Iterations: 0
Avg Convergence Rate: 1.000000
Final Residual: 9.969297e-02
Total Reduction in Residual: 1.000000e+00
Maximum Memory Usage: 0.909 GB
--------------------------------------------------------------
Total Time: 0.0314893
setup: 0.0293952 s
solve: 0.00209405 s
solve(per iteration): 0 s
again "0" iters

and here is the same case called from a unit test and its output
AMGX version 2.1.0.131-opensource
Built on Jun 6 2020, 20:32:19
Compiled with CUDA Runtime 10.2, using CUDA driver 10.2
m_rank 0
m_nRank 1
m_ndevice 2
m_nHost 1
Warning: using only 1 of 2 available GPUs
AMGX_config_create_from_file(&m_config, param_file)
AMGX_resources_create(&m_resources, m_config, static_cast<void*>(&m_amgx_comm), m_max_device_per_host, (const int*)(&m_device_id))
AMGX_matrix_create(&m_matrix, m_resources, m_mode)
AMGX_vector_create(&m_rhs, m_resources, m_mode)
AMGX_vector_create(&m_solution, m_resources, m_mode)
AMGX_solver_create(&m_solver, m_resources, m_mode, m_config)
sqrt 1
AMGX_matrix_comm_from_maps_one_ring(m_matrix, m_allocated_halo_depth, m_num_nbrs, m_nbrs, m_send_size, m_send_map, m_recv_size, m_recv_map)
AMGX_matrix_upload_all(m_matrix, m_n, m_nnz, m_block_dimx, m_block_dimy, (const int*)row_ptrs, (const int*)col_indices, (const double*)data, (const double*)diag_data)
AMGX_vector_bind(m_rhs, m_matrix)
AMGX_vector_bind(m_solution, m_matrix)
AMGX_vector_upload(m_rhs, m_n_plus_ghost, m_block_size, rhs)
AMGX_vector_upload(m_solution, m_n_plus_ghost, m_block_size, rhs)
AMGX_solver_setup(m_solver, m_matrix)
AMG Grid:
Number of Levels: 6
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.68e-05
--------------------------------------------------------------
Grid Complexity: 1.8587
Operator Complexity: 2.00357
Total Memory Usage: 0.00233688 GB
--------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.909485 9.969297e-02
0 0.909485 9.867779e-02 0.9898
1 0.9095 3.703747e-02 0.3753
2 0.9095 1.228826e-02 0.3318
3 0.9095 4.846841e-03 0.3944
4 0.9095 2.128492e-03 0.4392
5 0.9095 9.672328e-04 0.4544
6 0.9095 4.243034e-04 0.4387
7 0.9095 1.947642e-04 0.4590
8 0.9095 8.401128e-05 0.4313
9 0.9095 3.381623e-05 0.4025
10 0.9095 1.696738e-05 0.5018
11 0.9095 7.558131e-06 0.4455
12 0.9095 3.316206e-06 0.4388
13 0.9095 1.404204e-06 0.4234
14 0.9095 6.186404e-07 0.4406
15 0.9095 2.787829e-07 0.4506
16 0.9095 1.260950e-07 0.4523
17 0.9095 4.818015e-08 0.3821
18 0.9095 1.409761e-08 0.2926
19 0.9095 8.047044e-09 0.5708
20 0.9095 5.687245e-09 0.7067
21 0.9095 3.133990e-09 0.5511
22 0.9095 1.435876e-09 0.4582
23 0.9095 5.912854e-10 0.4118
24 0.9095 2.313654e-10 0.3913
25 0.9095 9.721457e-11 0.4202
26 0.9095 3.978193e-11 0.4092
27 0.9095 1.710693e-11 0.4300
28 0.9095 7.397181e-12 0.4324
29 0.9095 3.755933e-12 0.5078
30 0.9095 2.206292e-12 0.5874
31 0.9095 1.030078e-12 0.4669
32 0.9095 4.370271e-13 0.4243
33 0.9095 1.771387e-13 0.4053
34 0.9095 7.296272e-14 0.4119
--------------------------------------------------------------
Total Iterations: 35
Avg Convergence Rate: 0.4501
Final Residual: 7.296272e-14
Total Reduction in Residual: 7.318742e-13
Maximum Memory Usage: 0.909 GB
--------------------------------------------------------------
Total Time: 0.159129
setup: 0.0357844 s
solve: 0.123344 s
solve(per iteration): 0.00352412 s
stat of solve 0
AMGX_vector_download(m_solution, dest)
err 8.38997e-05
As you see the grid data as well as initial res are identical

@marsaev
Copy link
Collaborator

marsaev commented Jun 23, 2020

Just to check - are there AMGX_initialize() and AMGX_finalize() calls in the code?

@Jaberh
Copy link
Author

Jaberh commented Jun 24, 2020

yes, a singleton class is reposible for initialization and cleanup, as this library will be used by other developers I wanted to prevent multiple calls, I did no put safe_call of initialize and finalize that is why it is not shown there

@marsaev
Copy link
Collaborator

marsaev commented Jun 24, 2020

Alright, those API calls looks good.

I made some progress today. There are double try/catch blocks inside the code and real issue is caught within the code, but error reported to the C API is incorrectly identified.

@Jaberh
Copy link
Author

Jaberh commented Jun 24, 2020

Do you suspect that this might be an issue related to something lower level that the programming, such as build, cuda installation, drivers, ... ?

@marsaev
Copy link
Collaborator

marsaev commented Jun 25, 2020

@Jaberh sorry, yesterday for some reason i couldn't add comments to this thread. I pushed a fix 7b4d431 to the v2.1.x branch. Can you try the update?

Answering your question - no, this is purely AMGX bug.

@Jaberh
Copy link
Author

Jaberh commented Jun 25, 2020

Hi marsaev, I am git pulling it right now and will let you know
Thanks for all your support and the replies

@Jaberh
Copy link
Author

Jaberh commented Jun 25, 2020

I built this one which has the latest commit. Unfortunately nothing changed, still unit test works fine and integrated version does not iterate. same output as above

commit 7b4d431
Author: Marat Arsaev marsaev@nvidia.com
Date: Thu Jun 25 23:01:31 2020 +0300

Disabling deferred tasks

@marsaev
Copy link
Collaborator

marsaev commented Jun 26, 2020

Got it.

Can you check what solver status is returned with AMGX_solver_get_status(solver, &status); after AMGX_solver_solve?

Also, can you try adding "solver_verbose" = 1 for both solver and smoother in the config, so something like this for AGGREGATION_JACOBI:

{
    "config_version": 2, 
    "determinism_flag": 1, 
    "solver": {
        "print_grid_stats": 1, 
        "algorithm": "AGGREGATION", 
        "obtain_timings": 1, 
        "solver": "AMG", 
        "smoother": {
            "solver" : "BLOCK_JACOBI",
            "scope" : "jacobi",
            "solver_verbose" : 1
        },
        "print_solve_stats": 1, 
        "presweeps": 2, 
        "selector": "SIZE_2", 
        "convergence": "RELATIVE_MAX_CORE", 
        "coarsest_sweeps": 2, 
        "max_iters": 100, 
        "monitor_residual": 1, 
        "min_coarse_rows": 2, 
        "relaxation_factor": 0.75, 
        "scope": "main", 
        "max_levels": 1000, 
        "postsweeps": 2, 
        "tolerance": 0.1, 
        "norm": "L1", 
        "cycle": "V",
        "solver_verbose" : 1
    }
}

you should see something like this in the output:

---------------------------------------------------------------------------
Parameters for solver: AMG with scope name: main


AMG solver settings:
cycle_iters = 2
norm = L1
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 100
scaling = NONE
norm = L1
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 0
obtain_timings = 1
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

info for jacobi should be repeated for each amg level.

@Jaberh
Copy link
Author

Jaberh commented Jun 26, 2020

Hi Marat. sure, I have been checking for solver return value which is 0. and here is the output using your new config file
global initilizer called

AMGX version 2.1.0.131-opensource
Built on Jun 25 2020, 15:44:52
Compiled with CUDA Runtime 10.2, using CUDA driver 10.2
 m_rank  0
 m_nRank  1
 m_ndevice  2
 m_nHost  1
Warning:  using only 1 of 2 available GPUs
AMGX_config_create_from_file(&m_config, param_file)
AMGX_resources_create(&m_resources, m_config, static_cast<void*>(&m_amgx_comm), m_max_device_per_host, (const int*)(&m_device_id))
AMGX_matrix_create(&m_matrix, m_resources, m_mode)
AMGX_vector_create(&m_rhs, m_resources, m_mode)
AMGX_vector_create(&m_solution, m_resources, m_mode)
AMGX_solver_create(&m_solver, m_resources, m_mode, m_config)
 sqrt 1
AMGX_matrix_comm_from_maps_one_ring(m_matrix, m_allocated_halo_depth, m_num_nbrs, m_nbrs, m_send_size, m_send_map, m_recv_size, m_recv_map)
AMGX_matrix_upload_all(m_matrix, m_n, m_nnz, m_block_dimx, m_block_dimy, (const int*)row_ptrs, (const int*)col_indices, (const double*)data, (const double*)diag_data)
AMGX_vector_bind(m_rhs, m_matrix)
AMGX_vector_bind(m_solution, m_matrix)
AMGX_vector_upload(m_rhs, m_n_plus_ghost, m_block_size, rhs)
AMGX_vector_upload(m_solution, m_n_plus_ghost, m_block_size, rhs)
AMGX_solver_setup(m_solver, m_matrix)
---------------------------------------------------------------------------
Parameters for solver: AMG with scope name: main


AMG solver settings:
cycle_iters = 2
norm = L1
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 100
scaling = NONE
norm = L1
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 0
obtain_timings = 1
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

AMG Grid:
         Number of Levels: 6
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)        10000             49600  0.000496       0.000848
           1(D)         4730             25612   0.00114       0.000781
           2(D)         2182             13218   0.00278       0.000392
           3(D)         1000              6524   0.00652        0.00019
           4(D)          464              3058    0.0142       8.88e-05
           5(D)          211              1365    0.0307       3.68e-05
         --------------------------------------------------------------
         Grid Complexity: 1.8587
         Operator Complexity: 2.00357
         Total Memory Usage: 0.00233688 GB
         --------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.911438   7.998657e+00
         --------------------------------------------------------------
         Total Iterations: 0
         Avg Convergence Rate: 		       1.000000
         Final Residual: 		   7.998657e+00
         Total Reduction in Residual: 	   1.000000e+00
         Maximum Memory Usage: 		          0.911 GB
         --------------------------------------------------------------
Total Time: 0.0246834
    setup: 0.0222985 s
    solve: 0.00238486 s
    solve(per iteration): 0 s
 **stat of solve 0**
AMGX_vector_download(m_solution, dest)
err 1.00176
 global destructor called


@marsaev
Copy link
Collaborator

marsaev commented Jun 26, 2020

Looks all right.
You are saying that it does iterate with the same config with the same matrix in the unit test, but not in the app code, right?

@Jaberh
Copy link
Author

Jaberh commented Jun 26, 2020

exactly,
here is the same config for unit test,

global initializer called.

AMGX version 2.1.0.131-opensource
Built on Jun 25 2020, 15:44:52
Compiled with CUDA Runtime 10.2, using CUDA driver 10.2
 m_rank  0
 m_nRank  1
 m_ndevice  2
 m_nHost  1
Warning:  using only 1 of 2 available GPUs
AMGX_config_create_from_file(&m_config, param_file)
AMGX_resources_create(&m_resources, m_config, static_cast<void*>(&m_amgx_comm), m_max_device_per_host, (const int*)(&m_device_id))
AMGX_matrix_create(&m_matrix, m_resources, m_mode)
AMGX_vector_create(&m_rhs, m_resources, m_mode)
AMGX_vector_create(&m_solution, m_resources, m_mode)
AMGX_solver_create(&m_solver, m_resources, m_mode, m_config)
 sqrt 1
AMGX_matrix_comm_from_maps_one_ring(m_matrix, m_allocated_halo_depth, m_num_nbrs, m_nbrs, m_send_size, m_send_map, m_recv_size, m_recv_map)
AMGX_matrix_upload_all(m_matrix, m_n, m_nnz, m_block_dimx, m_block_dimy, (const int*)row_ptrs, (const int*)col_indices, (const double*)data, (const double*)diag_data)
AMGX_vector_bind(m_rhs, m_matrix)
AMGX_vector_bind(m_solution, m_matrix)
AMGX_vector_upload(m_rhs, m_n_plus_ghost, m_block_size, rhs)
AMGX_vector_upload(m_solution, m_n_plus_ghost, m_block_size, rhs)
AMGX_solver_setup(m_solver, m_matrix)
---------------------------------------------------------------------------
Parameters for solver: AMG with scope name: main


AMG solver settings:
cycle_iters = 2
norm = L1
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 100
scaling = NONE
norm = L1
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 0
obtain_timings = 1
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi

relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------

AMG Grid:
         Number of Levels: 6
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)        10000             49600  0.000496       0.000848
           1(D)         4730             25612   0.00114       0.000781
           2(D)         2182             13218   0.00278       0.000392
           3(D)         1000              6524   0.00652        0.00019
           4(D)          464              3058    0.0142       8.88e-05
           5(D)          211              1365    0.0307       3.68e-05
         --------------------------------------------------------------
         Grid Complexity: 1.8587
         Operator Complexity: 2.00357
         Total Memory Usage: 0.00233688 GB
         --------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.911438   7.998657e+00
              0            0.911438   8.579094e+00         1.0726
              1              0.9114   8.166217e+00         0.9519
              2              0.9114   7.458036e+00         0.9133
              3              0.9114   6.721179e+00         0.9012
              4              0.9114   6.026819e+00         0.8967
              5              0.9114   5.392025e+00         0.8947
              6              0.9114   4.818757e+00         0.8937
              7              0.9114   4.304031e+00         0.8932
              8              0.9114   3.842848e+00         0.8928
              9              0.9114   3.430228e+00         0.8926
             10              0.9114   3.061298e+00         0.8924
             11              0.9114   2.731515e+00         0.8923
             12              0.9114   2.436909e+00         0.8921
             13              0.9114   2.173792e+00         0.8920
             14              0.9114   1.938851e+00         0.8919
             15              0.9114   1.729142e+00         0.8918
             16              0.9114   1.541975e+00         0.8918
             17              0.9114   1.374959e+00         0.8917
             18              0.9114   1.225957e+00         0.8916
             19              0.9114   1.093047e+00         0.8916
             20              0.9114   9.744872e-01         0.8915
             21              0.9114   8.687460e-01         0.8915
             22              0.9114   7.744420e-01         0.8914
         --------------------------------------------------------------
         Total Iterations: 23
         Avg Convergence Rate: 		         0.9035
         Final Residual: 		   7.744420e-01
         Total Reduction in Residual: 	   9.682150e-02
         Maximum Memory Usage: 		          0.911 GB
         --------------------------------------------------------------
Total Time: 0.0520445
    setup: 0.0229698 s
    solve: 0.0290748 s
    solve(per iteration): 0.00126412 s
 stat of solve 0
AMGX_vector_download(m_solution, dest)
err 0.0657659
 global destructor called

@marsaev
Copy link
Collaborator

marsaev commented Jun 26, 2020

I don't have any ideas why it might be happening without hands on on the code, considering that standalone execution functions properly.

Is it for both release and debug? If it happens in debug - can you try tracing with gdb through solve process and see where early exit happens? Function of interest would be

AMGX_STATUS Solver<TConfig>::solve(Vector<TConfig> &b, Vector<TConfig> &x,
and, in particular, this solve iteration loop:
for (m_curr_iter = 0; m_curr_iter < m_max_iters && !done; ++m_curr_iter)

@Jaberh
Copy link
Author

Jaberh commented Jun 26, 2020

ok, this is very odd to me as well, I will build a debug version and monitor that function! It might be beneficial to share this with a build specialist at NVIDIA as well, I am building a debug version and digging into this. Spasiba

@marsaev
Copy link
Collaborator

marsaev commented Jun 26, 2020

For debugging purposes you can set coarse solver and smoother to NOSOLVER, so that only AMG solver would go to this code piece. AMG would still iterate in that case, but residual should not decrease - this should be enough to try debug issue where it's not iterating at all. For example:

{
    "config_version": 2, 
    "determinism_flag": 1, 
    "solver": {
        "print_grid_stats": 1, 
        "algorithm": "AGGREGATION", 
        "obtain_timings": 1, 
        "solver": "AMG", 
        "smoother": {
            "solver" : "NOSOLVER",
            "scope" : "jacobi"
        },
        "coarse_solver" : 
        {
            "solver" : "NOSOLVER",
            "scope" : "dense"
        },
        "print_solve_stats": 1, 
        "presweeps": 2, 
        "selector": "SIZE_2", 
        "convergence": "RELATIVE_MAX_CORE", 
        "coarsest_sweeps": 2, 
        "max_iters": 5, 
        "monitor_residual": 1, 
        "min_coarse_rows": 2, 
        "relaxation_factor": 0.75, 
        "scope": "main", 
        "max_levels": 1000, 
        "postsweeps": 2, 
        "tolerance": 0.1, 
        "norm": "L1", 
        "cycle": "V"
    }
}

@Jaberh
Copy link
Author

Jaberh commented Jun 26, 2020

I quickly tried this no luck, I will try this in the debugger as well. I was gonna put the break point in the loop you mentioned above,

         Number of Levels: 12
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)        10000             49600  0.000496       0.000848
           1(D)         4730             25612   0.00114       0.000781
           2(D)         2182             13218   0.00278       0.000392
           3(D)         1000              6524   0.00652        0.00019
           4(D)          464              3058    0.0142       8.88e-05
           5(D)          211              1365    0.0307       3.98e-05
           6(D)           97               609    0.0647       1.79e-05
           7(D)           45               269     0.133       8.03e-06
           8(D)           21               117     0.265       3.58e-06
           9(D)           10                50       0.5       1.59e-06
          10(D)            5                21      0.84       7.15e-07
          11(D)            2                 4         1       1.79e-07
         --------------------------------------------------------------
         Grid Complexity: 1.8767
         Operator Complexity: 2.02514
         Total Memory Usage: 0.00237192 GB
         --------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.802063   7.998657e+00
         --------------------------------------------------------------
         Total Iterations: 0
         Avg Convergence Rate: 		       1.000000
         Final Residual: 		   7.998657e+00
         Total Reduction in Residual: 	   1.000000e+00
         Maximum Memory Usage: 		          0.802 GB
         --------------------------------------------------------------
Total Time: 0.158886
    setup: 0.14185 s
    solve: 0.0170352 s
    solve(per iteration): 0 s
 stat of solve 0

@Jaberh
Copy link
Author

Jaberh commented Jun 29, 2020

Hi Marat.
So I am trying to build this in debug mode, so I can track down the iterations in cuda-gdb but I get the following error

/lib/../lib64/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_GOTPCREL against undefined symbol `__gmon_start__'
tools/centos/6/gcc/8.2.0/lib/gcc/x86_64-pc-linux-gnu/8.2.0/crtbeginS.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libamgxsh.so
crtstuff.c:(.text+0x16): relocation truncated to fit: R_X86_64_GOTPCREL against undefined symbol `_ITM_deregisterTMCloneTable'
/tools/centos/6/gcc/8.2.0/lib/gcc/x86_64-pc-linux-gnu/8.2.0/crtbeginS.o: In function `register_tm_clones':
crtstuff.c:(.text+0x33): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x3a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libamgxsh.so
crtstuff.c:(.text+0x57): relocation truncated to fit: R_X86_64_GOTPCREL against undefined symbol `_ITM_registerTMCloneTable'
/tools/centos/6/gcc/8.2.0/lib/gcc/x86_64-pc-linux-gnu/8.2.0/crtbeginS.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x72): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x7d): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib64/libc.so.6
crtstuff.c:(.text+0x8d): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /tools/centos/6/gcc/8.2.0/lib/gcc/x86_64-pc-linux-gnu/8.2.0/crtbeginS.o
crtstuff.c:(.text+0x99): additional relocation overflows omitted from the output
libamgxsh.so: PC-relative offset overflow in PLT entry for `_ZN9__gnu_cxx13new_allocatorISt13_Rb_tree_nodeISt4pairIKPN4amgx11CWrapHandleIP25AMGX_vector_handle_structNS3_6VectorINS3_14TemplateConfigIL16AMGX_MemorySpace1EL17AMGX_VecPrecision0EL17AMGX_MatPrecision1EL17AMGX_IndPrecision2EEEEEEESt10shared_ptrISF_EEEE8allocateEmPK

@marsaev
Copy link
Collaborator

marsaev commented Jun 30, 2020

It seems that produced code is too large for the linker to process.

There are number of ways to reduce amount of generated code, but that might be little to adventurous :) I can try to provide you debug build on centos:centos6.9 with devtoolset-8 - would that work? Which cuda/mpi are you compiling against?

@Jaberh
Copy link
Author

Jaberh commented Jun 30, 2020

openmpi/3.1.3, cuda/10.2.89

@Jaberh
Copy link
Author

Jaberh commented Jun 30, 2020

here is the comparison of what test unit and real code outputs calling the same method:

AMG solver settings:
cycle_iters = 2
norm = L2
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 2
scaling = NONE
norm = L2
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 1
obtain_timings = 1
---------------------------------------------------------------------------

AMG Grid:
         Number of Levels: 6
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)        10000             49600  0.000496       0.000848
           1(D)         4730             25612   0.00114       0.000781
           2(D)         2182             13218   0.00278       0.000392
           3(D)         1000              6524   0.00652        0.00019
           4(D)          464              3058    0.0142       8.88e-05
           5(D)          211              1365    0.0307       3.68e-05
         --------------------------------------------------------------
         Grid Complexity: 1.8587
         Operator Complexity: 2.00357
         Total Memory Usage: 0.00233688 GB
         --------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.911438   9.969297e-02
 max iters ???????????? 2
called converged() line 295  monitor convergence 1
 converged():  1
 **done 1**
 ----------------------------------------------------------------------------------
AMG solver settings:
cycle_iters = 2 
norm = L2
presweeps = 2 
postsweeps = 2 
max_levels = 1000
coarsen_threshold = 1 
min_fine_rows = 1 
min_coarse_rows = 2 
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 2 
scaling = NONE
norm = L2
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0 
print_solve_stats = 1 
print_grid_stats = 1 
print_vis_data = 0 
monitor_residual = 1 
store_res_history = 1 
obtain_timings = 1 
---------------------------------------------------------------------------

AMG Grid:
         Number of Levels: 6
            LVL         ROWS               NNZ    SPRSTY       Mem (GB)
         --------------------------------------------------------------
           0(D)        10000             49600  0.000496       0.000848
           1(D)         4730             25612   0.00114       0.000781
           2(D)         2182             13218   0.00278       0.000392
           3(D)         1000              6524   0.00652        0.00019
           4(D)          464              3058    0.0142       8.88e-05
           5(D)          211              1365    0.0307       3.68e-05
         --------------------------------------------------------------
         Grid Complexity: 1.8587
         Operator Complexity: 2.00357
         Total Memory Usage: 0.00233688 GB
         --------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini            0.911438   9.969297e-02
 max iters ???????????? 2
called converged()  monitor convergence 1
 converged():  0
 **done 0**

I also noticed that this function

template<class TConfig>
bool AbsoluteConvergence<TConfig>::convergence_update_and_check(const PODVec_h &nrm, const PODVec_h &nrm_ini)
{
    printf("Check tolerance: %16.16lf norm_size %d\n", this->m_tolerance,nrm.size());
    bool res_converged = true;
    bool res_converged_rel = true;

    for (int i = 0; i < nrm.size(); i++)
    {   
        bool conv = nrm[i] < this->m_tolerance;
        res_converged = res_converged && conv;
        bool conv_rel = nrm[i] < Epsilon_conv<ValueTypeB>::value() * nrm_ini[i];
        res_converged_rel = res_converged_rel && conv_rel;
        printf("nrm %lf nrm_ini %lf Epsilon_conv %lf  \n", nrm[i], Epsilon_conv<ValueTypeB>::value(),nrm_ini[i]);
    }   

     //   printf("res_converged_rel %d  \n", res_converged_rel);
    
    if (res_converged_rel)
    {   
        std::stringstream ss; 
        ss << "Relative residual has reached machine precision" << std::endl;
        amgx_output(ss.str().c_str(), static_cast<int>(ss.str().length()));
        return true;
    }   

    return res_converged;
}
for both cases reports
Check tolerance: 0.0000000000000000 norm_size 0 
Additionally this method 
`m_convergence->convergence_update_and_check(m_nrm, m_nrm_ini)`
returns true for the real code and false for unit test with the same  inputs 

Assuming that initial residual is the same for both cases, converged() should return the same boolean for both cases,
for the real code, it considers it as converged although init Res is huge,
If I remove the !done from the iterating loop and hence force the given number of iterations, it converges to the desired tolerance.
Let me know what you think, thanks

@Jaberh
Copy link
Author

Jaberh commented Jun 30, 2020

the solver_verbose does not print out the tolerance, so I tracked down why that function returns differently. Apparently,
the tolerance while reading is not set as it shows as "tol1e+298" whereas the unit test reads it correctly as tol 1e-10.
everything else is read correctly, I think it is a good idea to add tolerace to this to solver_verbose items as it can catch the errors like this pretty easily. Having said that I am not sure why only this parameter is being read wrong. If I hard code the tolerance this->m_tolerance=1.e-10; it converge,

@marsaev
Copy link
Collaborator

marsaev commented Jun 30, 2020

Great progress! Still it would be good to udnerstand where incorrect read comes from - from JSON parser, or it is modified somewhere. I have sent you debug binary if you are willing to track this issue further using gdb

@Jaberh
Copy link
Author

Jaberh commented Jun 30, 2020

sure, I am happy to help debug the issue as I am passed overdue to get this to work
Did you email?

@Jaberh
Copy link
Author

Jaberh commented Jul 1, 2020

all the doubles are wrong basically
tolerance and relaxation, should be related to
Type AMG_Config::getParameter(const string &name, const string &current_scope)
returns doubles with extremely big exponents e307 ....
JSON pareser parses correctly as the issue is still there when I hard code the config parameters,
here are the output from the import json object, the read is ok
name relaxation_factor value 7.5e+307
Parsing parameter with name "max_levels" of type Number
Parsing as int
Parsing parameter with name "postsweeps" of type Number
Parsing as int
Parsing parameter with name "tolerance" of type Number
Parsing as double
name tolerance value 1e+298

    double GetDouble() const {
                 RAPIDJSON_ASSERT(IsNumber());
                 if ((flags_ & kDoubleFlag) != 0)                                return data_.n.d;       // exact type, no conversion.
                 if ((flags_ & kIntFlag) != 0)                                   return data_.n.i.i;     // int -> double
                 if ((flags_ & kUintFlag) != 0)                                  return data_.n.u.u;     // unsigned -> double
                 if ((flags_ & kInt64Flag) != 0)                                 return (double)data_.n.i64; // int64_t -> double (may lose precision)
                 RAPIDJSON_ASSERT((flags_ & kUint64Flag) != 0);  return (double)data_.n.u64;     // uint64_t -> double (may lose precision)
        }
Just to confirm that parsing is ok, I printed out param that is being processed in 
MGX_ERROR AMG_Config::parse_json_file(const char *filename) for both unit test and real code are identical
AMGX_config_create_from_file(&m_config, param_file)
 params {
    "config_version": 2, 
    "determinism_flag": 1, 
    "solver": {
        "print_grid_stats": 1, 
        "algorithm": "AGGREGATION", 
        "obtain_timings": 1, 
        "solver": "AMG", 
        "smoother": "BLOCK_JACOBI", 
        "print_solve_stats": 1, 
        "presweeps": 2, 
        "selector": "SIZE_2", 
        "convergence": "RELATIVE_MAX_CORE", 
        "coarsest_sweeps": 2, 
        "max_iters": 2, 
        "monitor_residual": 1, 
        "min_coarse_rows": 2, 
        "relaxation_factor": 0.75, 
        "scope": "main", 
        "max_levels": 1000, 
        "postsweeps": 2, 
        "tolerance":1e-10, 
        "norm": "L2", 
        "use_scalar_norm": 1,
        "cycle": "V",
         "store_res_history": 1,
        "solver_verbose" : 1 
    }
}

@Jaberh
Copy link
Author

Jaberh commented Jul 2, 2020

Hi Marat,
I have been further debugging this, and I noticed that the c_value being passed here is wrong,
void AMG_Config::setNamedParameter(const string &name, const double &c_value, const std::string &current_scope, const std::string &new_scope, ParamDesc::iterator &param_desc_iter),
the params in parse_json_file is correct, the problem is either related to json_parser.Parse<0>(params.c_str()) or import_json_object(json_parser, true);

@marsaev
Copy link
Collaborator

marsaev commented Jul 2, 2020

Sorry, was away for a holiday.

I got your email from github commit logs, but i think it is wrong sine i've got not delivered notification. You can get binary here: https://drive.google.com/file/d/1qB1Q5SpqtsG54JVJrOst6S3lmFw56Lcd/view?usp=sharing

@marsaev
Copy link
Collaborator

marsaev commented Jul 2, 2020

So, the value in rapidjson::Value in import_json_object() is correct, but the actual value c_value that is passed to the setNamedParameter<double>() is wrong?

@Jaberh
Copy link
Author

Jaberh commented Jul 2, 2020

Here is the issue with RAPID JASON

here is the buggy function in RAPID JASON, I guess we have to debug for third party libs as well
inline double Pow10(int n) {
static const double e[] = { // 1e-308...1e308: 617 * 8 bytes = 4936 bytes
1e-308,1e-307,1e-306,1e-305,1e-304,1e-303,1e-302,1e-301,1e-300,
1e-299,1e-298,1e-297,1e-296,1e-295,1e-294,1e-293,1e-292,1e-291,1e-290,1e-289,1e-288,1e-287,1e-286,1e-285,1e-284,1e-283,1e-282,1e-281,1e-280,
1e-279,1e-278,1e-277,1e-276,1e-275,1e-274,1e-273,1e-272,1e-271,1e-270,1e-269,1e-268,1e-267,1e-266,1e-265,1e-264,1e-263,1e-262,1e-261,1e-260,
1e-259,1e-258,1e-257,1e-256,1e-255,1e-254,1e-253,1e-252,1e-251,1e-250,1e-249,1e-248,1e-247,1e-246,1e-245,1e-244,1e-243,1e-242,1e-241,1e-240,
1e-239,1e-238,1e-237,1e-236,1e-235,1e-234,1e-233,1e-232,1e-231,1e-230,1e-229,1e-228,1e-227,1e-226,1e-225,1e-224,1e-223,1e-222,1e-221,1e-220,
1e-219,1e-218,1e-217,1e-216,1e-215,1e-214,1e-213,1e-212,1e-211,1e-210,1e-209,1e-208,1e-207,1e-206,1e-205,1e-204,1e-203,1e-202,1e-201,1e-200,
1e-199,1e-198,1e-197,1e-196,1e-195,1e-194,1e-193,1e-192,1e-191,1e-190,1e-189,1e-188,1e-187,1e-186,1e-185,1e-184,1e-183,1e-182,1e-181,1e-180,
1e-179,1e-178,1e-177,1e-176,1e-175,1e-174,1e-173,1e-172,1e-171,1e-170,1e-169,1e-168,1e-167,1e-166,1e-165,1e-164,1e-163,1e-162,1e-161,1e-160,
1e-159,1e-158,1e-157,1e-156,1e-155,1e-154,1e-153,1e-152,1e-151,1e-150,1e-149,1e-148,1e-147,1e-146,1e-145,1e-144,1e-143,1e-142,1e-141,1e-140,
1e-139,1e-138,1e-137,1e-136,1e-135,1e-134,1e-133,1e-132,1e-131,1e-130,1e-129,1e-128,1e-127,1e-126,1e-125,1e-124,1e-123,1e-122,1e-121,1e-120,
1e-119,1e-118,1e-117,1e-116,1e-115,1e-114,1e-113,1e-112,1e-111,1e-110,1e-109,1e-108,1e-107,1e-106,1e-105,1e-104,1e-103,1e-102,1e-101,1e-100,
1e-99, 1e-98, 1e-97, 1e-96, 1e-95, 1e-94, 1e-93, 1e-92, 1e-91, 1e-90, 1e-89, 1e-88, 1e-87, 1e-86, 1e-85, 1e-84, 1e-83, 1e-82, 1e-81, 1e-80,
1e-79, 1e-78, 1e-77, 1e-76, 1e-75, 1e-74, 1e-73, 1e-72, 1e-71, 1e-70, 1e-69, 1e-68, 1e-67, 1e-66, 1e-65, 1e-64, 1e-63, 1e-62, 1e-61, 1e-60,
1e-59, 1e-58, 1e-57, 1e-56, 1e-55, 1e-54, 1e-53, 1e-52, 1e-51, 1e-50, 1e-49, 1e-48, 1e-47, 1e-46, 1e-45, 1e-44, 1e-43, 1e-42, 1e-41, 1e-40,
1e-39, 1e-38, 1e-37, 1e-36, 1e-35, 1e-34, 1e-33, 1e-32, 1e-31, 1e-30, 1e-29, 1e-28, 1e-27, 1e-26, 1e-25, 1e-24, 1e-23, 1e-22, 1e-21, 1e-20,
1e-19, 1e-18, 1e-17, 1e-16, 1e-15, 1e-14, 1e-13, 1e-12, 1e-11, 1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1e+0,
1e+1, 1e+2, 1e+3, 1e+4, 1e+5, 1e+6, 1e+7, 1e+8, 1e+9, 1e+10, 1e+11, 1e+12, 1e+13, 1e+14, 1e+15, 1e+16, 1e+17, 1e+18, 1e+19, 1e+20,
1e+21, 1e+22, 1e+23, 1e+24, 1e+25, 1e+26, 1e+27, 1e+28, 1e+29, 1e+30, 1e+31, 1e+32, 1e+33, 1e+34, 1e+35, 1e+36, 1e+37, 1e+38, 1e+39, 1e+40,
1e+41, 1e+42, 1e+43, 1e+44, 1e+45, 1e+46, 1e+47, 1e+48, 1e+49, 1e+50, 1e+51, 1e+52, 1e+53, 1e+54, 1e+55, 1e+56, 1e+57, 1e+58, 1e+59, 1e+60,
1e+61, 1e+62, 1e+63, 1e+64, 1e+65, 1e+66, 1e+67, 1e+68, 1e+69, 1e+70, 1e+71, 1e+72, 1e+73, 1e+74, 1e+75, 1e+76, 1e+77, 1e+78, 1e+79, 1e+80,
1e+81, 1e+82, 1e+83, 1e+84, 1e+85, 1e+86, 1e+87, 1e+88, 1e+89, 1e+90, 1e+91, 1e+92, 1e+93, 1e+94, 1e+95, 1e+96, 1e+97, 1e+98, 1e+99, 1e+100,
1e+101,1e+102,1e+103,1e+104,1e+105,1e+106,1e+107,1e+108,1e+109,1e+110,1e+111,1e+112,1e+113,1e+114,1e+115,1e+116,1e+117,1e+118,1e+119,1e+120,
1e+121,1e+122,1e+123,1e+124,1e+125,1e+126,1e+127,1e+128,1e+129,1e+130,1e+131,1e+132,1e+133,1e+134,1e+135,1e+136,1e+137,1e+138,1e+139,1e+140,
1e+141,1e+142,1e+143,1e+144,1e+145,1e+146,1e+147,1e+148,1e+149,1e+150,1e+151,1e+152,1e+153,1e+154,1e+155,1e+156,1e+157,1e+158,1e+159,1e+160,
1e+161,1e+162,1e+163,1e+164,1e+165,1e+166,1e+167,1e+168,1e+169,1e+170,1e+171,1e+172,1e+173,1e+174,1e+175,1e+176,1e+177,1e+178,1e+179,1e+180,
1e+181,1e+182,1e+183,1e+184,1e+185,1e+186,1e+187,1e+188,1e+189,1e+190,1e+191,1e+192,1e+193,1e+194,1e+195,1e+196,1e+197,1e+198,1e+199,1e+200,
1e+201,1e+202,1e+203,1e+204,1e+205,1e+206,1e+207,1e+208,1e+209,1e+210,1e+211,1e+212,1e+213,1e+214,1e+215,1e+216,1e+217,1e+218,1e+219,1e+220,
1e+221,1e+222,1e+223,1e+224,1e+225,1e+226,1e+227,1e+228,1e+229,1e+230,1e+231,1e+232,1e+233,1e+234,1e+235,1e+236,1e+237,1e+238,1e+239,1e+240,
1e+241,1e+242,1e+243,1e+244,1e+245,1e+246,1e+247,1e+248,1e+249,1e+250,1e+251,1e+252,1e+253,1e+254,1e+255,1e+256,1e+257,1e+258,1e+259,1e+260,
1e+261,1e+262,1e+263,1e+264,1e+265,1e+266,1e+267,1e+268,1e+269,1e+270,1e+271,1e+272,1e+273,1e+274,1e+275,1e+276,1e+277,1e+278,1e+279,1e+280,
1e+281,1e+282,1e+283,1e+284,1e+285,1e+286,1e+287,1e+288,1e+289,1e+290,1e+291,1e+292,1e+293,1e+294,1e+295,1e+296,1e+297,1e+298,1e+299,1e+300,
1e+301,1e+302,1e+303,1e+304,1e+305,1e+306,1e+307,1e+308
};
RAPIDJSON_ASSERT(n <= 308);
return n < -308 ? 0.0 : e[n + 308];
}
simply change this to POW(10,n) and it works

@marsaev
Copy link
Collaborator

marsaev commented Jul 2, 2020

Was the N passed to that function wrong? I wonder why it worked in one case but not another.

@Jaberh
Copy link
Author

Jaberh commented Jul 2, 2020

No the real code uses a lot of static global stuff so I think at some point it runs out. I don't like the look-up table there, bad practice,
N is passsed right, look at the fix it does not alter N,

@marsaev
Copy link
Collaborator

marsaev commented Jul 28, 2020

I'm glad that you were able to identify the issue. I'm still not sure about real reason on what's happening, but i can agree that in our case possible performance benefit of lookup table is negligible and we can safely use pow. Since rapidjson is 3rd party code - let me clarify few things about making changes and perform few tests.

@Jaberh
Copy link
Author

Jaberh commented Aug 6, 2020

Thanks for the followup, I have one more issue to resolve and that is for certain cases some of my ranks have 0 number of elements which leads to failure at matrix construction, what is the best way to go about this?
I can generate a communicator and only include ranks that have non-zero elements, which adds some collective call overheads
or I can simulate that pretending that rank with zero element has only one neighbor that is self, since I dont know enough about AMGX's under the hood I would like to know your opinion on this, and this is something frequently happens in our simulations due to lots of refine/de-refinement. Also, does AMGX support lu(k) as well (I see "ilu_sparsity_level"), I wonder if it has ilu by threshold as well?
Spasiba for your help.

by the way the following might be better than good ole pow

inline double Pow10(int n) {
std::string tmp;
tmp="1e"+std::to_string(n);
double ret=std::stod(tmp,nullptr);
RAPIDJSON_ASSERT(n <= 308);
RAPIDJSON_ASSERT(n > -308);
return ret;
}

@Jaberh
Copy link
Author

Jaberh commented Aug 7, 2020

one more question, is it possible to disable the print out of Using Normal MPI (Hostbuffer) communicator...
it is unnecessary for realistic big case runs just clutters the log file, thanks again for your support

@marsaev
Copy link
Collaborator

marsaev commented Aug 7, 2020

Yep, will move it to higher verbosity level.

Thanks,

@Jaberh
Copy link
Author

Jaberh commented Aug 10, 2020

Hi Marat.,
I had one more question on the previous comment, most importantly
I have one more issue to resolve and that is for certain cases some of my ranks have 0 number of elements which leads to failure at matrix construction, what is the best way to go about this?
I can generate a communicator and only include ranks that have non-zero elements, which adds some collective call overheads
or I can simulate that pretending that rank with zero element has only one neighbor that is self, since I don't know enough about AMGX's under the hood I would like to know your opinion on this, and this is something frequently happens in our simulations due to lots of refine/de-refinement. Can AMGX handle solving several disconnect graphs? In my example in deadlocks. I could get it to work with defining new MPI communicator though!

@marsaev
Copy link
Collaborator

marsaev commented Aug 11, 2020

We don't handle such cases specifically, so result would be unpredictable.
Are there a lot of such ranks? How often set of ranks that has zero elements change?
I would guess that cumulative performance penalty and where those penalties occur on any of those would depend on the specifics of your problem and the way you call AMGX, but both should work! If you need to have working solution right now probably would be a good idea to try both suggestions from outside AMGX. But you are right - the more balanced amount of data there is per GPU the more throughput is achievable - i.e. one of the reasons that solving wells/reservoir together in a single matrix would be a bad idea.
My wild guess would be i think there is no critical infrastructure code changes to be done to support that case, but still some scoping is needed to see what's going on. If this case support would ever be implemented then it would be transparent for the user - just by providing 0 for rank's number of elements.

@Jaberh
Copy link
Author

Jaberh commented Aug 11, 2020

I think the most robust way to handle this is via communicators, it happens a lot in internal combustion engine simulations
as the number of mesh in different phases changes drastically. Thanks for your feedback and support.
We will soon try some realistic cases on a leadership class cluster

@marsaev
Copy link
Collaborator

marsaev commented Aug 11, 2020

Yes, this definitely would be robust regarding solver, however there would be also a toll for creating/releasing communicators for each such change. We also have plans to experiment with some additional logic behind communications and exploiting topology - this might have additional load on the initialization for each MPI comm.

@marsaev
Copy link
Collaborator

marsaev commented Aug 12, 2020

I'll close this issue for now. Please, let me know is you will have other questions.

@marsaev marsaev closed this as completed Aug 12, 2020
@Jaberh
Copy link
Author

Jaberh commented Aug 13, 2020

I have one more question,
So the solver supports multi stream solves (such as solving for solid and fluid at the same setup, not related to multi-stream in cuda), I have an interface class and hence I construct several objects using different configs, the solution is correct but since now I free resources (although they belong to different instances of the same class), I get an error in the clean up phase as follows,
If I have a single object there are no issues.

AMGX_solver_destroy() 
!!! detected some memory leaks in the code: trying to free non-empty temporary device pool !!!

if I comment out AMGX_SAFE_CALL(AMGX_resources_destroy(m_resources)); then the error changes to, which makes sense as it detects the non freed resources handle.

*** Process received signal ***
 Signal: Segmentation fault (11)
Signal code:  (128)
Failing at address: (nil)
[ 0] /lib64/libpthread.so.0(+0xf630)[0x7f85af5b5630]
[ 1] /lib64/libcuda.so.1(+0x1f3b8d)[0x7f856b8e1b8d]
[ 2] /lib64/libcuda.so.1(+0x1ddbc7)[0x7f856b8cbbc7]
[ 3] /lib64/libcuda.so.1(+0xf9b4b)[0x7f856b7e7b4b]
[ 4] /lib64/libcuda.so.1(cuEventDestroy_v2+0x59)[0x7f856b969ae9]
[ 5] centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x5e8dd0)[0x7f858178cdd0]
[ 6] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x61cea4)[0x7f85817c0ea4]
[ 7] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x29aad)[0x7f85811cdaad]
[ 8] /tools/centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(+0x2ae16)[0x7f85811cee16]
[ 9] /centos/7/nvidia/cuda/10.2.89/lib64/libcublas.so.10(cublasDestroy_v2+0xe7)[0x7f858125cf77]
[10] libamgxsh.so(_ZN4amgx6Cublas14destroy_handleEv+0x25)[0x7f8588d15085]
[11] libamgxsh.so(_ZN4amgx9ResourcesD1Ev+0x5d)[0x7f8588d14ead]
[12] lib/libamgxsh.so(_ZNSt15_Sp_counted_ptrIPN4amgx9ResourcesELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0x12)[0x7f8587dcff32]
[13] libamgxsh.so(_ZNSt15_Sp_counted_ptrIPN4amgx11CWrapHandleIP28AMGX_resources_handle_structNS0_9ResourcesEEELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0xba)[0x7f8587dd308a]
[14] libamgxsh.so(_ZNSt8_Rb_treeIPN4amgx11CWrapHandleIP28AMGX_resources_handle_structNS0_9ResourcesEEESt4pairIKS6_St10shared_ptrIS5_EESt10_Select1stISB_ESt4lessIS6_ESaISB_EE8_M_eraseEPSt13_Rb_tree_nodeISB_E+
[15] libamgxsh.so(_ZN4amgx10MemManagerIJNS_11CWrapHandleIP28AMGX_resources_handle_structNS_9ResourcesEEEEED1Ev+0x2c)[0x7f8587e42e5c]
[16] /lib64/libc.so.6(+0x39ce9)[0x7f8586377ce9]
[17] /lib64/libc.so.6(+0x39d37)[0x7f8586377d37]
[18] /lib64/libc.so.6(__libc_start_main+0xfc)[0x7f858636055c]
[19]
*** End of error message ***

Hopefully this is the last test case, I would like to know your advice on this before debugging further
I highly doubt that order of destruction is the issue as it would affect the single object case but I am listing it here just for the sake of completeness

      AMGX_SAFE_CALL(AMGX_vector_destroy(m_rhs));
      AMGX_SAFE_CALL(AMGX_vector_destroy(m_solution));
      AMGX_SAFE_CALL(AMGX_matrix_destroy(m_matrix));
       AMGX_SAFE_CALL(AMGX_solver_destroy(m_solver));
      AMGX_SAFE_CALL(AMGX_resources_destroy(m_resources));
      AMGX_SAFE_CALL(AMGX_config_destroy(m_config));

It is probably related to

  size_t n_erased = get_mode_bookkeeper<Envelope>().erase(envl);
   bool flag = get_mem_manager<LetterW>().template free<LetterW>(letter);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants