Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvc++: integer conversion resulted in a change of sign #229

Closed
marioroy opened this issue Jan 6, 2024 · 9 comments
Closed

nvc++: integer conversion resulted in a change of sign #229

marioroy opened this issue Jan 6, 2024 · 9 comments

Comments

@marioroy
Copy link

marioroy commented Jan 6, 2024

I tried nvc++ (NVIDIA HPC SDK) and thought to pass this along. I ran git pull and have the latest from master.

$ nvc++ -o llil4hmap -std=c++20 -fopenmp -Wall -O3 llil4hmap.cc -I./parallel-hashmap
"./parallel-hashmap/parallel_hashmap/phmap.h", line 434: warning: integer conversion resulted in a change of sign [integer_sign_change]
          auto special = _mm_set1_epi8(static_cast<uint8_t>(kSentinel));
                                       ^
Remark: individual warnings can be suppressed with "--diag_suppress <warning-name>"

"./parallel-hashmap/parallel_hashmap/phmap.h", line 442: warning: integer conversion resulted in a change of sign [integer_sign_change]
          auto special = _mm_set1_epi8(static_cast<uint8_t>(kSentinel));
                                       ^

The llil4hmap.cc demonstration is found in my gist repo.

https://gist.github.com/marioroy/3924c48e140f8330f25f67cd98a815ef

@marioroy
Copy link
Author

marioroy commented Jan 9, 2024

I reached out to NVIDIA. The parallel-hashmap usage in llil4hmap is not a typical pattern, but something I tried out of curiosity at the time. I replied to try llil4map.cc instead for investigating why nvc++ is noticeably slower compared to clang++.

https://forums.developer.nvidia.com/t/nvfortran-23-9-problem/270618/13?u=marioeroy

@greg7mdp
Copy link
Owner

greg7mdp commented Jan 9, 2024

Thank you @marioroy , I'm very busy right now, but I'll look into this when I can I promise.

@easypickings
Copy link

Have met the same warnings with nvcc. Seems simply including the header will trigger the warning:

#include <phmap.h>

int main() {}

Compiled with nvcc -Iinclude/parallel_hashmap test.cu -o test. Warnings:

include/parallel_hashmap/phmap.h(434): warning #68-D: integer conversion resulted in a change of sign

include/parallel_hashmap/phmap.h(442): warning #68-D: integer conversion resulted in a change of sign

include/parallel_hashmap/phmap.h(434): warning #68-D: integer conversion resulted in a change of sign

include/parallel_hashmap/phmap.h(442): warning #68-D: integer conversion resulted in a change of sign

@marioroy
Copy link
Author

marioroy commented Jan 25, 2024

I tried the suggestion, emitted by nvc++. Individual warnings can be suppressed with --diag_suppress <warning-name>.

nvc++ --diag_suppress=integer_sign_change ...

I also tried nvc++ to build a Python extension, Accelerating Python on GPUs with nvc++ and Cython.

nvc++ --diag_suppress=declared_but_not_referenced,identifier_not_keyword,set_but_not_used ...

greg7mdp added a commit that referenced this issue Feb 4, 2024
@greg7mdp
Copy link
Owner

greg7mdp commented Feb 4, 2024

Warnings are fixed with latest version.

@greg7mdp greg7mdp closed this as completed Feb 4, 2024
@marioroy marioroy changed the title integer conversion resulted in a change of sign nvc++: integer conversion resulted in a change of sign Mar 30, 2024
@marioroy
Copy link
Author

marioroy commented Mar 30, 2024

Hi, @greg7mdp

The NVIDIA nvc++ compiler is interesting. I fixed the three map demonstrations llil4map.cc, llil4hmap.cc, and llil4emh.cc. No more slowness using nvc++.

  1. Changed std::mutex to included spinlock_mutex. Noticeably better performance. However, this resolves nvc++ taking greater than 40 seconds to compile llil4hmap.cc and llil4emh.cc.
  2. Fixed out_properties. This ran poorly using nvc++, taking > 4 seconds. The program built with clang++ outputs in 0.7 seconds. It turns out that allocating a basic string inside a parallel loop causes nvc++ OpenMP to perform worst than non-parallel.
#ifdef MAX_STR_LEN_L
      // std::basic_string<char> s { it->first.data(), MAX_STR_LEN_L };
      // str.append(s.c_str());
         str.append(it->first.data());
#else
         str.append(it->first.data(), it->first.size());
#endif

Results:

llil4map consumes the least memory. The llil4hmap demonstration is similar to llil4map, but computes the hash_value one time per key. This was done out of curiosity. The llil4emh demonstration uses emhash7::HashMap for comparison.

$ NUM_THREADS=60 ./llil4map in/big* in/big* in/big* | cksum
llil4map (fixed string length=12) start
use OpenMP
use boost sort        g++            clang++        nvc++
get properties         7.864 secs     7.866 secs     7.973 secs
map to vector          1.140 secs     0.881 secs     0.876 secs
vector stable sort     1.681 secs     1.098 secs     1.105 secs
write stdout           0.624 secs     0.595 secs     0.590 secs
total time            11.311 secs    10.441 secs    10.546 secs
    count lines     970195200
    count unique    200483043
2057246516 1811140689

$ NUM_THREADS=60 ./llil4hmap in/big* in/big* in/big* | cksum
llil4hmap (fixed string length=12) start
use OpenMP
use boost sort        g++            clang++        nvc++
get properties         7.245 secs     7.197 secs     7.617 secs
map to vector          1.179 secs     0.841 secs     0.809 secs
vector stable sort     1.682 secs     1.103 secs     1.113 secs
write stdout           0.647 secs     0.573 secs     0.571 secs
total time            10.754 secs     9.715 secs    10.112 secs
    count lines     970195200
    count unique    200483043
2057246516 1811140689

$ NUM_THREADS=60 ./llil4emh in/big* in/big* in/big* | cksum
llil4emh (fixed string length=12) start
use OpenMP
use boost sort        g++            clang++        nvc++
get properties         6.150 secs     6.250 secs     6.392 secs
map to vector          1.016 secs     0.877 secs     0.832 secs
vector stable sort     1.681 secs     1.127 secs     1.094 secs
write stdout           0.608 secs     0.561 secs     0.595 secs
total time             9.457 secs     8.817 secs     8.914 secs
    count lines     970195200
    count unique    200483043
2057246516 1811140689

Thank you for the tip to reclaim memory.

MyMap().swap(map); // swap map with an empty temporary, which is immediately destroyed

@greg7mdp
Copy link
Owner

Thanks, interesting data, emhash seems to be pretty good. Personally, I'm not really focusing on benchmarks, but more on having the hash map perform well in most cases.

@marioroy
Copy link
Author

marioroy commented Mar 30, 2024

Previously, the program ran poorly using nvc++, particularly "write stdout". The spinlock_mutex sped up "get properties".

$ ./llil4hmap /data1/input/big* | cksum
llil4hmap (fixed string length=12) start
use OpenMP
use boost sort        Before          After
get properties         3.804 secs      2.797 secs
hmap to vector         0.697 secs      0.792 secs
vector stable sort     1.104 secs      1.148 secs
write stdout           5.071 secs      0.564 secs
total time            10.678 secs      5.302 secs
    count lines     323398400
    count unique    200483043
701308064 1804347429

@greg7mdp
Copy link
Owner

That's great, thanks for doing that. Maybe I should add the spinlock_mutex to phmap so people can easily try it out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants