Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix limiter_node decrementer #647

Merged
merged 8 commits into from
Dec 13, 2021

Conversation

Iliamish
Copy link
Contributor

@Iliamish Iliamish commented Nov 10, 2021

Description

We have logical race in case where in one time first node try_put to limiter_node and second node try_put to limiter_node decrementer. So, we need to count the attempts to decrease my_count that were still in the my_tries stage.

Fixes # - #634

  • - git commit message contains appropriate signed-off-by string (see CONTRIBUTING.md for details)

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add respective label(s) to PR if you have permissions

  • bug fix - change which fixes an issue
  • new feature - change which adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and for some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>
Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>
Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>
Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>
@Iliamish Iliamish self-assigned this Nov 11, 2021
Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two questions need to be resolved before this patch gets merged:

  1. Align whitespaces so that they are similar to, let's say, whitespace surroundings.
  2. Binary compatibility questions that we have talked offline.

Otherwise, the patch looks good to me.

Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>
@Iliamish Iliamish force-pushed the dev/iliamish/limiter_node_decrementer_fix branch from 75f1373 to f51fb90 Compare November 12, 2021 11:09
@phprus
Copy link
Contributor

phprus commented Nov 13, 2021

@alexey-katranov
Copy link
Contributor

This PR also fix issues: #342 and #489

Workaround is not needed:

Interesting correlation, what gcc version have you tried? I do not think gcc generated incorrect code due to some race condition in the code.

@phprus
Copy link
Contributor

phprus commented Nov 17, 2021

This PR also fix issues: #342 and #489
Workaround is not needed:

Interesting correlation, what gcc version have you tried? I do not think gcc generated incorrect code due to some race condition in the code.

All tests after replace workaround to:

template< typename Sender, typename Receiver >
void make_edge_impl(Sender& sender, Receiver& receiver){
//#if __GNUC__ < 12 && !TBB_USE_DEBUG
//    // Seemingly, GNU compiler generates incorrect code for the call of limiter.register_successor in release (-03)
//    // The function pointer to make_edge workarounds the issue for unknown reason
//    auto make_edge_ptr = tbb::flow::make_edge<int>;
//    make_edge_ptr(sender, receiver);
//#else
    tbb::flow::make_edge(sender, receiver);
//#endif
}

Compilers:

  • gcc version 7.5.0 (SUSE Linux)
  • gcc version 8.2.1 20180831 [gcc-8-branch revision 264010] (SUSE Linux)

Without this PR - ~100% of launches freezes.

Serov, Vladimir added 2 commits November 22, 2021 09:29
Signed-off-by: Serov, Vladimir <vladimir.serov@intel.com>
Signed-off-by: Serov, Vladimir <vladimir.serov@intel.com>
@phprus
Copy link
Contributor

phprus commented Nov 22, 2021

Commit: f620936
Branch: dev/iliamish/limiter_node_decrementer_fix

GCC versions:

phprus@phprus:~/devel/oneTBB/oneTBB-f620936cff7573a6d631fc57c152f1f48374463a/build> g++-7 -v
Using built-in specs.
COLLECT_GCC=g++-7
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/7/lto-wrapper
OFFLOAD_TARGET_NAMES=hsa:nvptx-none
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,ada,go --enable-offload-targets=hsa,nvptx-none, --without-cuda-driver --enable-checking=release --disable-werror --with-gxx-include-dir=/usr/include/c++/7 --enable-ssp --disable-libssp --disable-libvtv --disable-libcc1 --disable-plugin --with-bugurl=https://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-7 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
gcc version 7.5.0 (SUSE Linux)
phprus@phprus:~/devel/oneTBB/oneTBB-f620936cff7573a6d631fc57c152f1f48374463a/build> g++-8 -v
Using built-in specs.
COLLECT_GCC=g++-8
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=hsa:nvptx-none
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,fortran,ada,go --enable-offload-targets=hsa,nvptx-none=/usr/nvptx-none, --without-cuda-driver --enable-checking=release --disable-werror --with-gxx-include-dir=/usr/include/c++/8 --enable-ssp --disable-libssp --disable-libvtv --disable-cet --disable-libcc1 --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-8 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
gcc version 8.2.1 20180831 [gcc-8-branch revision 264010] (SUSE Linux)
phprus@phprus:~/devel/oneTBB/oneTBB-f620936cff7573a6d631fc57c152f1f48374463a/build> g++-9 -v
Using built-in specs.
COLLECT_GCC=g++-9
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/9/lto-wrapper
OFFLOAD_TARGET_NAMES=hsa:nvptx-none
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,fortran,ada,go --enable-offload-targets=hsa,nvptx-none=/usr/nvptx-none, --without-cuda-driver --disable-werror --with-gxx-include-dir=/usr/include/c++/9 --enable-ssp --disable-libssp --disable-libvtv --disable-cet --disable-libcc1 --disable-plugin --with-bugurl=https://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-9 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
gcc version 9.3.1 20200406 [revision 6db837a5288ee3ca5ec504fbd5a765817e556ac2] (SUSE Linux)
phprus@phprus:~/devel/oneTBB/oneTBB-f620936cff7573a6d631fc57c152f1f48374463a/build> g++-10 -v
Using built-in specs.
COLLECT_GCC=g++-10
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,ada,go,d --enable-offload-targets=nvptx-none=/usr/nvptx-none,amdgcn-amdhsa=/usr/amdgcn-amdhsa, --without-cuda-driver --enable-checking=release --disable-werror --with-gxx-include-dir=/usr/include/c++/10 --enable-ssp --disable-libssp --disable-libvtv --enable-cet=auto --disable-libcc1 --disable-plugin --with-bugurl=https://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-libphobos --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-10 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.1 20200825 [revision c0746a1beb1ba073c7981eb09f55b3d993b32e5c] (SUSE Linux)
phprus@phprus:~/devel/oneTBB/oneTBB-f620936cff7573a6d631fc57c152f1f48374463a/build>

CMake (3.17.0):

CC=gcc-7  CXX=g++-7  cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=17 ../..
CC=gcc-8  CXX=g++-8  cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=17 ../..
CC=gcc-9  CXX=g++-9  cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=17 ../..
CC=gcc-10 CXX=g++-10 cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=17 ../..

11 runs of each test set of each compiler.
All without errors.

@phprus
Copy link
Contributor

phprus commented Dec 2, 2021

Do you have any updates?

@vlserov vlserov assigned vlserov and unassigned Iliamish Dec 7, 2021
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me!
Consider implementing the suggestion, though.

Comment on lines 2104 to 2116
spin_mutex::scoped_lock lock(my_mutex);
++my_count;
if ( my_future_decrement ) {
if ( my_count > my_future_decrement ) {
my_count -= my_future_decrement;
my_future_decrement = 0;
}
else {
my_future_decrement -= my_count;
my_count = 0;
}
}
--my_tries;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code is a complete (both semantic and syntactic) duplication of lines 1951-1963 in the newer version of the same file. I would recommend moving them into separate limiter_node-specific helper method, which will be named correspondingly. E.g., account_successful_forward.

There is also similar (slightly incomplete) code duplication about forwarding failure: see lines 2094-2101 and 1983-1994 in the newer version of the file. Please consider moving them as well.

Of course, it is mainly for better code readability and maintainability in the future, and not a showstopper for now. If there is no time or desire implementing this "small" improvement, consider leaving a TODO comment, something like: "// TODO: consider moving duplicated logic about processing of successful and unsuccessful task forward into separate functions."

@vlserov vlserov merged commit 0815661 into master Dec 13, 2021
@vlserov vlserov deleted the dev/iliamish/limiter_node_decrementer_fix branch December 13, 2021 08:59
kboyarinov pushed a commit that referenced this pull request Dec 27, 2021
* Fix limiter_node decrementer

Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>

* Fix align

Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>

* add new variable for check unused decrements

Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>

* remove iostream

Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>

* remove unnesessary cast

Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>

* align whitespaces

Signed-off-by: Mishin, Ilya <ilya.mishin@intel.com>

* Fix whitespace alignment

Signed-off-by: Serov, Vladimir <vladimir.serov@intel.com>

* Revert workaround for GCC

Signed-off-by: Serov, Vladimir <vladimir.serov@intel.com>

Co-authored-by: Serov, Vladimir <vladimir.serov@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants