Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set call_finished_ with true for each call inside callFinished #2074

Merged

Conversation

iuhilnehc-ynos
Copy link
Contributor

@iuhilnehc-ynos iuhilnehc-ynos commented Oct 12, 2020

Fixes #2057

Signed-off-by: Chen Lihui lihui.chen@sony.com

@iuhilnehc-ynos iuhilnehc-ynos force-pushed the topic-set-flag-immediately branch from 57c9143 to 1bc6761 Compare October 12, 2020 09:55
Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have actual problem to fix with this PR? or any related issue?

clients/roscpp/src/libros/service_server_link.cpp Outdated Show resolved Hide resolved
and also set current_call flags when failure happens

Signed-off-by: Chen Lihui <lihui.chen@sony.com>
@iuhilnehc-ynos iuhilnehc-ynos force-pushed the topic-set-flag-immediately branch from c250d4c to 929a405 Compare October 13, 2020 07:42
@iuhilnehc-ynos
Copy link
Contributor Author

@fujitatomoya @Barry-Xu-2018

Could you help review this PR?

@Barry-Xu-2018
Copy link
Contributor

Could you help review this PR?

LGTM.

Comment on lines 201 to 204
{
boost::mutex::scoped_lock lock(call_queue_mutex_);
current_call_->call_finished_ = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fix is not exactly related to the deadlock with #2057. since ros_comm is in maintenance, It would be better to keep the fix related to the problem only. If this is the another bug, I would create a different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll remove it.

@@ -262,6 +266,7 @@ void ServiceServerLink::callFinished()

current_call_->finished_ = true;
current_call_->finished_condition_.notify_all();
current_call_->call_finished_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks reasonable to change call_finished_ into true once current call in the queue is finished.

but is this supposed to be moved from caller thread? adding call_finished_ = true here means, it should check if call_finished_ is true on caller thread instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but is this supposed to be moved from caller thread?

It should keep here. There are some cases, such as,

TEST(SrvCall, callSrvWithWrongType)
{
test_roscpp::BadTestStringString::Request req;
test_roscpp::BadTestStringString::Response res;
ASSERT_TRUE(ros::service::waitForService("service_adv"));
for ( int i = 0; i < 4; ++i )
{
bool call_result = ros::service::call("service_adv", req, res);
ASSERT_FALSE(call_result);
}
}

in this case, ServiceServerLink::callFinished will not be called, and then if ServiceServerLink exit,

ServiceServerLink::~ServiceServerLink -> ... -> ServiceServerLink::cancelCall

{
boost::mutex::scoped_lock lock(local->finished_mutex_);
local->finished_ = true;
local->finished_condition_.notify_all();
}
if (boost::this_thread::get_id() != info->caller_thread_id_)
{
while (!local->call_finished_)
{
boost::this_thread::yield();
}
}

stuck in while (!local->call_finished_) if we remove

info->call_finished_ = true;

adding call_finished_ = true here means, it should check if call_finished_ is true

Okay

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
@iuhilnehc-ynos
Copy link
Contributor Author

iuhilnehc-ynos commented Oct 15, 2020

It seems there exists another potential bug, at least from some CI building.
http://build.ros.org/job/Npr__ros_comm__ubuntu_focal_amd64/387/

also happened at
http://build.ros.org/job/Npr__ros_comm__ubuntu_focal_amd64/359/

Updated:
This test is https://github.com/ros/ros_comm/blob/58ce3595e9b131cfb41abe39a4fdf1dcc8eb140e/test/test_rospy/test/rostest/on_shutdown.test

while not rospy.is_shutdown() and not self.success and time.time() < timeout_t:
will block until some conditions are false ( test case expected to receive message "I'm dead" from https://github.com/ros/ros_comm/blob/58ce3595e9b131cfb41abe39a4fdf1dcc8eb140e/test/test_rospy/nodes/publish_on_shutdown.py

Do you think that

import time
time.sleep(2.0)

should be moved after
rospy.signal_shutdown('test done')

@dirk-thomas
Copy link
Member

@ros-pull-request-builder retest this please

@dirk-thomas
Copy link
Member

It seems there exists another potential bug, at least from some CI building.

I think the test is flaky. The latest triggered CI build passed.

@dirk-thomas
Copy link
Member

Thank you for the patch.

Does this patch need to be backported to Melodic / Kinetic?

@dirk-thomas dirk-thomas merged commit a9bac52 into ros:noetic-devel Oct 15, 2020
@iuhilnehc-ynos
Copy link
Contributor Author

iuhilnehc-ynos commented Oct 16, 2020

Does this patch need to be backported to Melodic / Kinetic?

Yes.

@dirk-thomas

kinetic backport: #2079
melodic backport: #2080

iuhilnehc-ynos pushed a commit to iuhilnehc-ynos/ros_comm that referenced this pull request Oct 16, 2020
)

* set call_finished_ with true for each call inside callFinished

and also set current_call flags when failure happens

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* Update based on review

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* set the flag with true again even if it is true already

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
iuhilnehc-ynos pushed a commit to iuhilnehc-ynos/ros_comm that referenced this pull request Oct 16, 2020
)

* set call_finished_ with true for each call inside callFinished

and also set current_call flags when failure happens

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* Update based on review

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* set the flag with true again even if it is true already

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
@dirk-thomas
Copy link
Member

Yes.

I added the "consider-backport" label so that the change will be picked up during the next backport pull requests. See #1496 for a descriptions of the process.

jacobperron pushed a commit that referenced this pull request Oct 16, 2020
* set call_finished_ with true for each call inside callFinished

and also set current_call flags when failure happens

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* Update based on review

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* set the flag with true again even if it is true already

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
jacobperron pushed a commit that referenced this pull request Oct 22, 2020
* set call_finished_ with true for each call inside callFinished

and also set current_call flags when failure happens

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* Update based on review

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* set the flag with true again even if it is true already

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
jacobperron pushed a commit that referenced this pull request Oct 22, 2020
* set call_finished_ with true for each call inside callFinished

and also set current_call flags when failure happens

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* Update based on review

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>

* set the flag with true again even if it is true already

Signed-off-by: Chen Lihui <lihui.chen@sony.com>

Co-authored-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
nim65s added a commit to nim65s/robotpkg that referenced this pull request Mar 12, 2021
Because DEPEND_ABI.ros-comm.noetic?= ros-comm>=1.15

1.15.9 (2020-10-16)
-------------------
* Fix deadlock when service connection is dropped (ros/ros_comm#2074)
* Update maintainers (ros/ros_comm#2075)
* Fix case where accessing cached parameters shuts down another node (ros/ros_comm#2068)
* Fix spelling (ros/ros_comm#2066)
* Fix Lost Wake Bug in ROSOutAppender (ros/ros_comm#2033)
* Fix compatibility issue with boost 1.73 and above (ros/ros_comm#2023)
* Gracefully stop recording upon SIGTERM and SIGINT (ros/ros_comm#2038)
* Use heapq.merge instead of custom merge sort code (ros/ros_comm#2017)
* Fix handling of single quotes in command arguments on Windows (ros/ros_comm#2051)
* Clearer error message (ros/ros_comm#2035)
* Ignore underscores when parsing literal numeric values for Python 3 compatibility (ros/ros_comm#2022)
* Clear cached URI for a node that has gone offline (ros/ros_comm#2010)
* Add skip_cache parameter to rosnode_ping() (ros/ros_comm#2009)
* Install advertisetest (ros/ros_comm#2046)
* Use range instead of xrange for Python 3 compatibility (ros/ros_comm#2013)
* Fix to address CVE-2020-16124 (ros/ros_comm#2065)
* Fix XmlRpcValue::_doubleFormat being unused (ros/ros_comm#2003)

1.15.8 (2020-07-23)
-------------------
* change is_async_connected to use epoll when available (ros/ros_comm#1983)
* allow mixing latched and unlatched publishers (ros/ros_comm#1991)
* remove not existing NodeProxy from rospy __all_\_ (ros/ros_comm#2007)
* fix typo in topics.py (ros/ros_comm#1977)
* fix bad relative import (still Python 2 style) (ros/ros_comm#1973)
* improve shutdown message with duplicate node name (ros/ros_comm#1992)
* remove dependency on rostopic from rostest package (ros/ros_comm#2002)
* fix missing reload() function in Python 3 (ros/ros_comm#1968)
* add latch param to throttle (ros/ros_comm#1944)
* add const versions of XmlRpcValue converting operators (ros/ros_comm#1978)

1.15.7 (2020-05-28)
-------------------
* fix Windows build break (ros/ros_comm#1961)
* fix NameError in launch error handling (ros/ros_comm#1965)

1.15.6 (2020-05-21)
-------------------
* fix a bug that using a destroyed connection object (ros/ros_comm#1950)

1.15.5 (2020-05-15)
-------------------
* check if async socket connect is success or failure before TransportTCP::read() and TransportTCP::write() (ros/ros_comm#1954)
* fix bug that connection drop signal related funtion throw a bad_weak exception (ros/ros_comm#1940)
* multiple latched publishers per process on the same topic (ros/ros_comm#1544)
* fix negative numbers in ros statistics (ros/ros_comm#1531)
* remove extra \n in ROS_DEBUG (ros/ros_comm#1925)
* add option to repeat latched messages at the start of bag splits (ros/ros_comm#1850)
* fix bag migration failures caused by typo in connection_header assignment (ros/ros_comm#1952)
* fix brief description comments after members (ros/ros_comm#1920)
* add --sigint-timeout and --sigterm-timeout parameters (ros/ros_comm#1937)
* roslaunch-check: search dir recursively (ros/ros_comm#1914)
* sort printed nodes by namespace alphabetically (ros/ros_comm#1934)
* remove pycrypto import (not used) (ros/ros_comm#1922)
* avoid infinite recursion in rosrun tab completion when rosbash is not installed (ros/ros_comm#1948)
* fix bare pointer in topic_tools::ShapeShifter (ros/ros_comm#1722)
* clear message queue on simtime jumping back (ros/ros_comm#1518)
* use undefined dynamic_lookup on macOS (ros/ros_comm#1923)
* check if enough FDs are free, instead counting the total free FDs (ros/ros_comm#1929)

1.15.4 (2020-03-19)
-------------------
* restrict boost dependencies to components used (ros/ros_comm#1871)
* add exception for ConnectionAbortedError (ros/ros_comm#1908)
* fix mac trying to use epoll instead of kqueue (ros/ros_comm#1907)
* fix AttributeError: __exit__ (ros/ros_comm#1915, regression from 1.14.4)

1.15.3 (2020-02-28)
-------------------
* remove Boost version check since Noetic only targets platforms with 1.67+ (ros/ros_comm#1903)

1.15.2 (2020-02-25)
-------------------
* export missing Boost dependency (ros/ros_comm#1898)
* add timestamp formatting for rosconsole (ros/ros_comm#1892)

1.15.1 (2020-02-24)
-------------------
* fix missing boost dependencies (ros/ros_comm#1895)
* use setuptools instead of distutils (ros/ros_comm#1870)
* increase time limit of advertisetest/publishtest.test to reduce flakyness (ros/ros_comm#1897)

1.15.0 (2020-02-21)
-------------------
* fix dictionary changed size during iteration (ros/ros_comm#1894)
* update test to pass with old and new yaml (ros/ros_comm#1893)

Packaging changes:
- removed patch-an, as there are no more boost version checks
- updated patch-ao
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mutex deadlock on service connection drop.
4 participants