Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give the total number of subscribers when a publisher advertises #21

Closed
blaroche opened this issue Nov 15, 2012 · 6 comments
Closed

Give the total number of subscribers when a publisher advertises #21

blaroche opened this issue Nov 15, 2012 · 6 comments

Comments

@blaroche
Copy link

This is an enhancement issue based on the discussions about this question: http://answers.ros.org/question/11167/how-do-i-publish-exactly-one-message

The problem that my colleague and I experienced is that the first (few) messages published were "lost". In our case, we wanted to publish exactly one message and not more. The problem with ROS is that the publisher starts sending its messages as soon as it exists, without waiting for the subscribers to establish a connection. This is acceptable in some cases, but not in many others.

Here is my suggestion:

Calling NodeHandle::advertise eventually gets to line 403 of topic_manager.cpp: master::execute("registerPublisher", args, result, payload, true); Currently, the variable result is ignored, but it should contain the total number of subscribers for the given topic. I would also add an optional parameter in advertise() that would be a duration. Then, we a publisher would be created, it would wait for the subscribers to connect, up to the given duration. Any call to Publisher::publish before that would queue the message.

This solution is not fail-proof, but at least with a timeout option, it would increase the reliability when required and still allow not to wait (or not to wait too long) for other cases (by giving zero as a timeout duration).

@ahendrix
Copy link

Interesting. If a new publisher is notified of the number of subscribers that are registered with the master when it starts, the advertise call could block until all subscribers have connected.

If we don't see connections from all of the subscribers we expect within a given timeout, it's probably a sign that there's something wrong with the system, either a network configuration problem or a ROS client that isn't behaving properly. We could then give the user an early notification that there's something wrong with their system, rather than letting it fail silently later on.

This approach fails if a subscriber crashes and fails to deregister its subscriptions from the master.

@moesenle
Copy link
Contributor

It also fails if the subscriber is shutting down (even cleanly) right after the publisher got the number of subscribers but before the subscriber actually establishes the connection. Not sure how likely this is to happen though.

@blaroche
Copy link
Author

I like the idea of an early warning. advertise already takes AdvertiseOptions, but it could also give back AdvertiseResults, which would contain for example the total number of subscribers, the ones that have connected, status messages about pending connections, etc.

Like I said, it's not fail-proof, mainly for the two reasons that you mentioned. I would rather call this a best-effort method, which would be acceptable if described as such in the documentation. Users would then have to take this into account in their application logic.

@dirk-thomas
Copy link
Member

The publish-subscribe paradigm implemented for ROS explicitly decouples both sides. It is intentional that the number of subscribers is not known. Adding such a feature bypassing this decoupling is currently not planned for the client libraries.

The initially described issue of "lost" messages during startup of the graph is inherent by the current design and could be only resolved by a major redesign.

@blaroche
Copy link
Author

I like the fact that ROS has very low coupling between publishers and subscribers, but hiding information does not help with coupling. Coupling is the dependence of a module on another, and having more information wouldn't increase this dependence; it would just mean that there is more information available if necessary.

Having said that, while the ROS platform should continue to have low coupling, user applications themselves may require higher coupling. By receiving the number of subscribers, and even the IP or other information, the publisher node would be much more flexible in its behavior. It would open up the ROS platform to a much broader range of applications, without taking anything away from the original design.

Given that the question on answers.ros.org was viewed over 900 times and that the same question is asked on different forums, I think that this problem is not well understood (or desired) by the community. I would really suggest trying to find some kind of solution, rather than staying with the status quo.

@dirk-thomas
Copy link
Member

I am not implying that ROS should stay with the status quo.

But the initially stated problem of being unable to reliably send a single message because during start-up of the ROS graph the subscribers are not yet connected is a very fundamental design problem. This can neither be resolved with a simple patch nor should it be worked around by i.e. waiting until a hard coded number of subscribers has connected.

The feature to provide a deterministic start-up behavior is a very important one for ROS in the future. We will work on this in the "next generation of ROS" SIG. But changes like this will alter ROS significantly and will not land in one of the next ROS releases.

contradict pushed a commit to contradict/ros_comm that referenced this issue Aug 12, 2016
lsolanka added a commit to ros-hunter/ros_comm that referenced this issue Dec 10, 2017
…ilures)

Issues:
- some roswtf tests fail - not sure what the problem is

- Due to use of global variables there are issues with destruction of
global objects in various translation units, during the process
shutdown. This is related to Subscriber and Publisher and happens only
in executables where they are defined as global variables outside of
main. Example: some topic_tools components such as throttle

This is most likely related to the fact that now various components are
statically linked instead of dynamically. And for instance if an
executable depends on rosconsole, and we define a publisher as a global
object, when the program exits, the publisher is getting destroyed, but
other global variables get destroyed before that:

> Core was generated by `devel/lib/topic_tools/throttle messages /input 5'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x00007f277238c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
> 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> [Current thread is 1 (Thread 0x7f2773b90780 (LWP 24620))]
> (gdb) bt
> #0  0x00007f277238c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
> ros#1  0x00007f277238e02a in __GI_abort () at abort.c:89
> ros#2  0x00007f2772ccf7dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#3  0x00007f2772ccd6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#4  0x00007f2772ccd701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#5  0x00007f2772cce23f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#6  0x0000000000719a64 in boost::system::error_code::message[abi:cxx11]() const (this=0x1fb5f00)
>     at .../include/boost/system/error_code.hpp:477
> ros#7  0x000000000071a0ed in boost::system::system_error::what (this=0x1fb5ef0) at .../include/boost/system/system_error.hpp:70
> ros#8  0x00007f2772ccf805 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#9  0x00007f2772ccd6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#10 0x00007f2772ccc6a9 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#11 0x00007f2772ccd005 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#12 0x00007f2772730f83 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
> ros#13 0x00007f2772731487 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
> ros#14 0x000000000071a557 in boost::mutex::lock (this=0xbb7fa0 <ros::console::g_locations_mutex>)
>     at .../include/boost/thread/pthread/mutex.hpp:119
> ros#15 0x000000000071c723 in boost::unique_lock<boost::mutex>::lock (this=0x7fff031f4950)
>     at .../include/boost/thread/lock_types.hpp:346
> ros#16 0x000000000071bdeb in boost::unique_lock<boost::mutex>::unique_lock (this=0x7fff031f4950, m_=...)
>     at .../include/boost/thread/lock_types.hpp:124
> ros#17 0x0000000000718dae in ros::console::initializeLogLocation (loc=0xbb5c70 <ros::Publisher::Impl::~Impl()::__rosconsole_define_location__loc>, name="ros.roscpp",
>     level=ros::console::levels::Debug) at .../ros_comm/tools/rosconsole/src/rosconsole/rosconsole.cpp:632
> ros#18 0x00000000007278b6 in ros::Publisher::Impl::~Impl (this=0x1fb6730, __in_chrg=<optimised out>) at .../ros_comm/clients/roscpp/src/libros/publisher.cpp:40
> ros#19 0x00000000007299a9 in boost::detail::sp_ms_deleter<ros::Publisher::Impl>::destroy (this=0x1fb6728)
>     at .../include/boost/smart_ptr/make_shared_object.hpp:59
> ros#20 0x0000000000729b14 in boost::detail::sp_ms_deleter<ros::Publisher::Impl>::operator() (this=0x1fb6728)
>     at .../include/boost/smart_ptr/make_shared_object.hpp:93
> ros#21 0x0000000000729a53 in boost::detail::sp_counted_impl_pd<ros::Publisher::Impl*, boost::detail::sp_ms_deleter<ros::Publisher::Impl> >::dispose (this=0x1fb6710)
>     at .../include/boost/smart_ptr/detail/sp_counted_impl.hpp:172
> ros#22 0x0000000000707ab5 in boost::detail::sp_counted_base::release (this=0x1fb6710)
>     at .../include/boost/smart_ptr/detail/sp_counted_base_std_atomic.hpp:110
> ros#23 0x0000000000707b41 in boost::detail::shared_count::~shared_count (this=0xbb7d28 <g_pub+8>, __in_chrg=<optimised out>)
>     at .../include/boost/smart_ptr/detail/shared_count.hpp:426
> ros#24 0x0000000000707ef4 in boost::shared_ptr<ros::Publisher::Impl>::~shared_ptr (this=0xbb7d20 <g_pub>, __in_chrg=<optimised out>)
>     at .../include/boost/smart_ptr/shared_ptr.hpp:341
> ros#25 0x0000000000727bce in ros::Publisher::~Publisher (this=0xbb7d20 <g_pub>, __in_chrg=<optimised out>)
>     at .../ros_comm/clients/roscpp/src/libros/publisher.cpp:74
> ros#26 0x00007f2772390ff8 in __run_exit_handlers (status=0, listp=0x7f277271b5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
> ros#27 0x00007f2772391045 in __GI_exit (status=<optimised out>) at exit.c:104
> ros#28 0x00007f2772377837 in __libc_start_main (main=0x706886 <main(int, char**)>, argc=4, argv=0x7fff031f4bf8, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>,
>     stack_end=0x7fff031f4be8) at ../csu/libc-start.c:325
> ros#29 0x0000000000705bc9 in _start ()
lsolanka added a commit to ros-hunter/ros_comm that referenced this issue Jun 3, 2018
…ilures)

Issues:
- some roswtf tests fail - not sure what the problem is

- Due to use of global variables there are issues with destruction of
global objects in various translation units, during the process
shutdown. This is related to Subscriber and Publisher and happens only
in executables where they are defined as global variables outside of
main. Example: some topic_tools components such as throttle

This is most likely related to the fact that now various components are
statically linked instead of dynamically. And for instance if an
executable depends on rosconsole, and we define a publisher as a global
object, when the program exits, the publisher is getting destroyed, but
other global variables get destroyed before that:

> Core was generated by `devel/lib/topic_tools/throttle messages /input 5'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x00007f277238c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
> 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> [Current thread is 1 (Thread 0x7f2773b90780 (LWP 24620))]
> (gdb) bt
> #0  0x00007f277238c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
> ros#1  0x00007f277238e02a in __GI_abort () at abort.c:89
> ros#2  0x00007f2772ccf7dd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#3  0x00007f2772ccd6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#4  0x00007f2772ccd701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#5  0x00007f2772cce23f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#6  0x0000000000719a64 in boost::system::error_code::message[abi:cxx11]() const (this=0x1fb5f00)
>     at .../include/boost/system/error_code.hpp:477
> ros#7  0x000000000071a0ed in boost::system::system_error::what (this=0x1fb5ef0) at .../include/boost/system/system_error.hpp:70
> ros#8  0x00007f2772ccf805 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#9  0x00007f2772ccd6b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#10 0x00007f2772ccc6a9 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#11 0x00007f2772ccd005 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> ros#12 0x00007f2772730f83 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
> ros#13 0x00007f2772731487 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
> ros#14 0x000000000071a557 in boost::mutex::lock (this=0xbb7fa0 <ros::console::g_locations_mutex>)
>     at .../include/boost/thread/pthread/mutex.hpp:119
> ros#15 0x000000000071c723 in boost::unique_lock<boost::mutex>::lock (this=0x7fff031f4950)
>     at .../include/boost/thread/lock_types.hpp:346
> ros#16 0x000000000071bdeb in boost::unique_lock<boost::mutex>::unique_lock (this=0x7fff031f4950, m_=...)
>     at .../include/boost/thread/lock_types.hpp:124
> ros#17 0x0000000000718dae in ros::console::initializeLogLocation (loc=0xbb5c70 <ros::Publisher::Impl::~Impl()::__rosconsole_define_location__loc>, name="ros.roscpp",
>     level=ros::console::levels::Debug) at .../ros_comm/tools/rosconsole/src/rosconsole/rosconsole.cpp:632
> ros#18 0x00000000007278b6 in ros::Publisher::Impl::~Impl (this=0x1fb6730, __in_chrg=<optimised out>) at .../ros_comm/clients/roscpp/src/libros/publisher.cpp:40
> ros#19 0x00000000007299a9 in boost::detail::sp_ms_deleter<ros::Publisher::Impl>::destroy (this=0x1fb6728)
>     at .../include/boost/smart_ptr/make_shared_object.hpp:59
> ros#20 0x0000000000729b14 in boost::detail::sp_ms_deleter<ros::Publisher::Impl>::operator() (this=0x1fb6728)
>     at .../include/boost/smart_ptr/make_shared_object.hpp:93
> ros#21 0x0000000000729a53 in boost::detail::sp_counted_impl_pd<ros::Publisher::Impl*, boost::detail::sp_ms_deleter<ros::Publisher::Impl> >::dispose (this=0x1fb6710)
>     at .../include/boost/smart_ptr/detail/sp_counted_impl.hpp:172
> ros#22 0x0000000000707ab5 in boost::detail::sp_counted_base::release (this=0x1fb6710)
>     at .../include/boost/smart_ptr/detail/sp_counted_base_std_atomic.hpp:110
> ros#23 0x0000000000707b41 in boost::detail::shared_count::~shared_count (this=0xbb7d28 <g_pub+8>, __in_chrg=<optimised out>)
>     at .../include/boost/smart_ptr/detail/shared_count.hpp:426
> ros#24 0x0000000000707ef4 in boost::shared_ptr<ros::Publisher::Impl>::~shared_ptr (this=0xbb7d20 <g_pub>, __in_chrg=<optimised out>)
>     at .../include/boost/smart_ptr/shared_ptr.hpp:341
> ros#25 0x0000000000727bce in ros::Publisher::~Publisher (this=0xbb7d20 <g_pub>, __in_chrg=<optimised out>)
>     at .../ros_comm/clients/roscpp/src/libros/publisher.cpp:74
> ros#26 0x00007f2772390ff8 in __run_exit_handlers (status=0, listp=0x7f277271b5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
> ros#27 0x00007f2772391045 in __GI_exit (status=<optimised out>) at exit.c:104
> ros#28 0x00007f2772377837 in __libc_start_main (main=0x706886 <main(int, char**)>, argc=4, argv=0x7fff031f4bf8, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>,
>     stack_end=0x7fff031f4be8) at ../csu/libc-start.c:325
> ros#29 0x0000000000705bc9 in _start ()
johnsonshih added a commit to johnsonshih/ros_comm that referenced this issue Sep 29, 2018
dirk-thomas pushed a commit that referenced this issue Jan 30, 2019
)

* normalize the string to utf-8 before passing to environment block.

* convert from unicode to string when setting env variable (#21)
tahsinkose pushed a commit to tahsinkose/ros_comm that referenced this issue Apr 15, 2019
…s#1593)

* normalize the string to utf-8 before passing to environment block.

* convert from unicode to string when setting env variable (ros#21)
dirk-thomas pushed a commit that referenced this issue Aug 3, 2020
)

* normalize the string to utf-8 before passing to environment block.

* convert from unicode to string when setting env variable (#21)
dirk-thomas pushed a commit that referenced this issue Aug 3, 2020
)

* normalize the string to utf-8 before passing to environment block.

* convert from unicode to string when setting env variable (#21)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants