Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On erasing NMOS resources, HTTP DELETE messages are not getting generated #372

Open
gsbnmos opened this issue Feb 16, 2024 · 5 comments
Open

Comments

@gsbnmos
Copy link

gsbnmos commented Feb 16, 2024

Hello,

While, NMOS shutting down, I'm calling following functions to delete all the nmos resources before exiting:

nmos::erase_resource(model.node_resources, res_id, true); // res_id is of node
model.notify();

While erasing the resources, in Wireshark, I was expecting HTTP DELETE messages going out to Registry and controller(e.g., Riedel). However, I don't see any of these messages in wireshark.

Could you please help, am I missing something in the above code?

Thankyou in advance.
Gururaj S.B

@gsbnmos
Copy link
Author

gsbnmos commented Feb 17, 2024

Below is the sys-logs captured while exiting:

2024-02-16 18:49:15 User.Warning 10.110.60.75 Feb 16 13:19:26 10.100.50.2 NMOS[1812]: Unable to lock to current reference source
2024-02-16 18:49:15 Daemon.Info 10.110.60.75 Feb 16 13:19:27 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:27.191: error: 2657059808: Eraced resources count = 62
2024-02-16 18:49:15 Daemon.Info 10.110.60.75 Feb 16 13:19:27 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:27.196: warning: 2673845216: System global configuration is not discoverable
2024-02-16 18:49:15 Daemon.Info 10.110.60.75 Feb 16 13:19:27 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:27.199: warning: 2715808736: Stopped registered operation
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.557: info: 3069259792: Closing connections
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: info: 2648667104: Error getting remote endpoint: system:9 (Bad file descriptor)
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: warning: 2648667104: WebSocket Connection Unknown - "" - 0 websocketpp:26 Operation canceled
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: info: 2648667104: asio async_shutdown error: system:9 (Bad file descriptor)
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: info: 2648667104: handle_accept error: Operation canceled
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.561: info: 2648667104: Stopping acceptance of new connections because the underlying transport is no longer listening.
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 nmos-cpp-node[6039]: 2024-02-16 13:19:31.572: info: 3069259792: Stopping nmos-cpp node
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 systemd[1]: Stopping Sony Nmos Daemon....
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 systemd[1]: nmos.service: Succeeded.
2024-02-16 18:49:20 Daemon.Info 10.110.60.75 Feb 16 13:19:31 10.100.50.2 systemd[1]: Stopped Sony Nmos Daemon..

You could see, my log while erasing 62 NMOS resources at second line: 2024-02-16 13:19:27.191:
After about 4s, that is 5th line in the logs, at 2024-02-16 13:19:31.557: we see Closing connections.

I was expecting following logs in the sys-logs which is in the node_behavior thread, with the deregistration logic just after the regular registration logic::
slog::logslog::severities::info(gate, SLOG_FLF) << "Requesting registration deletion for " << id_type;

@garethsb
Copy link
Contributor

It depends when you erase the resources.

The HTTP DELETE requests are sent to the registry asynchronously with the call to nmos::erase_resource, by the node behaviour thread.

  • If you erase the resources from the model after a controlled shutdown is initiated, the node behaviour thread probably already exited, so the requests won't be sent.
  • If you erase a resource any time before a controlled shutdown, the DELETE requests will be sent. (Obviously you also have to release the model lock for long enough for the node behaviour thread to get the lock.)

Hope that makes sense. If you want to implement the recommended (but not required) controlled unregistration behaviour then erase the resources before calling model.controlled_shutdown() (or destroying the server).

@suddheerA
Copy link

Dear Gareth (@garethsb),

Thank you for your response.

I have created a separate thread named "ShutDownThread" that runs in parallel with the "node_implementation_thread." This thread is responsible for monitoring for shutdown conditions within my application. Upon receiving a shutdown signal, the thread erases the node, subsequently erasing all its child resources. After the erasure is complete, it calls the model.controlled_shutdown() function.

To ensure data integrity during this process, I have implemented a write lock using auto write_lock = model.write_lock(); before initiating the erase operation. The lock is then released after the erasure is complete using write_lock.unlock();.

However, despite these steps, HTTP DELETE messages are not being sent as expected.

I would appreciate your assistance in understanding why these messages are not being triggered and troubleshooting any potential issues. Any insights or suggestions you can provide would be greatly helpful.

Thank you for your time and support.

Sincerely,

@garethsb
Copy link
Contributor

I understand you're doing something like this...

  1. Get lock
  2. Erase node resource
  3. Call notify and release lock
  4. Call controlled shutdown

I think this just still doesn't leave a long enough gap for the Node behaviour thread to empty the queue of events to send to the Registry between 3 and 4.

The simplest change to confirm this would be to put a std::this_thread::sleep_for there.
The other approach would be to use a conditional wait on the "grain" containing the queue of events to be empty. It's created by the Node behaviour thread itself but you could find it in the model.

// a (fake) subscription to keep track of all resource events

@gsbnmos
Copy link
Author

gsbnmos commented Apr 5, 2024

Hi Garethsb,
Apologies for the delayed reply.
Thanks for the suggestions. This works!
Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants