Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not seem to properly handle death of LibreOffice? #135

Closed
Grunthos opened this issue Oct 5, 2024 · 9 comments
Closed

Does not seem to properly handle death of LibreOffice? #135

Grunthos opened this issue Oct 5, 2024 · 9 comments

Comments

@Grunthos
Copy link
Contributor

Grunthos commented Oct 5, 2024

LibreOffice very occasionally crashed, and some documents make it CPU bound; for the latter I would like to kill it.

Unfortunately, it seems that unoserver does not detect when LibreOffice dies and does not restart it.

I am using a Docker image wit supervisord; can unoserver be modified to die when libreoffice dies (or restart libreoffice)?

Steps to reproduce:

  1. Start unoserver
  2. Kill Libreoffice/soffice bin (not the splash)

unoserver does not die

@Grunthos
Copy link
Contributor Author

Grunthos commented Oct 6, 2024

I have submitted a PR to address this issue.

@Edwardveb
Copy link

Edwardveb commented Oct 7, 2024

I'm pretty sure this is related to my bug.
Make a word doc and ad an object "Text box" to it, add some text in it.
Convert it using ghcr.io/unoconv/unoserver-docker , Libre office crashes (Get an error added in the end), and it does not recover, but the server still keeps going and responds that it's healthy. When trying to convert other good documents, it does not work and we get the same error. Perhaps there could be a mechanism to restart office instance?

Traceback (most recent call last):
File "/usr/bin/unoconvert", line 8, in
sys.exit(converter_main())
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/unoserver/client.py", line 248, in converter_main
result = client.convert(
^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/unoserver/client.py", line 87, in convert
result = proxy.convert(
^^^^^^^^^^^^^^
File "/usr/lib/python3.11/xmlrpc/client.py", line 1122, in call
return self.__send(self.__name, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/xmlrpc/client.py", line 1464, in __request
response = self.__transport.request(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/xmlrpc/client.py", line 1166, in request
return self.single_request(host, handler, request_body, verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/xmlrpc/client.py", line 1182, in single_request
return self.parse_response(resp)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/xmlrpc/client.py", line 1354, in parse_response
return u.close()
^^^^^^^^^
File "/usr/lib/python3.11/xmlrpc/client.py", line 668, in close
raise Fault(**self._stack[0])
xmlrpc.client.Fault: <Fault 1: "<class 'unoserver.converter.com.sun.star.connection.NoConnectException'>:Connector : couldn't connect to socket (Connection refused) at /home/buildozer/aports/community/libreoffice/src/libreoffice-7.6.3.1/io/source/connector/connector.cxx:118">

image

@lafrech
Copy link

lafrech commented Oct 7, 2024

Probably related.

As a test, I launched unoserver with a small memory limit to see how OOM could be handled:

systemd-run --scope -p MemoryMax=5M unoserver

IIUC, this makes LO crash but not unoserver itself, which results in

INFO:unoserver:Server PID: 209920
INFO:unoserver:Starting unoconverter.
Exception in thread Thread-1 (serve):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.11/dist-packages/unoserver/server.py", line 116, in serve
    self.conv = converter.UnoConverter(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/unoserver/converter.py", line 77, in __init__
    self.context = self.resolver.resolve(
                   ^^^^^^^^^^^^^^^^^^^^^^
unoserver.converter.com.sun.star.connection.NoConnectException: Connector : couldn't connect to socket (Connection refused) at ./io/source/connector/connector.cxx:118

My issue is that the client holds forever trying to connect. I'd like it to return an error after a reasonable timeout. In this case, it is a connection timeout, not even a conversion timeout, which makes it easier (conversion might be arbitrary long but connection should be fast).

Relaunching LO is another thing. But having a timeout here would make for more graceful error management.

I thought this was the point of #119 but it seems not.

Basically, what I'm thinking of is unoserver using a timeout to catch the connection error above and return a nice error to the (possibly remote) client.

We could also add a timeout client-side. The difference is it would also manage network issues between client and unoserver, and unoserver itself being down / shut off.

In fact, the client-side timeout might be more useful. But unoserver side, we could catch connection issues (can't connect to LO) right away rather than wait for a possibly long conversion. I hope I'm making sense.

There might be a way to achieve client-side timeout in user code but I don't think it is straightforward, so a simple timeout=5 kwarg there would be nice.

@Grunthos
Copy link
Contributor Author

Grunthos commented Oct 7, 2024

#119 is a related problem; a timeout/kill/restart works, but current code still does not respond properly to the death of libreoffice.

My solution to this problem has been to:

  • use a container with supervisord, unoserver and my web server,
  • add a monitor that checks for a 100% CPU libreoffice.
  • Monitor kills LibreOffice when it misbehaves and, with my PR, unoserver exits gracefully
  • supervisord then restarts unoserver which restarts libreoffice.

This all seems to work. The critical part is having unoserver die when LibreOffice dies. Another approach would be to have unoserver restart LibreOffice if it dies and was not killed by unoserver, though that comes with unoserver having to make management decisions about how often to try to restart etc, which is why I like the supervisord solution.

@lafrech
Copy link

lafrech commented Oct 7, 2024

Yes. I've been thinking about it after posting and also figured it would be better to have unoserver die so it can be restarted.

I didn't notice your PR. I can't comment on the implementation but I agree with the intent.

I opened #137 to discuss the client timeout.

@Grunthos Grunthos changed the title Does not seem to detect death of LibreOffice? Does not seem to properly handle death of LibreOffice? Oct 7, 2024
@regebro
Copy link
Member

regebro commented Oct 8, 2024

Yes, unoserver should exit if LibreOffice dies. There must be some case this doesn't happen after the 2.0 refactoring. I'm hoping to get time to test the PR soon.

@Grunthos
Copy link
Contributor Author

Grunthos commented Oct 8, 2024

If it helps at all the problem occurs because the RPC thread does not terminate. Sadly, closing the port is not enough to make it terminate, at least on some stacks (and perhaps python versions?), so one also has to send a dummy connect request. My testing was all done on Ubuntu x64.

@regebro
Copy link
Member

regebro commented Oct 9, 2024

That had other weird issues, and after much debugging those went away if I waited with terminating LibreOffice to after the RPC process has terminated. Heck knows why, but it seems to work more reliably now anyway. New release will hopefully come this week (maybe even today).

@regebro
Copy link
Member

regebro commented Oct 9, 2024

2.3b1 is now released

@regebro regebro closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants