Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zombie maxima process - if invoked from a script #33027

Closed
dimpase opened this issue Dec 15, 2021 · 58 comments
Closed

zombie maxima process - if invoked from a script #33027

dimpase opened this issue Dec 15, 2021 · 58 comments

Comments

@dimpase
Copy link
Member

dimpase commented Dec 15, 2021

invoking maxima in a Sage script leads to a zombie process. E.g.
run the following in terminal

echo "t=maxima('2+2')" > /tmp/foo.sage && ./sage /tmp/foo.sage

and observe zombie maxima process after this terminates.
Don't forget to

killall maxima

now and then.

A slightly shorter

echo "t=maxima('2+2')" | ./sage

does not lead to a zombie - in fact, it prints on exit:

sage: sage: Exiting Sage (CPU time 0m0.08s, Wall time 0m0.67s).
Exiting Maxima with PID 2318 running <SAGEROOT>/local/bin/maxima -p <SAGEROOT>/local/var/lib/sage/venv-python3.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp

This was observed while testing sagetex spkg with SAGE_CHECK=yes make sagetex on #32887.

Using maxima_calculus() (the library interface) rather than maxima() (pexpect interface) does not lead to zombies.


Apparently, this only happens with ecl installed in Sage rather than with system-wide ecl.

CC: @nbruin @vbraun @mkoeppe @spaghettisalat

Component: interfaces

Author: Dima Pasechnik

Branch/Commit: abafddc

Reviewer: Michael Orlitzky

Issue created by migration from https://trac.sagemath.org/ticket/33027

@dimpase dimpase added this to the sage-9.5 milestone Dec 15, 2021
@kliem
Copy link
Contributor

kliem commented Dec 15, 2021

comment:1

Moving this to critical, as it seems to efficiently kill the patchbots.

@mantepse
Copy link
Collaborator

comment:3

This might be related to #32167.

@dimpase
Copy link
Member Author

dimpase commented Jan 21, 2022

comment:4

I think it was observed without fricas installed, too.

@dimpase

This comment has been minimized.

@dimpase
Copy link
Member Author

dimpase commented Jan 21, 2022

comment:6

I've noticed that one gets
.sage/maxima/binary/5_45_0/ecl/21_2_1/ created in DOTSAGE. (no matter with the reproducer, or the non-reproducer echo "t=maxima('2+2')" | ./sage

@dimpase

This comment has been minimized.

@dimpase
Copy link
Member Author

dimpase commented Jan 21, 2022

comment:7

This effectively kills the patchbots, thus a blocker, IMHO.
https://groups.google.com/d/msgid/sage-devel/997e9f75-8e92-4a67-b43a-1777074cbe45n%40googlegroups.com

@mantepse
Copy link
Collaborator

comment:8

what I meant to say is that it might be a similar underlying problem.

@kliem
Copy link
Contributor

kliem commented Jan 21, 2022

comment:9

Replying to @dimpase:

This effectively kills the patchbots, thus a blocker, IMHO.
https://groups.google.com/d/msgid/sage-devel/997e9f75-8e92-4a67-b43a-1777074cbe45n%40googlegroups.com

Apparently a ticket introduced in 9.5beta8 https://groups.google.com/g/sage-release/c/vo_m79EHAVc/m/mMlNPz5sBAAJ introduced this problem. My patchbot worked fine until then and I got contacted by the IT on December 13th (it is a virtual machine and I'm guessing this also affected other people).

@orlitzky
Copy link
Contributor

comment:10
  1. Never execute code from a predictable filename under /tmp =)
  2. FWIW, I can't reproduce this on rc2.

@dimpase
Copy link
Member Author

dimpase commented Jan 21, 2022

comment:11

Replying to @orlitzky:

  1. Never execute code from a predictable filename under /tmp =)
  2. FWIW, I can't reproduce this on rc2.

Perhaps it's due to ecl from the system? I see this on Debian 11 with ecl built by Sage.

@kliem
Copy link
Contributor

kliem commented Jan 21, 2022

comment:12

Replying to @dimpase:

Replying to @orlitzky:

  1. Never execute code from a predictable filename under /tmp =)
  2. FWIW, I can't reproduce this on rc2.

Perhaps it's due to ecl from the system? I see this on Debian 11 with ecl built by Sage.

No, my config.log states, already installed as an SPKG.

@kliem
Copy link
Contributor

kliem commented Jan 21, 2022

Attachment: config.log

A config.log of some debian buster with this problem

@dimpase
Copy link
Member Author

dimpase commented Jan 21, 2022

comment:13

Replying to @kliem:

Replying to @dimpase:

Replying to @orlitzky:

  1. Never execute code from a predictable filename under /tmp =)
  2. FWIW, I can't reproduce this on rc2.

Perhaps it's due to ecl from the system? I see this on Debian 11 with ecl built by Sage.

No, my config.log states, already installed as an SPKG.

I meant that a system ecl is the reason that Michael cannot reproduce this.

@kliem
Copy link
Contributor

kliem commented Jan 21, 2022

comment:14

Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.

@dimpase
Copy link
Member Author

dimpase commented Jan 21, 2022

comment:15

Replying to @kliem:

Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.

Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.

@dimpase

This comment has been minimized.

@orlitzky
Copy link
Contributor

comment:18

Replying to @dimpase:

Replying to @kliem:

Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.

Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.

Maybe delete that write_error.patch that the SPKG applies, and try again?

@orlitzky
Copy link
Contributor

comment:19

The other obvious difference is that the SPKG is built with --disable-threads. But the patch is more suspicious to me.

@mantepse
Copy link
Collaborator

comment:20

Could you check whether using spkg ecl / system ecl also makes the difference for #32167?

@dimpase
Copy link
Member Author

dimpase commented Jan 21, 2022

comment:21

Replying to @orlitzky:

Replying to @dimpase:

Replying to @kliem:

Ah yes, this might be the reason. I can also observe this on ubuntu focal with ecl installed as SPKG.

Yes, I can reproduce this - on a Gentoo machine with systemwide ecl there is no zombie, but installing ecl+maxima in Sage leads to zombies.

Maybe delete that write_error.patch that the SPKG applies, and try again?

patch or no patch, the same picture.

@dimpase
Copy link
Member Author

dimpase commented Jan 21, 2022

comment:22

accoring to comment:9, something happened in 9.5.beta8, which caused this. Time for git bisect I suppose...

@dimpase
Copy link
Member Author

dimpase commented Jan 22, 2022

comment:39

Replying to @orlitzky:

And I don't think adding a quit() hook could hurt in any case.

it's there, no?

sage: maxima._quit_string()
'quit();'

@mantepse
Copy link
Collaborator

comment:40

I think I am noticing zombie processes also with the gap3 (optional) interface.

@orlitzky
Copy link
Contributor

comment:41

Replying to @dimpase:

Replying to @orlitzky:

And I don't think adding a quit() hook could hurt in any case.

it's there, no?

sage: maxima._quit_string()
'quit();'

But when is that string sent to the running maxima process? I meant something like comment:31.

I see now that there's a function called quit_sage() in src/sage/all.py,

def quit_sage(verbose=True):
    """                                                                                                                                                                      
    If you use Sage in library mode, you should call this function                                                                                                           
    when your application quits.                                                                                                                                             
                                                                                                                                                                             
    It makes sure any child processes are also killed, etc.                                                                                                                  
    """
    ...
    from sage.interfaces.quit import expect_quitall
    expect_quitall(verbose=verbose)
    ...

which kills all running pexpect processes. And apparently you are supposed to call this function yourself. So, I guess we can blame sagetex for not calling quit_sage() at the end of the scripts it generates? =)

@dimpase
Copy link
Member Author

dimpase commented Jan 22, 2022

comment:42

shouldn't EOF in a .sage script trigger quit_sage()?

@orlitzky
Copy link
Contributor

comment:43

Replying to @dimpase:

shouldn't EOF in a .sage script trigger quit_sage()?

Yes, obviously. (Or something like that). I only use sage as a library, and I just learned about that function 20 minutes ago. It's outrageous to expect the user to manually clean up after the library.

However I think an atexit hook is a cleaner approach, at least for pexpect processes. They all support quit(), and could register their own atexit hook upon being initialized. It won't help with crashes or when killed by a signal, but then, neither does calling quit_sage() at EOF.

@orlitzky
Copy link
Contributor

comment:44

I'm going to experiment with a cleanup hook for symmetrica on another ticket. Our interface to that library has start() and end() functions that are supposed to be called manually. The start() function gets called when you import sage.libs.symmetrica.all, but end() never does unless you call quit_sage(). So I'm going to have start() register a hook that calls end(), and remove the corresponding bits from quit_sage(). The body of quit_sage() isn't very long so we may be able to obsolete it rather quickly.

@orlitzky
Copy link
Contributor

comment:45

I posted a branch on #8784 that eliminates quit_sage() and puts all of the cleanup chores (including the termination of pexpect processes) into atexit hooks.

It's a more-invasive change, but the right thing to do if it doesn't cause any subtle new bugs.

@orlitzky
Copy link
Contributor

Reviewer: Michael Orlitzky

@orlitzky
Copy link
Contributor

comment:46

Let's take the easy route for the 9.5 release and worry about quit_sage() afterwards.

@slel
Copy link
Member

slel commented Jan 30, 2022

comment:47

Setting milestone to 9.6 now that 9.5 is out.

@slel slel modified the milestones: sage-9.5, sage-9.6 Jan 30, 2022
@vbraun
Copy link
Member

vbraun commented Jan 30, 2022

comment:48

With this ticket I'm seeing a lot of flakiness with maxima-related tests, e.g.

sage -t --long --warn-long 50.7 --random-seed=36889336955867588730901666695941730921 src/sage/interfaces/maxima_abstract.py
**********************************************************************
File "src/sage/interfaces/maxima_abstract.py", line 314, in sage.interfaces.maxima_abstract.MaximaAbstract._commands
Failed example:
    sorted(maxima._commands(verbose=False))
Exception raised:
    Traceback (most recent call last):
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/doctest/forker.py", line 694, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/doctest/forker.py", line 1088, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.interfaces.maxima_abstract.MaximaAbstract._commands[0]>", line 1, in <module>
        sorted(maxima._commands(verbose=False))
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 327, in _commands
        [self.completions(chr(65+n), verbose=verbose)+
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 327, in <listcomp>
        [self.completions(chr(65+n), verbose=verbose)+
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima_abstract.py", line 296, in completions
        cmd_list = self._eval_line('apropos("%s")'%s, error_check=False)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima.py", line 814, in _eval_line
        self._expect_expr(self._display_prompt)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/maxima.py", line 731, in _expect_expr
        i = self._expect.expect(expr)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 343, in expect
        return self.expect_list(compiled_pattern_list,
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/spawnbase.py", line 372, in expect_list
        return exp.expect_loop(timeout)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/expect.py", line 179, in expect_loop
        return self.eof(e)
      File "/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/pexpect/expect.py", line 122, in eof
        raise exc
    pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
    Maxima with PID 1795009 running /home/release/Sage/local/bin/maxima -p /home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp
    command: /home/release/Sage/local/bin/maxima
    args: ['/home/release/Sage/local/bin/maxima', '-p', '/home/release/Sage/local/var/lib/sage/venv-python3.9.9/lib/python3.9/site-packages/sage/interfaces/sage-maxima.lisp']
    buffer (last 100 chars): b''
    before (last 100 chars): ''
    after: <class 'pexpect.exceptions.EOF'>
    match: None
    match_index: None
    exitstatus: None
    flag_eof: True
    pid: 1795009
    child_fd: 20
    closed: False
    timeout: None
    delimiter: <class 'pexpect.exceptions.EOF'>
    logfile: None
    logfile_read: None
    logfile_send: None
    maxread: 4194304
    ignorecase: False
    searchwindowsize: None
    delaybeforesend: None
    delayafterclose: 0.1
    delayafterterminate: 0.1
    searcher: searcher_re:
        0: re.compile(b'<sage-display>')
**********************************************************************

Always disappears when testing individually, but running multiple maxima-using tests in parallel triggers them quite reliably. E.g.

./sage -t -p 8 --long src/sage/interfaces/maxima_abstract.py src/sage/dynamics/complex_dynamics/mandel_julia.py src/sage/interfaces/maxima.py src/sage/tests/books/computational-mathematics-with-sagemath/sol/mpoly_doctest.py src/sage/coding/kasami_codes.pyx

@vbraun
Copy link
Member

vbraun commented Jan 30, 2022

comment:49

Actually seems to be due to #32986

@mwageringel
Copy link

comment:51

I have just been hit by this problem on the patchbot as well. For me, running ptestlong now leaves about 47 maxima processes running at around 130% cpu usage on average, as well as two ecl and several fricas processes.

With the current branch, the problem seems to go away.

@dimpase
Copy link
Member Author

dimpase commented Feb 7, 2022

comment:52

feel free to give it positive review, again...

@orlitzky
Copy link
Contributor

orlitzky commented Feb 8, 2022

comment:53

Longer term we will probably want threads enabled anyway (the upstream and most distros default), even if #8784 goes in, so.

@mwageringel
Copy link

comment:54

At first, I thought the branch would disable threads, but actually it is the other way around. If threads had been disabled before, I am surprised the zombie processes use more than 100% cpu.

@dimpase
Copy link
Member Author

dimpase commented Feb 8, 2022

comment:55

The patch resulted from an observation that a system-wide ecl with threads enabled does not produce zombie processes. Probably sage-cleaner works better on them.

@vbraun
Copy link
Member

vbraun commented Feb 13, 2022

Changed branch from u/dimpase/packages/ecl/nozombies to abafddc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests