Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECL test sometimes fails to create maxima directory #26968

Closed
timokau opened this issue Dec 29, 2018 · 26 comments · Fixed by #35195
Closed

ECL test sometimes fails to create maxima directory #26968

timokau opened this issue Dec 29, 2018 · 26 comments · Fixed by #35195

Comments

@timokau
Copy link
Contributor

timokau commented Dec 29, 2018

I just got this test failure (and I'm pretty sure I've seen it before):

File "/nix/store/br51f1j23cnxmbf19yn62rn5pp2rmqdh-sage-src-8.5/src/doc/ca/intro/index.rst", line 492, in doc.ca.intro.index
Failed example:
    solve([x + y == 6, x - y == 4], x, y)
Exception raised:
    Traceback (most recent call last):
      File "/nix/store/mfbhcnzdxiyj831pr3hg1wwh556a6lv3-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/nix/store/mfbhcnzdxiyj831pr3hg1wwh556a6lv3-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 1086, in compile_and_execute
        exec(compiled, globs)
      File "<doctest doc.ca.intro.index[1]>", line 1, in <module>
        solve([x + y == Integer(6), x - y == Integer(4)], x, y)
      File "/nix/store/mfbhcnzdxiyj831pr3hg1wwh556a6lv3-python-2.7.15-env/lib/python2.7/site-packages/sage/symbolic/relation.py", line 1107, in solve
        m = maxima(f)
      File "sage/misc/lazy_import.pyx", line 354, in sage.misc.lazy_import.LazyImport.__call__ (build/cythonized/sage/misc/lazy_import.c:3682)
        return self.get_object()(*args, **kwds)
      File "sage/misc/lazy_import.pyx", line 189, in sage.misc.lazy_import.LazyImport.get_object (build/cythonized/sage/misc/lazy_import.c:2347)
        return self._get_object()
      File "sage/misc/lazy_import.pyx", line 221, in sage.misc.lazy_import.LazyImport._get_object (build/cythonized/sage/misc/lazy_import.c:2584)
        self._object = getattr(__import__(self._module, {}, {}, [self._name]), self._name)
      File "/nix/store/mfbhcnzdxiyj831pr3hg1wwh556a6lv3-python-2.7.15-env/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py", line 113, in <module>
        ecl_eval("(set-pathnames)")
      File "sage/libs/ecl.pyx", line 1326, in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10224)
        cpdef EclObject ecl_eval(str s):
      File "sage/libs/ecl.pyx", line 1341, in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10163)
        o=ecl_safe_eval(o)
      File "sage/libs/ecl.pyx", line 350, in sage.libs.ecl.ecl_safe_eval (build/cythonized/sage/libs/ecl.c:5221)
        raise RuntimeError("ECL says: {}".format(
    RuntimeError: ECL says: Could not create directory "/build/sage-home/.sage/maxima"
    C library error: "File exists"
**********************************************************************

This cannot be reliably reproduced. Probably some parallelism issue (tests were run with 4 threads).

CC: @embray @saraedum @jdemeyer

Component: doctest framework

Keywords: random_fail ecl

Issue created by migration from https://trac.sagemath.org/ticket/26968

@timokau timokau added this to the sage-8.6 milestone Dec 29, 2018
@timokau
Copy link
Contributor Author

timokau commented Jan 4, 2019

comment:1

It would be nice to give every test process its own DOT_SAGE directory. Would that be possible? I tried doing that, but got confused as to how the doctest processes are actually started and how isolated they are from the "master" process. I tried setting DOT_SAGE in env.py based on DOCTEST_MODE, but that apparently always is False.

@embray
Copy link
Contributor

embray commented Jan 4, 2019

comment:2

I have also argued in the past that the tests should use a clean .sage. It's a problem for some other tests as well. I don't know if there's a good reason not to, other than "nobody's done it yet".

@timokau
Copy link
Contributor Author

timokau commented Jan 5, 2019

comment:3

The only reason I could think of is that some weird configurations wouldn't be tested. But I would think that is more of a feature: Having doctests depend on the user configuration makes everything much more difficult and relying on users to pick up issues when testing with their configuration doesn't sound like a particularly good practice either.

@embray
Copy link
Contributor

embray commented Jan 7, 2019

comment:4

Replying to @timokau:

The only reason I could think of is that some weird configurations wouldn't be tested. But I would think that is more of a feature: Having doctests depend on the user configuration makes everything much more difficult and relying on users to pick up issues when testing with their configuration doesn't sound like a particularly good practice either.

Yes, if there are any bugs resulting from user configuration those should be then reported by users and captured as a regression test: Tests should be isolated as possible and reproducible.

@timokau
Copy link
Contributor Author

timokau commented Jan 7, 2019

comment:5

Do you happen to know how practical it would be to implement then? As I mentioned earlier, it seems like DOCTEST_MODE is only set to true after env.py was evaluated.

@embray
Copy link
Contributor

embray commented Jan 8, 2019

comment:6

I think it would be quite practical; just nobody's been sufficiently motivated enough before now.

Probably somewhere in sage.doctest.forker we would just create a tempdir and set sage.env.DOT_SAGE to that tempdir before running any examples in a test file.

@timokau
Copy link
Contributor Author

timokau commented Jan 12, 2019

comment:7

But that would set sage.env.DOT_SAGE for all processes right?

@embray
Copy link
Contributor

embray commented Jan 14, 2019

comment:8

No, not if it were done after fork.

@embray
Copy link
Contributor

embray commented Jan 15, 2019

comment:9

Retarging tickets optimistically to the next milestone. If you are responsible for this ticket (either its reporter or owner) and don't believe you are likely to complete this ticket before the next release (8.7) please retarget this ticket's milestone to sage-pending or sage-wishlist.

@embray embray modified the milestones: sage-8.6, sage-8.7 Jan 15, 2019
@timokau
Copy link
Contributor Author

timokau commented Jan 17, 2019

comment:10

Replying to @embray:

I think it would be quite practical; just nobody's been sufficiently motivated enough before now.

Probably somewhere in sage.doctest.forker we would just create a tempdir and set sage.env.DOT_SAGE to that tempdir before running any examples in a test file.

I now remembered why that wouldn't work. There are other variables that depend on sage.env.DOT_SAGE which wouldn't be re-evaluated. We'd probably need to either change env.py to use @propertys or spawn completely independent sage instances.

@timokau
Copy link
Contributor Author

timokau commented Jan 17, 2019

comment:11

Although that wouldn't even fix this issue since MAXIMA_USERDIR is set in sage-env, not env.py.

@embray
Copy link
Contributor

embray commented Jan 18, 2019

comment:12

I still haven't looked at it that deeply but I suspect you're overthinking it. Though the concern about MAXIMA_USERDIR is definitely valid. That's the kind of problem I wanted to address in #22652, but every time it's come up it's been easier to just implement a narrow fix rather than a general solution.

@timokau
Copy link
Contributor Author

timokau commented Jan 18, 2019

comment:13

Repeating my comment I accidentally posted in #22652 here:

Replying to @embray:

I still haven't looked at it that deeply but I suspect you're overthinking it.

env.py contains these lines:

_add_variable_or_fallback('PYTHON_EGG_CACHE',   opj('$DOT_SAGE', '.python-eggs'))
_add_variable_or_fallback('SAGE_STARTUP_FILE',  opj('$DOT_SAGE', 'init.sage'))

Which is admittedly way less occurrences than I expected. Still, those would need to be re-evaluated or their usage changed.

Though the concern about MAXIMA_USERDIR is definitely valid. That's the kind of problem I wanted to address in #22652,

#22652 would be amazing. I don't see why it would necessary fix these issues though. Of course it would make sense to tackle both issues at the same time.

but every time it's come up it's been easier to just implement a narrow fix rather than a general solution.

Speaking of narrow fixes, for this specific use-case, for this particular issue that should do it (how do I get that nice patch highlighting on trac?):

diff --git a/src/sage/doctest/forker.py b/src/sage/doctest/forker.py
index 02e18e67e7..115f748727 100644
--- a/src/sage/doctest/forker.py
+++ b/src/sage/doctest/forker.py
@@ -668,7 +668,14 @@ class SageDocTestRunner(doctest.DocTestRunner, object):
 
             try:
                 # Don't blink!  This is where the user's code gets run.
+                # print("MAXIMA_USERDIR is: {}".format(os.environ['MAXIMA_USERDIR']))
+                os.environ['MAXIMA_USERDIR'] = "/tmp/maxima-userdir-{}".format(os.getpid())
                 self.compile_and_execute(example, compiler, test.globs)
             except SystemExit:
                 raise
             except BaseException:

I'll probably adopt that for nix. I don't want to fix the same issue on our build server every week. Should I submit that upstream or is that too "hacky"?

@timokau
Copy link
Contributor Author

timokau commented Jan 18, 2019

comment:14

I ended up with NixOS/nixpkgs#54285.

@jhpalmieri
Copy link
Member

comment:15

Replying to @timokau:

(how do I get that nice patch highlighting on trac?)

On the line right after the triple braces {{{, add the line #!diff. (See also https://trac.sagemath.org/wiki/WikiProcessors for more options like #!sh, #!html, etc.)

@jdemeyer
Copy link

comment:16

You seem too much focusing on the doctest framework here: this is not a bug in the doctest framework, so it shouldn't be fixed in the doctest framework.

Ideally, this should be reported to ECL upstream.

@timokau
Copy link
Contributor Author

timokau commented Jan 19, 2019

comment:17

They seem to assume only one ecl process is running at a time, which may or may not be a bug.

@embray
Copy link
Contributor

embray commented Mar 25, 2019

comment:18

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

@embray embray modified the milestones: sage-8.7, sage-8.8 Mar 25, 2019
@embray
Copy link
Contributor

embray commented Jun 14, 2019

comment:19

As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).

@embray embray removed this from the sage-8.8 milestone Jun 14, 2019
@mkoeppe
Copy link
Member

mkoeppe commented Apr 18, 2020

comment:20

I'm getting a similar failure while building doc-html

@mkoeppe mkoeppe added this to the sage-9.1 milestone Apr 18, 2020
@mkoeppe mkoeppe modified the milestones: sage-9.3, sage-9.4 Mar 24, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.4, sage-9.5 Jul 19, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.5, sage-9.6 Dec 18, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.6, sage-9.7 May 3, 2022
@mkoeppe mkoeppe modified the milestones: sage-9.7, sage-9.8 Sep 19, 2022
@mkoeppe mkoeppe removed this from the sage-9.8 milestone Jan 29, 2023
@tornaria
Copy link
Contributor

This bug is still biting, see e.g. https://github.com/void-linux/void-packages/actions/runs/4169062901/jobs/7216578712

Could we workaround this? E.g. create the directory before ECL runs ensure-directories-exist. If the directory exists, there's no race. I'm not sure there is a simple robust way to know the name of the directory in advance (e.g. for me it seems to be $DOT_SAGE/maxima/binary/5_45_1/ecl/21_2_1/ but clearly that changes with maxima and ecl versions).

Is this reported as a bug in ECL?

@tornaria
Copy link
Contributor

I mean something like the following (proof of concept, the computation of *maxima-objdir* is not 100% correct).

--- a/src/sage/interfaces/maxima_lib.py
+++ b/src/sage/interfaces/maxima_lib.py
@@ -116,6 +116,20 @@ ecl_eval("(in-package :maxima)")
 ecl_eval("(setq $nolabels t))")
 ecl_eval("(defvar *MAXIMA-LANG-SUBDIR* NIL)")
 ecl_eval("(set-locale-subdir)")
+
+# Compute `*maxima-objdir*` as in `(set-pathnames)`.
+# See: src/init-cl.lisp in maxima source code.
+import os
+maxima_objdir = os.path.join(
+        os.environ.get("MAXIMA_USERDIR"), "binary",
+        ecl_eval("(maxima-version1)").python()[1:-1],
+        ecl_eval("*maxima-lispname*").python()[1:-1],
+        ecl_eval("(lisp-implementation-version1)").python()[1:-1]
+        )
+# Create *maxima-objdir* before calling (set-pathnames) to avoid a race
+# in ecl's implementation of ensure-directories-exist.  See #26968.
+os.makedirs(maxima_objdir, exist_ok=True)
+
 ecl_eval("(set-pathnames)")
 ecl_eval("(defun add-lineinfo (x) x)")
 ecl_eval('(defun principal nil (cond ($noprincipal (diverg)) ((not pcprntd) (merror "Divergent Integral"))))')

An alternative is to first run ecl_eval("(set-pathnames)"); catch the RuntimeError and if so use the value of *maxima-objdir* as computed by (set-pathnames). I think it should use os.makedirs() then run (set-pathnames) again.

@tornaria
Copy link
Contributor

Here's a small script to reproduce the issue:

from sage.all import *
import os, sys, tempfile

def fork_and_try_import():
    pid = os.fork()
    if not pid:
        try:
            import sage.interfaces.maxima_lib
        except RuntimeError as e:
            print("RuntimeError:", e)
            os._exit(1)
        os._exit(0)
    return pid

def run_test(d, num=5):
    failed = 0
    for i in range(num):
        with tempfile.TemporaryDirectory(dir=d) as tmpdir:
            print(f"Run #{i} in {tmpdir}")
            os.environ['MAXIMA_USERDIR'] = os.path.abspath(tmpdir)
            pid1 = fork_and_try_import()
            pid2 = fork_and_try_import()
            failed += os.waitpid(pid1, 0)[1] != 0
            failed += os.waitpid(pid2, 0)[1] != 0

    print(f"Failed tests: {failed}/{num}")

if len(sys.argv) > 1:
    d = sys.argv[1]
else:
    d = None

with tempfile.TemporaryDirectory(dir=d) as tmpdir:
        print(f"Running test in {tmpdir}")
        run_test(tmpdir)

This is still a race so it won't reproduce every time, depending on the speed of your filesystem. However, I have better luck reproducing this on either on NFS or on SSHFS, since the network latency gives plenty of time for the race to trigger.

By default this uses /tmp and is tipically hard to trigger the race:

$ sage sage-race.py 
Running test in /tmp/tmpqb989z33
Run #0 in /tmp/tmpqb989z33/tmpsg61q0xr
Run #1 in /tmp/tmpqb989z33/tmpfvz6o5_9
Run #2 in /tmp/tmpqb989z33/tmpe3qb7qpu
Run #3 in /tmp/tmpqb989z33/tmp9a5n9pqj
Run #4 in /tmp/tmpqb989z33/tmpfu2qr3aj
Failed tests: 0/5

My home dir here is NFS:

$ sage sage-race.py ~
Running test in /home/tornaria/tmp3q0d_q4q
Run #0 in /home/tornaria/tmp3q0d_q4q/tmp60_u_uuc
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmp60_u_uuc/binary"
C library error: "File exists"
Run #1 in /home/tornaria/tmp3q0d_q4q/tmp03bxyaf9
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmp03bxyaf9/binary"
C library error: "File exists"
Run #2 in /home/tornaria/tmp3q0d_q4q/tmpz37vri7x
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmpz37vri7x/binary"
C library error: "File exists"
Run #3 in /home/tornaria/tmp3q0d_q4q/tmp4m55dxh1
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmp4m55dxh1/binary"
C library error: "File exists"
Run #4 in /home/tornaria/tmp3q0d_q4q/tmpp8bw75kr
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmpp8bw75kr/binary"
C library error: "File exists"
Failed tests: 5/5

Here's how to do it using sshfs (change REMOTE-HOST by the name of a host you can ssh into):

$ mkdir /tmp/mnt
$ sshfs REMOTE-HOST: /tmp/mnt/
$ sage sage-race.py /tmp/mnt/
Running test in /tmp/mnt/tmpi8hw10cb
Run #0 in /tmp/mnt/tmpi8hw10cb/tmp4ehsdrr7
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmp4ehsdrr7/binary"
C library error: "File exists"
Run #1 in /tmp/mnt/tmpi8hw10cb/tmpa_icybzr
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmpa_icybzr/binary"
C library error: "File exists"
Run #2 in /tmp/mnt/tmpi8hw10cb/tmpgqlbi43n
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmpgqlbi43n/binary"
C library error: "File exists"
Run #3 in /tmp/mnt/tmpi8hw10cb/tmp2v6883z7
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmp2v6883z7/binary"
C library error: "File exists"
Run #4 in /tmp/mnt/tmpi8hw10cb/tmprdtc0yme
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmprdtc0yme/binary"
C library error: "File exists"
Failed tests: 5/5
$ fusermount -u /tmp/mnt
$ rmdir /tmp/mnt/

@tornaria
Copy link
Contributor

A direct way to reproduce the doctest failures is as follows. Assuming /tmp/mnt is a slow fs (say, sshfs as above)

$ DOT_SAGE=/tmp/mnt/temp-dot-sage ./sage -tp 2 src/sage/interfaces/maxima_lib.py{,}
...
    RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmp-dot-sage/maxima/binary/5_45_1"
    C library error: "File exists"
...

Note this will doctest the same file twice in parallel. Usually one of the two will fail the first time. After that the maxima directory will be created so no race anymore: you have to rm -rf /tmp/mnt/temp-dot-sage/maxima before each try for this to fail.

For the same reason, you may be able to reproduce this if your $HOME is on an NFS-mounted filesystem without messing with DOT_SAGE, but only if you rm -rf ~/.sage/maxima before each try.


Now that I can easily reproduce it, I'll PR a workaround.

tornaria added a commit to tornaria/sage that referenced this issue Feb 25, 2023
We use a temporary `MAXIMA_USERDIR` so its empty, and we try
to initialize maxima twice in parallel to entice the race.
This temporary dir is placed within `DOT_SAGE` so it is easy
to try different filesystems.

The bug triggers more frequently if `DOT_SAGE` is in a high
latency filesystem (e.g. sshfs on a non-local host).

The next commit introduces a workaround for the bug.
tornaria added a commit to tornaria/sage that referenced this issue Feb 25, 2023
When maxima is initialized a bug in ecl implementation of
`ensure-directories-exist` might result in a runtime error.

As a workaround, in case we get a runtime error we use python to create
the directory and continue with maxima initialization.

Note that for normal usage the directory will already exist within the
user's `DOT_SAGE` so this code will almost never run. However, when
running doctests on CI this occasionally triggers.
tornaria added a commit to tornaria/sage that referenced this issue Feb 25, 2023
We run a new instance of sage in a subprocess to ensure maxima is
not already initialized. We use a temporary MAXIMA_USERDIR so its empty,
and we try to initialize maxima twice in parallel to entice the race.

This temporary dir is placed within `DOT_SAGE` so it is easy
to try different filesystems.

The bug triggers more frequently if `DOT_SAGE` is in a high
latency filesystem (e.g. sshfs on a non-local host).

The next commit introduces a workaround for the bug.
tornaria added a commit to tornaria/sage that referenced this issue Feb 25, 2023
When maxima is initialized a bug in ecl implementation of
`ensure-directories-exist` might result in a runtime error.

As a workaround, in case we get a runtime error we use python to create
the directory and continue with maxima initialization.

Note that for normal usage the directory will already exist within the
user's `DOT_SAGE` so this code will almost never run. However, when
running doctests on CI this occasionally triggers.
@tobiasdiez tobiasdiez mentioned this issue Feb 25, 2023
3 tasks
tornaria added a commit to tornaria/sage that referenced this issue Feb 25, 2023
When maxima is initialized a bug in ecl implementation of
`ensure-directories-exist` might result in a runtime error.

As a workaround, in case we get a runtime error we use python to create
the directory and then continue with maxima initialization.

Note that for normal usage the directory will already exist within the
user's `DOT_SAGE` so this code will almost never run. However, when
running doctests on CI this occasionally triggers.
vbraun pushed a commit that referenced this issue Mar 26, 2023
    
### 📚 Description

When maxima is initialized a bug in ecl implementation of
`ensure-directories-exist` might result in a runtime error.

As a workaround, in case we get a runtime error we use python to create
the directory and continue with maxima initialization.

Note that for normal usage the directory will already exist within the
user's `DOT_SAGE` so this code will almost never run. However, when
running doctests on CI this occasionally triggers.

#### New doctest

The first commit introduces a doctest to try to catch this race.
We run a new instance of sage in a subprocess to ensure maxima is
not already initialized. We use a temporary `MAXIMA_USERDIR` so its
empty,
and we try to initialize maxima twice in parallel to entice the race.

This temporary dir is placed within `DOT_SAGE` so it is easy
to try different filesystems.

The bug triggers more frequently if `DOT_SAGE` is in a high
latency filesystem (e.g. sshfs on a non-local host).

Closes #26968.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->

- [x] I have made sure that the title is self-explanatory and the
description concisely explains the PR.
- [x] I have linked an issue or discussion.
- [x] I have created tests covering the changes.
    
URL: #35195
Reported by: Gonzalo Tornaría
Reviewer(s): Gonzalo Tornaría, Matthias Köppe
tornaria added a commit to tornaria/sage that referenced this issue Jun 25, 2023
Calling initialize-runtime-globals will run set-pathnames and be subject
to the issue described in sagemath#26968.

Thus the workaround introduced in sagemath#35195 has to be done before anything
that may call set-pathnames (e.g. initialize-runtime-globals).
antonio-rojas added a commit to antonio-rojas/sage that referenced this issue Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants