Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble passing make check #1240

Closed
sampollard opened this issue Oct 17, 2017 · 33 comments
Closed

Trouble passing make check #1240

sampollard opened this issue Oct 17, 2017 · 33 comments

Comments

@sampollard
Copy link

I am trying to get flux installed on a non-LC system (uname -a = Linux sansa 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux). Here is my process:

  1. Edit flux package in spack to be updated to v0.8.0
url      = "https://github.com/flux-framework/flux-core/releases/download/v0.8.0/flux-core-0.8.0.tar.gz"
version('0.8.0', md5= '9ee12974a8b2ab9a30533f69826f3bec')
# (I got this via md5sum flux-core-0.8.0.tar.gz)
  1. Edited flux package in spack
depends_on("lua-luafilesystem")
depends_on("lau-luaposix")
depends_on("libuuid")
  1. Remove the gettext dependency in spack/var/spack/repos/builtin/packages/git

  2. spack install flux %gcc@5.4.0

  3. Load everything from module avail that spack manages

  4. ./autogen && ./configure --diable-docs --prefix=$HOME/local && make

  5. make check

Then I get this file
test-suite-sansa4.log

Does anyone have any insight into this? It seems like fluxometer isn't getting copied into spack's LUA location, but there seems to be more going on.

@grondo
Copy link
Contributor

grondo commented Oct 17, 2017

Even some non-Lua tests are failing, e.g. looks like most of the kvs tests have failed.

Once make has completed, can you try something like this in your builddir?

 $ cd t
 $ ./t0001-basic.t -d -v

If that succeeds then maybe try running ./t1000-kvs.t -d -v which definitely failed in your example.

I wonder if there is just some basic missing dependency.

@sampollard
Copy link
Author

So ./t0001-basic.t -d -v passed every test. I have attached the output of ./t1000-kvs.t -d -v
t1000-kvs.out.txt
)

@grondo
Copy link
Contributor

grondo commented Oct 17, 2017

Ok, that didn't fail in the obvious ways I would expect if there was some missing dependency.

We may have to start smaller. Would you be willing to run through ./t0002-request.t -d -v, ./t0003-*.t -d -v , etc til you hit the first failure and share the result?

@sampollard
Copy link
Author

Of course. Thank you for helping me with this. Here's what I've got:

  • PASS t0002, 3, 4, 5, 7, 8, 9, 10, 11, 12, 14 (there's no t0006 and t0013)
  • FAIL ./t0015-cron.t -d -v
    t0015-cron.out.txt

@grondo
Copy link
Contributor

grondo commented Oct 17, 2017

Ok, unfortunately there are no useful errors in that test either. Sorry!

Try this (from top level of your flux-core builddir)

$ src/cmd/flux start flux exec hostname

The flux-exec util uses the Lua bindings so if something is wrong there, hopefully we'll get a useful error.
If that works, then try the flux-cron command that was failing above:

$ src/cmd/flux start flux cron list

You could also just launch a session and try various commands to see if any clues pop up:

$ src/cmd/flux start
$ flux kvs dir
$ flux kvs dir -R
$ flux ps
...

The confusing thing is that the kvs tests don't depend on the Lua bindings (I don't think), so there might be something else wrong as well.

@sampollard
Copy link
Author

$ src/cmd/flux start flux exec hostname
sansa
$ src/cmd/flux start flux cron list
    ID CMD/NAME        STATE        #RUNS            LASTRUN  STATUS
$ flux kvs dir
$ flux kvs dir -R
$ flux ps
OWNER     RANK       PID  COMMAND
none         0     70332  /bin/bash
none         0        -1  /bin/bash

I messed around a bit while inside flux, but I am not very familiar just using flux-core. I built flux-sched and ran a unit test and it looks like this, if it helps. (this is t/t001-basic.t from flux-sched)
t0001-basic.out.txt

@grondo
Copy link
Contributor

grondo commented Oct 18, 2017

Hm, most everything you tried above seemed to work properly, indicating that the flux-core built at least marginally ok.

The errors from that flux-sched test seem like that version of flux-sched wasn't built against the correct libflux-core libraries. Do you perhaps have a previous version of a flux-core spack package installed, for which flux-sched might be picking up libraries?

Tomorrow I'll try building up dependencies with spack as you have and see if I can reproduce any of these issues.

@grondo
Copy link
Contributor

grondo commented Oct 18, 2017

@sampollard -- I didn't get very far with the spack package on my desktop (See #1118 for details).

Still trying to reproduce the errors you're seeing here...

@sampollard
Copy link
Author

Ahh, I've found my way to that issue before as well. Here's what I did to get around that:

I modified the flux and git packages in git they can be found here

then, I did spack install --only dependencies flux

I then did a module avail and piped and loaded every single module I could that was managed by flux. Maybe this will get you on the right track?

NOTE: I didn't try this.

Alternatively, it looks like Todd Gamblin suggested adding the following to ~/.spack/config.yaml

config:
    build_stage:
    - $spack/var/spack/stage

@grondo
Copy link
Contributor

grondo commented Oct 19, 2017

Thanks, I didn't have environment-modules on my desktop, so I've installed environment modules with spack, then set it up with

$ . $(spack location --install-dir environment-modules)/Modules/init/bash
$ . $SPACK_ROOT/share/spack/setup-env.sh

After this I was able to build a copy of flux-core using all dependencies from spack, and was indeed able to reproduce the problem you've reported.

I'm still trying to figure out what is going on though. I've tried changing the versions of czmq and zeromq, as well as jansson, but so far that doesn't seem to resolve the problem.

For the flux-cron errors, I narrowed down the issue to a failure in json_unpack() for the json_t *arg argument passed to cron_interval_create. After adding some debug, it appears that the json_t object is corrupted somehow. The subsequent json_unpack fails with (after adding debug code) "Object item not found: interval", even though json_dumps() shows a string representation as: {"interval":1}.

In fact, the following test (added to the top of cron_interval_create()) fails with "arg and o are NOT equal". I would never expect this to fail:

json_t *o = json_loads (json_dumps (arg, 0), 0, NULL);
fprintf (stderr, "arg and o are %sequal\n",
             json_equal (o, arg) ? "" : "NOT ");

I'll see if I can reproduce this outside of flux code, but I don't see this same error when building with normal dependencies outside of spack.

@sampollard
Copy link
Author

Huh. So this is only happening when spack handles the dependencies?

Would a temporary workaround be to install all the dependencies without spack?

@grondo
Copy link
Contributor

grondo commented Oct 19, 2017

Yes, that seems to work for me on my desktop.. though I'd really like to understand what is going on here with the jansson library.

@grondo
Copy link
Contributor

grondo commented Oct 19, 2017

This is interesting. The spack package for jansson only has a static library. Do you know if that is on purpose? That might be part of the problem here, since multiple function versions could be loaded by different objects within the same address space.

$ (cd  $(spack location --install-dir jansson) && find lib)
lib
lib/cmake
lib/cmake/jansson
lib/cmake/jansson/JanssonTargets.cmake
lib/cmake/jansson/JanssonTargets-relwithdebinfo.cmake
lib/cmake/jansson/JanssonConfig.cmake
lib/cmake/jansson/JanssonConfigVersion.cmake
lib/libjansson.a
lib/pkgconfig
lib/pkgconfig/jansson.pc

@grondo
Copy link
Contributor

grondo commented Oct 19, 2017

A change that fixed the issue with the cron service was to switch to flux_msg_get_json and json_loads directly in the cron service, then using json_unpack directly, instead of relying on flux_msg_unpack().

i.e.

    if (flux_msg_get_json (msg, &json_str) < 0
       || !(o = json_loads (json_str, 0, NULL))) {
        flux_log_error (h, "cron.create: Failed to get name/command/args");
        goto done;
    }

    /* Get required fields "type", "name" and "command" */
    if (json_unpack (o, "{ s:s, s:s, s:s, s:O }",
            "type", &type,
            "name", &name,
            "command", &command,
            "args", &args) < 0) {
        flux_log_error (h, "cron.create: Failed to get name/command/args");
        goto done;
    }

instead of

    if (flux_msg_unpack (msg, "{ s:s, s:s, s:s, s:O }",
            "type", &type,
            "name", &name,
            "command", &command,
            "args", &args) < 0) {
        flux_log_error (h, "cron.create: Failed to get name/command/args");
        goto done;
    }

This does seem to imply that the jansson implementation that got linked with libflux message.c is maybe not binary compatible with the one linked to modules/cron (though I don't really have proof there is even a different version linked anywhere else, I did try to remove all versions of jansson but one from my system before rebuilding)

My worry in general, is that we're going to run into similar trouble if ever someone is compiling out-of-tree modules to load into a flux session, and using flux_msg_unpack to return a json_t * object, unless we can guarantee the exact same jansson symbols are being used in both contexts...

@grondo
Copy link
Contributor

grondo commented Oct 20, 2017

@sampollard, I think the main problem here (besides the linking of fluxometer.lua to itself) is the static jansson library. I changed the spack package of jansson to build shared libraries and that resolved all the strange test failures. (Though I have no idea why exactly linking jansson statically was causing these errors).

Try this change, along with config change suggested by Todd above, and let me know if it goes better.

I still have czmq and zeromq backed off to earlier versions, but I'll try updating them to the default in spack after lunch.

diff --git a/var/spack/repos/builtin/packages/jansson/package.py b/var/spack/repos/builtin/packages/jansson/package.py
index f29b17f..c8269b7 100644
--- a/var/spack/repos/builtin/packages/jansson/package.py
+++ b/var/spack/repos/builtin/packages/jansson/package.py
@@ -33,3 +33,8 @@ class Jansson(CMakePackage):
     url      = "https://github.com/akheron/jansson/archive/v2.9.tar.gz"
 
     version('2.9', 'd2db25c437b359fc5a065ed938962237')
+
+    def cmake_args(self):
+        return [
+            '-DJANSSON_BUILD_SHARED_LIBS=ON',
+       ]

@grondo
Copy link
Contributor

grondo commented Oct 21, 2017

Submitted patch to spack jansson package as spack/spack#5857

@grondo
Copy link
Contributor

grondo commented Oct 21, 2017

I've also discovered the reason make check can't find the fluxometer lua module at runtime when built with spack dependencies. The spack lua package sets up the runtime LUA_PATH and omits the default path (the special string ";;"). The fluxometer based Lua tests in flux-core are dependent on that path, since it contains the local directory.

To temporarily fix, append ;;; to the current LUA_PATH, e.g.

$ export LUA_CPATH="$LUA_CPATH;;;"

As a workaround, the flux t/Makefile.am could explicitly append the path to fluxomter.lua to the LUA_PATH (you would still have to append the default path to run the tests by hand)

@sampollard
Copy link
Author

sampollard commented Oct 24, 2017

I hope I haven't done something stupid and regressed, but here's where I'm at. Here are some errors I'm getting with make check:

ERROR: t0000-sharness
=====================

lua: ...at2g6fhs6s74m553tokapjc/share/lua/5.1/posix/init.lua:32: posix namespace clash: unistd.getopt
stack traceback:
        [C]: in function 'assert'
        ...at2g6fhs6s74m553tokapjc/share/lua/5.1/posix/init.lua:32: in main chunk
        [C]: in function 'require'
        (command line):1: in main chunk
        [C]: ?
error: failed to find lua posix module in path
ERROR: t0000-sharness.t - missing test plan
ERROR: t0000-sharness.t - exited with status 1
...
...
...
ERROR: lua/t0004-getattr
========================

Required Lua 'posix' module not found. Please check LUA_PATH or install package
ERROR: lua/t0004-getattr.t - missing test plan
ERROR: lua/t0004-getattr.t - exited with status 1

But I can't get the lua-posix library to link. Any idea why this might happen? I have module loaded the spack lua-luaposix and lua-luafilesystem. Do I also need a systemwide installation of luarocks or something?

Also, did you mean to have LUA_CPATH instead of LUA_PATH in that last comment? Because when I added ;; to the end of LUA_PATH I fixed the fluxometer issues.

@grondo
Copy link
Contributor

grondo commented Oct 24, 2017

Yeah, you are right should have been LUA_PATH, sorry!

I haven't seen that namespace clash before from lua-posix. You might try "uninstalling" the lua-luaposix module and trying again. What is the LUA_CPATH as set by spack?

@grondo
Copy link
Contributor

grondo commented Oct 24, 2017

But I can't get the lua-posix library to link. Any idea why this might happen? I have module loaded the spack lua-luaposix and lua-luafilesystem. Do I also need a systemwide installation of luarocks or something?

You shouldn't need the lua-luafilesystem package. I wasn't having trouble with luaposix, but then again I have a working version installed via system packages. I'll try reproducing.

@sampollard
Copy link
Author

sampollard commented Oct 25, 2017

Thanks. I had been dragging lua-luafilesystem along in the dependencies for a while now, I think I saw it in some build instruction a while ago. I did reinstall the lua-posix.

Anyway, my steps were thus:

  1. Get my version of spack with the updated flux to version 0.8.0 and your jansson change
  2. Add this to ~/.spack/config.yaml
config:
    build_stage:
    - $spack/var/spack/stage
  1. spack install flux %gcc@5.4.0 - this will fail. I specify 5.4.0 rather than my default (4.8.5) because it seems to load the 5.4.0 regardless of if I leave that off.
  2. export CC=gcc-5. Actually, I module load the gcc-5 but you get the idea.
  3. Load every spack module from module avail
  4. export LUA_PATH="$LUA_PATH;;;"
  5. ./autogen.sh && ./configure --prefix=$HOME/local && make -j8
    (I had to spack uninstall czmq then reinstall it for some reason. There was a uuid dependency that didn't get satisfied the first time, this has happened to me before)
  6. make check - This still generates an error finding luaposix, as mentioned before, though no namespace clash

LUA_CPATH is

/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-luaposix-33.4.0-5bfqmmdevat2g6fhs6s74m553tokapjc/lib/lua/5.1/?.so;/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-luaposix-33.4.0-5bfqmmdevat2g6fhs6s74m553tokapjc/share/lua/5.1/?.so;/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-5.1.5-otsc6gzx3tlabm6y5yflmzewtfzzf6lk/lib/lua/5.1/?.so;/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-5.1.5-otsc6gzx3tlabm6y5yflmzewtfzzf6lk/share/lua/5.1/?.so;/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-luaposix-33.4.0-5bfqmmdevat2g6fhs6s74m553tokapjc/lib/lua/5.1/?.so;/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-luaposix-33.4.0-5bfqmmdevat2g6fhs6s74m553tokapjc/share/lua/5.1/?.so;/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-5.1.5-otsc6gzx3tlabm6y5yflmzewtfzzf6lk/lib/lua/5.1/?.so;/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-5.1.5-otsc6gzx3tlabm6y5yflmzewtfzzf6lk/share/lua/5.1/?.so;/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/lua-5.1.5-otsc6gzx3tlabm6y5yflmzewtfzzf6lk/lib/lua/5.1/?.so

To be clear, I was able to get a working version by just manually installing everything (with apt).

@grondo
Copy link
Contributor

grondo commented Oct 25, 2017

Load every spack module from module avail

I'm not that familiar with spack, are you using spack load to load modules or module load? I don't know the difference.

@grondo
Copy link
Contributor

grondo commented Oct 25, 2017

spack install flux %gcc@5.4.0 - this will fail. I specify 5.4.0 rather than my default (4.8.5) because it seems to load the 5.4.0 regardless of if I leave that off.

I can't get this to work, even if I first spack install gcc@5.4.0 (which took forever)

@sampollard
Copy link
Author

I am using module load, and the system already has gcc 5.4.0.

I also forgot to clarify, flux install spack fails as well. This is just to install the dependencies, and afterward I go into the source directory of flux.

@grondo
Copy link
Contributor

grondo commented Oct 25, 2017

I also forgot to clarify, flux install spack fails as well. This is just to install the dependencies, and afterward I go into the source directory of flux.

Ah, ok I understand now. I'm using default /usr/bin/gcc which is 6.2.0 on my system, but I can't understand how that would be part of your luaposix problem.

I did notice spack has a spack install --only dependencies which is what I've been using.

@grondo
Copy link
Contributor

grondo commented Oct 25, 2017

Though I should note that spack install flux did work for me on my flux-core-update branch (which is pending merge in upstream spack repo)

Does lua -l posix work for you with your current environment ?

$ lua -l posix -e 'print "Success"'
Success

@grondo
Copy link
Contributor

grondo commented Oct 30, 2017

@sampollard, on my latest flux-core-update branch of spack (submitted upstream as spack/spack#5914) I'm able to get most of make check to pass using the following process:

  1. checkout clean copy of spack repo on flux-core-update branch:
$  git clone https://github.com/grondo/spack.git spack.git
Cloning into 'spack.git'...
$ cd spack.git
$ git checkout flux-core-update
  1. bootstrap spack:
$ export PATH=$(pwd)/bin:$PATH
$ spack bootstrap
$ . $(spack location --install-dir environment-modules)/Modules/init/bash
$ . share/spack/setup-env.sh
  1. build flux dependencies with spack (this takes awhile):
$ spack install --only dependencies flux@master
  1. checkout flux & build under spack env flux bash
$ git clone https://github.com/flux-framework/flux-core
$ cd flux-core
$ spack env flux bash
$ ./autogen.sh && ./configure
...
$ make -j 16
$ check -j 16 check

For this scenario, on my desktop anyway, all tests pass except the valgrind test, which fails with a detected leak under getaddrinfo which doesn't quite match the existing suppression under t/valgrind/valgrind.supp.

@sampollard
Copy link
Author

I apologize for the delay. Let me try to address everything:

Lua posix appears to work; I get the same output as you:

$ lua -l posix -e 'print "Success"'
Success

And for some reason, the make check appears to fail in the following way:
make_check_output.txt

This seems to be an unrelated error with python. When your fix gets merged, I'll try it that way and let you know.

@grondo
Copy link
Contributor

grondo commented Nov 1, 2017

Ok, package spec for flux is now updated in spack LLNL/develop branch. Make check should be able to pass on current flux-core/master since the failing valgrind test was updated (at least for my ubuntu system)

Your failure above seems to be in the pylint checks, which we've now disabled by default on master. Pylint seems to be very version sensitive and brittle, so we've made that check opt-in upstream. For 0.8.0 you might want to try a different version of pylint/astroid (1.5.6 specifically is used in our travis builds for instance)

@sampollard
Copy link
Author

It seems that fixing one bug reveals another.

I think that part of these issues may be the system I'm using; I've had trouble with it being misconfigured in the past so it may not even be spack or flux's fault. But anyway, here's what I'm getting.

If I use the spack env flux bash after checking out this version of spack 165e6bfe5fb3ce327315c698a3a275deda9e6d35
and this commit of flux-core: 4456339, doing

spack env flux bash
./configure --prefix=$HOME/local --disable-docs

Then I get the following compilation error:

/disks/large/home/users/spollard/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/czmq-4.0.2-y7iykqngvz2x6jwayts4ezlifycn4tap/lib/libczmq.so: undefined reference to `uuid_generate@UUID_1.0'

and I have loaded libuuid-1.0.3-gcc-5.4.0-yapjrlz.

@grondo
Copy link
Contributor

grondo commented Nov 7, 2017

Does that error go away if you rebuild libczmq?
I do notice that the czmq spack package doesn't have a depends on libuuid, so I wonder if it was building with the system libuuid before?

@sampollard
Copy link
Author

I reinstalled libczmq, and also ended up reinstalling munge. When using spack env flux bash I also had to type export LUA_PATH="$LUA_PATH;;;" to get the LUA_PATH.

It seems that every time I try to reinstall some package, I have to reinstall another. It seems to go in this order: lua -> czmq -> munge -> luaposix -> lua -> czmq -> munge -> ... ad nauseum.

I wish I could help figure this out, but I can't really justify spending any more time on this since the workaround of installing packages using apt worked for me on a different system. It could be a case that my current system is misconfigured, but I'll leave the make check log for now.
test-suite.log

@grondo
Copy link
Contributor

grondo commented Nov 8, 2017

@sampollard, thanks for trying! Honestly I think this is a tricky problem with spack and probably not a flux problem, so I wouldn't encourage you to spend too much more time on it either. Glad you're able to get flux-core,sched working. If you have problems in the future feel free to open up another issue!

@grondo grondo closed this as completed Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants