Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with slurm 19.05.01 #28

Closed
dpryan79 opened this issue Jul 15, 2019 · 9 comments
Closed

Compatibility with slurm 19.05.01 #28

dpryan79 opened this issue Jul 15, 2019 · 9 comments

Comments

@dpryan79
Copy link

At least when I built it just now, slurm 19.05.01 doesn't ship a libslurmdb.so, so the linking test run by configure fails. Removing the linking to it here solves the issue https://github.com/natefoo/slurm-drmaa/blob/master/m4/ax_slurm.m4#L75 but it'd be good if someone else confirmed before this is implemented.

@EricR86
Copy link
Contributor

EricR86 commented Sep 4, 2019

I can confirm that the slurmdb is merged in with the slurm library. There is a note from Schedmd:

NOTE: libslurmdb has been merged into libslurm.  If functionality is needed
      from libslurmdb please just link to libslurm.

It is under the "28 May 2019" section of the RELEASE_NOTES and can be found on the website

@ocfmatt
Copy link

ocfmatt commented Sep 13, 2019

I have a compilation issue whereby I encounter the following error:

checking for usable SLURM libraries/headers... *** The SLURM test program failed to link or run. See the file config.log
*** for the exact error that occured.
no
configure: error:
Slurm libraries/headers not found;
add --with-slurm-inc and --with-slurm-lib with appropriate locations.

My configure command is:
./configure --prefix=/opt/software/galaxy/plugins/drmaa --with-slurm-inc=/opt/software/slurm/19.05.1-2/include --with-slurm-lib=/opt/software/slurm/19.05.1-2/lib

Libraries and includes are present as stated in the help text:

[root@maxlogin1 slurm-drmaa]# ls -la /opt/software/slurm/19.05.1-2/include/slurm/slurm.h /opt/software/slurm/19.05.1-2/lib/libslurm.a
-rw-r--r--. 1 root root   217039 Aug  5 11:24 /opt/software/slurm/19.05.1-2/include/slurm/slurm.h
-rw-r--r--. 1 root root 55722378 Aug  5 11:22 /opt/software/slurm/19.05.1-2/lib/libslurm.a

My Slurm installation is compiled from source using Intel compilers. I am getting the same error when running against Slurm 18.08.8 making me think I am potentially missing a compilation flag in either Slurm or drmaa.

@dpryan79 did you install Slurm 19.05.1 from packages or compile from source?

@EricR86
Copy link
Contributor

EricR86 commented Sep 13, 2019

@ocfmatt if you look in your config.log I'm pretty certain the error will show up as
/usr/bin/ld: cannot find -lslurmdb
after configure tries to compile a test program. It's complaining specifically that it can't find libslurmdb.so because that file no longer exists in Slurm 19.05.

You can remove the -lslurmdb line safely in either the fix shown above or from the SLURM_LIBS variable inside configure itself (it was around line 14034 for me).

@ocfmatt
Copy link

ocfmatt commented Sep 13, 2019

@EricR86 Thanks for your quick reply.

I removed "-lslurmdb" and was able to configure and make after I downloaded the release source code. I had an uninitialised drmaa_utils path causing other errors.

@research-computing-facility
Copy link

research-computing-facility commented Nov 21, 2019

Hi all,
I have been able to build the drmaa thanks to these tips with slurm 19.05.4 but the module will not load:

galaxy.jobs.runners.drmaa INFO 2019-11-21 17:13:08,365 Overriding DRMAA_LIBRARY_PATH due to runner plugin parameter: /galaxy/slurm-drmaa/compiled/lib/libdrmaa.so
Traceback (most recent call last):
  File "/galaxy/production/lib/galaxy/webapps/galaxy/buildapp.py", line 58, in paste_app_factory
    app = galaxy.app.UniverseApplication(global_conf=global_conf, **kwargs)
  File "/galaxy/production/lib/galaxy/app.py", line 189, in __init__
    self.job_manager = manager.JobManager(self)
  File "/galaxy/production/lib/galaxy/jobs/manager.py", line 24, in __init__
    self.job_handler = handler.JobHandler(app)
  File "/galaxy/production/lib/galaxy/jobs/handler.py", line 34, in __init__
    self.dispatcher = DefaultJobDispatcher(app)
  File "/galaxy/production/lib/galaxy/jobs/handler.py", line 779, in __init__
    self.job_runners = self.app.job_config.get_job_runner_plugins(self.app.config.server_name)
  File "/galaxy/production/lib/galaxy/jobs/__init__.py", line 649, in get_job_runner_plugins
    rval[id] = runner_class(self.app, runner['workers'], **runner.get('kwds', {}))
  File "/galaxy/production/lib/galaxy/jobs/runners/drmaa.py", line 63, in __init__
    drmaa = __import__("drmaa")
  File "/galaxy/production/.venv/lib/python2.7/site-packages/drmaa/__init__.py", line 65, in <module>
    from .session import JobInfo, JobTemplate, Session
  File "/galaxy/production/.venv/lib/python2.7/site-packages/drmaa/session.py", line 39, in <module>
    from drmaa.helpers import (adapt_rusage, Attribute, attribute_names_iterator,
  File "/galaxy/production/.venv/lib/python2.7/site-packages/drmaa/helpers.py", line 36, in <module>
    from drmaa.wrappers import (drmaa_attr_names_t, drmaa_attr_values_t,
  File "/galaxy/production/.venv/lib/python2.7/site-packages/drmaa/wrappers.py", line 56, in <module>
    _lib = CDLL(libpath, mode=RTLD_GLOBAL)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /galaxy/slurm-drmaa/compiled/lib/libdrmaa.so: undefined symbol: slurm_kill_job2

What is odd is that it seems to be defined in the headers;

==> grep -r slurm_kill_job2 /usr/include/slurm
/usr/include/slurm/slurm.h: * slurm_kill_job2()
/usr/include/slurm/slurm.h:extern int slurm_kill_job2(const char *job_id, uint16_t signal, uint16_t flags);
==> ldd /galaxy/slurm-drmaa/compiled/lib/libdrmaa.so
	linux-vdso.so.1 =>  (0x00007ffef55c8000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb11604c000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fb115c7e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fb11648c000)

Command used to configure:
./configure --prefix /galaxy/slurm-drmaa/compiled
Would anyone have any suggestions?
Many thanks in advance

@dpryan79
Copy link
Author

libdrmaa should be linking against libslurm.so, but that doesn't seem to be the case for you, which I think is the cause of the problem.

@research-computing-facility

Hi @dpryan79 you were right I did a silly mistake I removed the whole line in the configure rather than leaving
SLURM_LIBS="-lslurm "
thanks for your insight!

@natefoo
Copy link
Owner

natefoo commented Apr 1, 2020

Fixed by @EricR86 in #34 and released in version 1.1.1. Thanks!

@natefoo natefoo closed this as completed Apr 1, 2020
@X-WJ
Copy link

X-WJ commented Oct 26, 2020

Hi, I also met this problem as natefoo described .and i revised this line in configure like this .

#line 14022 SLURM_LIBS="-lslurm"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants