Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a conda-forge package for flux-core #5079

Closed
jan-janssen opened this issue Apr 8, 2023 · 52 comments
Closed

Build a conda-forge package for flux-core #5079

jan-janssen opened this issue Apr 8, 2023 · 52 comments

Comments

@jan-janssen
Copy link

I try to build flux from source but I get the following error message:

make[3]: Entering directory '$SRC_DIR/src/common/liblsd'
  CC       cbuf.lo
cbuf.c: In function 'cbuf_peek_line':
cbuf.c:665:15: error: variable 'l' set but not used [-Werror=unused-but-set-variable]
  665 |     int n, m, l;
      |               ^
cbuf.c: In function 'cbuf_read_line':
cbuf.c:701:15: error: variable 'l' set but not used [-Werror=unused-but-set-variable]
  701 |     int n, m, l;
      |               ^
cbuf.c: In function 'cbuf_replay_line':
cbuf.c:738:15: error: variable 'l' set but not used [-Werror=unused-but-set-variable]
  738 |     int n, m, l;
      |               ^
cbuf.c: In function 'cbuf_write_line':
cbuf.c:812:23: error: variable 'n' set but not used [-Werror=unused-but-set-variable]
  812 |     int nfree, ncopy, n;
      |                       ^
cc1: all warnings being treated as errors
@garlick
Copy link
Member

garlick commented Apr 8, 2023

Thanks! Which compiler and version?
We'll just need to add that error to the list of errors we ignore for that library since was pulled in from elsewhere.

@chu11
Copy link
Member

chu11 commented Apr 8, 2023

Hmmm, it seems the l (in the first one) is only used within an assert, so I guess assertions are disabled on your build. Possible you are disabling NDEBUG. We normally build with asserts turned on.

Edit: oh @garlick responded like 1 minute earlier, that's another approach :P

Edit: Sorry, I got it backwards, you define NDEBUG to disable assert ... but you seemed to realize my mistake.

@jan-janssen
Copy link
Author

I try to build a conda-forge package in conda-forge/staged-recipes#22502 - the logs are available via the azure CI.

@jan-janssen
Copy link
Author

In the logs I found the following definitions:

CFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/flux-core-0.49.0 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix
CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem $PREFIX/include

But I thought as it is a c code the CPPFLAGS should have no effect. Or is this external library build with C++?

@grondo
Copy link
Contributor

grondo commented Apr 8, 2023

CPPFLAGS is C preprocessor flags, CXXFLAGS would be C++ flags. If you are able, try removing -DNDEBUG from CPPFLAGS in your build until we get the issue resolved in our build system.

@chu11
Copy link
Member

chu11 commented Apr 8, 2023

CPPFLAGS stands for C pre-processor, so it does seem that you're defining it in some way within your build, thus asserts not built, thus the warning.

I think @garlick's technique (temporarily for you) would be to disable that specific warning in the liblsd/Makefile.am.

AM_CFLAGS = \
	$(WARNING_CFLAGS) \
	$(CODE_COVERAGE_CFLAGS) \
	-Wno-parentheses -Wno-error=parentheses -Wno-error=unused-but-set-variable

AM_LDFLAGS = \
	$(CODE_COVERAGE_LIBS)

AM_CPPFLAGS = \
	-DWITH_PTHREADS

noinst_LTLIBRARIES = liblsd.la

liblsd_la_SOURCES = \
	cbuf.c \
	cbuf.h

Edit: oops, I didn't initially add it in the above, hopefully didn't typo that, didn't try it

@jan-janssen
Copy link
Author

I slowly make progress, the next error I get is:

fileref.c: In function 'file_has_no_data':
fileref.c:60:23: error: 'SEEK_DATA' undeclared (first use in this function); did you mean 'SEEK_SET'?
   60 |     if (lseek (fd, 0, SEEK_DATA) == (off_t)-1 && errno == ENXIO)
      |                       ^~~~~~~~~
      |                       SEEK_SET
fileref.c:60:23: note: each undeclared identifier is reported only once for each function it appears in
fileref.c: In function 'blobvec_create':
fileref.c:87:42: error: 'SEEK_DATA' undeclared (first use in this function); did you mean 'SEEK_SET'?
   87 |         if ((offset = lseek (fd, offset, SEEK_DATA)) == (off_t)-1) {
      |                                          ^~~~~~~~~
      |                                          SEEK_SET
fileref.c:97:47: error: 'SEEK_HOLE' undeclared (first use in this function)
   97 |             if ((notdata = lseek (fd, offset, SEEK_HOLE)) == (off_t)-1)
      |                                               ^~~~~~~~~
make[3]: *** [Makefile:1374: fileref.lo] Error 1
make[3]: Leaving directory '$SRC_DIR/src/common/libutil'

@chu11
Copy link
Member

chu11 commented Apr 8, 2023

uh oh, perhaps the system you're building on doesn't support SEEK_DATA and SEEK_HOLE. On my ubuntu box in the lseek manpage ...

   Seeking file data and holes
       Since version 3.1, Linux supports the following additional values for whence:

       SEEK_DATA
<snip>

Perhaps fileref needs to support non-sparse files on systems that don't support SEEK_DATA and SEEK_HOLE. Not sure if a quick workaround is available for that.

Edit: I suppose a quick workaround is to remove most of the code in fileref.c for just trying to get things to compile for now. flux filemap and some other stuff wouldn't work, but I think it's survivable. Hopefully ...

@grondo
Copy link
Contributor

grondo commented Apr 8, 2023

It would help to get the compiler version and OS distro. My guess is that these errors are not from a Linux system?

You will probably encounter more errors in the build if this is the case. Unfortunately Flux supports Linux only for now...

@jan-janssen
Copy link
Author

It would help to get the compiler version and OS distro. My guess is that these errors are not from a Linux system?

As explained above, I try to build a conda package for flux. conda-forge/staged-recipes#22502 for this purpose, I build it in a restricted environment on the Azure CI. The environment is restricted in the way that it is only linking to libraries which are provided by other conda packages. The raw log is available as:
https://dev.azure.com/conda-forge/84710dde-1620-425b-80d0-4cf5baca359d/_apis/build/builds/686906/logs/35

@grondo
Copy link
Contributor

grondo commented Apr 9, 2023

Ah, I apologize that I missed your earlier comment! 🤦‍♂️ I will check out the log later, because it looks like the build is running on Ubuntu 22.04 which should be working...

@grondo
Copy link
Contributor

grondo commented Apr 9, 2023

Ah, I wonder if somehow _GNU_SOURCE is not being defined in your build? I forget how we do this, and I'm not currently at a computer, but you could try adding -D_GNU_SOURCE as a test.

@jan-janssen
Copy link
Author

It seems the _GNU_SOURCE topic is a more general issue with conda conda/conda-build#3165

@jan-janssen
Copy link
Author

Ah, I wonder if somehow _GNU_SOURCE is not being defined in your build? I forget how we do this, and I'm not currently at a computer, but you could try adding -D_GNU_SOURCE as a test.

I tried adding -D_GNU_SOURCE to the CPP_FLAGS, but then I get the following error:

fdwalk.c:71: error: "_GNU_SOURCE" redefined [-Werror]
   71 | #define _GNU_SOURCE
      | 
<command-line>: note: this is the location of the previous definition
cc1: all warnings being treated as errors

@grondo
Copy link
Contributor

grondo commented Apr 9, 2023

Thank you for trying that, now that I can take a look, I see that flux-core defines _GNU_SOURCE in config/config.h:

/* Define _GNU_SOURCE so that we get all necessary prototypes */
#define _GNU_SOURCE 1

and fileref.c does include it:

/* Define _GNU_SOURCE so that we get all necessary prototypes */
#define _GNU_SOURCE 1

This should be enough to get the definition of SEEK_DATA and SEEK_HOLE from unistd.h as documented in lseek(2):

The _GNU_SOURCE feature test macro must be defined in order to
obtain the definitions of SEEK_DATA and SEEK_HOLE from
<unistd.h>.

So I'm unfortunately baffled as to what the problem is here.

From the conda issue you reference above, it seems like conda may be replacing or patching glibc, perhaps that has something to do with it?

Perhaps I will try a build with conda to see what is going on, but I have zero prior experience with this tool.

Can you apply a patch during the conda build? We can try to provide a patch for fileref.c that can work around the issue, but I fear we'll just hit another issue since much of the code assumes a fully working glibc at this time.

@jan-janssen
Copy link
Author

Can you apply a patch during the conda build? We can try to provide a patch for fileref.c that can work around the issue, but I fear we'll just hit another issue since much of the code assumes a fully working glibc at this time.

Yes, we can patch during build time.

@jan-janssen jan-janssen changed the title Error when compiling liblsd Build a conda-forge package for flux-core Apr 9, 2023
@grondo
Copy link
Contributor

grondo commented Apr 9, 2023

Well from your conda logs it looks like the build is occurring on a centos 7 host with glibc 2.17.:

user-agent : conda/23.3.1 requests/2.28.2 CPython/3.10.10 Linux/5.15.0-1035-azure centos/7.9.2009 glibc/2.17

we test centos 7 in CI, so I'm still confused why this is failing in conda!

Can you try this patch to workaround missing SEEK_DATA and SEEK_HOLE?

diff --git a/src/common/libutil/fileref.c b/src/common/libutil/fileref.c
index 69213b0f0..22a4d3359 100644
--- a/src/common/libutil/fileref.c
+++ b/src/common/libutil/fileref.c
@@ -57,8 +57,10 @@ static int blobvec_append (json_t *blobvec,
 
 static bool file_has_no_data (int fd)
 {
+#if defined(SEEK_DATA)
     if (lseek (fd, 0, SEEK_DATA) == (off_t)-1 && errno == ENXIO)
         return true;
+#endif
     return false;
 }
 
@@ -83,19 +85,25 @@ static json_t *blobvec_create (int fd,
         goto error;
     }
     while (offset < size) {
+#if defined(SEEK_DATA)
         // N.B. fails with ENXIO if there is no more data
         if ((offset = lseek (fd, offset, SEEK_DATA)) == (off_t)-1) {
             if (errno == ENXIO)
                 break;
             goto error;
         }
+#endif
         if (offset < size) {
             off_t notdata;
             int blobsize;
 
+#if defined(SEEK_HOLE)
             // N.B. returns size if there are no more holes
             if ((notdata = lseek (fd, offset, SEEK_HOLE)) == (off_t)-1)
                 goto error;
+#else
+	    notdata = size;
+#endif
             blobsize = notdata - offset;
             if (blobsize > chunksize)
                 blobsize = chunksize;
diff --git a/src/common/libutil/test/fileref.c b/src/common/libutil/test/fileref.c
index 61cb0bdfe..6fbd84d9b 100644
--- a/src/common/libutil/test/fileref.c
+++ b/src/common/libutil/test/fileref.c
@@ -87,9 +87,11 @@ void rmfile (const char *name)
  */
 bool test_sparse (void)
 {
+    bool result = false;
+
+#if defined(SEEK_DATA)
     int fd;
     struct stat sb;
-    bool result = false;
 
     fd = open (mkpath ("testhole"), O_WRONLY | O_CREAT | O_TRUNC, 0600);
     if (fd < 0)
@@ -104,6 +106,7 @@ bool test_sparse (void)
         result = true;
     close (fd);
     rmfile ("testhole");
+#endif
     return result;
 }
 
@@ -709,7 +712,8 @@ int main (int argc, char *argv[])
     test_dir ();
     test_link ();
     test_small ();
-    test_empty ();
+    if (have_sparse)
+        test_empty ();
     test_expfail ();
     test_pretty_print ();
 

@grondo
Copy link
Contributor

grondo commented Apr 9, 2023

Reading about conda forge I see that it uses a sysroot that is currently based on CentOS 6 (!?). I think that might be the issue, though I could be misunderstanding since it seems surprising that conda would be based on CentOS 6 since I think that distro was EOL in 2020. However, can you try to force the sysroot to centos 7 with these instructions?

https://conda-forge.org/docs/maintainer/knowledge_base.html#using-centos-7

@grondo
Copy link
Contributor

grondo commented Apr 11, 2023

FYI - I was able to build flux-core on CentOS 6 (with a newer compiler, zeromq, czmq, libhwloc, libsodium, etc) with the following changes: (in case you are unable to change to a CentOS 7 sysroot)

diff --git a/src/broker/state_machine.c b/src/broker/state_machine.c
index 57e727fea..b02a3b144 100644
--- a/src/broker/state_machine.c
+++ b/src/broker/state_machine.c
@@ -11,6 +11,8 @@
 #if HAVE_CONFIG_H
 #include "config.h"
 #endif
+#include <signal.h>
+
 #include <flux/core.h>
 
 #include "src/common/libczmqcontainers/czmq_containers.h"
diff --git a/src/cmd/builtin/proxy.c b/src/cmd/builtin/proxy.c
index feff795b9..655df262a 100644
--- a/src/cmd/builtin/proxy.c
+++ b/src/cmd/builtin/proxy.c
@@ -18,6 +18,7 @@
 #include <sys/socket.h>
 #include <sys/un.h>
 #include <stdio.h>
+#include <signal.h>
 #include <assert.h>
 #include <stdlib.h>
 #include <string.h>
diff --git a/src/common/libsubprocess/server.c b/src/common/libsubprocess/server.c
index 2c164646c..275aa3443 100644
--- a/src/common/libsubprocess/server.c
+++ b/src/common/libsubprocess/server.c
@@ -13,6 +13,7 @@
 #endif
 
 #include <unistd.h> // defines environ
+#include <signal.h>
 #include <errno.h>
 #include <flux/core.h>
 
diff --git a/src/common/libterminus/pty.c b/src/common/libterminus/pty.c
index a1dbf6788..a3c6f4c7b 100644
--- a/src/common/libterminus/pty.c
+++ b/src/common/libterminus/pty.c
@@ -208,8 +208,10 @@ int flux_pty_kill (struct flux_pty *pty, int sig)
         pty->wait_for_client = false;
         pty->wait_on_close = false;
     }
+#if defined(TIOCSIG)
     if (ioctl (pty->leader, TIOCSIG, sig) >= 0)
         return 0;
+#endif
     llog_debug (pty, "ioctl (TIOCSIG): %s", strerror (errno));
     if (ioctl (pty->leader, TIOCGPGRP, &pgrp) >= 0
         && pgrp > 0
diff --git a/src/common/libutil/fileref.c b/src/common/libutil/fileref.c
index 69213b0f0..22a4d3359 100644
--- a/src/common/libutil/fileref.c
+++ b/src/common/libutil/fileref.c
@@ -57,8 +57,10 @@ static int blobvec_append (json_t *blobvec,
 
 static bool file_has_no_data (int fd)
 {
+#if defined(SEEK_DATA)
     if (lseek (fd, 0, SEEK_DATA) == (off_t)-1 && errno == ENXIO)
         return true;
+#endif
     return false;
 }
 
@@ -83,19 +85,25 @@ static json_t *blobvec_create (int fd,
         goto error;
     }
     while (offset < size) {
+#if defined(SEEK_DATA)
         // N.B. fails with ENXIO if there is no more data
         if ((offset = lseek (fd, offset, SEEK_DATA)) == (off_t)-1) {
             if (errno == ENXIO)
                 break;
             goto error;
         }
+#endif
         if (offset < size) {
             off_t notdata;
             int blobsize;
 
+#if defined(SEEK_HOLE)
             // N.B. returns size if there are no more holes
             if ((notdata = lseek (fd, offset, SEEK_HOLE)) == (off_t)-1)
                 goto error;
+#else
+	    notdata = size;
+#endif
             blobsize = notdata - offset;
             if (blobsize > chunksize)
                 blobsize = chunksize;
diff --git a/src/common/libutil/test/fileref.c b/src/common/libutil/test/fileref.c
index 61cb0bdfe..6fbd84d9b 100644
--- a/src/common/libutil/test/fileref.c
+++ b/src/common/libutil/test/fileref.c
@@ -87,9 +87,11 @@ void rmfile (const char *name)
  */
 bool test_sparse (void)
 {
+    bool result = false;
+
+#if defined(SEEK_DATA)
     int fd;
     struct stat sb;
-    bool result = false;
 
     fd = open (mkpath ("testhole"), O_WRONLY | O_CREAT | O_TRUNC, 0600);
     if (fd < 0)
@@ -104,6 +106,7 @@ bool test_sparse (void)
         result = true;
     close (fd);
     rmfile ("testhole");
+#endif
     return result;
 }
 
@@ -709,7 +712,8 @@ int main (int argc, char *argv[])
     test_dir ();
     test_link ();
     test_small ();
-    test_empty ();
+    if (have_sparse)
+        test_empty ();
     test_expfail ();
     test_pretty_print ();
 
diff --git a/src/modules/job-exec/job-exec.c b/src/modules/job-exec/job-exec.c
index d29aba68a..c6b76fb92 100644
--- a/src/modules/job-exec/job-exec.c
+++ b/src/modules/job-exec/job-exec.c
@@ -89,6 +89,7 @@
 #endif
 #include <assert.h>
 #include <unistd.h>
+#include <signal.h>
 #include <flux/core.h>
 
 #include "src/common/libczmqcontainers/czmq_containers.h"
diff --git a/src/shell/rlimit.c b/src/shell/rlimit.c
index 25a19d936..f83904525 100644
--- a/src/shell/rlimit.c
+++ b/src/shell/rlimit.c
@@ -60,8 +60,10 @@ static int rlimit_name_to_string (const char *name)
         return RLIMIT_NICE;
     if (streq (name, "rtprio"))
         return RLIMIT_RTPRIO;
+#ifdef RLIMIT_RTTTIME
     if (streq (name, "rttime"))
         return RLIMIT_RTTIME;
+#endif
     if (streq (name, "sigpending"))
         return RLIMIT_SIGPENDING;
     return -1;

@jan-janssen
Copy link
Author

@grondo thank you so much, the patch definitely helped. I am now able to progress further. Unfortunately, I am still not able to complete the compilation as I get the following error:

Making all in libsubprocess
make[3]: Entering directory '$SRC_DIR/src/common/libsubprocess'
  CC       command.lo
  CC       local.lo
  CC       fork.lo
  CC       posix_spawn.lo
  CC       remote.lo
  CC       server.lo
  CC       util.lo
  CC       subprocess.lo
subprocess.c: In function 'flux_standard_output':
subprocess.c:267:9: error: ignoring return value of 'fwrite' declared with attribute 'warn_unused_result' [-Werror=unused-result]
  267 |         fwrite (ptr, lenp, 1, fstream);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[3]: *** [Makefile:997: subprocess.lo] Error 1

@grondo
Copy link
Contributor

grondo commented Apr 11, 2023

Ah, it looks like conda forge is using GCC 11.3. The latest we test with is 11.2. For that particular issue you could try the following

diff --git a/src/common/libsubprocess/subprocess.c b/src/common/libsubprocess/su
bprocess.c
index da068b3d3..e2f41922f 100644
--- a/src/common/libsubprocess/subprocess.c
+++ b/src/common/libsubprocess/subprocess.c
@@ -263,7 +263,7 @@ void flux_standard_output (flux_subprocess_t *p, const char 
*stream)
     }
 
     if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+        (void) fwrite (ptr, lenp, 1, fstream);
 }
 
 /*

Let me see if I can test with GCC 11.3 and uncover any other issues with that compiler.

@grondo
Copy link
Contributor

grondo commented Apr 11, 2023

FYI - I was able to build with GCC 11.3 with no changes from current master, so I wonder if conda forge is adding some other warnings. I'll try to see if those are captured in the logs.

@jan-janssen
Copy link
Author

I applied your patch but the error remains:

subprocess.c: In function 'flux_standard_output':
subprocess.c:267:16: error: ignoring return value of 'fwrite' declared with attribute 'warn_unused_result' [-Werror=unused-result]
  267 |         (void) fwrite (ptr, lenp, 1, fstream);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[3]: *** [Makefile:997: subprocess.lo] Error 1

@grondo
Copy link
Contributor

grondo commented Apr 11, 2023

Interesting. I think this is another one that is due to older glibc provided by centos 6. warn_unused_result has since been removed from fwrite() since it doesn't make that much sense: https://sourceware.org/bugzilla/show_bug.cgi?id=11959
(Resolved in 2014!)

try this instead as workaround

diff --git a/src/common/libsubprocess/subprocess.c b/src/common/libsubprocess/subprocess.c
index da068b3d3..5134b3a47 100644
--- a/src/common/libsubprocess/subprocess.c
+++ b/src/common/libsubprocess/subprocess.c
@@ -262,8 +262,10 @@ void flux_standard_output (flux_subprocess_t *p, const char *stream)
         }
     }
 
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        if (fwrite (ptr, lenp, 1, fstream) < 0)
+            log_err ("fwrite");
+    }
 }
 
 /*

@grondo
Copy link
Contributor

grondo commented Apr 11, 2023

BTW, the reason (void) didn't work to resolve the unused result is detailed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25509 (an entertaining read)

@jan-janssen
Copy link
Author

Unfortunately also the if statement fails on conda:

subprocess.c: In function 'flux_standard_output':
subprocess.c:267:13: error: ignoring return value of 'fwrite' declared with attribute 'warn_unused_result' [-Werror=unused-result]
  267 |         if (fwrite (ptr, lenp, 1, fstream) < 0)
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

@grondo
Copy link
Contributor

grondo commented Apr 11, 2023

Ok, that doesn't make sense to me because the result is used, but I modified my system to add the warn_unsed_result attribute to fwrite and was able to reproduce. This silly change fixes it:

diff --git a/src/common/libsubprocess/subprocess.c b/src/common/libsubprocess/subprocess.c
index da068b3d3..72cb6cca6 100644
--- a/src/common/libsubprocess/subprocess.c
+++ b/src/common/libsubprocess/subprocess.c
@@ -262,8 +262,11 @@ void flux_standard_output (flux_subprocess_t *p, const char *stream)
         }
     }
 
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        int rc = fwrite (ptr, lenp, 1, fstream);
+        if (rc < 0)
+            log_err ("fwrite");
+    }
 }
 
 /*

@jan-janssen
Copy link
Author

This trick seems to work, it just seems to be missing at other parts as well:

make[2]: Entering directory '$SRC_DIR/src/shell'
  CC       plugstack.lo
  CC       jobspec.lo
  CC       rcalc.lo
  CCLD     libshell.la
  CC       mpir/rangelist.lo
  CC       mpir/nodelist.lo
  CC       mpir/proctable.lo
  CCLD     libmpir.la
  CC       shell.o
  CC       rc.o
  CC       builtins.o
  CC       info.o
  CC       task.o
  CC       log.o
  CC       events.o
  CC       pmi/pmi.o
  CC       pmi/pmi_exchange.o
  CC       input.o
  CC       output.o
output.c: In function 'shell_output_term':
output.c:177:17: error: ignoring return value of 'fwrite' declared with attribute 'warn_unused_result' [-Werror=unused-result]
  177 |                 fwrite (data, len, 1, f);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

@grondo
Copy link
Contributor

grondo commented Apr 11, 2023

Ah, sorry I guess I didn't test the whole build, try this:

diff --git a/src/cmd/flux-exec.c b/src/cmd/flux-exec.c
index eeeb2c581..404f86b87 100644
--- a/src/cmd/flux-exec.c
+++ b/src/cmd/flux-exec.c
@@ -192,7 +192,8 @@ void output_cb (flux_subprocess_t *p, const char *stream)
     if (lenp) {
         if (optparse_getopt (opts, "label-io", NULL) > 0)
             fprintf (fstream, "%d: ", flux_subprocess_rank (p));
-        fwrite (ptr, lenp, 1, fstream);
+        if (fwrite (ptr, lenp, 1, fstream) != lenp)
+            log_err ("fwrite");
     }
 }
 
diff --git a/src/cmd/flux-job.c b/src/cmd/flux-job.c
index 979c7fbf7..ea8ed86da 100644
--- a/src/cmd/flux-job.c
+++ b/src/cmd/flux-job.c
@@ -1674,7 +1674,8 @@ static void handle_output_data (struct attach_ctx *ctx, json_t *context)
     if (len > 0) {
         if (optparse_hasopt (ctx->p, "label-io"))
             fprintf (fp, "%s: ", rank);
-        fwrite (data, len, 1, fp);
+        if (fwrite (data, len, 1, fp) != len)
+            log_err ("fwrite");
         fflush (fp);
     }
     free (data);
@@ -2205,7 +2206,8 @@ void handle_exec_log_msg (struct attach_ctx *ctx, double ts, json_t *context)
                  rank,
                  stream);
     }
-    fwrite (data, len, 1, stderr);
+    if (fwrite (data, len, 1, stderr) != len)
+        log_err ("fwrite");
 }
 
 static struct idset *all_taskids (const struct taskmap *map)
diff --git a/src/common/libsubprocess/subprocess.c b/src/common/libsubprocess/subprocess.c
index da068b3d3..a63765b8e 100644
--- a/src/common/libsubprocess/subprocess.c
+++ b/src/common/libsubprocess/subprocess.c
@@ -262,8 +262,10 @@ void flux_standard_output (flux_subprocess_t *p, const char *stream)
         }
     }
 
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        if (fwrite (ptr, lenp, 1, fstream) != lenp)
+            log_err ("fwrite");
+    }
 }
 
 /*
diff --git a/src/shell/output.c b/src/shell/output.c
index 8ebf07365..e9eb3b964 100644
--- a/src/shell/output.c
+++ b/src/shell/output.c
@@ -174,7 +174,8 @@ static int shell_output_term (struct shell_output *out)
             }
             if ((output_type == FLUX_OUTPUT_TYPE_TERM) && len > 0) {
                 fprintf (f, "%s: ", rank);
-                fwrite (data, len, 1, f);
+                if (fwrite (data, len, 1, f) != len)
+                    return -1;
             }
             free (data);
         }

This got me all the way through make with GCC 11.3 and the warn_unused_result on fwrite.

Note that we probably don't want to apply these patches to flux-core, so I'd still suggest looking into updating the sysroot to centos 7 as a long(er) term fix.

@jan-janssen
Copy link
Author

Thanks a lot, with the latest patch I am able to compile all the modules and it only fails at the linking step. There I get the following error:

/home/conda/staged-recipes/build_artifacts/flux-core_1681245865557/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/11.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: .libs/flux-shell: local symbol `idset_encode' in ../../src/common/.libs/libflux-internal.a(libidset_la-idset_encode.o) is referenced by DSO
/home/conda/staged-recipes/build_artifacts/flux-core_1681245865557/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/11.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

@grondo
Copy link
Contributor

grondo commented Apr 11, 2023

I might have to try building in conda myself in order to debug that one.

Also I realized my patch above is incorrect. fwrite(3) returns size_t so it can't return a negative number (that's why the first try failed) 🤦

Edit: FYI - I've corrected the fwrite patch above.

@grondo
Copy link
Contributor

grondo commented Apr 12, 2023

I've made some progress here. We can get through make and make check-prep with the following extra patches:

diff -Nur flux-core-0.49.0/src/common/libeventlog/Makefile.in flux-core-0.49.0.new/src/common/libeventlog/Makefile.in
--- flux-core-0.49.0/src/common/libeventlog/Makefile.in	2023-04-05 16:24:09.982692163 +0000
+++ flux-core-0.49.0.new/src/common/libeventlog/Makefile.in	2023-04-12 03:49:02.262611471 +0000
@@ -691,7 +691,7 @@
 	$(top_builddir)/src/common/libtap/libtap.la \
 	$(top_builddir)/src/common/libeventlog/libeventlog.la \
 	$(top_builddir)/src/common/libutil/libutil.la \
-	$(JANSSON_LIBS)
+	$(JANSSON_LIBS) -lrt
 
 all: all-am
diff --git a/t/rexec/rexec_count_stdout.c b/t/rexec/rexec_count_stdout.c
index 8cb9ff932..f5e1985f8 100644
--- a/t/rexec/rexec_count_stdout.c
+++ b/t/rexec/rexec_count_stdout.c
@@ -69,8 +69,10 @@ void output_cb (flux_subprocess_t *p, const char *stream)
         }
     }
 
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        if (fwrite (ptr, lenp, 1, fstream) != lenp)
+            log_err ("fwrite");
+    }
 
     if (!strcasecmp (stream, "stdout"))
         stdout_count++;
diff --git a/t/rexec/rexec_getline.c b/t/rexec/rexec_getline.c
index b4a17c3ea..55fd57ffd 100644
--- a/t/rexec/rexec_getline.c
+++ b/t/rexec/rexec_getline.c
@@ -76,8 +76,10 @@ void output_cb (flux_subprocess_t *p, const char *stream)
 
     if (!(ptr = flux_subprocess_getline (p, stream, &lenp)))
         log_err_exit ("flux_subprocess_getline");
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        if (fwrite (ptr, lenp, 1, fstream) != lenp)
+            log_err ("fwrite");
+    }
     else
         fprintf (fstream, "EOF\n");
 }
diff -Nur flux-core-0.49.0/src/shell/Makefile.in flux-core-0.49.0.new/src/shell/Makefile.in
--- flux-core-0.49.0/src/shell/Makefile.in	2023-04-05 16:24:12.798689648 +0000
+++ flux-core-0.49.0.new/src/shell/Makefile.in	2023-04-12 02:50:11.579942667 +0000
@@ -872,13 +872,14 @@
 	$(top_builddir)/src/bindings/lua/libfluxlua.la \
 	$(top_builddir)/src/common/libflux-core.la \
 	$(top_builddir)/src/common/libflux-taskmap.la \
+	$(top_builddir)/src/common/libflux-idset.la \
 	$(top_builddir)/src/common/libpmi/libpmi_server.la \
 	$(top_builddir)/src/common/libpmi/libpmi_common.la \
 	$(top_builddir)/src/common/libczmqcontainers/libczmqcontainers.la \
-	$(top_builddir)/src/common/libflux-internal.la \
 	$(top_builddir)/src/common/libflux-optparse.la \
 	$(top_builddir)/src/common/libterminus/libterminus.la \
 	$(top_builddir)/src/common/libutil/libutil.la \
+	$(top_builddir)/src/common/libflux-internal.la \
 	$(LUA_LIB) \
 	$(HWLOC_LIBS) \
 	$(JANSSON_LIBS) \

Also, some build and runtime requirements were missing: jq (for make check), sphinx (for docs), and ply and lua-luaposix.

However, even with all that, make check is failing with errors:

Traceback (most recent call last):
  File "/home/conda/staged-recipes/build_artifacts/flux-core_1681310324812/work/
t/scripts/runpty.py", line 19, in <module>
    from flux import util
  File "/home/conda/staged-recipes/build_artifacts/flux-core_1681310324812/work/
src/bindings/python/flux/__init__.py", line 14, in <module>
    import flux.core.handle
  File "/home/conda/staged-recipes/build_artifacts/flux-core_1681310324812/work/
src/bindings/python/flux/core/handle.py", line 15, in <module>
    from _flux._core import ffi, lib
ModuleNotFoundError: No module named '_cffi_backend'

But cffi is installed, so I'm stumped at the moment.

@jan-janssen
Copy link
Author

jan-janssen commented Apr 12, 2023

I added the patches and make check runs for quite some time - currently it is stuck at PASS: t0019-jobspec-schema.t 78 - error: version_not_scalar.yaml Can this be related to one of the patches?

@grondo
Copy link
Contributor

grondo commented Apr 12, 2023

Yeah one of the tests is hanging. Note that many tests also fail. I'm unsure if it is related to any of the patches, though I don't find that likely, or is because of the errors in the conda environment (see above)

@grondo
Copy link
Contributor

grondo commented Apr 13, 2023

@jan-janssen it took a day longer than I thought, but I think I have a conda build working. I'll add the updated centos6.patch to the end of this comment.

I also found that some packages were missing - please add

  • to the build section:
    • jq (required for make check)
  • to the host section:
    • libuuid
    • ply
    • sphinx
    • ncurses
    • lua-luaposix
  • to the run section
    • czmq (I don't know why this one is required here, but I got failures without it)

Also, flux does not require MPI to build (It can run MPI jobs, so it needs them only for optional testing), so I removed {{ mpi }} form the requirements to avoid that combinatorial build. You can feel free to add that back, but in order to run MPI tests, you'll have to export FLUX_TEST_MPI=t in the test environment.

I also changed the build.sh script to run with some parallelism, since especially make check is very slow without it (Optional)

Here's the combined diff for all my current changes:

diff --git a/src/broker/state_machine.c b/src/broker/state_machine.c
index 57e727fea..b02a3b144 100644
--- a/src/broker/state_machine.c
+++ b/src/broker/state_machine.c
@@ -11,6 +11,8 @@
 #if HAVE_CONFIG_H
 #include "config.h"
 #endif
+#include <signal.h>
+
 #include <flux/core.h>
 
 #include "src/common/libczmqcontainers/czmq_containers.h"
diff --git a/src/cmd/builtin/proxy.c b/src/cmd/builtin/proxy.c
index feff795b9..655df262a 100644
--- a/src/cmd/builtin/proxy.c
+++ b/src/cmd/builtin/proxy.c
@@ -18,6 +18,7 @@
 #include <sys/socket.h>
 #include <sys/un.h>
 #include <stdio.h>
+#include <signal.h>
 #include <assert.h>
 #include <stdlib.h>
 #include <string.h>
diff --git a/src/common/libsubprocess/server.c b/src/common/libsubprocess/server.c
index 2c164646c..275aa3443 100644
--- a/src/common/libsubprocess/server.c
+++ b/src/common/libsubprocess/server.c
@@ -13,6 +13,7 @@
 #endif
 
 #include <unistd.h> // defines environ
+#include <signal.h>
 #include <errno.h>
 #include <flux/core.h>
 
diff --git a/src/common/libterminus/pty.c b/src/common/libterminus/pty.c
index a1dbf6788..a3c6f4c7b 100644
--- a/src/common/libterminus/pty.c
+++ b/src/common/libterminus/pty.c
@@ -208,8 +208,10 @@ int flux_pty_kill (struct flux_pty *pty, int sig)
         pty->wait_for_client = false;
         pty->wait_on_close = false;
     }
+#if defined(TIOCSIG)
     if (ioctl (pty->leader, TIOCSIG, sig) >= 0)
         return 0;
+#endif
     llog_debug (pty, "ioctl (TIOCSIG): %s", strerror (errno));
     if (ioctl (pty->leader, TIOCGPGRP, &pgrp) >= 0
         && pgrp > 0
diff --git a/src/common/libutil/fileref.c b/src/common/libutil/fileref.c
index 69213b0f0..22a4d3359 100644
--- a/src/common/libutil/fileref.c
+++ b/src/common/libutil/fileref.c
@@ -57,8 +57,10 @@ static int blobvec_append (json_t *blobvec,
 
 static bool file_has_no_data (int fd)
 {
+#if defined(SEEK_DATA)
     if (lseek (fd, 0, SEEK_DATA) == (off_t)-1 && errno == ENXIO)
         return true;
+#endif
     return false;
 }
 
@@ -83,19 +85,25 @@ static json_t *blobvec_create (int fd,
         goto error;
     }
     while (offset < size) {
+#if defined(SEEK_DATA)
         // N.B. fails with ENXIO if there is no more data
         if ((offset = lseek (fd, offset, SEEK_DATA)) == (off_t)-1) {
             if (errno == ENXIO)
                 break;
             goto error;
         }
+#endif
         if (offset < size) {
             off_t notdata;
             int blobsize;
 
+#if defined(SEEK_HOLE)
             // N.B. returns size if there are no more holes
             if ((notdata = lseek (fd, offset, SEEK_HOLE)) == (off_t)-1)
                 goto error;
+#else
+	    notdata = size;
+#endif
             blobsize = notdata - offset;
             if (blobsize > chunksize)
                 blobsize = chunksize;
diff --git a/src/common/libutil/test/fileref.c b/src/common/libutil/test/fileref.c
index 61cb0bdfe..6fbd84d9b 100644
--- a/src/common/libutil/test/fileref.c
+++ b/src/common/libutil/test/fileref.c
@@ -87,9 +87,11 @@ void rmfile (const char *name)
  */
 bool test_sparse (void)
 {
+    bool result = false;
+
+#if defined(SEEK_DATA)
     int fd;
     struct stat sb;
-    bool result = false;
 
     fd = open (mkpath ("testhole"), O_WRONLY | O_CREAT | O_TRUNC, 0600);
     if (fd < 0)
@@ -104,6 +106,7 @@ bool test_sparse (void)
         result = true;
     close (fd);
     rmfile ("testhole");
+#endif
     return result;
 }
 
@@ -709,7 +712,8 @@ int main (int argc, char *argv[])
     test_dir ();
     test_link ();
     test_small ();
-    test_empty ();
+    if (have_sparse)
+        test_empty ();
     test_expfail ();
     test_pretty_print ();
 
diff --git a/src/modules/job-exec/job-exec.c b/src/modules/job-exec/job-exec.c
index d29aba68a..c6b76fb92 100644
--- a/src/modules/job-exec/job-exec.c
+++ b/src/modules/job-exec/job-exec.c
@@ -89,6 +89,7 @@
 #endif
 #include <assert.h>
 #include <unistd.h>
+#include <signal.h>
 #include <flux/core.h>
 
 #include "src/common/libczmqcontainers/czmq_containers.h"
diff --git a/src/shell/rlimit.c b/src/shell/rlimit.c
index 25a19d936..f83904525 100644
--- a/src/shell/rlimit.c
+++ b/src/shell/rlimit.c
@@ -60,8 +60,10 @@ static int rlimit_name_to_string (const char *name)
         return RLIMIT_NICE;
     if (streq (name, "rtprio"))
         return RLIMIT_RTPRIO;
+#ifdef RLIMIT_RTTTIME
     if (streq (name, "rttime"))
         return RLIMIT_RTTIME;
+#endif
     if (streq (name, "sigpending"))
         return RLIMIT_SIGPENDING;
     return -1;
diff --git a/src/cmd/flux-exec.c b/src/cmd/flux-exec.c
index eeeb2c581..79747091d 100644
--- a/src/cmd/flux-exec.c
+++ b/src/cmd/flux-exec.c
@@ -190,9 +190,12 @@ void output_cb (flux_subprocess_t *p, const char *stream)
         log_err_exit ("flux_subprocess_getline");
 
     if (lenp) {
+        int rc;
         if (optparse_getopt (opts, "label-io", NULL) > 0)
             fprintf (fstream, "%d: ", flux_subprocess_rank (p));
-        fwrite (ptr, lenp, 1, fstream);
+        rc = fwrite (ptr, lenp, 1, fstream);
+        if (rc != lenp)
+            log_err ("fwrite");
     }
 }
 
diff --git a/src/cmd/flux-job.c b/src/cmd/flux-job.c
index 979c7fbf7..2eb7c7a8b 100644
--- a/src/cmd/flux-job.c
+++ b/src/cmd/flux-job.c
@@ -1672,9 +1672,12 @@ static void handle_output_data (struct attach_ctx *ctx, json_t *context)
     else
         fp = stderr;
     if (len > 0) {
+        int rc;
         if (optparse_hasopt (ctx->p, "label-io"))
             fprintf (fp, "%s: ", rank);
-        fwrite (data, len, 1, fp);
+        rc = fwrite (data, len, 1, fp);
+        if (rc != len)
+            log_err ("fwrite");
         fflush (fp);
     }
     free (data);
@@ -2205,7 +2208,9 @@ void handle_exec_log_msg (struct attach_ctx *ctx, double ts, json_t *context)
                  rank,
                  stream);
     }
-    fwrite (data, len, 1, stderr);
+    int rc = fwrite (data, len, 1, stderr);
+    if (rc != len)
+        log_err ("fwrite");
 }
 
 static struct idset *all_taskids (const struct taskmap *map)
diff --git a/src/common/libsubprocess/subprocess.c b/src/common/libsubprocess/subprocess.c
index da068b3d3..72cb6cca6 100644
--- a/src/common/libsubprocess/subprocess.c
+++ b/src/common/libsubprocess/subprocess.c
@@ -262,8 +262,11 @@ void flux_standard_output (flux_subprocess_t *p, const char *stream)
         }
     }
 
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        int rc = fwrite (ptr, lenp, 1, fstream);
+        if (rc != lenp)
+            log_err ("fwrite");
+    }
 }
 
 /*
diff --git a/src/shell/output.c b/src/shell/output.c
index 8ebf07365..6e3422605 100644
--- a/src/shell/output.c
+++ b/src/shell/output.c
@@ -173,8 +173,11 @@ static int shell_output_term (struct shell_output *out)
                 f = stderr;
             }
             if ((output_type == FLUX_OUTPUT_TYPE_TERM) && len > 0) {
+                int rc;
                 fprintf (f, "%s: ", rank);
-                fwrite (data, len, 1, f);
+                rc = fwrite (data, len, 1, f);
+                if (rc != len)
+                    return -1;
             }
             free (data);
         }
diff --git a/t/rexec/rexec_count_stdout.c b/t/rexec/rexec_count_stdout.c
index 8cb9ff932..f5e1985f8 100644
--- a/t/rexec/rexec_count_stdout.c
+++ b/t/rexec/rexec_count_stdout.c
@@ -69,8 +69,10 @@ void output_cb (flux_subprocess_t *p, const char *stream)
         }
     }
 
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        if (fwrite (ptr, lenp, 1, fstream) != lenp)
+            log_err ("fwrite");
+    }
 
     if (!strcasecmp (stream, "stdout"))
         stdout_count++;
diff --git a/t/rexec/rexec_getline.c b/t/rexec/rexec_getline.c
index b4a17c3ea..55fd57ffd 100644
--- a/t/rexec/rexec_getline.c
+++ b/t/rexec/rexec_getline.c
@@ -76,8 +76,10 @@ void output_cb (flux_subprocess_t *p, const char *stream)
 
     if (!(ptr = flux_subprocess_getline (p, stream, &lenp)))
         log_err_exit ("flux_subprocess_getline");
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        int rc = fwrite (ptr, lenp, 1, fstream);
+        (void) (rc);
+    }
     else
         fprintf (fstream, "EOF\n");
 }

diff -Nur flux-core-0.49.0/src/common/libeventlog/Makefile.in flux-core-0.49.0.new/src/common/libeventlog/Makefile.in
--- flux-core-0.49.0/src/common/libeventlog/Makefile.in	2023-04-05 16:24:09.982692163 +0000
+++ flux-core-0.49.0.new/src/common/libeventlog/Makefile.in	2023-04-12 03:49:02.262611471 +0000
@@ -691,7 +691,7 @@
 	$(top_builddir)/src/common/libtap/libtap.la \
 	$(top_builddir)/src/common/libeventlog/libeventlog.la \
 	$(top_builddir)/src/common/libutil/libutil.la \
-	$(JANSSON_LIBS)
+	$(JANSSON_LIBS) -lrt
 
 all: all-am
 
diff -Nur flux-core-0.49.0/src/shell/Makefile.in flux-core-0.49.0.new/src/shell/Makefile.in
--- flux-core-0.49.0/src/shell/Makefile.in	2023-04-05 16:24:12.798689648 +0000
+++ flux-core-0.49.0.new/src/shell/Makefile.in	2023-04-12 02:50:11.579942667 +0000
@@ -872,13 +872,14 @@
 	$(top_builddir)/src/bindings/lua/libfluxlua.la \
 	$(top_builddir)/src/common/libflux-core.la \
 	$(top_builddir)/src/common/libflux-taskmap.la \
+	$(top_builddir)/src/common/libflux-idset.la \
 	$(top_builddir)/src/common/libpmi/libpmi_server.la \
 	$(top_builddir)/src/common/libpmi/libpmi_common.la \
 	$(top_builddir)/src/common/libczmqcontainers/libczmqcontainers.la \
-	$(top_builddir)/src/common/libflux-internal.la \
 	$(top_builddir)/src/common/libflux-optparse.la \
 	$(top_builddir)/src/common/libterminus/libterminus.la \
 	$(top_builddir)/src/common/libutil/libutil.la \
+	$(top_builddir)/src/common/libflux-internal.la \
 	$(LUA_LIB) \
 	$(HWLOC_LIBS) \
 	$(JANSSON_LIBS) \
--- a/t/t0001-basic.t	2023-04-05 16:22:48.910745814 +0000
+++ b/t/t0001-basic.t	2023-04-13 13:45:26.201487673 +0000
@@ -613,7 +613,8 @@
 	grep "^test two commands" help2.out &&
 	grep "a test two" help2.out
 '
-test_expect_success 'flux-help command can display manpages for subcommands' '
+command -v man >/dev/null && test_set_prereq HAVE_MAN
+test_expect_success HAVE_MAN 'flux-help command can display manpages for subcommands' '
 	PWD=$(pwd) &&
 	mkdir -p man/man1 &&
 	cat <<-EOF > man/man1/flux-foo.1 &&
@@ -623,7 +624,7 @@
 	EOF
 	MANPATH=${PWD}/man FLUX_IGNORE_NO_DOCS=y flux help foo | grep "^FOO(1)"
 '
-test_expect_success 'flux-help command can display manpages for api calls' '
+test_expect_success HAVE_MAN 'flux-help command can display manpages for api calls' '
 	PWD=$(pwd) &&
 	mkdir -p man/man3 &&
 	cat <<-EOF > man/man3/flux_foo.3 &&
@@ -638,7 +639,7 @@
 	man notacommand >/dev/null 2>&1
 	echo $?
 }
-test_expect_success 'flux-help returns nonzero exit code from man(1)' '
+test_expect_success HAVE_MAN 'flux-help returns nonzero exit code from man(1)' '
 	test_expect_code $(missing_man_code) \
 		eval FLUX_IGNORE_NO_DOCS=y flux help notacommand
 '
diff --git a/t/sharness.d/01-setup.sh b/t/sharness.d/01-setup.sh
index 84b02f97f..fd9387ed5 100644
--- a/t/sharness.d/01-setup.sh
+++ b/t/sharness.d/01-setup.sh
@@ -57,6 +57,7 @@ else # normal case, use ${top_builddir}/src/cmd/flux
     #
     export LD_LIBRARY_PATH="${FLUX_BUILD_DIR}/src/common/.libs:$LD_LIBRARY_PATH"
 fi
+PATH=$(flux python -c 'import os,sys; print(os.path.dirname(sys.executable))'):$PATH
 export PATH
 
 if ! test -x ${fluxbin}; then

@grondo
Copy link
Contributor

grondo commented Apr 13, 2023

FYI: This is what I ended up with at the end of the build:

INFO :: The inputs making up the hashes for the built packages are as follows:
{
  "flux-core-0.49.0-py310lua54he957ee6": {
    "recipe": {
      "c_compiler": "gcc",
      "c_compiler_version": "11",
      "channel_targets": "conda-forge main",
      "libarchive": "3.6",
      "libuuid": "2",
      "lz4_c": "1.9.3",
      "ncurses": "6",
      "python": "3.10.* *_cpython",
      "sqlite": "3",
      "target_platform": "linux-64",
      "zeromq": "4.3.4"
    }
  },
  "flux-core-0.49.0-py38lua54h52dc48a": {
    "recipe": {
      "c_compiler": "gcc",
      "c_compiler_version": "11",
      "channel_targets": "conda-forge main",
      "libarchive": "3.6",
      "libuuid": "2",
      "lz4_c": "1.9.3",
      "ncurses": "6",
      "python": "3.8.* *_cpython",
      "sqlite": "3",
      "target_platform": "linux-64",
      "zeromq": "4.3.4"
    }
  },
  "flux-core-0.49.0-py39lua54hab9c2d2": {
    "recipe": {
      "c_compiler": "gcc",
      "c_compiler_version": "11",
      "channel_targets": "conda-forge main",
      "libarchive": "3.6",
      "libuuid": "2",
      "lz4_c": "1.9.3",
      "ncurses": "6",
      "python": "3.9.* *_cpython",
      "sqlite": "3",
      "target_platform": "linux-64",
      "zeromq": "4.3.4"
    }
  }
}

@grondo
Copy link
Contributor

grondo commented Apr 13, 2023

I've included some of these fixes in a PR #5093, so the patch for conda-forge can be reduced at the next release.

@grondo
Copy link
Contributor

grondo commented Apr 13, 2023

Hold on, I think when combining patches I may have made a mistake in the patch above. I'll paste a corrected version here once I've verified it is fully working.

@grondo
Copy link
Contributor

grondo commented Apr 13, 2023

Ok, here's a corrected patch (verified working this time). I was finding that tests were failing sometimes due to some fwrite: Success errors on stderr, so I changed all the fwrite return code usage into noops.

Also, I forgot to mention that I couldn't find a package containing man(1) for conda. That is a requirement for flux-core as well if you want to view the manpages and have a working flux help COMMAND.

diff --git a/src/broker/state_machine.c b/src/broker/state_machine.c
index 57e727fea..b02a3b144 100644
--- a/src/broker/state_machine.c
+++ b/src/broker/state_machine.c
@@ -11,6 +11,8 @@
 #if HAVE_CONFIG_H
 #include "config.h"
 #endif
+#include <signal.h>
+
 #include <flux/core.h>
 
 #include "src/common/libczmqcontainers/czmq_containers.h"
diff --git a/src/cmd/builtin/proxy.c b/src/cmd/builtin/proxy.c
index feff795b9..655df262a 100644
--- a/src/cmd/builtin/proxy.c
+++ b/src/cmd/builtin/proxy.c
@@ -18,6 +18,7 @@
 #include <sys/socket.h>
 #include <sys/un.h>
 #include <stdio.h>
+#include <signal.h>
 #include <assert.h>
 #include <stdlib.h>
 #include <string.h>
diff --git a/src/common/libsubprocess/server.c b/src/common/libsubprocess/server.c
index 2c164646c..275aa3443 100644
--- a/src/common/libsubprocess/server.c
+++ b/src/common/libsubprocess/server.c
@@ -13,6 +13,7 @@
 #endif
 
 #include <unistd.h> // defines environ
+#include <signal.h>
 #include <errno.h>
 #include <flux/core.h>
 
diff --git a/src/common/libterminus/pty.c b/src/common/libterminus/pty.c
index a1dbf6788..a3c6f4c7b 100644
--- a/src/common/libterminus/pty.c
+++ b/src/common/libterminus/pty.c
@@ -208,8 +208,10 @@ int flux_pty_kill (struct flux_pty *pty, int sig)
         pty->wait_for_client = false;
         pty->wait_on_close = false;
     }
+#if defined(TIOCSIG)
     if (ioctl (pty->leader, TIOCSIG, sig) >= 0)
         return 0;
+#endif
     llog_debug (pty, "ioctl (TIOCSIG): %s", strerror (errno));
     if (ioctl (pty->leader, TIOCGPGRP, &pgrp) >= 0
         && pgrp > 0
diff --git a/src/common/libutil/fileref.c b/src/common/libutil/fileref.c
index 69213b0f0..22a4d3359 100644
--- a/src/common/libutil/fileref.c
+++ b/src/common/libutil/fileref.c
@@ -57,8 +57,10 @@ static int blobvec_append (json_t *blobvec,
 
 static bool file_has_no_data (int fd)
 {
+#if defined(SEEK_DATA)
     if (lseek (fd, 0, SEEK_DATA) == (off_t)-1 && errno == ENXIO)
         return true;
+#endif
     return false;
 }
 
@@ -83,19 +85,25 @@ static json_t *blobvec_create (int fd,
         goto error;
     }
     while (offset < size) {
+#if defined(SEEK_DATA)
         // N.B. fails with ENXIO if there is no more data
         if ((offset = lseek (fd, offset, SEEK_DATA)) == (off_t)-1) {
             if (errno == ENXIO)
                 break;
             goto error;
         }
+#endif
         if (offset < size) {
             off_t notdata;
             int blobsize;
 
+#if defined(SEEK_HOLE)
             // N.B. returns size if there are no more holes
             if ((notdata = lseek (fd, offset, SEEK_HOLE)) == (off_t)-1)
                 goto error;
+#else
+	    notdata = size;
+#endif
             blobsize = notdata - offset;
             if (blobsize > chunksize)
                 blobsize = chunksize;
diff --git a/src/common/libutil/test/fileref.c b/src/common/libutil/test/fileref.c
index 61cb0bdfe..6fbd84d9b 100644
--- a/src/common/libutil/test/fileref.c
+++ b/src/common/libutil/test/fileref.c
@@ -87,9 +87,11 @@ void rmfile (const char *name)
  */
 bool test_sparse (void)
 {
+    bool result = false;
+
+#if defined(SEEK_DATA)
     int fd;
     struct stat sb;
-    bool result = false;
 
     fd = open (mkpath ("testhole"), O_WRONLY | O_CREAT | O_TRUNC, 0600);
     if (fd < 0)
@@ -104,6 +106,7 @@ bool test_sparse (void)
         result = true;
     close (fd);
     rmfile ("testhole");
+#endif
     return result;
 }
 
@@ -709,7 +712,8 @@ int main (int argc, char *argv[])
     test_dir ();
     test_link ();
     test_small ();
-    test_empty ();
+    if (have_sparse)
+        test_empty ();
     test_expfail ();
     test_pretty_print ();
 
diff --git a/src/modules/job-exec/job-exec.c b/src/modules/job-exec/job-exec.c
index d29aba68a..c6b76fb92 100644
--- a/src/modules/job-exec/job-exec.c
+++ b/src/modules/job-exec/job-exec.c
@@ -89,6 +89,7 @@
 #endif
 #include <assert.h>
 #include <unistd.h>
+#include <signal.h>
 #include <flux/core.h>
 
 #include "src/common/libczmqcontainers/czmq_containers.h"
diff --git a/src/shell/rlimit.c b/src/shell/rlimit.c
index 25a19d936..f83904525 100644
--- a/src/shell/rlimit.c
+++ b/src/shell/rlimit.c
@@ -60,8 +60,10 @@ static int rlimit_name_to_string (const char *name)
         return RLIMIT_NICE;
     if (streq (name, "rtprio"))
         return RLIMIT_RTPRIO;
+#ifdef RLIMIT_RTTTIME
     if (streq (name, "rttime"))
         return RLIMIT_RTTIME;
+#endif
     if (streq (name, "sigpending"))
         return RLIMIT_SIGPENDING;
     return -1;
diff --git a/src/cmd/flux-exec.c b/src/cmd/flux-exec.c
index eeeb2c581..79747091d 100644
--- a/src/cmd/flux-exec.c
+++ b/src/cmd/flux-exec.c
@@ -190,9 +190,12 @@ void output_cb (flux_subprocess_t *p, const char *stream)
         log_err_exit ("flux_subprocess_getline");
 
     if (lenp) {
+        int rc;
         if (optparse_getopt (opts, "label-io", NULL) > 0)
             fprintf (fstream, "%d: ", flux_subprocess_rank (p));
-        fwrite (ptr, lenp, 1, fstream);
+        rc = fwrite (ptr, lenp, 1, fstream);
+        (void) rc;
+
     }
 }
 
diff --git a/src/cmd/flux-job.c b/src/cmd/flux-job.c
index 979c7fbf7..2eb7c7a8b 100644
--- a/src/cmd/flux-job.c
+++ b/src/cmd/flux-job.c
@@ -1672,9 +1672,12 @@ static void handle_output_data (struct attach_ctx *ctx, json_t *context)
     else
         fp = stderr;
     if (len > 0) {
+        int rc;
         if (optparse_hasopt (ctx->p, "label-io"))
             fprintf (fp, "%s: ", rank);
-        fwrite (data, len, 1, fp);
+        rc = fwrite (data, len, 1, fp);
+        (void) rc;
+
         fflush (fp);
     }
     free (data);
@@ -2205,7 +2208,9 @@ void handle_exec_log_msg (struct attach_ctx *ctx, double ts, json_t *context)
                  rank,
                  stream);
     }
-    fwrite (data, len, 1, stderr);
+    int rc = fwrite (data, len, 1, stderr);
+    (void) rc;
+    
 }
 
 static struct idset *all_taskids (const struct taskmap *map)
diff --git a/src/common/libsubprocess/subprocess.c b/src/common/libsubprocess/subprocess.c
index da068b3d3..72cb6cca6 100644
--- a/src/common/libsubprocess/subprocess.c
+++ b/src/common/libsubprocess/subprocess.c
@@ -262,8 +262,11 @@ void flux_standard_output (flux_subprocess_t *p, const char *stream)
         }
     }
 
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        int rc = fwrite (ptr, lenp, 1, fstream);
+        (void) rc;
+
+    }
 }
 
 /*
diff --git a/src/shell/output.c b/src/shell/output.c
index 8ebf07365..6e3422605 100644
--- a/src/shell/output.c
+++ b/src/shell/output.c
@@ -173,8 +173,11 @@ static int shell_output_term (struct shell_output *out)
                 f = stderr;
             }
             if ((output_type == FLUX_OUTPUT_TYPE_TERM) && len > 0) {
+                int rc;
                 fprintf (f, "%s: ", rank);
-                fwrite (data, len, 1, f);
+                rc = fwrite (data, len, 1, f);
+                (void) rc;
+
             }
             free (data);
         }
diff --git a/t/rexec/rexec_count_stdout.c b/t/rexec/rexec_count_stdout.c
index 8cb9ff932..f5e1985f8 100644
--- a/t/rexec/rexec_count_stdout.c
+++ b/t/rexec/rexec_count_stdout.c
@@ -69,8 +69,10 @@ void output_cb (flux_subprocess_t *p, const char *stream)
         }
     }
 
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        int rc = fwrite (ptr, lenp, 1, fstream);
+        (void) rc;
+    }
 
     if (!strcasecmp (stream, "stdout"))
         stdout_count++;
diff --git a/t/rexec/rexec_getline.c b/t/rexec/rexec_getline.c
index b4a17c3ea..55fd57ffd 100644
--- a/t/rexec/rexec_getline.c
+++ b/t/rexec/rexec_getline.c
@@ -76,8 +76,10 @@ void output_cb (flux_subprocess_t *p, const char *stream)
 
     if (!(ptr = flux_subprocess_getline (p, stream, &lenp)))
         log_err_exit ("flux_subprocess_getline");
-    if (lenp)
-        fwrite (ptr, lenp, 1, fstream);
+    if (lenp) {
+        int rc = fwrite (ptr, lenp, 1, fstream);
+        (void) (rc);
+    }
     else
         fprintf (fstream, "EOF\n");
 }

diff -Nur flux-core-0.49.0/src/common/libeventlog/Makefile.in flux-core-0.49.0.new/src/common/libeventlog/Makefile.in
--- flux-core-0.49.0/src/common/libeventlog/Makefile.in	2023-04-05 16:24:09.982692163 +0000
+++ flux-core-0.49.0.new/src/common/libeventlog/Makefile.in	2023-04-12 03:49:02.262611471 +0000
@@ -691,7 +691,7 @@
 	$(top_builddir)/src/common/libtap/libtap.la \
 	$(top_builddir)/src/common/libeventlog/libeventlog.la \
 	$(top_builddir)/src/common/libutil/libutil.la \
-	$(JANSSON_LIBS)
+	$(JANSSON_LIBS) -lrt
 
 all: all-am
 
diff -Nur flux-core-0.49.0/src/shell/Makefile.in flux-core-0.49.0.new/src/shell/Makefile.in
--- flux-core-0.49.0/src/shell/Makefile.in	2023-04-05 16:24:12.798689648 +0000
+++ flux-core-0.49.0.new/src/shell/Makefile.in	2023-04-12 02:50:11.579942667 +0000
@@ -872,13 +872,14 @@
 	$(top_builddir)/src/bindings/lua/libfluxlua.la \
 	$(top_builddir)/src/common/libflux-core.la \
 	$(top_builddir)/src/common/libflux-taskmap.la \
+	$(top_builddir)/src/common/libflux-idset.la \
 	$(top_builddir)/src/common/libpmi/libpmi_server.la \
 	$(top_builddir)/src/common/libpmi/libpmi_common.la \
 	$(top_builddir)/src/common/libczmqcontainers/libczmqcontainers.la \
-	$(top_builddir)/src/common/libflux-internal.la \
 	$(top_builddir)/src/common/libflux-optparse.la \
 	$(top_builddir)/src/common/libterminus/libterminus.la \
 	$(top_builddir)/src/common/libutil/libutil.la \
+	$(top_builddir)/src/common/libflux-internal.la \
 	$(LUA_LIB) \
 	$(HWLOC_LIBS) \
 	$(JANSSON_LIBS) \
--- a/t/t0001-basic.t	2023-04-05 16:22:48.910745814 +0000
+++ b/t/t0001-basic.t	2023-04-13 13:45:26.201487673 +0000
@@ -613,7 +613,8 @@
 	grep "^test two commands" help2.out &&
 	grep "a test two" help2.out
 '
-test_expect_success 'flux-help command can display manpages for subcommands' '
+command -v man >/dev/null && test_set_prereq HAVE_MAN
+test_expect_success HAVE_MAN 'flux-help command can display manpages for subcommands' '
 	PWD=$(pwd) &&
 	mkdir -p man/man1 &&
 	cat <<-EOF > man/man1/flux-foo.1 &&
@@ -623,7 +624,7 @@
 	EOF
 	MANPATH=${PWD}/man FLUX_IGNORE_NO_DOCS=y flux help foo | grep "^FOO(1)"
 '
-test_expect_success 'flux-help command can display manpages for api calls' '
+test_expect_success HAVE_MAN 'flux-help command can display manpages for api calls' '
 	PWD=$(pwd) &&
 	mkdir -p man/man3 &&
 	cat <<-EOF > man/man3/flux_foo.3 &&
@@ -638,7 +639,7 @@
 	man notacommand >/dev/null 2>&1
 	echo $?
 }
-test_expect_success 'flux-help returns nonzero exit code from man(1)' '
+test_expect_success HAVE_MAN 'flux-help returns nonzero exit code from man(1)' '
 	test_expect_code $(missing_man_code) \
 		eval FLUX_IGNORE_NO_DOCS=y flux help notacommand
 '
diff --git a/t/sharness.d/01-setup.sh b/t/sharness.d/01-setup.sh
index 84b02f97f..fd9387ed5 100644
--- a/t/sharness.d/01-setup.sh
+++ b/t/sharness.d/01-setup.sh
@@ -57,6 +57,7 @@ else # normal case, use ${top_builddir}/src/cmd/flux
     #
     export LD_LIBRARY_PATH="${FLUX_BUILD_DIR}/src/common/.libs:$LD_LIBRARY_PATH"
 fi
+PATH=$(flux python -c 'import os,sys; print(os.path.dirname(sys.executable))'):$PATH
 export PATH
 
 if ! test -x ${fluxbin}; then

@grondo
Copy link
Contributor

grondo commented Apr 13, 2023

Oh, also, I had to add a LICENSE file, to get the build to finish successfully, but that one is more obvious.

@grondo
Copy link
Contributor

grondo commented Apr 15, 2023

@jan-janssen - I tried this build locally with MPI and for some reason in the conda-build environment running the MPI jobs as part of testing is not working. The test jobs are hanging (likely in MPI_Init). It looks like the MPI version being tested at the time is

Open MPI v4.1.5, package: Open MPI conda@d39edd4cafc2 Distribution, ident: 4.1.5, repo rev: v4.1.5, Feb 23, 2023

which I believe should work without the flux-pmix shell plugin.

However, to diagnose we need some way to test interactively in the conda-build environment. Do you know how to do that?
Another approach would be to build packages without MPI. Then install flux-core and MPI with conda and debug from there. Once we figure out the MPI launch issues, you could then add the mpi builds back to flux-core's recipe if desired (for testing I suppose)

@grondo
Copy link
Contributor

grondo commented Apr 17, 2023

@jan-janssen I finally figured out how to install a package built locally from the staged-recipes build-locally.py script into a conda environment for testing. The steps I used for my own future reference:

  • Install miniconda Linux installer from https://docs.conda.io/en/latest/miniconda.html#linux-installers
  • Install all dependencies as listed in your conda-forge recipe using the conda-forge channel:
    $ conda install -c conda-forge zeromq lz4-c libarchive lua libhwloc sqlite cffi jsonschema pyyaml czmq jansson ply lua-luaposix libuuid openmpi
    
  • Install locally built flux-core package with conda install --offline /path/to/staged-recipes/build_artifacts/linux-64/flux-core*.conda
  • export LDFLAGS=-L$HOME/miniconda3//lib CPPFLAGS=-I$HOME/miniconda3/include LD_LIBRARY_PATH=$HOME/miniconda3/lib

The last step allows flux-core to be configured and built under the conda environment.

After those steps, I was able to reproduce the failure with the openmpi tests. The problem appears to be solved by exporting the following environment variable:

OMPI_MCA_btl=self,tcp

I'm currently testing this with the conda-forge build-locally.py recipe.

@jan-janssen
Copy link
Author

@grondo I am sorry for the slow response, I updated the pull request now I only get one error and one failure:

2023-04-18T04:33:12.5087809Z FAIL: t2004-hydra.t 2 - Hydra can launch Flux
2023-04-18T04:33:12.5088193Z ERROR: t2004-hydra.t - exited with status 1

This might be related to running the tests in a restricted docker container. Maybe we can just skip those tests.

@grondo
Copy link
Contributor

grondo commented Apr 18, 2023

Yeah, the MPI tests seem to be brittle in the conda-forge build environment. Since Flux does not require MPI to build (it only requires it to run some MPI launch testing), my inclination would be to build the conda-forge flux-core package without all the MPI permutations, and then separately test that Flux can launch the various MPIs packaged in conda.

That is, a single build of flux-core should be able to launch multiple versions of MPI, so there is no reason to require a "openmpi" or "mpich" variant of flux-core in conda.

If you still want to build with {{ mpi }}, then it might be wise to disable the MPI tests for now by avoiding FLUX_TEST_MPI in the conda-forge build.sh.

@jan-janssen
Copy link
Author

@grondo Thanks a lot - the pull request is finally merged and I am going to start working on flux-sched next. Would you be interested in joining me in maintaining the conda-forge package? https://github.com/conda-forge/flux-core-feedstock

@jan-janssen
Copy link
Author

Somehow the build infrastructure between staged-recipes and the build infrastructure for the final build seems to be slightly different. Now I get the following error:

exclude.c: In function 'exclude_update':
exclude.c:90:13: error: pointer 'add_s' may be used after 'free' [-Werror=use-after-free]
   90 |             flux_log_error (h,
      |             ^~~~~~~~~~~~~~~~~~
   91 |                             "exclude: failed to undrain ranks in %s",
      |                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   92 |                             add_s);
      |                             ~~~~~~
exclude.c:86:9: note: call to 'free' here
   86 |         free (add_s);
      |         ^~~~~~~~~~~~
cc1: all warnings being treated as errors

@jan-janssen
Copy link
Author

@grondo
Copy link
Contributor

grondo commented Apr 19, 2023

Ah, that's an actual bug. Must be a different (newer?) compiler?

I'll get that fixed here, in the meantime here's a patch and sorry about that!

diff --git a/src/modules/resource/exclude.c b/src/modules/resource/exclude.c
index 62326abf6..ea0f46581 100644
--- a/src/modules/resource/exclude.c
+++ b/src/modules/resource/exclude.c
@@ -83,13 +83,13 @@ int exclude_update (struct exclude *exclude,
                                         add_s) < 0) {
             flux_log_error (h, "error posting exclude event");
         }
-        free (add_s);
         /*  Added exclude ranks can no longer be drained:
          */
         if (undrain_ranks (exclude->ctx->drain, add) < 0)
             flux_log_error (h,
                             "exclude: failed to undrain ranks in %s",
                             add_s);
+        free (add_s);
         idset_destroy (add);
     }
     if (del) {

@jan-janssen
Copy link
Author

Thanks a lot - it is working and the package should be pushed to the conda repositories soon.

@grondo
Copy link
Contributor

grondo commented Apr 20, 2023

Would you be interested in joining me in maintaining the conda-forge package? https://github.com/conda-forge/flux-core-feedstock

Beyond what I learned this week I have no experience with conda, so I'm not sure I'd be a good choice as a maintainer. Please feel free to continue to open issues here as they come up!

@jan-janssen
Copy link
Author

Beyond what I learned this week I have no experience with conda, so I'm not sure I'd be a good choice as a maintainer. Please feel free to continue to open issues here as they come up!

Either you can watch and learn which would be the preferred solution from my side, as it gives you the option to adjust the patch yourself without waiting for me. Still it is also fine for me to open the issues here and discuss. It depends on what you prefer.

@grondo
Copy link
Contributor

grondo commented Apr 20, 2023

Ah, that would be fine then in case it saves you some time shuffling patches back and forth from issues in this repo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants