Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSX-only multithreaded Z3 crash #1080

Closed
arifogel opened this issue Jun 16, 2017 · 19 comments
Closed

OSX-only multithreaded Z3 crash #1080

arifogel opened this issue Jun 16, 2017 · 19 comments

Comments

@arifogel
Copy link

Likely related to #698

I meant to report this a long time ago. Basically, the batfish project uses multiple contexts in parallel during some of its queries. This causes crashes ONLY in OSX. It works on Linux and Windows with no issue.
The master branch of batfish disables this parallelism only in OSX. I have created a topic branch that reenables this parallelism to reproduce the bug:
https://github.com/batfish/batfish/tree/ari/z3-osx-double-free
You can reproduce by:

  1. (Install brew coreutils and JDK 8 on OSX and perhaps other dependencies)
  2. git clone https://github.com/batfish/batfish
  3. cd batfish
  4. git checkout origin/ari/z3-osx-double-free
  5. . tools/batfish_functions.sh
  6. batfish_build_all
  7. cd tools && ./install_z3_osx.sh (may require root -- replaces any z3 libs on system with 4.5.0. If you want to test with latest master or development branch, skip this step)
  8. (from root of batfish repository) allinone -runmode interactive
  9. (inside allinone CLI): init-testrig test_rigs/example
  10. (inside allinone CLI): get reachability
arifogel@XXXXXX:~/git/batfish$allinone -runmode interactive
batfish> init-testrig test_rigs/example
Init'ed and set active container
Uploaded testrig. Parsing now.
Status: SUCCESS
PARSING SUMMARY
STATISTICS
  Parsing results:
    Parsed successfully: 17

batfish> get reachability
java(46891,0x70000ee52000) malloc: *** error for object 0x7fcfcb411f00: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
java(46891,0x70000f058000) malloc: *** error for object 0x7fcfcb411f00: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
java(46891,0x70000ef55000) malloc: *** error for object 0x7fcfcb411f00: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
/Users/arifogel/git/batfish/projects/allinone/allinone: line 16: 46891 Abort trap: 6           java $ALLINONE_JAVA_ARGS -DbatfishQuestionPluginDir="${BATFISH_JAVA_QUESTION_PLUGIN_DIR}" -jar "$ALLINONE_JAR" "$@"
@arifogel arifogel changed the title OSX-only multithreaded Z3 crsah OSX-only multithreaded Z3 crash Jun 16, 2017
@arifogel
Copy link
Author

arifogel commented Jun 16, 2017

Some relevant Java code:
Batifsh.java: computeNodeOutput (executes jobs in parallel):

   public Set<Flow> computeNodOutput(List<NodJob> jobs) {
      _logger.info("\n*** EXECUTING NOD JOBS ***\n");
      resetTimer();
      Set<Flow> flows = new TreeSet<>();
      BatfishJobExecutor<NodJob, NodAnswerElement, NodJobResult, Set<Flow>> executor = new BatfishJobExecutor<>(
            _settings, _logger, true, "NOD");
      // todo: do something with nod answer element
      executor.executeJobs(jobs, flows, new NodAnswerElement());
      printElapsedTime();
      return flows;
   }

NodJob.java:call (work done by each job):

   @Override
   public NodJobResult call() throws Exception {
      long startTime = System.currentTimeMillis();
      long elapsedTime;
      try (Context ctx = new Context()) {
         NodProgram baseProgram = _dataPlaneSynthesizer
               .synthesizeNodDataPlaneProgram(ctx);
         NodProgram queryProgram = _querySynthesizer.getNodProgram(baseProgram);
         NodProgram program = baseProgram.append(queryProgram);
         Params p = ctx.mkParams();
         p.add("fixedpoint.engine", "datalog");
         p.add("fixedpoint.datalog.default_relation", "doc");
         p.add("fixedpoint.print_answer", true);
         Fixedpoint fix = ctx.mkFixedpoint();
         fix.setParameters(p);
         for (FuncDecl relationDeclaration : program.getRelationDeclarations()
               .values()) {
            fix.registerRelation(relationDeclaration);
         }
         for (BoolExpr rule : program.getRules()) {
            fix.addRule(rule, null);
         }
         for (BoolExpr query : program.getQueries()) {
            Status status = fix.query(query);
            switch (status) {
            case SATISFIABLE:
               break;
            case UNKNOWN:
               throw new BatfishException("Query satisfiability unknown");
            case UNSATISFIABLE:
               break;
            default:
               throw new BatfishException("invalid status");
            }
         }
         Expr answer = fix.getAnswer();
         BoolExpr solverInput;
         if (answer.getArgs().length > 0) {
            List<Expr> reversedVarList = new ArrayList<>();
            reversedVarList.addAll(program.getVariablesAsConsts().values());
            Collections.reverse(reversedVarList);
            Expr[] reversedVars = reversedVarList.toArray(new Expr[] {});
            Expr substitutedAnswer = answer.substituteVars(reversedVars);
            solverInput = (BoolExpr) substitutedAnswer;
         }
         else {
            solverInput = (BoolExpr) answer;
         }
         if (_querySynthesizer.getNegate()) {
            solverInput = ctx.mkNot(solverInput);
         }
         Solver solver = ctx.mkSolver();
         solver.add(solverInput);
         Status solverStatus = solver.check();
         switch (solverStatus) {
         case SATISFIABLE:
            break;

         case UNKNOWN:
            throw new BatfishException("Stage 2 query satisfiability unknown");

         case UNSATISFIABLE:
            elapsedTime = System.currentTimeMillis() - startTime;
            return new NodJobResult(elapsedTime, _logger.getHistory());

         default:
            throw new BatfishException("invalid status");
         }
         Model model = solver.getModel();
         Map<String, Long> constraints = new LinkedHashMap<>();
         for (FuncDecl constDecl : model.getConstDecls()) {
            String name = constDecl.getName().toString();
            BitVecExpr varConstExpr = program.getVariablesAsConsts().get(name);
            long val = ((BitVecNum) model.getConstInterp(varConstExpr))
                  .getLong();
            constraints.put(name, val);
         }
         Set<Flow> flows = new HashSet<>();
         for (Pair<String, String> nodeVrf : _nodeVrfSet) {
            String node = nodeVrf.getFirst();
            String vrf = nodeVrf.getSecond();
            Flow flow = createFlow(node, vrf, constraints);
            flows.add(flow);
         }
         elapsedTime = System.currentTimeMillis() - startTime;
         return new NodJobResult(elapsedTime, _logger.getHistory(), flows);
      }
      catch (Z3Exception e) {
         elapsedTime = System.currentTimeMillis() - startTime;
         return new NodJobResult(elapsedTime, _logger.getHistory(),
               new BatfishException(
                     "Error running NoD on concatenated data plane", e));
      }
   }

@NikolajBjorner
Copy link
Contributor

Will you also gift us a shiny new Mac to test it (requires a 5K monitor, of course) :-).
OK, more earnestly, thanks for the detailed repro setup.

@wintersteiger
Copy link
Contributor

@arifogel I got to step 6. batfish_build_all, but I can't find any script of that name. Did that change name in the meanwhile?

@delcypher
Copy link
Contributor

@arifogel This may be a long shot but have you tried building Z3 with the ThreadSanitizer? Because you are using Z3 from Java you may have to pull some tricks to get the sanitizer runtime library to be loaded.

@arifogel
Copy link
Author

arifogel commented Jun 22, 2017 via email

@wintersteiger
Copy link
Contributor

Ah! Alright, that's probably it, I'm definitely not using the latest version of bash.

@mantognini
Copy link

I was wondering if more thoughts & efforts were put into this issue. We might have the same one with epfl-lara/stainless, or at least the symptoms look really similar. Just checking before investing more time into this. :)

@delcypher
Copy link
Contributor

@mantognini A random thought of mine...

Z3 uses OpenMP primitives for locking (now that we use C+11 it might be better to use the C++ standard library locks). IIRC Clang shiped with Xcode on macOS does not support OpenMP. Therefore if you use that as your compiler you have no locks and using Z3 in a multi-threaded manner is likely unsafe.

If you use a Clang (from homebrew that supports OpenMP) or gcc with OpenMP support to build Z3 do you run into the same problem?

@mantognini
Copy link

IIRC Clang shiped with Xcode on macOS does not support OpenMP.

I don't know if this has changed, but I think you get an error from Apple's clang. Anyway, I'll give a shot at this idea next week and report here.

@delcypher
Copy link
Contributor

delcypher commented Oct 6, 2017

@mantognini Both build systems (CMake and python/Makefile) try to detect on the first configure if the compiler supports OpenMP. If the compiler does not support it on this first configure, OpenMP support is disabled. You will not see any warnings from Clang.

If you ask the CMake build system to explicitly enable OpenMP (pass -DUSE_OPENMP=ON to cmake) then an error will be raised if the compiler does not support OpenMP. I can't remember what the other build system does.

@mantognini
Copy link

Oh, okay, didn't know that. Thanks for the info.

@dhalperi
Copy link

dhalperi commented Jan 27, 2018

Here's a concise Java unit test that reproduces "usually" on my Macbook (mid-2017, 4 cores):

package com.microsoft.z3.parallelism;

import static org.hamcrest.Matchers.equalTo;
import static org.junit.Assert.assertThat;

import com.microsoft.z3.Context;
import com.microsoft.z3.Solver;
import com.microsoft.z3.Status;
import java.util.stream.IntStream;
import org.junit.Test;

public class Z3ParallelismTest {

  private static void trivialZ3Program(int i) {
    try (Context ctx = new Context()) {
      Solver solver = ctx.mkSolver();
      solver.add(ctx.mkEq(ctx.mkInt(3), ctx.mkInt(4)));
      assertThat(solver.check(), equalTo(Status.UNSATISFIABLE));
    }
  }

  @Test
  public void testParallelism() {
    IntStream.rangeClosed(1, Runtime.getRuntime().availableProcessors() * 2)
        .parallel()
        .forEach(Z3ParallelismTest::trivialZ3Program);
  }
}

The crash messages vary:

objc[34866]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/bin/java (0x1025204c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x1025a44e0). One of the two will be used. Which one is undefined.
java(34866,0x70000b4d7000) malloc: *** error for object 0x7f94941b1200: double free
*** set a breakpoint in malloc_error_break to debug
objc[34856]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/bin/java (0x1008424c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x10186d4e0). One of the two will be used. Which one is undefined.
WARNING: parser error
WARNING: invalid new_gen function 'cost', switching to default one
WARNING: parser error
Failed to verify: m_parser.parse_string("cost", m_new_gen_function)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000011fce6315, pid=34856, tid=0x0000000000003c03
#
# JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libz3.dylib+0x625315]  smt::qi_queue::set_values(quantifier*, app*, unsigned int, unsigned int, unsigned int, float)+0x25
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
#

Hope this is useful.

@NikolajBjorner
Copy link
Contributor

Is openmp available on your platform?

@dhalperi
Copy link

@NikolajBjorner yes, but I'm using the OS X released z3 binary.

@dhalperi
Copy link

(It seems like undesirable behavior for that binary to just be broken.)

@NikolajBjorner
Copy link
Contributor

Ok, so I should take our build machine is basically not well configured or it is producing binaries that are useless for multithreading.

@wintersteiger
Copy link
Contributor

@delcypher Does the default clang on OSX now support OpenMP or would it take more effort than that to update our builds?

@arifogel
Copy link
Author

arifogel commented Feb 5, 2018

No. We had to use brew to build with OpenMP.
Specifically, brew install llvm. Our latest master branch at https://github.com/batfish/z3 has a script that produces such a build. You can look at the dependencies installed in .travis.yml to see what needs to be installed.
(Sorry for all the edits)
Note that using our build script the only extra run requirement is that libomp.dylib be present in a standard location after installing. We include this in the binaries we produce.

@NikolajBjorner
Copy link
Contributor

OpenMP is now no longer with us. Z3 uses C++11 threads for everything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants