Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include last working commit in README.md #1

Open
jcpetkovich opened this issue Jan 14, 2019 · 13 comments
Open

Include last working commit in README.md #1

jcpetkovich opened this issue Jan 14, 2019 · 13 comments

Comments

@jcpetkovich
Copy link

Is it possible to include the last working commit in the README.md for this?

I'm very interested in trying this out, but I can't get tfcompile to build. I want to sanity check against the commit that I'm trying to build so I can isolate it to human error.

Thanks for all your hard work!

@rongjiecomputer
Copy link
Owner

I don't remember when was the last time I build tfcompile from source (I don't do this very often because laptop will be completely useless during the 6 hours compilation).

I think the LKGR is around Sep 19, 2018.

You can post your build error console log here if you want.

@rongjiecomputer
Copy link
Owner

You might hit some template parsing compile errors if you are using pure MSVC due to MSVC's delayed template parsing instead of the standard 2-phase template parsing.

I use clang-cl with MSVC and -fno-delayed-template-parsing flag to compile Tensorflow. If you use /permissive- flag with pure MSVC from latest update of VS2017, it will probably works too.

@jcpetkovich
Copy link
Author

Thanks a lot! I was puzzling over those template errors! Using clang-cl with the specified flag, I get the following log on commit: ab795aebe033f1faa69808599706f88c27b2f54f

This commit was the last commit that I successfully compiled tfcompile on Linux.

Interestingly, there is an obvious missing #include to fix, but after that, I'm afraid I get a little confused with the error I get.
bazel_ab795aeb_build.log

On the master branch, I get:
bazel_master_build_log.log

Thanks a lot for your help so far!

@rongjiecomputer
Copy link
Owner

Change char final_packet[State::kPacketSize] HH_ALIGNAS(32) = {0}; to HH_ALIGNAS(32) char final_packet[State::kPacketSize] = {0}; (Note the position of HH_ALIGNAS(32). Only clang does not allow the first example, MSVC and GCC happily accept both).

You can find the file at bazel-<something>/external/highwayhash/highwayhash/.

I should send a PR to Highwayhash later to fix this permanently.

@jcpetkovich
Copy link
Author

I had some time to make more progress, and added a few things to .bazelrc.user along the way.

Now I'm facing another issue, it seems I'm missing the dlfcn.h header, but I'm not sure quite where to find it. Did you run into this problem as well?

bazelrc_user.txt
bazel_master.log

Thanks again for your help!

@jcpetkovich
Copy link
Author

Getting a littler further in, I run into a much louder set of errors!

bazel_master2.log

@rongjiecomputer
Copy link
Owner

Try to add -DWIN32_LEAN_AND_MEAN compile flag. The log shows conflict caused by winsock.h (included by windows.h by default) and winsock2.h.

@jcpetkovich
Copy link
Author

Thanks! I tried building with this, and I think it resolved the conflict between winsock.h and winsock2.h! But I still have the dlfcn.h error to contend with. Searching around it seems that you had a pull request that removed this header from another file, is this because dlfcn.h is not available on windows?

@jcpetkovich
Copy link
Author

Thanks! I tried building with this, and I think it resolved the conflict between winsock.h and winsock2.h! But I still have the dlfcn.h error to contend with. Searching around it seems that you had a pull request that removed this header from another file, is this because dlfcn.h is not available on windows?

I tried dropping in a dlfcn.h header from here: https://github.com/shishaochen/TensorFlow-0.8-Win/tree/63221dfc4f1a1d064308e632ba12e6a54afe1fd8/posix but this eventually led back to more missing headers in the same stacktrace.h file. It seems to me that PLATFORM_POSIX is defined when maybe it shouldn't be. What do you think?

@rongjiecomputer
Copy link
Owner

in https://github.com/tensorflow/tensorflow/blob/9590c4c32dd4346ea5c35673336f5912c6072bf2/tensorflow/core/platform/default/stacktrace.h#L20

Change the ifdef to this:

#if !defined(IS_MOBILE_PLATFORM) && !defined(PLATFORM_WINDOWS) && \ 
    defined(PLATFORM_POSIX) && \
    (defined(__clang__)) || defined(__GNUC__))
#define TF_GENERATE_BACKTRACE
#endif

@jcpetkovich
Copy link
Author

Aha! I don't know why I didn't think of that. Thanks! That got me much further!

Now I'm running into this issue:
bazel_master_fpclassify.log

Looking at the file and it's history, I think I might need to backtrack to before this commit:
tensorflow/tensorflow@dd3adf9

I checked out the parent of that commit: 0b6177c2fa

And tried again, and got this:
bazel_0b6177c2fa.log

That's where I'm stuck now. I'll give it another shot tonight.

@rongjiecomputer
Copy link
Owner

For first error, it is caused by MSVC's std implementation does not special-case integral type for std::isnan unlike libstdc++ and libc++.

Change this line https://github.com/tensorflow/tensorflow/blob/dd3adf935a800e25132b187d65aa62e97504f50c/tensorflow/compiler/xla/service/hlo_evaluator_typed_visitor.h#L914

  template <
      typename NativeT,
      typename std::enable_if<!is_complex_t<NativeT>::value && std::is_intergal<NativeT>::value>::type* = nullptr>
  Status HandleClamp(HloInstruction* clamp) {
    std::function<ElementwiseT(ElementwiseT, ElementwiseT, ElementwiseT)>
        clamp_op = [](ElementwiseT low, ElementwiseT value, ElementwiseT high) {
          return static_cast<ElementwiseT>(
              std::min(high, std::max(value, low)));
        };
    TF_ASSIGN_OR_RETURN(
        parent_->evaluated_[clamp],
        ElementwiseTernaryOp(clamp,
                             std::move(ConvertTernaryFunction(clamp_op))));
    return Status::OK();
  }

  template <
      typename NativeT,
      typename std::enable_if<!is_complex_t<NativeT>::value  && !std::is_intergal<NativeT>::value>::type* = nullptr>
  Status HandleClamp(HloInstruction* clamp) {
    std::function<ElementwiseT(ElementwiseT, ElementwiseT, ElementwiseT)>
        clamp_op = [](ElementwiseT low, ElementwiseT value, ElementwiseT high) {
          if (std::isnan(low) || std::isnan(high)) {
            return static_cast<ElementwiseT>(NAN);
          }
          return static_cast<ElementwiseT>(
              std::fmin(high, std::fmax(value, low)));
        };
    TF_ASSIGN_OR_RETURN(
        parent_->evaluated_[clamp],
        ElementwiseTernaryOp(clamp,
                             std::move(ConvertTernaryFunction(clamp_op))));
    return Status::OK();
  }

For the second error, it is because compiler-rt that comes with Clang 7 on Windows still does not have runtime functions for [u]int128 operations. Add --copt=/U__SIZEOF_INT128__ --host_copt=/U__SIZEOF_INT128__ to Bazel so that Eigen does not use [u]int128 on Windows. (This is fixed in Clang 8, but not yet released).

For the undefined __cpu_model symbol, the symbol comes from compiler-rt (clang_rt.builtins-x86_64.lib). In your LLVM installation, try to find this lib file under lib/clang/<version>/lib/windows, then add --linkopt=/defaultlib:path to Bazel.

@jcpetkovich
Copy link
Author

Success! I managed to build tfcompile.exe!

It was on this commit: 0b6177c2fa
clang: 7.0.1
python: 3.6.8
java: OpenJDK 64-Bit Server VM (build 9.0.7.1+1, mixed mode)

Bazel Info:

bazel-bin: C:/users/jcp/_bazel_jcp/5ui3ssh4/execroot/org_tensorflow/bazel-out/x64_windows-opt/bin
bazel-genfiles: C:/users/jcp/_bazel_jcp/5ui3ssh4/execroot/org_tensorflow/bazel-out/x64_windows-opt/genfiles
bazel-testlogs: C:/users/jcp/_bazel_jcp/5ui3ssh4/execroot/org_tensorflow/bazel-out/x64_windows-opt/testlogs
character-encoding: file.encoding = ISO-8859-1, defaultCharset = ISO-8859-1
command_log: C:/users/jcp/_bazel_jcp/5ui3ssh4/command.log
committed-heap-size: 652MB
execution_root: C:/users/jcp/_bazel_jcp/5ui3ssh4/execroot/org_tensorflow
gc-count: 16
gc-time: 1084ms
install_base: C:/users/jcp/_bazel_jcp/install/1aab8d4c0a7c3e1b65a21a1c51f808f0
java-home: C:/Users/jcp/_bazel_jcp/install/1aab8d4c0a7c3e1b65a21a1c51f808f0/_embedded_binaries/embedded_tools/jdk
java-runtime: OpenJDK Runtime Environment (build 9.0.7.1+1) by Azul Systems, Inc.
java-vm: OpenJDK 64-Bit Server VM (build 9.0.7.1+1, mixed mode) by Azul Systems, Inc.
max-heap-size: 1881MB
output_base: C:/users/jcp/_bazel_jcp/5ui3ssh4
output_path: C:/users/jcp/_bazel_jcp/5ui3ssh4/execroot/org_tensorflow/bazel-out
package_path: %workspace%
release: release 0.21.0
repository_cache: C:/users/jcp/_bazel_jcp/cache/repos/v1
server_log: c:\users\jcp\_bazel_jcp\5ui3ssh4\java.log.desktop-kn561q3.jcp.log.java.20190124-110705.20868
server_pid: 20868
used-heap-size: 104MB
workspace: C:/users/jcp/tensorflow

My .bazelrc.user:

build --copt=-DNOGDI
build --host_copt=-DNOGDI
build --copt=-D_USE_MATH_DEFINES
build --host_copt=-D_USE_MATH_DEFINES
build --copt=-fno-delayed-template-parsing
build --host_copt=-fno-delayed-template-parsing
build --copt=-DWIN32
build --host_copt=-DWIN32
build --copt=-DCINTERFACE
build --host_copt=-DCINTERFACE
build --copt=-DWIN32_LEAN_AND_MEAN
build --host_copt=-DWIN32_LEAN_AND_MEAN
build --copt=/U__SIZEOF_INT128__
build --host_copt=/U__SIZEOF_INT128__
build --linkopt=/defaultlib:"clang_rt.builtins-x86_64.lib"
build --linkopt=/libpath:"C:\\Program Files\\LLVM\\lib\\clang\\7.0.1\\lib\\windows"

Patch: changes.txt

And as you said: "Change char final_packet[State::kPacketSize] HH_ALIGNAS(32) = {0}; to HH_ALIGNAS(32) char final_packet[State::kPacketSize] = {0}; (Note the position of HH_ALIGNAS(32). Only clang does not allow the first example, MSVC and GCC happily accept both)."

I think that's the complete recipe!

Should any of this be submitted as a PR to tensorflow? What do you think?

Thanks again for all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants