Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine .o and .ji by default, add output options to control it #11640

Merged
merged 1 commit into from
Jun 22, 2015

Conversation

JeffBezanson
Copy link
Member

This switches us to generating a .o with system image data included by default. It adds the following options:

  • --output-o Write a .o.
  • --output-ji Write a .ji file
  • --output-bc Write LLVM bitcode.

--build and --dump-bitcode are replaced by these.

@StefanKarpinski
Copy link
Member

Would it make sense to have this be controlled as --build=[obj|ji|bitcode] instead, with --build being short for --build=obj?

@StefanKarpinski
Copy link
Member

Oh, I see that the filename is the argument to --build already. Still, this feels like something that wants an enumeration instead of a bunch of boolean options.

@vtjnash
Copy link
Member

vtjnash commented Jun 9, 2015

before merging, i would like to discuss how this fit in with #8745. i think this is a move in the right direction, but want to avoid renaming these too often.

@tkelman
Copy link
Contributor

tkelman commented Jun 9, 2015

What exactly is the advantage here? Binary installs don't need the .o file, why do we want to insist they carry that along now? Distro packagers and lint tools will likely not approve of packaging object files. And we need to preserve the ability to run without having the shared-library system image present. Changing JL_SYSTEM_IMAGE_PATH to include the shared library extensions won't help in that regard.

@JeffBezanson
Copy link
Member Author

You don't need to install the .o file, the same data ends up in the .so. This also fully preserves the ability to start without a .so. All you need to do is use --output-ji, then pass the file to -J.

@tkelman
Copy link
Contributor

tkelman commented Jun 9, 2015

Windows won't have a .dll until we upgrade LLVM, so it needs to be preserved without additional command-line arguments. If we don't have a shared library then the right place for this data is as a separate file.

@JeffBezanson
Copy link
Member Author

To further clarify, julia itself only generates .o files, and we use an external linker to make a .so. I agree the stuff in Make.inc needs to be more flexible, this is just a rough cut.

@JeffBezanson
Copy link
Member Author

needs to be preserved without additional command-line arguments

Why?

My design principle here is to set things up to do what we'd ideally want, but with enough options to make every platform work. Otherwise you end up with weird defaults and weird options, people wonder why it's done that way, and the answer is that a long time ago some platform was a bit broken. Not very satisfying.

@StefanKarpinski I agree using 3 boolean options seems a bit off, but in fact you often mix and match. --output-bitcode is typically used in addition to generating a .o, and you might want to generate a .ji, or .o, or both.

@tkelman
Copy link
Contributor

tkelman commented Jun 10, 2015

Why?

Double-clicking julia.exe from the binary installer of 0.4.0 should probably work correctly?

@JeffBezanson
Copy link
Member Author

So when the binary installer is built, we pass --output-ji, and set JL_SYSTEM_IMAGE_PATH appropriately? Why can't that be done?

@tkelman
Copy link
Contributor

tkelman commented Jun 10, 2015

That'll work. Why isn't that the default behavior though? I still don't see any advantage to putting these into the same file, why is that "what we'd ideally want"? Is there a real benefit?

@JeffBezanson
Copy link
Member Author

Generating a single object file with everything you need is what every other language does, and is just the right thing to do. Being able to generate a simple single object file goes a long way towards making julia a first-class language, and people who want to ship libraries or executables want this. Needing to locate and load an extra file on startup can actually be a significant annoyance.

@tkelman
Copy link
Contributor

tkelman commented Jun 10, 2015

Every other statically compiled language, and everything you need except the runtime library anyway.

Is this the only way of getting the .ji data into the .so file, or is it already there? I can definitely see a use case of wanting the .ji data both as a separate file and getting included into the .so. Would that require both a combined .o+.ji to link the .so from, then separately building the standalone .ji?

I see how this could make the static compilation use case a bit more streamlined. But since that's generally going to be invoking julia more like a conventional command-line compiler, it seems like putting that behind the non-default set of flags wouldn't be so bad?

It would be best to preserve the present configuration of default build outputs for the way julia and base get built, which should be doable here via makefile changes.

@JeffBezanson
Copy link
Member Author

everything you need except the runtime library anyway

Yes, but the key is to have the linker handle it, instead of needing to load files on your own.

I believe another option is to have the platform linker add the .ji data into the .so as a plain binary segment of some kind.

What is the use case for having a separate .ji file, other than our current windows problems?

I believe a "system image" should just be a .so or .dll, and we could easily forget that .ji ever existed. Especially if we start compiling modules or packages, it will be much more convenient to have one file per unit instead of two.

In fact, another option is to always generate the "combined" .so, but pass an option telling julia to ignore the pre-generated native code if necessary. This is much simpler. A system image is just a shared library file, and the only variables are how it's built and how it's used.

@tkelman
Copy link
Contributor

tkelman commented Jun 10, 2015

One issue is we don't always have reliable access to a linker on Windows (or no-Xcode Macs, I believe?). AFAIK there's no standalone way to get the MSVC linker without needing several gigabytes of visual studio at present, and using mingw gcc as a linker sort of works but has bootstrapping issues and can't really give full native debug info. Some day lld will be usable on all platforms (fingers crossed), but who knows how far off that is. Haskell and Rust have perpetually struggled with this issue on Windows, as has the scientific python stack.

The advantage of the separate .ji is being able to work without a linker. Would there be a way of generating a stub dll with no real content except our .ji data?

edit: lld might actually be making more progress than I thought it was, ref http://article.gmane.org/gmane.comp.compilers.llvm.devel/86607

@JeffBezanson
Copy link
Member Author

Yes, as long as we can look up the address of a symbol in a DLL we can load just the ji data and nothing else from it. But since julia by itself can't generate a DLL, it could make sense to keep the ability to generate a .ji file.

@tkelman
Copy link
Contributor

tkelman commented Jun 10, 2015

What I really want to avoid here, if at all possible, is needing to have a toolchain installed to do even the most basic level of package precompilation. Otherwise we essentially have to recreate (or use) what conda has done, and provide binaries even for pure-Julia packages for load times to ever be bearable.

But this might be a somewhat orthogonal concern, more for #8745 than for here. I'm not sure.

@vtjnash
Copy link
Member

vtjnash commented Jun 10, 2015

we'll have to figure out what to do about the undefined symbols on windows in that case, since "just loading the dll" can be problematic if it also pulls in a different libjulia.dll

@pao
Copy link
Member

pao commented Jun 10, 2015

AFAIK there's no standalone way to get the MSVC linker without needing several gigabytes of visual studio at present

The Windows SDK 7.1 standalone installer ISO is only half a gigabyte by comparison, but that doesn't really undermine your point.

@tkelman
Copy link
Contributor

tkelman commented Jun 10, 2015

And haven't they discontinued the separate SDK package since then?

edit: https://msdn.microsoft.com/en-gb/windows/desktop/bg162891 "The Windows SDK no longer ships with a complete command-line build environment. You must install a compiler and build environment separately."

@pao
Copy link
Member

pao commented Jun 10, 2015

That's for SDK 8+; SDK 7.1 is still available. But I don't know that it matters, I still don't think a half gig, separate installer is an acceptable option for us. Unlike MATLAB (where SDK 7.1 is the latest supported freely available Win64 toolchain) where most people would only need the toolchain for MEX compilation, I agree with you that Julia's precompilation needs to be widely available to users.

@JeffBezanson
Copy link
Member Author

In consultation with @vtjnash I now have a better and simpler approach. We have --output-o, --output-ji, and --output-bc, each of which takes a filename. These replace --build. When we write a .o file it will always contain system image data. This should just be considered static data that goes with the object code. The -J option of course still supports .ji and .so files.

@JeffBezanson
Copy link
Member Author

That would be ok; doesn't matter much to me. I was going for an analogy to gcc's -o.

@StefanKarpinski
Copy link
Member

It's clear what "output" means in the context of gcc: it's purpose is to build binaries from source code. The purpose of julia is not so restricted – there are lots of things that "output" could mean for julia; if you ask it to build something, on the other hand, that's a lot more specific.

@vtjnash vtjnash changed the title RFC: combine .o and .ji by default, add output options to control it combine .o and .ji by default, add output options to control it Jun 11, 2015
@JeffBezanson
Copy link
Member Author

Yes, "output" is a bit generic, but it should be pretty clear what --output-o means, since it says what it's outputting. I also feel like "build" refers to an overall build process, like something make does, while this is just a single step that generates a file or two.

@StefanKarpinski
Copy link
Member

"generate"? "compile"?


# if not absolute, then relative to the directory of the julia executable
JCPPFLAGS += "-DJL_SYSTEM_IMAGE_PATH=\"$(build_private_libdir_rel)/sys.ji\""
JCPPFLAGS += "-DJL_SYSTEM_IMAGE_PATH=\"$(build_private_libdir_rel)/sys.$(SHLIB_EXT)\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need to stay .ji on windows for now, or the binaries will stop working, yeah?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks. Actually I wonder if there should be some flexibility in the default system image name, trying both .ji and .dll so it can load whichever exists.

@JeffBezanson
Copy link
Member Author

Yes, that sounds quite possible. We could update the instructions to tell people to build with --output-ji, and/or add a command line option to ignore precompiled native code and use only the system image data inside sys.dylib. I think @vtjnash might have some concerns about this, but it seems reasonable to me.

@staticfloat
Copy link
Member

@JeffBezanson how are we supposed to update make install for this then? Should we copy over sys.ji if it exists, but if not, don't freak out?

@tkelman
Copy link
Contributor

tkelman commented Jun 24, 2015

The embedding docs also need to be updated for this, right?

https://groups.google.com/forum/#!topic/julia-users/EfXIEuoojtw

@waTeim
Copy link
Contributor

waTeim commented Jun 25, 2015

oh oh. I can check the embedding part anyway.

@waTeim
Copy link
Contributor

waTeim commented Jun 25, 2015

yep, it's a problem:

bizarro% julia julia-config.jl  --cflags
ERROR: LoadError: type Void has no field captures
 in cflags at /Users/jeffw/src/julia-embed/julia-config/julia-config.jl:39
 in main at /Users/jeffw/src/julia-embed/julia-config/julia-config.jl:60

24 function initDir()
 25     @unix_only return match(r"(.*)(/julia/sys.ji)",imagePath()).captures[1];
 26     @windows_only return match(r"(.*)(\\julia\\sys.ji)",imagePath()).captures[1];
 27 end
 28 
 29 function ldflags()
 30     replace("""-L$(libDir())""","\\","\\\\");
 31 end
 32 
 33 function ldlibs()
 34     @unix_only return replace("""-Wl,-rpath,$(libDir()) -ljulia""","\\","\\\\");
 35     @windows_only return replace("""-ljulia""","\\","\\\\");
 36 end
 37 
 38 function cflags()
 39     arg1 = replace(initDir(),"\\","\\\\\\\\");

@tkelman
Copy link
Contributor

tkelman commented Jun 25, 2015

@waTeim I suspect that's an unrelated Regex-related problem, likely due to the PCRE2 change, can you file a separate issue for that?

@waTeim
Copy link
Contributor

waTeim commented Jun 25, 2015

I can do that, but even better, I believe I can fix this.

@waTeim
Copy link
Contributor

waTeim commented Jun 25, 2015

Do I have this right? Programs that do embedding that call jl_init_with_image need to determine the right value, and it could be either a .ji or a .so (linux) or a .dll (windows) or a .dylib (osx) depending, and that program need to determine which one beforehand somehow?

@JeffBezanson
Copy link
Member Author

You can pass NULL and we will use a default location. You only need to pass a path if you want to tell julia to use a specific image.

@ScottPJones
Copy link
Contributor

Should the API take the name without the .ji, or always expect a .ji and strip it if not found with that extension, and then add the correct (for the platform) extension?

@waTeim
Copy link
Contributor

waTeim commented Jun 25, 2015

Unfortunately passing NULL doesn't always work especially if there are any symbolic links involved in how it's deployed. Something that simplifies things would be nice, maybe we can talk about it @ 7:00?

@ScottPJones
Copy link
Contributor

and/or add a command line option to ignore precompiled native code and use only the system image data inside sys.dylib

I'm not sure what @vtjnash concerns might have been that you (@JeffBezanson) were worried about, but this would be much nicer for trying to run coverage tests, not having to do a special make with the --output-ji .

@ScottPJones
Copy link
Contributor

I did build with the new --output-ji flag, got a sys.ji file, renamed the sys.dylib file, and got the following:

Scotts-Retina-MBP:j33 scott$ usr/bin/julia --inline-no --code-coverage=all
System image file "/j/j33/usr/bin/../lib/julia/sys.dylib" not found
Scotts-Retina-MBP:j33 scott$ ls -l /j/j33/usr/bin/../lib/julia/sys.*
-rw-r--r--  1 scott  wheel  18368477 Jun 27 17:29 /j/j33/usr/bin/../lib/julia/sys.ji
-rw-r--r--  1 scott  wheel  25857520 Jun 27 17:29 /j/j33/usr/bin/../lib/julia/sys.o

@staticfloat
Copy link
Member

@ScottPJones I think you have to directly specify the sys.ji file with the -J flag on startup, if you want to not load in sys.dylib.

@staticfloat
Copy link
Member

Hmmm..... This has also broken the coverage build process, since we can't just delete sys.so and fallback to running off of the .ji file, because it's not shipped by default anymore. I'd prefer to add that command-line option that ignores precompiled code rather than building a sys.ji and then deleting the sys.so file.

@ScottPJones
Copy link
Contributor

@staticfloat Unfortunately, even with your nice tip, about the -J flag, it didn't work.

Scotts-Retina-MBP:j33 scott$ usr/bin/julia --inline-no --code-coverage=all -J usr/lib/julia/sys.ji
ERROR: LoadError: syntax: invalid character ""
 in include at boot.jl:254
 in include_from_node1 at loading.jl:133
 in process_options at client.jl:305
 in _start at client.jl:405
while loading /j/j33/usr/lib/julia/sys.ji, in expression starting on line 1

@waTeim
Copy link
Contributor

waTeim commented Jun 28, 2015

So this is what I'm going to do. I'll work around my problem on the embedding side by first adding to my middleware some detection code that will attempt {.so,dylib,.dll} first and if that doesn't work .ji. If that works out, then later I can submit a pull request to the embeddingn api to make jl_init(0) work a little better by using readlink as necessary to avoid it's current pitfall.

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

I'd prefer to add that command-line option that ignores precompiled code

I think this is the way we should go. We might want to switch the default on this option on Windows if it can give good backtraces, instead of doing a --output-ji flag dance.

@ScottPJones
Copy link
Contributor

This change and the lack of a command-line option to make it act like it just had the .ji information, really makes it very difficult to work on improving coverage (even on the platforms where it is possible).
Given that a number of us are actively trying to improve the coverage, I would hope that would be a priority.

@tkelman
Copy link
Contributor

tkelman commented Jun 29, 2015

Tweaking this would also likely help mitigate #11818, since this change was the underlying cause of that timeout becoming much more frequent.

@waTeim
Copy link
Contributor

waTeim commented Jul 1, 2015

Following on from my previous comments, an update form the embedding perspective....

I've found so far that the following strategy works on Linux and OS/X

  1. determine the location of libjulia
  2. determine the filename of the system image by extracting it from jl_options.image_file -- this is nice, it's exported in julia.h; convenient.

However, this does not work out so well on Windows for even though the same strategy can be used and does result in a usable value. It appears that it results in a .ji rather than a .dll and use of .ji tends to lead to crashes here's an example.

@tkelman
Copy link
Contributor

tkelman commented Jul 1, 2015

@waTeim how recently did you try? I put sys.dll back in f98decc, but by default it isn't being loaded so backtraces are still okay.

@waTeim
Copy link
Contributor

waTeim commented Jul 1, 2015

@tkelman Yea good point, this is relative to an archive from June 29.

@waTeim
Copy link
Contributor

waTeim commented Jul 2, 2015

Yea, the updated version attempts to use the .dll, however it's still crashy. This could be a red herring because I'm using a fairly old version of MSYS2. However more likely is that since I'm testing on an old machine, it's bringing out timing errors that normally would not be seen. The start time does seem to be longer than previous versions. This may also be the effect of node, but I don't recall it being this unstable previously. I've updated the gist for reference. It might prove useful.

Updated:

Waitaminute. This is consistently crashing while trying to obtain an IP address. Why is that even happening at all?

@tkelman
Copy link
Contributor

tkelman commented Jul 2, 2015

Yes, the start time is slower than previous source builds. How things worked before this PR is source builds on Windows did have a sys.dll so started fast but had backtrace issues, and we deleted sys.dll from the binary installers so they started slowly but had better backtraces. This PR removed the sys.ji backup, and my change from a day or two ago made it so we include sys.dll but disable loading precompiled code from it by default on Windows (until we upgrade LLVM). You can change the --precompiled flag or the corresponding item in the jl_options struct.

nolta added a commit that referenced this pull request Sep 20, 2015
The '-b/--build' option was removed in #11640
a703799
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants