Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devdocs notes on memory layout and runtime initialisation #9468

Merged
merged 8 commits into from
Feb 21, 2015
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
250 changes: 250 additions & 0 deletions doc/devdocs/init.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
***********************************
Initialisation of the Julia runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file really should be comments on the code itself. there are too many specific implementation details mentioned which may change, and this file will then get out of sync. moving these comments to the function themselves would help avoid that. and the url links themselves will become out-of-sync almost immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with putting this stuff in source comments is that the reader would have to spend a lot of time fishing around in different files and following calls to find it. This is intended to be a code walkthrough of the execution path for a specific example. It took me the better part of a couple of days reading the source and playing in lldb to figure out what code path was followed in my example. My intention in writing it up this was is that someone else who wants to get a feel for what kind of stuff has to happen to print "Hello World!" can spend 15 mins reading a few pages.

As for there being "too many specific implementation details mentioned which may change" I think that is the nature of the beast. How can you have a runtime code walkthrough that contains anything other than specific implementation details? The point of documenting the current state is to make it easier for new devs to understand what is going on now, so they can effectively design and implement changes.

There is a tradeoff for all complex code bases: If you don't document the internals, then only a few people will understand them and be able to make useful contributions. If you do document the internals, then the documentation becomes part of the codebase and has to be updated when things change. Comments in the code are only one dimension. The other dimensions are just as important: i.e. data layout, architecture/layer diagrams, temporal views (walkthroughs) etc.

I admit I was worried about the line-number URL links too as I pasted them into the doc. However, until github has a syntax for linking to a function definition I think it's better than nothing. My guess is that the line numbers will drift over time, but that at least the links will take the reader close enough to the right place that it's still useful. I considered the alternative of using a link to a fixed version of the file, but I decided that it is better that the reader sees the latest code if they are reading about internals. If they find an annoyingly bad link, they can fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely put, @samoconnor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't mind if you want to make this relatively concise, for example, putting most of this as a comment in _julia_init. however, while this documentation is currently accurate, it has only been correct for about 2 weeks on master (since #9266) and parts of it will change again when #9450 is merged. that makes me very hesitant to point to specific line numbers or implementation details. i think adding this as a comment in the source code would provide a more concise overview of the julia_init function (for example), than trying to look at each of the functions that it calls. for a user looking to understand the code flow, I think a basic understand of ui/repl.c (or examples/embedding.c) might be a sufficient starting point in a few words:

1) parse arguments
2) initialize libjulia (for options that affect code generation or early initialization)
3) calls Base._start()
4) call atexit hooks

***********************************

How does the Julia runtime execute :code:`julia -e 'println("Hello World!")'` ?

main()
------

Execution starts at `main() in julia/ui/repl.c
<https://github.com/JuliaLang/julia/blob/master/ui/repl.c#L333>`_.

main() calls `libsupport_init()
<https://github.com/JuliaLang/julia/blob/master/src/support/libsupportinit.c#L10>`_
to set the C library locale and to initialise the "ios" library
(see `ios_init_stdstreams()
<https://github.com/JuliaLang/julia/blob/master/src/support/ios.c#L917>`_
and :ref:`dev-ios`).

Next `parse_opts()
<https://github.com/JuliaLang/julia/blob/master/ui/repl.c#L80>`_
is called to process command line options. Note that parse_opts()
only deals with options that affect code generation or early initialisation. Other
options are handled later by `process_options() in base/client.jl
<https://github.com/JuliaLang/julia/blob/master/base/client.jl#L214>`_

parse_opts() stores command line options in the `global jl_compileropts
struct
<https://github.com/JuliaLang/julia/blob/master/src/julia.h#L1320>`_.


julia_init()
------------


`julia_init() in task.c
<https://github.com/JuliaLang/julia/blob/master/src/task.c#L270>`_ is
called by main() and calls `_julia_init() in init.c
<https://github.com/JuliaLang/julia/blob/master/src/init.c#L875>`_.

_julia_init() begins by calling libsupport_init() again (it does
nothing the second time).

`restore_signals()
<https://github.com/JuliaLang/julia/blob/master/src/init.c#L402>`_ is
called to zero the signal handler mask.

`jl_resolve_sysimg_location()
<https://github.com/JuliaLang/julia/blob/master/src/init.c#L823>`_ searches
configured paths for the base system image. See :ref:`dev-sysimg`.

`jl_gc_init()
<https://github.com/JuliaLang/julia/blob/master/src/gc.c#L1096>`_
sets up allocation pools and lists for: weak refs, preserved values
and finalization.

`jl_init_frontend()
<https://github.com/JuliaLang/julia/blob/master/src/ast.c#L119>`_
loads and initialises a pre-compiled femptolisp image containing
the scanner/paser;

`jl_init_types()
<https://github.com/JuliaLang/julia/blob/master/src/jltypes.c#L2887>`_
creates jl_datatype_t type description objects for the `built-in
types defined in julia.h
<https://github.com/JuliaLang/julia/blob/master/src/julia.h#L295>`_. e.g.::

jl_any_type = jl_new_abstracttype(jl_symbol("Any"), NULL, jl_null);
jl_any_type->super = jl_any_type;

jl_type_type = jl_new_abstracttype(jl_symbol("Type"), jl_any_type, jl_null);

jl_int32_type = jl_new_bitstype(jl_symbol("Int32"),
jl_any_type, jl_null, 32);

`jl_init_tasks()
<https://github.com/JuliaLang/julia/blob/master/src/task.c#L870>`_ creates
the jl_datatype_t* jl_task_type object; initialises the global
`jl_root_task struct
<https://github.com/JuliaLang/julia/blob/master/src/julia.h#L1159>`_; and
sets jl_current_task to the root task.

`jl_init_codegen()
<https://github.com/JuliaLang/julia/blob/master/src/codegen.cpp#L4830>`_
initialises the `LLVM library <http://llvm.org>`_.

`jl_init_serializer()
<https://github.com/JuliaLang/julia/blob/master/src/dump.c#L1732>`_
initialises 8-bit serialisation tags for 256 frequently used
jl_value_t values. The serialisation mechanism uses these tags as
shorthand (in lieu of storing whole objects) to save storage space.

.. sidebar:: sysimg

If there is a sysimg file, it contains a pre-cooked image of the "Core" and "Main" modules (and whatever else is created by "boot.jl"). See :ref:`dev-sysimg`.

`jl_restore_system_image() <https://github.com/JuliaLang/julia/blob/master/src/dump.c#L1379>`_ de-serialises the saved sysimg into the current Julia runtime environment and initialisation continues after jl_init_box_caches() below...

Note: `jl_restore_system_image() (and dump.c in general) <https://github.com/JuliaLang/julia/blob/master/src/dump.c#L1379>`_ uses the :ref:`dev-ios`.


If there is no sysimg file (:code:`!jl_compileropts.image_file`) then
then "Core" and "Main" modules are created and "boot.jl" is evaluated:

:code:`jl_core_module = jl_new_module(jl_symbol("Core"))` creates
the Julia "Core" module.

`jl_init_intrinsic_functions()
<https://github.com/JuliaLang/julia/blob/master/src/intrinsics.cpp#L1254>`_
creates a new Julia module "Intrinsics" containing constant
jl_intrinsic_type symbols. These define an integer code for
each `intrinsic function
<https://github.com/JuliaLang/julia/blob/master/src/intrinsics.cpp#L2>`_.
`emit_intrinsic()
<https://github.com/JuliaLang/julia/blob/master/src/intrinsics.cpp#L757>`_
translates these symbols into LLVM instructions during code generation.

`jl_init_primitives()
<https://github.com/JuliaLang/julia/blob/master/src/builtins.c#L989>`_
hooks C functions up to Julia function symbols. e.g. the symbol
:func:`Base.is` is bound to C function pointer :code:`jl_f_is`
by calling :code:`add_builtin_func("eval", jl_f_top_eval)`, which does::

jl_set_const(jl_core_module,
jl_symbol("is"),
jl_new_closure(jl_f_top_eval, jl_symbol("eval"), NULL));


`jl_new_main_module()
<https://github.com/JuliaLang/julia/blob/master/src/toplevel.c>`_
creates the global "Main" module and sets
:code:`jl_current_task->current_module = jl_main_module`.

Note: _julia_init() `then sets <https://github.com/JuliaLang/julia/blob/master/src/init.c#L975>`_ :code:`jl_root_task->current_module = jl_core_module`. :code:`jl_root_task` is an alias of :code:`jl_current_task` at this point, so the current_module set by jl_new_main_module() above is overwritten.

`jl_load("boot.jl") <https://github.com/JuliaLang/julia/blob/master/src/toplevel.c#L568>`_ calls `jl_parse_eval_all("boot.jl") <https://github.com/JuliaLang/julia/blob/master/src/toplevel.c#L525>`_ which repeatedly calls `jl_parse_next() <https://github.com/JuliaLang/julia/blob/master/src/ast.c#L523>`_ and `jl_toplevel_eval_flex() <https://github.com/JuliaLang/julia/blob/master/src/toplevel.c#L376>`_ to parse and execute `boot.jl <https://github.com/JuliaLang/julia/blob/master/base/boot.jl#L116>`_. TODO -- drill down into eval?

`jl_get_builtin_hooks() <https://github.com/JuliaLang/julia/blob/master/src/init.c#L1209>`_ initialises global C pointers to Julia globals defined in boot.jl.


`jl_init_box_caches() <https://github.com/JuliaLang/julia/blob/master/src/alloc.c#L850>`_ pre-allocates global boxed integer value objects for values up to 1024. This speeds up allocation of boxed ints later on. e.g.::

jl_value_t *jl_box_uint8(uint32_t x)
{
return boxed_uint8_cache[(uint8_t)x];
}

`_julia_init() iterates <https://github.com/JuliaLang/julia/blob/master/src/init.c#L997>`_ over the :code:`jl_core_module->bindings.table` looking for :code:`jl_datatype_t` values and sets the type name's module prefix to :code:`jl_core_module`.

`jl_add_standard_imports(jl_main_module) <https://github.com/JuliaLang/julia/blob/master/src/toplevel.c#L34>`_ does "using Base" in the "Main" module.

Note: _julia_init() `now reverts <https://github.com/JuliaLang/julia/blob/master/src/init.c#L1017>`_ to :code:`jl_root_task->current_module = jl_main_module` as it was before being `set to jl_core_module <https://github.com/JuliaLang/julia/blob/master/src/init.c#L975>`_ above.

Platform specific signal handlers are initialised for SIGSEGV (OSX, Linux), and SIGFPE (Windows).

Other signals (SIGINFO, SIGBUS, SIGILL, SIGTERM, SIGABRT, SIGQUIT, SIGSYS and SIGPIPE) are hooked up to `sigdie_handler() <https://github.com/JuliaLang/julia/blob/master/src/init.c#L174>`_ which prints a backtrace.

`jl_init_restored_modules() <https://github.com/JuliaLang/julia/blob/master/src/dump.c#L1458>`_ calls `jl_module_run_initializer() <https://github.com/JuliaLang/julia/blob/master/src/module.c#L429>`_ for each de-serialised module to run the "__init__" function.

Finally `sigint_handler() <https://github.com/JuliaLang/julia/blob/master/src/init.c#L409>`_ is hooked up to SIGINT and calls :code:`jl_throw(jl_interrupt_exception)`.

_julia_init() the returns `back to main() in julia/ui/repl.c
<https://github.com/JuliaLang/julia/blob/master/ui/repl.c#L355>`_ and main() calls :code:`true_main(argc, (char**)argv)`.

true_main()
-----------

`true_main() <https://github.com/JuliaLang/julia/blob/master/ui/repl.c#L275>`_ loads the contents of :code:`argv[]` into :data:`Base.ARGS`.

If a .jl "program" file was supplied on the command line, then `exec_program() <https://github.com/JuliaLang/julia/blob/master/ui/repl.c#L219>`_ calls `jl_load(program) <https://github.com/JuliaLang/julia/blob/master/src/toplevel.c#L568>`_ which calls `jl_parse_eval_all() <https://github.com/JuliaLang/julia/blob/master/src/toplevel.c#L525>`_ which repeatedly calls `jl_parse_next() <https://github.com/JuliaLang/julia/blob/master/src/ast.c#L523>`_ and `jl_toplevel_eval_flex() <https://github.com/JuliaLang/julia/blob/master/src/toplevel.c#L376>`_ to parse and execute the program.

However, in our example (:code:`julia -e 'println("Hello World!")'`), `jl_get_global(jl_base_module, jl_symbol("_start")) <https://github.com/JuliaLang/julia/blob/master/src/module.c#L320>`_ looks up `Base._start <https://github.com/JuliaLang/julia/blob/master/base/client.jl#L388>`_ and `jl_apply() <https://github.com/JuliaLang/julia/blob/master/src/julia.h#L987>`_ executes it.


Base._start
-----------

`Base._start <https://github.com/JuliaLang/julia/blob/master/base/client.jl#L388>`_ calls `Base.process_options <https://github.com/JuliaLang/julia/blob/master/base/client.jl#L214>`_ which calls `jl_parse_input_line("println(\"Hello World!\")") <https://github.com/JuliaLang/julia/blob/master/src/ast.c#L468>`_ to create an expression object and :func:`Base.eval` to execute it.


Base.eval
---------

Base.eval was `mapped to jl_f_top_eval <https://github.com/JuliaLang/julia/blob/master/src/builtins.c#L1005>`_ by jl_init_primitives().

`jl_f_top_eval() <https://github.com/JuliaLang/julia/blob/master/src/builtins.c#L444>`_ calls `jl_toplevel_eval_in(jl_main_module, ex) <https://github.com/JuliaLang/julia/blob/master/src/builtins.c#L444>`_, where "ex" is the parsed expression :code:`println("Hello World!")`.

`jl_toplevel_eval_in() <https://github.com/JuliaLang/julia/blob/master/src/builtins.c#L417>`_ calls `jl_toplevel_eval_flex() <https://github.com/JuliaLang/julia/blob/master/src/toplevel.c#L376>`_ which calls `eval() in interpreter.c <https://github.com/JuliaLang/julia/blob/master/src/interpreter.c#L112>`_.

The stack dump below shows how the interpreter works its way through various methods of :func:`Base.println` and :func:`Base.print` before arriving at `write{T}(s::AsyncStream, a::Array{T}) <https://github.com/JuliaLang/julia/blob/master/base/stream.jl#L782>`_ which does :code:`ccall(jl_write_no_copy())`.

`jl_write_no_copy() <https://github.com/JuliaLang/julia/blob/master/src/jl_uv.c#L580>`_
calls uv_write() to write "Hello World!" to JL_STDOUT. See :ref:`dev-libuv`.::

Hello World!


============================ ================= ===============================================
Stack frame Source code Notes
============================ ================= ===============================================
jl_write_no_copy() jl_uv.c:552 called though :func:`Base.ccall`
julia_write_282942 stream.jl:734 function write!{T}(s::AsyncStream, a::Array{T})
julia_print_284639 ascii.jl:93 print(io::IO, s::ASCIIString) = (write(io, s);nothing)
jlcall_print_284639
jl_apply() julia.h:989
jl_trampoline() builtins.c:835
jl_apply() julia.h:989
jl_apply_generic() gf.c:1624 Base.print(Base.TTY, ASCIIString)
jl_apply() julia.h:989
jl_trampoline() builtins.c:835
jl_apply() julia.h:989
jl_apply_generic() gf.c:1643 Base.print(Base.TTY, ASCIIString, Char, Char...)
jl_apply() julia.h:989
jl_f_apply() builtins.c:374
jl_apply() julia.h:989
jl_trampoline() builtins.c:835
jl_apply() julia.h:989
jl_apply_generic() gf.c:1643 Base.println(Base.TTY, ASCIIString, ASCIIString...)
jl_apply() julia.h:989
jl_trampoline() builtins.c:835
jl_apply() julia.h:989
jl_apply_generic() gf.c:1643 Base.println(ASCIIString,)
jl_apply() julia.h:989
do_call() interpreter.c:70
eval() interpreter.c:210
jl_interpret_toplevel_expr() interpreter.c:25
jl_toplevel_eval_flex() toplevel.c:498
jl_toplevel_eval() toplevel.c:521
jl_toplevel_eval_in() builtins.c:440
jl_f_top_eval() builtins.c:469
============================ ================= ===============================================

Since our example has just one function call, which has done its
job of printing "Hello World!" the stack now rapidly unwinds back to main().


jl_atexit_hook()
----------------

main() calls `jl_atexit_hook()
<https://github.com/JuliaLang/julia/blob/master/src/init.c#L448>`_. This
calls _atexit for each module, then calls `jl_gc_run_all_finalizers()
<https://github.com/JuliaLang/julia/blob/master/src/gc.c#L325>`_
and cleans up libuv handles.


julia_save()
------------

Finally main() calls `julia_save() <https://github.com/JuliaLang/julia/blob/master/src/init.c#L1155>`_, which if requested on the command line, saves the runtime state to a new system image. See `jl_compile_all() <https://github.com/JuliaLang/julia/blob/master/src/gf.c#L1525>`_ and `jl_save_system_image() <https://github.com/JuliaLang/julia/blob/master/src/dump.c#L1300>`_.
2 changes: 2 additions & 0 deletions doc/devdocs/julia.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
.. toctree::
:maxdepth: 1

init
object
cartesian
meta
subarrays
Expand Down
77 changes: 77 additions & 0 deletions doc/devdocs/object.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
******************************
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash reminder to self: correct this section in #7906

Memory layout of Julia Objects
******************************

Object layout (jl_value_t)
--------------------------

.. sidebar:: `special case. <https://github.com/JuliaLang/julia/blob/master/src/jltypes.c#L2897>`_

:code:`jl_tuple_type->type = jl_tuple_type`

The :code:`jl_value_t` struct defines the minimal header for a Julia
object in memory.
The :code:`type` field points to a
`jl_datatype_t <http://github.com/JuliaLang/julia/blob/master/src/julia.h#L204>`_ object::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type field for tuples is a tuple.


typedef struct _jl_value_t {
struct _jl_value_t *type;
} jl_value_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be depended upon (#2818). just been waiting for the gc-improvement PR to merge to change it.

jl_typeof(x) is the approved method of accessing and assigning to the type field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted in next commit




The layout of the rest of the object is dependant on its type.

e.g. a :func:`Base.tuple` object has an array of pointers to the
objects contained by the tuple::

typedef struct {
struct _jl_value_t *type;
size_t length;
jl_value_t *data[];
} jl_tuple_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implementation detail isn't always true (isbits tuples in memory are typically unboxed), and is likely to change soon (after Jeff finishes #8839 and has time to complete his planned tuple-type refactoring)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted and updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an unboxed tuple also doesn't have a length or type field. implementing other tuples as arrays of boxed values is inefficient, so Jeff is intending to change that also.

(edit: to be fair, an isbits tuple may not have the array of raw values either. it exists only as a compiler construct – although if you force it into memory, the compiler will represent it as an anonymous type / struct, so it is safe to assume this alignment and layout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your edit correctly, and these "isbits tuples" are just implementation detail of a compiler optimisation, then I think I should remove the "or an array of raw values for un-boxed bits type tuples" text again. I think that this kind of memory layout documentation can only usefully describe the way objects are when they are understandable by the runtime/interpreter. Within a compiled function, or within a closed graph of (maybe inlined) compiled functions, there may be all kinds of strange representations of data that result from various optimisation phases (including llvm internal optimisations that we're not aware of). There is no point trying to document these here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are just implementation detail of a compiler optimisation

much of what you are attempting to document here is an implementation detail of a compiler optimization (or current lack thereof), except that they are compatible with C as documented in http://docs.julialang.org/en/latest/manual/calling-c-and-fortran-code/#type-correspondences

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I can't agree with that. The content of object.rst is based: on the public runtime interface in julia.h and on runtime implementation in gc.c (mark bit and allocation). The compiler does not come into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

julia.h has not been scrubbed to ensure that it contains only the public runtime interface (#8690). additionally, there are still implementation details present in this file which should not be depended upon. the layout of a struct jl_tuple_t is one such item.

the compiler decides whether the gc needs to be a runtime operation, or whether it can lift some of the allocation/free workload. i agree that it is partly separated from the compiler, but that does not mean it is part of the public runtime interface. also, other than JL_PUSH/JL_POP&friends (which are part of the public interface), the gc is not part of julia.h. nor is newobj (which is described below). Structs for the built-in types should probably be in julia_internal.h, but are currently still in julia.h for legacy reasons.

jl_typeof is the only (macro) function mentioned here which can reliably be considered a public interface.


e.g. a "boxed" uint16_t (created by :func:`jl_box_uint16`) is stored as
follows::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a x86_64 processor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted


struct {
struct _jl_value_t *type; -- 8 bytes
uint16_t data; -- 2 bytes
-- 6 bytes padding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this padding is not present in the struct layout, since the sizeof this struct is exactly 10 bytes. the alignof this struct is 8 bytes (although this can be OS dependent), however, so when embedded in an array or another struct, that array or other struct may insert padding here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of the code is that:

  • there is not actually a jl_int16_t struct defined anywhere, so sizeof() this "struct" is not relevant (I used struct notation just as a convenient way to describe the layout of what ends up in memory as best as I can figure out).
  • BOXN_FUNC(16, 2) in alloc.c means that 16-bit ints are allocate by alloc_2w (allocate two words).
  • Note also: BOXN_FUNC(8, 2), BOXN_FUNC(16, 2), BOXN_FUNC(32, 2), BOXN_FUNC(64, 2)

As far as I can tell that means that in Julia, all boxed ints (8, 16, 32 and 64) consume 16 bytes. Am I missing a subtlety here?

(Note that I'm assuming a 64 bit platform)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though this struct does not exist in a C header file, it is still well-defined. every julia type lives a double-life as a boxed and unboxed version, although #2818 should simplify this greatly (by removing the struct jl_value_t *type field from the struct definition. regardless, the padding is not part of the type, but part of the allocator. in #8134, we could allocate a tighter box for this type.

(the alignof and sizeof this struct is operating system, not word size, dependent)

};

Structs for the built-in types are `defined in julia.h <http://github.com/JuliaLang/julia/blob/master/src/julia.h#L69>`_. The corresponding global jl_datatype_t objects are created by `jl_init_types() <http://github.com/JuliaLang/julia/blob/master/src/jltypes.c#L2887>`_.


Garbage collector mark bit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--------------------------

The garbage collector uses the low bit of the :code:`jl_value_t.type`
pointer as a flag to mark reachable objects (see :code:`gcval_t`).
During each mark/sweep cycle, the gc sets the mark bit of each
reachable object, deallocates objects that are not marked, then
clears the mark bits. While the mark/sweep is in progress the
:code:`jl_value_t.type` pointer is altered by the mark bit. The gc
uses the :func:`gc_typeof` macro to retrieve the original type
pointer::

#define gc_typeof(v) ((jl_value_t*)(((uptrint_t)jl_typeof(v))&~1UL))


Object allocation
-----------------

Storage for new objects is allocated by :func:`newobj` in julia_internal.h::

STATIC_INLINE jl_value_t *newobj(jl_value_t *type, size_t nfields)
{
jl_value_t *jv = (jl_value_t*)allocobj((1+nfields) * sizeof(void*));
jv->type = type;
return jv;
}

Note that all objects are allocated in multiples of 8 bytes, so the
smallest object size is 16 bytes (8 byte type pointer + 8 bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the smallest object should be 8 bytes (8 byte type pointer + 0 bytes data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)
I guess that's strictly speaking true, but to my mind, an "object" that contains only type information and no data is not really an object, it's just a "meta-object".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's been given the name singleton and is optimized and used fairly heavily (nothing is a builtin type that uses this). thus it's a pretty important type for understanding codegen.cpp and cgutils.cpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Thx.
noted in latest commit.

data). :func:`allocobj` in gc.c allocates memory for new objects.
Memory is allocated from a pool for objects up to 2048 bytes, or
by malloc() otherwise.::
2 changes: 2 additions & 0 deletions doc/devdocs/sysimg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
System Image Building
*********************

.. _dev-sysimg:

Building the Julia system image
-------------------------------

Expand Down