-
-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequester mutable state outside of package directories #796
Comments
It would be nice if
Would it work with Julia packages which requires external packages at precompile-time? Those packages may need do some precompile-time metaprogrammings to, e.g., define structs depending on the C ABI of the external package (PyCall does it). |
That's a very good thought; I'm going to add it on to the top issue. Probably the whole dict of options gets hashed and mixed in with the other elements determining the storage location.
Dependent packages should be loaded by the time you |
My thoughts exactly!
I guess I misunderstood that external libraries are somehow loaded during |
Ah, I see what you mean. You want to be sure that e.g. |
I found https://github.com/JuliaPackaging/BinaryBuilder.jl/wiki/Roadmap after writing the post below. It looks like PyCall.jl can just do something like Can I access information about struct PyDateTime_CAPI
# type objects:
DateType::PyPtr
DateTimeType::PyPtr
TimeType::PyPtr
DeltaType::PyPtr
TZInfoType::PyPtr
@static if pyversion >= v"3.7"
TimeZone_UTC::PyPtr
end
... and so on ...
end where const pyversion = vparse(split(Py_GetVersion(libpy_handle))[1]) PyCall.jl also inspects |
Yes, that’s right.
…On Thu, Oct 11, 2018 at 09:26 Takafumi Arakaki ***@***.***> wrote:
I found https://github.com/JuliaPackaging/BinaryBuilder.jl/wiki/Roadmap
after writing the post below. It looks like PyCall.jl can just do something
like using LibPython.jll: libpython to get the handle. So I guess the
following pattern is supported.
------------------------------
Can I access information about python.jll package at precompile time of
PyCall.jl? PyCall.jl needs to define struct layout depending on Python
version. For example:
struct PyDateTime_CAPI
# type objects:
DateType::PyPtr
DateTimeType::PyPtr
TimeType::PyPtr
DeltaType::PyPtr
TZInfoType::PyPtr
@static if pyversion >= v"3.7"
TimeZone_UTC::PyPtr
end
... and so on ...end
---
https://github.com/JuliaPy/PyCall.jl/blob/fb88f4d0df66fd2ce1bc4dc862611c355be0e50d/src/pydates.jl#L12-L35
where pyversion is obtained by calling Python C API at precompile time:
const pyversion = vparse(split(Py_GetVersion(libpy_handle))[1])
---
https://github.com/JuliaPy/PyCall.jl/blob/fb88f4d0df66fd2ce1bc4dc862611c355be0e50d/src/startup.jl#L85
PyCall.jl also inspects libpython with hassym at precompile time.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#796 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH_aMcGacm_njt_ss_3NLU6Tfs96bbdks5ujp5agaJpZM4XJkDo>
.
|
Note that BinaryBuilder will probably never be a reasonable option for PyCall. You use Python for the ecosystem, not just libpython, and so we need to have access to a full-featured Python distro like Anaconda. But we still need persistent per-package options. e.g. PyCall should be able to remember what |
I'm going to respond with #777 in mind here: On the one hand, this issue explicitly does not want to share state between different versions of packages, because the intended use case is for different Oh if only we had some way of uniquely identifying the content we want to download/store, and we could use that unique identifier to key us into a directory! Oh wait, that's Stefan's content-addressable filesystem idea. So, new API idea that might satisfy everyone here: just pass a hash to
I think it makes a lot of sense to "deduplicate" based on a hash that the user passes to |
@stevengj If you want only Python packages, I think BinaryBuilder could be a reasonable option to install However, this is only for Python packages available from PyPI. For example, you can't install Node.js from PyPI (which is required for installing JupyterLab extension). But this probably can be covered by BinaryBuilder directly?
@staticfloat Yeah, that's what I was thinking when connecting this to #777. Maybe it could be |
Yes, that is clearly superior. :) |
Ah, I forgot another benefit of this; right now we get shared package state when you So it's important to not only allow the user to specify what keys the Updated API proposal: Pkg.package_state_dir(things_to_be_hashed...; include_version::Bool = true, include_ABI::Bool = true) Where function package_state_dir(things_to_be_hashed...; include_version::Bool = true, include_ABI::Bool = true)
h = UInt64(0)
for t in things_to_be_hashed
h = hash(t, h)
end
if include_version
h = hash(Base.VERSION, h)
end
if include_ABI
# We would perhaps want to integrate this logic into Pkg
h = hash(BinaryProvider.triplet(BinaryProvider.platform_key_abi()), h)
end
return joinpath(Pkg.data_dir(), string(h, base=16))
end |
@staticfloat Actually, using
because re-installing conda takes more time than downloading some binaries. Also, current Conda.jl has no mechanism for re-creating the same environment (at the moment). It's probably better to have two kinds of state directories like I don't know if discussing " On the other hand, above specification may sound over-complication (especially considering XDG compliance was rejected before). That's why I suggested #777; Pkg.jl could be just agnostic about what each package does and just provide a scratch space for it. Each package can then just implement it's own state/data strategy like But since BinaryProvider should be working with Pkg closely, it may not be optimal here. So maybe just forget about making this a public API and expose to JuliaPackaging as semi-public API? |
I don't think this is a good reason to make "clearing state" not clear the Conda installation data. It seems to me that Conda.jl installed packages should be treated exactly the same way as BinaryProvider-downloaded packages; I don't see a clear difference between them. |
Right, that was not appropriate reasoning. What I was trying to point out was that there are information/data more important than external libraries. In case of Conda.jl, that would be the version numbers and package origins. Although there is no direct easy way, a conda environment can have something like |
Stefan's notes from triage:
|
This implements basic functionality and tests for a new `Caches` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package. Closes #796
This implements functionality and tests for a new `Spaces` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Spaces` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Spaces` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Spaces` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Scratch` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Spaces` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Scratch` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Scratch` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Scratch` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
This implements functionality and tests for a new `Scratch` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796
And with the official announcement of |
I think it would be desirable to have packages use something similar to
Pkg.package_state_dir(@__MODULE__)
or something as the default location where e.g. binaries, datasets, etc.. should be stored. This would be constructed as an overall per-environment state directory (overridable by an environment variable or environment config key perhaps), that then has hashed subdirectories similar to~/.julia/packages
but explicitly including information to disambiguate julia OS, arch, calling ABI, GCC ABI and package options. This has multiple benefits;Packages become more "immutable". It would be lovely to be certain that the entire tree hash of a package directory inside of
~/.julia/packages
never changes.Packages get automatically pushed toward greater relocatability. As recent experiments with PackageCompiler have shown, broad-spectrum usage of things like
@__FILE__
and@__DIR__
should be discouraged anyway. A common use case is for creating a scratch space for binaries (e.g.<pkg dir>/deps/usr
), but others exist (downloading datasets, generating julia code, etc... Forcing a runtime lookup based on@__MODULE__
is already what we need to do, so this would dovetail nicely.Pkg3 package resolution could technically be arch/OS agnostic. I'm imagining the nightmare scenario where Crazy Charlie has installed three copies of Julia 1.0, one with GCC 8 targeting x86_64, one with GCC 7 targeting x86_64 and one with GCC 6 targeting i686. Technically, we could share the actual Julia package directories, but when run from a Julia with a particular
<arch>-<os>-<calling_abi>-<gcc_abi>
(increasingly inaccurately named) triplet, the result ofPkg.package_state_dir(@__MODULE__)
could mutate accordingly.The storage directory for mutable state could be decoupled from the storage for julia code. Imagine a heterogenous cluster with various CPUs and a shared package depot that is provided to all users; not only would the
.ji
files generated differ on each machine, but the downloaded binaries could differ as well (taking advantage of different ISAs). This separation between Julia code and build/run-time content would solve both the "must provide binaries that work on every platform globally" problem and the "I don't have permissions to modify packages placed in this depot" problemThis could make the "nuclear" state reset option a little easier for users; instead of nuking
~/.julia
entirely (as some still do to try and fix stale state problems) they could instead nuke~/.julia/package_state
or whatever we default the location to. This would essentially cause a "rebuild" of every package, as if it were freshly installed, without doing more drastic things like losing the set of installed packages.This would allow for, (in my mind) a cleaner workflow for managing package state than the current
Pkg.build()
system; I would prefer that there is noPkg.build()
and instead each package is responsible for checking the existence of files within__init__()
; this can be done extremely quickly (e.g.isdir(joinpath(Pkg.package_state_dir(@__MODULE__), "usr"))
) and should remove one more minor pain point in Pkg, the "This package was not properly installed, please runPkg.build(<pkg name>)
error message.The text was updated successfully, but these errors were encountered: