Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Preferences subsystem #1835

Closed
wants to merge 3 commits into from
Closed

Add Preferences subsystem #1835

wants to merge 3 commits into from

Conversation

staticfloat
Copy link
Member

There are a number of packages that are ad-hoc storing content within ~/.julia/prefs. Storing some kind of "preferences" or "options" is a generally useful thing, and as we march closer and closer to a reality where we can set package directories themselves to be completely read-only, we will need a place where users can store user configuration that is specific to their own package.

To that end, this introduces Pkg.Preferences, a simple API to serialize Dict objects to TOML and bring them back. This is a minimal interface that is meant to be used in a very straightforward manner, to be used by projects in doing things such as allowing the user to set which execution backend a project should use, etc...

A logical extension of this would be to allow overlaying of the global depot preferences (which this PR implements) with per-project values. A Project.toml could contain a section that defines a Dict for any UUIDs within its manifest, then we recursively merge that dict into the base depot preferences dict. This would allow for powerful per-project customization of packages within the project.

This is the last piece that I foresee us needing before we can contemplate turning package directories to be completely read-only. Between Artifacts, Scratch spaces and Preferences, I think we can build a very compelling story for controlled state management.

These functions provide a nice interface for determining the
currently-running package's UUID and VersionNumber, identified by loaded
`Module`.  We explicitly do not include a lookup by name, in expectation
of potential future naming clashes due to having multiple packages with
the same name but different UUIDs.
@staticfloat staticfloat changed the title Implement Pkg.Preferences Add Preferences subsystem May 22, 2020
@aminya
Copy link

aminya commented May 22, 2020

I have some questions about this.

  • What is the interface for a package's user to set a preference for a package?
  • How can a package developer use the user input?
  • Does this allow a package to use a different backend based on the user's input? For example, to allow AcuteML to be used by EzXML or WebIO depending on the user's input.

@staticfloat
Copy link
Member Author

This PR doesn't set a user API, it's lower-level than that. These are the primitives that a package would use as a backing store, not a user-facing interactive toolkit.

As an example, a package could include a configure() method, then if there are no preferences set upon first run, they call that function which uses TerminalMenus to walk the user through configuration. Alternatively, a package could always choose defaults and rely upon the user to call a set_xyz(choice) function to change something about its behavior.

Think of this as being just a simple datastore; a way for packages to store small, persistent Dicts. What packages do with that Dict, (And how they populate it with user input) is completely up to them.

This implements functionality and tests for a new `Scratch`
subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this
provides an abstraction for a mutable datastore that can be explicitly
lifecycled to an owning package, or shared among multiple packages.

Closes #796
@simonbyrne
Copy link
Contributor

Very nice! Looks ideal for my MPI problems.

Preferences provides a simple package configuration store; packages can
store arbitrary configurations into `Dict` objects that get serialized
into their active `Project.toml`.  Depot-wide preferences can also be
stored within the `prefs` folder of a Julia depot, allowing for default
values to be passed down to new environments from the system admin.
@StefanKarpinski
Copy link
Member

I haven't reviewed this in detail but the biggest design consideration for preferences is layering. You want sysadmins and packages to be able to provide good defaults on shared systems, which calls for some kind of layering mechanism. But that leaves a number of questions:

  • What happens when some system defaults change?
  • What about when a project is moved to another system?
  • Should layering occur across the depot path or along the load path?

I've thought about this for a while and I think that each layer should store a full set of choices along with whether that choice was inherited from a lower layer or set at this layer. That way if a choice at a lower layer changes, the user can be prompted for whether to keep the old choice or use the new default.

This also helps address what should happen when a project moves to another system. You have a full snapshot of preferences which can be compared with the choices on the new system: if a value has a non-default value, you use that because it was already an explicit override; if a value has a default value that matches the inherited value on the new system then you can also safely continue to use that default; only when a value that is marked as default has a different inherited value do you need to decide what to do—this is much the same as if a default changes on the same system. In both cases, it makes sense to prompt the user or allow them to batch choose the project or the system.

However, I'm not sure if this choice/inherited combination makes sense for all layers or only the last layer: i.e. the individual project. That's the level where choices get made and where reproducibility matters.

Layering across depots versus layering across the load path feels like different kinds of preference inheritance. Depot patch inheritance is about inheriting choices that a sysadmin made for the system but allowing individual users to decide they want something else. Load path inheritance is about the same user deciding whether they want to use different preferences for different projects.

@staticfloat
Copy link
Member Author

I struggled back and forth with myself in responding to this, rewriting my comment 3-4 times before I really grasped the central difficulty I was having. We really shouldn't be packing all of this into the Project.toml. I now think that the preferences that are set for this Project specifically should be placed into Project.toml, whereas the merged preferences from the Project and the environment should be placed into Manifest.toml. This way, we have both what the package author is enforcing (Project.toml), as well as the combined view of package author's guidelines together with environmental defaults (Manifest.toml).

So far, I've avoided any kind of requirement for interactivity here. While packages can certainly define some kind of configure() method that uses TerminalMenus and all that fanciness to configure themselves, the underlying mechanism is very much a datastore; you give it a Dict, you get a Dict back out. While we could check to see if what is recorded in the Manifest.toml matches what we get from doing a recursive merge with the layered depots and the Project.toml, I would rather only do this check at certain well-defined points, such as when we resolve or instantiate. I definitely don't want to break out into an interactive context during operations that would otherwise be expected to be batch scriptable.

@tkf
Copy link
Member

tkf commented May 27, 2020

Is it usable at the top-level scope of packages? That is to say, can it be used as "build-time" (precompilation-time) option? If so, how does it handle precompilation cache invalidation?

I actually have implemented a similar system in #1378. The biggest design decision was to make it nicely work with precompilation. Since cache invalidation works only per-file basis (via include_dependency), I was forced to use indivisual file for each package to store configuration. It's not super nice to use a single file (like Project.toml or Manifest.toml) shared across packages because it'd cause re-precompilation of the whole project if you update the option for only one package. However, the biggest obstacle with this approach was the reproducibility of the configuration because now you have many small files.

I think the best way to go might be to support more finer mechanism for calculating precompilation checksum than per-file. As TOML parser might be going into Base JuliaLang/julia#36018, it might be possible to use shared file for the preference system and use particular sub-dictionary for deciding if the precompilation cache is stale.

@staticfloat
Copy link
Member Author

I decided to keep all compile-time stuff outside of this PR and instead put it here, as a separate package: https://github.com/staticfloat/CompileTimePreferences.jl

@tkf
Copy link
Member

tkf commented May 27, 2020

It would be nice if you can explain the reasoning behind the decision. Or, if merging this into Pkg.jl at some point is in your pipeline, I'd appreciate it if you share your plan.

I also note that the compile-time option is the main reason the preference system was requested:
JuliaLang/Juleps#38
#458

@StefanKarpinski
Copy link
Member

I now think that the preferences that are set for this Project specifically should be placed into Project.toml, whereas the merged preferences from the Project and the environment should be placed into Manifest.toml.

I like that separation. It's true to the nature of these two files. I'm a little unclear if that addresses changing inherited values (of so, how?) or if we still need to record whether something was an inherited value or not.

I don't want to force interactivity, but I do think that when inherited defaults change, some choice generally needs to be made. We can have a default way to resolve that in non-interactive contexts, but users will generally want to choose these things. Otherwise you're permanently privileging either always following changes to inherited values or always ignoring them, neither of which seems like the right choice.

@staticfloat
Copy link
Member Author

It would be nice if you can explain the reasoning behind the decision.

Yes, I discussed it a bit on the slack, but I should record it here for permanence. The decision is based more in technical difficulties and ergonomics than in a philosophical dislike of compile-time preferences being a part of this. I'm totally open to merging the two if we can get a nice API meshing, but as it stands, it's difficult. Here's my thought process:

My ideal API is one in which a package can mark some subtree of their preferences as used at compile-time. For instance, I could say something like build_prefs = @load_compile_time_preferences("build"), at the top-level of my package, and this would do three things:

  • It would load the preferences under the "build" key and return them to build_prefs.
  • It would use Base.include_dependency() on some sentinel file that contains a hash of the build preferences, to enforce recompilation when that file changed.
  • It would inform Pkg.Preferences that when preferences are saved, it should hash anything under a subtree that has been marked as build-sensitive and write it out to a sentinel file.

This design requires the system to maintain two separate pieces of information on disk; the Project.toml and the build preferences "sentinel" that forces recompilation. Unfortunately, keeping these two in-sync is difficult: there are many ways that preferences can change:

  • Through inherited preferences by a depot preference changing
  • If project A uses package B and a user sets preferences in project A for package B (which is how it works; the preferences are set within the Base.active_project()), then when we save preferences in project A, we need a way to know all the declared compile-time-preferences within all reachable packages in the entire environment.
  • A user could manually modify their Project.toml!

As I saw it, there were three options, none of them perfect:

  • Keep all options stored only within Project.toml, generate a hash of the subtree of preferences and write it out to a sentinel location to be used with include_dependency(). This has the cache invalidation problems listed above.
  • Keep normal options stored within Project.toml, and store compile-time preferences elsewhere, use a separate API to modify them and use include_dependency() directly upon the file. This is essentially what https://github.com/staticfloat/CompileTimePreferences.jl does.
  • Keep all options stored only within Project.toml and use it in its entirety with include_dependency(). This causes all options to become compile-time options, and will also cause any modification of Project.toml to trigger a recompile.

I think the ideas of injecting dependencies into Project.toml make a lot of sense; but we have to be careful to design this in such a way that it makes sense with the realities of include_dependency() as well.

@staticfloat
Copy link
Member Author

staticfloat commented May 27, 2020

I like that separation. It's true to the nature of these two files. I'm a little unclear if that addresses changing inherited values (of so, how?) or if we still need to record whether something was an inherited value or not.

I think that if something is different between Manifest.toml and Project.toml, that implies it was inherited at last resolve. So all inherited preferences that are not overridden by a Project.toml's explicit settings will be determinable by simply looking at what is listed in Manifest.toml and not in `Project.toml.

We can even tell if something has changed in the global environment by comparing preferences that we loaded straight from the Manifest.toml with preferences that we load by recursively merging depot preferences into eachother, then merging that into Project.toml preferences (e.g. rebuilding it "from scratch"). If there's a difference between those two, then we know something has changed, such as a depot preference changing, or us being relocated. It's kind of like running a resolve, and if nothing has changed, there's no need to recompile anything.

I don't want to force interactivity, but I do think that when inherited defaults change, some choice generally needs to be made.

In this case, I think the mental model of "just a key/value datastore" breaks down a bit. This can quickly spiral out of control, but let's briefly touch on some options to see if we like other interfaces better:

  • Packages are required to define a configure!(new_prefs, old_prefs) method that may or may not go interactive, that will be invoked if we detect that the environment has changed from underneath us. As mentioned previously, we can detect that we have an issue by comparing depot+Project/Manifest differences.

  • Preferences are not dynamic/implicit like they are now, but instead they are all declaratively defined within Project.toml, and Pkg can do things iterate over them, etc. Package authors would create preferences with explicit merge policies and other metadata attached to them, e.g.

# within Foo's Project.toml

# This defines preferences that belong to this package
[Preferences.libfoo_vendor]
description = "Choice of libfoo vendor"
valid_options = ["libfoo_jll", "custom_libfoo_jll"]
merge_policy = "prefer_depot"
type = "string"
default = "libfoo_jll"

[Preferences.frobulate]
description = "Enables frobulation.  Obviously."
merge_policy = "prefer_project"
type = "boolean"
default = true

[Preferences.nuke_launch_threshold]
description = "Consult the documentation for a full description of this option"
merge_policy = "ask_user"
type = "float"
default = 13.7
# Within OverallProject.toml
[deps]
Foo = "$(foo_uuid)"

[Preferences.$(foo_uuid).libfoo_vendor]
value = "custom_libfoo_jll"

[Preferences.nuke_launch_threshold]
value = 1000.0

With such a system, a configure() method could be written generically, then just walk over all preferences, taking the appropriate action.

@tkf
Copy link
Member

tkf commented May 27, 2020

As I saw it, there were three options, none of them perfect:

I've considered those options, too. I agree they are not perfect. That's why I suggested a different option in my first comment.

Keep all options stored only within Project.toml, generate a hash of the subtree of preferences and write it out to a sentinel location to be used with include_dependency(). This has the cache invalidation problems listed above.

Why not integrate a subset of the preference system to the module loader in Base? It can parse Project.toml and then invalidate the sentinel file. This is a slight variation of what I suggested.

@staticfloat
Copy link
Member Author

I can't comment on the feasibility of what we can do to change up what Base does when loading modules. It's true that with more fine-grained control it could be significantly more powerful, but I don't fully understand the tradeoffs there. I don't see where exactly in dump.c these files are actually verified; I see inside write_dependency_list() that things like the path and mtime are written out, but I haven't been able to find where it's used again. In any case, I am worried that needing to parse files at load time could significantly loading down. I'd need to hear from Jameson or Jeff or someone familiar with this code to know if modifying those routines is okay.

@tkf
Copy link
Member

tkf commented May 28, 2020

My understanding is that the stale check is mostly done in the Julia function Base.stale_cachefile. So, I think it's doable if we can use a TOML parser in Base.

@staticfloat
Copy link
Member Author

staticfloat commented Jun 2, 2020

We talked about this on the Pkg call, and we are going to be talking to the compiler team next week about adding a way for precompilation to take sub-trees of the preferences dict and use them in the invalidation process.

@staticfloat
Copy link
Member Author

We spoke with the compiler team, and we got the go-ahead to implement parsing of TOML files (Project.toml, Manifest.toml, etc...) during code loading, baking a hash of affected keys into precompile files. All of this is pending JuliaLang/julia#36018 getting merged.

@simonbyrne
Copy link
Contributor

baking a hash of affected keys into precompile files.

Could that be made more generic (perhaps a module-level __precompilehash__() function)?

The particular use-case is for MPI we typically want whatever MPI has been loaded via module load ... (which modifies the LD_LIBRARY_PATH), and so would want to retrigger precompilation when that changes.

@staticfloat
Copy link
Member Author

Could that be made more generic (perhaps a module-level precompilehash() function)?

Hmmm. I don't think we can trigger recompilation after the module has already been loaded; this would be fixed-functionality, it's not running arbitrary code.

What is module load ...? Is that Julia code?

@simonbyrne
Copy link
Contributor

What is module load ...? Is that Julia code?

No, it's a system like http://modules.sourceforge.net/, which modifies local environment variables (typically PATH, MANPATH, LD_LIBRARY_PATH, etc) to make different software builds available. e.g. on our system there are a bunch of different MPI builds, so before using you would do one of

module load openmpi/4.0.3
module load openmpi/4.0.3_cuda10.0
module load mpich/3.3.2

depending which MPI build you wanted.

If you configure MPI.jl with JULIA_MPI_BINARY=system, it will pick up which ever one is loaded, but the challenge is that we would want to retrigger precompilation whenever that changes.

@staticfloat
Copy link
Member Author

Is it too un-ergonomic to ask that a single top-level Project.toml be configured to expect a certain MPI?

@simonbyrne
Copy link
Contributor

I would have to think about it more, but from a usability perspective, i would say yes. Basically, any preference that depends on some state that is mutable from outside the package manager is likely to have this problem, e.g. if I set my libfoo preference to /usr/lib/libfoo.so, and then my system package manager upgrades libfoo, then I would want some way to invalidate the cache.

The second best option would be a way to have a function that lets you invalidate the cache from within __init__(), and would require a session restart, e.g.

const VERSION = libversion()
function __init__()
    if libversion() != VERSION
        Base.invalidatecache(@__MODULE__)
        error("libfoo version has changed, please restart Julia")
    end
end

@tkf
Copy link
Member

tkf commented Jun 16, 2020

Wouldn't it be better if users just write down the modules and the project they use in their script?

module load openmpi/4.0.3
export JULIA_PROJECT=PATH/TO/Project.toml

It sounds like a good practice for reproducibility.

But checking the library compatibility in __init__ sounds like a good UI (without invalidatecache) in any case.

@simonbyrne
Copy link
Contributor

It sounds like a good practice for reproducibility.

Only if they have access to the same cluster

@tkf
Copy link
Member

tkf commented Jun 16, 2020

Isn't it a good practice to record what you use in each cluster?

@tkf
Copy link
Member

tkf commented Sep 28, 2020

@staticfloat Given that JuliaLang/julia#37595 is merged and we (will) have https://github.com/JuliaPackaging/Preferences.jl, I think we can close it now?

@tkf tkf closed this Sep 28, 2020
@DilumAluthge DilumAluthge deleted the sf/preferences branch January 4, 2021 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants