Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency managament #208

Closed
candid82 opened this issue Apr 17, 2019 · 5 comments · Fixed by #210
Closed

Dependency managament #208

candid82 opened this issue Apr 17, 2019 · 5 comments · Fixed by #210

Comments

@candid82
Copy link
Owner

See #205 for some context.

Joker currently supports undocumented --classpath command line parameter (and corresponding JOKER_CLASSPATH environment variable). It works the same way as Java's classpath and provides a way to use shared "libraries". There is a number of issues with classpath approach though:

  • It doesn't provide delivery mechanism. Joker will try to load libraries from the specified location(s), but how those libraries get there is up to the user.
  • It doesn't provide explicit version management. If many programs use the same shared library, it becomes tricky to update it without breaking anything.
  • It's external to programs' source code. One cannot tell which external dependencies will be loaded at runtime just by looking at the code.

Ideally, Joker's dependency management system should be

  • simple, both conceptually and implementation wise
  • explicit: it should be easy to tell which dependencies come from where just by looking at the source code
  • built-in: there should not be a separate tool or command to run to pull dependencies

I tentatively proposed :from <url> option for (require), but @charlesg3 brought up a valid concern about this approach:

> For production systems, you need releases instead of snapshots, as such it's nice to be able to specify a tag or sha (as in deps.edn with clojure). Unfortunately this will start to look a little messy on every include, especially if it needs to be specified in every file

To address the above concern, I think Joker should allow to declare all dependencies in one place, similarly to deps.end or lein's project.clj, except it should be part of the source code.

Here is how it could work.

There is *ns-sources* built-in var. It's a map where keys are namespace names (possibly with wildcards) and values are "sources", which are instructions on how and where to load those namespaces from. For example:

{"com.test1.common.*" {:url "/var/lib/joker/test1/common"} ; local file system
 "com.test2.common.*" {:url "test2/common#v2.1.0"} ; github
 "com.test3.common.*" {:url "git+ssh://git@github.com:test3/common.git#v1.0.0"} ; git
}

The types of different urls have been borrowed from node.js's npm.
When Joker needs to load a namespace (i.e. while executing (require ...)), it first consults ns-sources map. If the namespace's name matches one of the keys, Joker will try to load the namespace from the corresponding source. Except for the simplest case of local file system urls, the loaded namespaces will have to be cached somewhere. The list of supported urls could be extended in the future, or perhaps the loading mechanism could be made pluggable. For example, user could provide their own loading function in addition to the url. In addition to *ns-sources* var, there should be (ns-sources <map>) function to conveniently set the var. That would normally be the first statement in a Joker script. For example:

(ns-sources
  {"com.test.common.*" {:url "test/common#v2.0.0"}})

(ns com.test.do-something
  (:require [com.test.common.foo :as foo]))

(println (foo/bar))
@charlesg3
Copy link
Contributor

@candid82 -- Thanks for moving this to a ticket and the initial proposal.

I think this proposal can be made to work. I like the idea of keeping it simple, explicit and possibly pluggable (which would mean that extending it possibly doesn't require extending the base language).

I have a quick curiosity -- the ns-sources function is declared outside of a given namespace and thus, as a built-in seems like it is a global var. Curiously, this seems like it could actually open up some different use patterns by design. The one in particular I'm thinking of, is for more complicated "groups of scripts" (we generally have many for any given project).

It would be cool if this use case was also supported:

In a given script:

my-script.joke:

(ns com.test.my-script
  (:require [com.test.deps]
            [com.test.common.foo :as foo]))

(println (foo/bar))

deps.joke:

(ns com.test.deps)
(ns-sources 
  {"com.test1.common.*" {:url "/var/lib/joker/test1/common"}})

With the above pattern, the deps / sources for a given project can be managed in a single location. It seems like the design you propose supports this.

@candid82
Copy link
Owner Author

@charlesg3, yes, deps.joke could be a special file that gets executed before loading any namespace in a given "project" or "library". In fact, I came up with exactly the same idea of deps.joke but in a slightly different context. I think Joker will have to support something like deps.joke for this approach to work with multi-file/multi-namespace libraries. If a library consists of multiple files / namespace (which is a common case), there is no specific place to put (ns-sources ...) in. That is, unless deps.joke is supported. For scripts / programs this is not an issue as there is, by definition, single entry point (a file) where (ns-sources...) can be put, although deps.joke could be used for a collection of related script, as you described.

I also thought about transitive dependencies a bit more and that led to some clarifications on (ns-sources...) semantics. I think the map passed to ns-sources should be merged into *ns-sources* without overwriting existing keys. The rationale is this: say script A depends on v1.0.0 of library B and v2.0.0 of library C. Library B depends on v1.0.0 of library C. At runtime we want to ensure that v2.0.0 of library C gets loaded, not v1.0.0. That is, the script's (ns-sources ...) takes precedence over its dependencies' (ns-sources...). All the usual problems with dependency conflicts still exist (there is no way around it as only one version of a namespace can be loaded at runtime), but at least one can always "pin" specific version of any library in the script's top level (ns-sources...).

@candid82
Copy link
Owner Author

In addition, I think *ns-sources* would have to be an ordered map (ArrayMap rather than HashMap) to make sure keys are iterated over in the order they were added.

@charlesg3
Copy link
Contributor

@candid82,

Good point on the transitive dependencies. Seems like a good candidate for a test case.

In terms of caching, I am thinking that files which need to be retrieved will be stored in $HOME/.joker/deps/[URL_PATH] -- if the file is already downloaded then use it otherwise get it. For local files I guess they can be loaded as-is.

Once a simple http endpoint is supported it's easy enough to get files from public github repos through the same interface (and likewise from S3 where you can get a public file).

On the other hand, where I'm really wanting to take this is hosting the library on a private github repo. This isn't too much extra work, but there are a few gotchas that I'd like to throw out there so that we both like the final result. Perhaps this is supported by the pluggable portion.

I'll note that if I have a repo (e.g. "screen-blend") under my org charlesg3 and under src/screenblend/core.clj, there's two ways to obtain it.

  • one is to use the http api -- (see here: https://gist.github.com/madrobby/9476733) ... where you would need to specify the authorization token and then you can get the file with the equivalent of the following command. But the GITHUB_TOKEN needs to be stored somewhere, and if you want to use a tag it get's put on the end of the line, we could detect that it is from github based on the path (and then set the auth tokens accordingly)... or you could perhaps specify {:org "charlesg3" :repo "screenblend" :path "src"} and then the plugin can construct the http request
curl -H  "Authorization: token $GITHUB_TOKEN" -H "Accept: application/vnd.github.v4.raw" -L "https://api.github.com/repos/charlesg3/screenblend/contents/src/screenblend/core.clj"
  • the other option is to specify the url as git@github.com:charlesg3/screenblend.git and perhaps provide a :path "src" and a :tag. And then just download the whole repo (using git) and use the files from there. This has the nice advantage that the GITHUB_TOKEN isn't needed.

In any case, some feedback on how much should be contained in the map of parameters vs embedding in a longer string.

In any case, I like where this is headed. I think I'll start a branch and see where things take me.

@didibus
Copy link

didibus commented Dec 2, 2019

With the current way to manage dependencies, it means that the order matters for what versions you're going to get in case of conflicts. That's fine when you pin a version, since you clearly want the version that you pinned. But when two transitive dependencies conflict, without you knowing, the order in which you depend on them could affect the version you get.

Maven has this behavior currently, where it grabs the first one and the order matters.

Some other tools I've seen do it where it just errors, and you are forced to pin the conflicting dependency. That makes what versions is being used very explicit. But you can eventually forget to unpin, and mean that as you upgrade dependencies, you force them to use an old version of their dependency. Similarly, it can give the false impression that your code depend on that, when it is only a transitive dependency. That said, some tools fix that by having a separate place where you simply define versions to pin on conflict, separate from versions you directly depend on.

What tools.deps do to improve on both the above strategies, is that it always chooses the latest version in the case of conflicts. Though the pinned version (version you directly depend on) always win if there is one. So its either the version you directly depend on, or the latest version of all conflicting dependencies.

I think what tool.deps does is best personally. It means that order doesn't matter, so you're not confused about why one order works and another doesn't, or how adding an unrelated dependency somehow broke another. And if the latest version is backward incompatible with the library using the older one, you can fix it by pinning it, and depending on the older version yourself.

The only challenge is do all URL type have a way to know which is "latest" ? I think tools.deps has a version number logic for maven, and I'm not sure what it does for shas on github, it might look at which commit comes after. Now for local files, or files in S3, Joker would need to come up with a strategy for that, should a joker script declare a version? Or could file-creation time be used? Or is it the person declaring the dependency that should provide a version like:

{"com.test1.common.*" {:url "/var/lib/joker/test1/common" :version "1.0.1"}}

Reading the resolve-deps doc for tools.deps might be a good source of inspiration as well: https://clojure.org/reference/deps_and_cli#_resolve_deps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants