Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement request: integrated package manager #491

Open
pkoppstein opened this issue Jul 18, 2014 · 106 comments
Open

Enhancement request: integrated package manager #491

pkoppstein opened this issue Jul 18, 2014 · 106 comments

Comments

@pkoppstein
Copy link
Contributor

Modern programming languages (e.g. Julia) do it; modern editors (e.g. Sublime) do it; and jq deserves to have an integrated package manager too.

Given the fantastic progress that's been recently made, it seems to me that the time is about right for hammering out some details about an integrated package manager, to help ensure the harmonious evolution of jq.

Julia's package manager seems to me to be a nearly perfect model for jq, and the draft outline specification given here closely follows http://julia.readthedocs.org/en/latest/manual/packages/

However the following is intentionally minimalist, that is, it is intended to ensure that the most important functionality will be available as soon as possible, while providing the foundations for backwards-compatible elaboration in future.

For simplicity of exposition, I'll assume in the following that:

i) the package manager's functionality is available via a jq module called Pkg;

ii) that registered packages are available as git repositories; and

iii) that anyone wishing to use the jq package management system will have access to the Internet, and will have git installed.

These assumptions are quite plausible but are not essential to the proposal. (In fact, I'm a fan of mercurial :-)

Key Concepts

  • Integration

Once jq has been installed, the functionality of Pkg will be available either because Pkg is part of the standard distribution, or because the following jq program will, by design, install the Pkg module:

import Pkg; Pkg::add("Pkg")

To install another registered package, say P, as well as its dependencies (if any), it will thereafter be sufficient to run the jq program:

import Pkg; Pkg:add("P")
  • Declarative Simplicity

Package management is declarative -- the package manager figures out
which versions of which packages to install or remove based on a
per-user set of requirements, which are stored in a single file in a
fixed location under the user's home directory. Package dependencies,
versioning information, and other details needed to make the package
management system work, are defined by the packages themselves. The
format of the requirements file is described in further detail below.

  • Terminology
    • "registered packages" (these are git or mercurial repositories)
    • "unregistered packages"
    • "installed packages" (a per-user collection of registered and/or unregistered packages).

Packages become registered when certain metadata has been added to the jq package metadata repository.

Details

All package manager commands are found in the Pkg module:

  • Pkg::status/0 # reveals installed packages
  • Pkg::add/1 # add a registered package
  • Pkg::clone/1 # install an unregistered package
  • Pkg::rm/1 # remove a package
  • Pkg::update/0 # update the installed packages (consistent with the requirements)
  • Pkg::resolve/0 # take the necessary actions to satisfy the user requirements if possible

The Requirements file

Details about the Julia requirements file can be found at http://julia.readthedocs.org/en/latest/manual/packages/#Requirementsquirement (sic)

The following scaled-back specification is intended to be suitable for a first-cut implementation:

  • The file is line-oriented and may contain #-style comments.

  • Apart from comments, each line has the format:

    PACKAGENAME [ATLEAST_THIS_VERSION]

where:

  • square brackets indicate the item is optional;
  • version numbers follow the "semantic versioning" scheme.
@nicowilliams
Copy link
Contributor

Right now this seems far off, though I agree that the most important bits are close.

I'm very tempted by the NixOS model, and I tend to think that the only reason every HLL nowadays has its own pkg mgr is that the OS pkg mgrs are all awful in so many ways. A situation that makes me want to tackle packaging in general, but it's been done so much, so often that it seems like a recipe for failure. A very frustrating situation, this. Adding one more HLL pkg mgr to the mess is not appealing either, though it is the tradition now, and it may be unavoidable.

@pkoppstein
Copy link
Contributor Author

@wtlangord - Elsewhere you wrote:

@nicowilliams definitely hits on one of the main benefits to having the module system ...
n.b. that doing this would require modifications to the install process (creating a directory for them) as well as an addition to the module system adding that path to the end of the search chain.

I've begun working on a prototype standalone package manager for jq. It's modeled closely on Julia's package system, with some influences from homebrew. In a nutshell, the main idea is to use git/github (or equivalent) so that Pkg::add("Stats") would clone the Stats repository, making the Stats module readily available via a simple "import Stats;" command without the user having to know where the remote repository or the local clone are located.

To make this work nicely requires that the local clones of jq repositories have a per-user home. Julia uses $HOME/.julia/ for this purpose. The current version of Julia's package specification is version v0.3, so the package Xyzzy would actually be in the directory $HOME/.julia/v03/ and we would expect to see a subdirectory such as $HOME/.julia/v03/Xyzzy/.git

My prototype has been using a similar scheme, but there are several issues which I hope you and @nicowilliams will have the time to help resolve.

The first issue stems from the fact that $HOME/.jq has already been appropriated. Even though it may eventually become available for other purposes, it would be better to find an alternative name. To avoid prejudicing you, I'll refer to this directory as the "distinguished directory".

The second issue stems from the goal that the user be shielded from having to know where the "distinguished directory" is located. Since this directory also houses the metadata needed to locate requested packages (like homebrew), the key to the whole thing is jq having knowledge of its location.

As I understand the current architecture of jq, this would require that when presented with a simple "import" directive, jq would (in the absence of contrary directives) first look for it in the "distinguished directory".

Your thoughts?
Thanks.

@wtlangford
Copy link
Contributor

As I understand the current architecture of jq, this would require that when presented with a simple
"import" directive, jq would (in the absence of contrary directives) first look for it in the "distinguished
directory".

Well, yes and no. Your package manager could have a script that should be run in the user's bashrc/zshrc/etc, that sets JQ_LIBRARY_PATH (It still exists!). And in that way could hide things from the user. Alternatively, we could just add a path to the search chain in the code. It wouldn't be difficult.

As far as ~/.jq - I'm of mixed opinions on this. I like having the file there, but I feel that using it as a directory would just be much more useful (per-user common modules.) I think to allow the global level imports as they were before (things in .jq imported as if they were builtins), we'd have to do, say, ~/.jq/.jq as the builtin-style ones, and then all others in that folder are treated as modules to be imported. Thoughts on that, @pkoppstein @nicowilliams?

@nicowilliams
Copy link
Contributor

Some thoughts:

  • module names (but not the as aliases) should permit hierarchy; maybe
    they should be strings, not symbols
  • we should either store metadata in a .json file or add constant folding
    for arrays and/or objects; a pkg mgr will need to be able to read such
    metadata without executing a module's code
  • we should remove support for ~/.jq, add support for ~/.jq/ in the search
    path
  • we need subdirs of ~/.jq/ for downloaded/standard pkgs and user-written
    ones. probably also a site-specific dir
  • we don't need all of the I/O infra to do okg mgmt, just a builtin
    "script" around curl

@pkoppstein
Copy link
Contributor Author

@nicowilliams and @wtlangford - thanks for your responses. I'll first work through Nico's points:

  • module names (but not the as aliases) should permit hierarchy; maybe they should be strings, not symbols

I'll certainly keep that in mind. (See e.g. (**) below.)

  • we should either store metadata in a .json file or add constant folding for arrays and/or objects; a pkg mgr will need to be able to read such metadata without executing a module's code

My initial focus with jqpm is on "registered packages". To be registered, a tiny amount of metadata has to be stored in a (git) "metadata repository", which Julia organizes by directory as follows:

  • PACKAGENAME/url -- a file with the URL of the git or hg repository of PACKAGENAME (e.g. git://github.com/Aerlinger/AnsiColor.jq.git)
  • PACKAGENAME/versions/VERSION/sha1 - a one-line file
  • PACKAGENAME/versions/VERSION/requires - a text file, with lines of the form: ANOTHERPKGNAME semver[+-]?

That's it. The main advantage of this scheme is that updating details for one version can't break information about any another. I'd like to keep the scheme. Is JSON overkill?

  • we should remove support for ~/.jq, add support for ~/.jq/ in the search path - we need subdirs of ~/.jq/ for downloaded/standard pkgs and user-written ones.

Excellent, but please note: If Xyzzy is a registered package that works with jq 1.5 then when downloaded it will go in ~/.jq/v1.5/Xyzzy and so the toplevel .jq file would be (following Julia) ~/.jq/v1.5/Xyzzy/src/Xyzzy.jq.

Even if the src/ segment is dropped, jq would have to know to look for Xyzzy.jq under Xyzzy.

(**) Would import Xyzzy/Foo load ~/.jq/v1.5/Xyzzy/Foo/src/Foo.jq ???

  • we don't need all of the I/O infra to do okg mgmt, just a builtin "script" around curl

So far, there's been no sighting of curl. Just git (and hg :-)

@nicowilliams
Copy link
Contributor

Using git has its appeal, yes.

See the other discussions as to versioning. I'm a fan of putting the major version number in the module name, minor.micro in the metadata.

Ideally the metadata would be in the .jq. Yes, we'd have to parse (but not execute) it to query it, but that seems reasonable. Any syntax errors would cause failure, and hopefully the module to be re-downloaded. Alternatively we could put the metadata in a .json file. Either way is OK, but I kinda like the first not least because it'd commit us to doing constant folding for constant [...]/{...} jq expressions, which would be nice to have. Having everything in one file has one advantage: no need to worry about keeping things in sync. But then, git/whatever can do it for us anyways. Also, I want to be able to store large amounts of data in/with modules (think of a jq-coded Unicode module, which would need large Unicode tables to do case conversions, normalization, ...), and storing bulk module data that in separate files has its advantages too.

As for using git, would that be: a git repo per-module? A set of repos containing complete sets of modules (e.g., a standard jq library that's not builtin to jq)?

A repo per-module seems like overkill, but a single repo of all jq modules would be unwieldy. So I'm leaning to a set of repos. We'll really need hierarchy. Do we want to use Java-style reverse domain name hierarchies?

Lots to think about.

@pkoppstein
Copy link
Contributor Author

@nicowilliams wrote:

Lots to think about.

Fortunately, most of the thinking has been done for us by the brilliant creators of Julia! Their package management system is mature and as I've mentioned before, I believe it is (at many levels) a perfect fit for jq. (Julia also distinguishes between modules and packages.)

Regarding package/repositories/modules, the fundamental principles are:

a) every package is a repository (normally a git or mercurial repository);
b) the official jq package registry is a repository (presumably hosted on github);
c) a package can have a hierarchy of modules (*);
d) These repositories must conform to some basic organizational rules designed to make it easy for jq and jqpm to find the things they need and do the things they need to do, but these rules are very minimal.

(*) So the big question is whether it would be feasible to allow a package to have more than one root module. Maybe there are alternatives that I've not considered, but it seems to me that going down that path would introduce unnecessary complexity. As I mentioned, the simplicity I'm aiming for is that:

$ jqpm add Foobar

will do whatever is needed to make it possible for the user to "load" Foobar.jq by simply writing:

import "Foobar";

Regarding JSON -- of course we can be more JSON-oriented if we want, but since jq processes streams of JSON entities so nicely, we don't have to restrict ourselves to a one-JSON-compound-entity per file mentality.

@nicowilliams
Copy link
Contributor

Of course, it'd have to be a repo per-module, but I'd want a repo of blessed modules -- think git sub-modules.

I'm thinking of something like Ubuntu repos: a set of blessed modules, ..., a set of contributed modules, with a single namespace for all, but a way to refer to specific ones in case of conflict.

In a jq libdir we'd have:

.../lib/jq/<repo-name>/.git
.../lib/jq/<repo-name>/<path-to-module>/.git
.../lib/jq/<repo-name>/<path-to-module>/stuff.jq
...

One could then import "<module-name>" as foo; or import "<repo-name>/<module-name>" as foo;.

But this seems a bit like overkill unless we expect jq to gather much momentum. I guess I should now go look at Julia's pkg manager, eh?

@nicowilliams
Copy link
Contributor

Incidentally, the jq pkg mgr could just be builtin to jq, needing only git to be in PATH.

@nicowilliams
Copy link
Contributor

What you call jqpm could then be just jq -r 'import jqpm; jqpm::install' (reads pkg names/URLs on stdin, installs them). :)

@nicowilliams
Copy link
Contributor

OK, so, that's the concept I'd go with: a set of pkg repos which are git repos listing module<->{URL(s), HEAD hashes), in a directory hierarchy (or in a single JSON file that reflects that hierarchy).

Or, if we can, we should avoid hierarchy?

Locally a jq libdir would have a per-pkg-repo directory and below it the hierarchy of all installed pkgs.

If there's a conflict between a "contrib" and "blessed" pkg, the latter wins, but one could import "contrib/foo" ...; to override this.

Dependency mgmt should be relatively simple to write in jq, but it's easy to write depth-first searches, while we should prefer a breadth-first searches. Still, that's nothing.

The most important infrastructure to finish for this is the I/O stuff. Which... I should get to, no?

(Re: I/O, it's in reasonable shape, but I don't like the --allow-{read,write,exec} options I'm adding. Instead I want to generalize that to an object of attribute, maybe eventually allowing a tight specification of a sandbox.)

@joelpurra
Copy link
Contributor

Thought I'd just chip in with some info on other package managers.

NPM (4k+ stars on github) is used in an Node.js environment.

Total Packages: 87 335

17 396 368 downloads in the last day
95 294 274 downloads in the last week
391 165 436 downloads in the last month

NPM inspired other package managers, like Bower (10k+ stars on github). Has thousands of packages.

Both NPM and Bower (pretty much) require that packages are registered with a central service, which is a drawback.

Component (3k+ stars on github) uses github for namespacing (by default) in their dependencies.

Oh, and all three use json for configuration (package.json, bower.json, component.json), and semver for package versioning and dependency version selection.

I like and use both NPM and Bower, both because the tools are good and the eco-system healthy. Don't have as much experience with Component, but the author @visionmedia does good things. Things move really, really fast in the javascript world - I'm sure there are other alternatives today,

@joelpurra
Copy link
Contributor

Should also add that NPM uses a deep dependency tree and keeps dependencies' dependencies (and their dependencies) as private packages in their respective node_modules/ subfolders - no version collisions.

Bower needs all dependencies' dependencies to play nice with each other in a flat dependency tree. This is in part because Bower is made for web browsers, and that those packages historically have mutated the global window scope.

@pkoppstein
Copy link
Contributor Author

@joelpurra remarked:

Both NPM and Bower (pretty much) require that packages are registered with a central service, which is a drawback.

jqpm (a script being developed as a prototype for what I hope will one day be a jq package named Pkg for a module of the same name) will support both "registered" and "unregistered" packages. The packages themselves can live just about anywhere, but I envision there will be a single official registry to distinguish "registered" from "unregistered" packages. This registry itself will be a git repository (think homebrew) but will only have a tiny amount of metadata.

(It wouldn't be hard to support multiple registries, but I don't really see the need at this point, assuming there is support for unregistered packages.)

jqpm will by design dovetail with jq, but I'm expecting that we can do better than that. My hope is that jq will (unless otherwise instructed) be able to locate any kind of package that (a) has been installed by jqpm; and (b) has been flagged as applicable to that particular version of jq.

In addition (and this has nothing directly to do with jqpm), I believe that it would be highly desirable for jq (unless otherwise instructed) to be able to locate locally installed files or modules if they have been installed in a jq-defined location, whether with or without jqpm's involvement. @wtlangford and @nicowilliams might like to comment on whether this is likely to happen. A likely directory for such locally installed files is ~/.jq/local/.

As for jqpm-supported "packages - to avoid name conflicts between registered and unregistered packages, I would like to adopt the following scheme, or one very similar to it:

  • A registered package named Shazam intended to work with v1.4 of jq (and corresponding to a git or mercurial repository of the same name) will be installed in ~/.jq/v1.4/Shazam/
  • An unregistered package name Xyzzy will go in ~/.jq/unregistered/Xyzzy

The reasoning behind the "v1.4" for registered (but not unregistered) packages is that the registration process should entail some kind of check or at least claim (e.g. by the package author) that a particular version of the package is compatible with a particular M.N version of jq.

One question I am still asking myself is:

  • Should jqpm try to support unregistered packages that are not git or mercurial repos?

Comments?

@joelpurra
Copy link
Contributor

Sorry in advance for not knowing more about Julia; I don't agree with much of what I understand Julia's package manager does upon a glance (like ~/.julia/v0.3/REQUIRE). I have tons of opinions on a package manager and most of them lean towards npm (and similar tools), which has shown incredible success.

Again, look at the package.json specification and a few examples. This is a per-project file specifying dependencies; it is not per-user nor per-system nor per-jq-engine-version. jq.json could contain both package version, dependencies with versions, package home page, author name and package license to start with. I much prefer a json file over any other format. I mean, this is jq.

One key component is semver ranges (source), which basically removes any ambiguity in selecting a compatible dependency, a compatible engine and publishing your own package version. I consider following semver to be best practice anyways, as it's well defined.

Packages can be installed in the current folder's ./jq_packages/... (alternatively ./.jq/packages/...) using jqpm install reading ./jq.json. Packages can be installed globally to ~/.jq/packages/... using jqpm install --global shazam.

I prefer packages to be really small, single-purpose and infinitely combinable - like legos.

I'd also prefer a lower-case only package naming rule as Mac OS X's HFS+ is case-preserving but case-insensitive by default.

~/tmp $ mkdir A
~/tmp $ mkdir a
mkdir: a: File exists

@joelpurra
Copy link
Contributor

A registered package named Shazam intended to work with v1.4 of jq (and corresponding to a git or mercurial repository of the same name) will be installed in ~/.jq/v1.4/Shazam/

I'd prefer ~/.jq/packages/shazam/x.y.z/ so that version x.z.a would go in a parallel subfolder.

  • Each version would specify a jq engine version compatibility level with a in the package config.
  • jq engine ranges can be as precise as the package wants to specify, but I imagine it often being something like >=1.4.0, or perhaps >=1.4.0 <1.5.0 (which is close enough to ~1.4.0).
  • Allows multiple package versions per jq engine version, instead of just one.
  • Allows multiple "system-global" or "user-global" package versions to be installed in parallel, without conflicts, yet selectable as dependencies through semver ranges.
  • Allows a later package version x.z.b to support an older jq engine version than before, perhaps through simplifying/optimizing the code.

See npm's 'package.json' 'engine', which also allows for semver ranges for both the node engine and other related tools, like npm itself. (Compare to jq and jqpm.)

engines
You can specify the version of node that your stuff works on:

{ "engines" : { "node" : ">=0.10.3 <0.12" } }


An unregistered package name Xyzzy will go in ~/.jq/unregistered/Xyzzy

I'd prefer that unregistered/unnamespaced packages aren't "system-global" or "user-global" reference-able as package dependencies, but require a more specific path (relative/absolute folder, git/hg).

@pkoppstein
Copy link
Contributor Author

Thanks for your interest and comments. The good news is that
I agree with many of your points. So perhaps you'll like Julia's
approach once you become more familiar with it :-)

As you may know, I spent some time trying to find a Package Manager
that we could use "out of the box" and certainly considered npm and
bower. I know this is not what you're recommending, but it did lead
me to look closely at many package managers, especially ones as
successful as npm. Of course I'm not saying that I'm anything like an
expert in any of them, and if I've overlooked or misunderstood a key
point, then I'm all ears.

The main reason that I gravitated away from "out of the box" solutions
was simply that I wanted to maximize the likelihood that the package
manager itself would ultimately be written as a jq module (in a future
version of jq). That's not to say that we couldn't or shouldn't use
an out-of-the-box package manager for prototyping, but npm seems like
a pretty heavy dependency even for that purpose. For example, the npm
page says "npm comes with node now."

So that leaves us with competing specifications. In principle we
could of course roll our own from scratch, but in practice that's not
so easily done when there are so many opinions and so few resources.

Anyway, as I came to learn more about Julia's package management
system, the more I liked it, and the more I came to see what a good
fit it would be for jq. I certainly hope that you will take the time
to become more familiar with it, not least because I'd appreciate it
if someone would tell me whether deviating from Julia on a particular
point does or does not make sense.

Let me give a small illustration of why I think Julia's choice on a
specific issue was the right one. You wrote:

 "I'd prefer ~/.jq/packages/shazam/x.y.z/ so that version x.z.a
 would go in a parallel subfolder."

This is in contrast to what I wrote:

 "a registered package named Shazam intended to work with v1.4 of
 jq (and corresponding to a git or mercurial repository of the
 same name) will be installed in ~/.jq/v1.4/Shazam/"

From what you wrote, it seems that you have rejected the idea of using
the "each package is a DVCS repository" approach. It is true that
that approach introduces a dependency on a DVCS system (in practice it
would be, or at least include, git), and that therefore there is an issue as
to whether ALL registered jq modules MUST be available as git
repositories. These are questions which are definitely worth
debating! If there is a consensus that neither jqpm nor the putative
Pkg module should be dependent on git, then that would certainly be
consequential!

Or perhaps you misunderstood a key point here - that the "v1.4" is the
two-digit jq version number, not the Shazam version number.
There are several subtleties here, but one of them is that if someone
wants to work with multiple versions of jq simultaneously, they can.

@joelpurra
Copy link
Contributor

@pkoppstein:

From what you wrote, it seems that you have rejected the idea of using
the "each package is a DVCS repository" approach.

Nope - I like git/hg repositories, and like tagging them with semver-formatted "v1.2.3" tags, and like that dependencies' version ranges point to such tags. But I see a problem in using a single directory per jq-version as opposed to package-version. Git can only handle one checked out commit (be it a tag or master) at a time, so this effectively would force all jq projects running on a single machine to use the same checked out version of a shared package. That is not an "acceptable" limitation to me, at this stage of developing a package manager. (Hope I didn't misunderstand Julia here.)

Or perhaps you misunderstood a key point here - that the "v1.4" is the
two-digit jq version number, not the Shazam version number.

No misunderstanding. A user-level ~/.julia/v0.3/REQUIRE style file puts severe limitations on the ability to have multiple projects running concurrently. Yes, I know that there would different folders per jq version, but if jq isn't release as often as I work on my projects and packages (daily), I will end up with collisions between my own projects unless I keep them all in step.

Assuming jq doesn't have a new release during the night:

  1. Today I write a jq project B with dependency X at version 1.0.0.
  2. Tomorrow I write a jq project C and discover that X has released 2.0.0. Yay! Or?
    • I don't want to use X 1.0.0 in C because the new (breaking) features are much, much better and would save me time and money.
    • I can't use X 2.0.0 in C unless I either upgrade all of B with all breaking changes.
  3. The day after I run my old jq project A and discover that it is broken due to project C (and possibly B) upgrading to X 2.0.0.
  4. After taking the time to upgrade the code in C, I discover that the dependency Y is also dependent on X 1.0.0. Damn.

I know that having just the one version of a library available system-wide is common in some languages. This is why I'm pointing towards npm, where any versions can exist in parallel, and even execute concurrently. Again, since a package's dependencies are private to the package in terms of scoping (as opposed to a single package instance in memory shared across all running code in a project), problems are greatly reduced.

There are several subtleties here, but one of them is that if someone
wants to work with multiple versions of jq simultaneously, they can.

Yes. This is why I pointed out the engine property of package.json.

@pkoppstein
Copy link
Contributor Author

@joelpurra wrote::

so this effectively would force all jq projects running on a single machine to use the same checked out version of a shared package.

jqpm installs packages on a per-user basis, not a per-machine basis.

If a single user wants to have two processes running concurrently using the same version of jq but different combinations of module-version numbers, then he or she should probably have two different accounts, just to retain his or her sanity. Better yet would be to avoid those troublesome modules until they're fixed.

I'd prefer that unregistered/unnamespaced packages aren't "system-global" or "user-global" reference-able as package dependencies, but require a more specific path (relative/absolute folder, git/hg).

jqpm's goal is to be able to work seamlessly with jq (and in particular, the "import" statement), without requiring any configuration.(*) Currently jq's import statement is very restrictive. I believe some of the complexity you have in mind would require significant changes on the jq side. Anyway, it might be helpful if you could discuss the assumptions you're making about jq and especially the "import" statement.

(*) E.g.:

1. jqpm install Shazam
2. jq 'import Shazam; ....'

Of course, so long as jq cannot itself run git/hg, directly or indirectly, there will be limitations on what a jq program can do by itself.

@pkoppstein
Copy link
Contributor Author

@joelpurra - Would you have time to write a simple (bash?) script to prototype a jq-oriented package manger that actually uses npm (or bower)? Let's call such a script jqnpm (or jqbower). The main idea would be to use the fact that jq now heeds the JQ_LIBRARY_PATH environment variable. As @wtlangford pointed out, this allows us to simulate a certain degree of integration between jq and the package manager.

If you were able to do that, it would be tremendously helpful in several respects. For example, I think it would give greater insights into the pros and cons of actually using npm (or bower) in this context, and it should help pinpoint the changes (if any) that would be required on the jq side under various scenarios.

Thanks!

@pkoppstein
Copy link
Contributor Author

@wtlangford wrote:

Well, yes and no

(I'd also appreciate @nicowilliams' comments about the following.)

The work on the jqpm protoytpe is coming along quite nicely (from a certain perspective), but I'd like to make sure that we're all on the same page regarding the goal that, when given "import Shazam", jq will be able to locate "Shazam.jq" very quickly and without any user-initiated configuration.

There are actually two related issues here but let's review the main scenario first.

  1. Let's assume the official (blessed) metadata repository is at github.com/jq/jq.metadata (the name mostly doesn't matter), and that the Shazam package has been registered there. (The Shazam package itself might be on bitbucket.)
  2. Let's also assume that a user has successfully installed the Shazam package locally. Under the covers, it's in ~/.jq/v1.4/ because the user is using jq v1.4. Thus there is now a directory ~/.jq/v1.4/Shazam
  3. The Shazam.jq file is somewhere UNDER ~/.jq/v1.4/Shazam/.

Of course we could use JQ_LIBRARY_PATH to tell jq where to look for Shazam in particular, but for numerous reasons, we don't want to go down that path.

One possibility to consider would be to allow * in JQ_LIBRARY_PATH. E.g.

JQ_LIBRARY_PATH=~/.jq/v1.4/*/src

There are two problems with that:

  1. It would be slow (if there are many installed packages)
  2. It would defeat the purpose of the "v1.4" component of the pathname: we do not want jq 1.5 to look under v1.4

Luckily, the solution is trivial:

[PROPOSAL:] Unless otherwise instructed, when given "import Shazam", 
version M.N of jq has only to check whether ~/.jq/vM.N/Shazam/src/Shazam.jq exists.  

That is, no search should be required to find an installed registered package.

The other part of the question here concerns "unblessed" metadata repositories. Suppose, for example, that github/shady/jq.metadata is such a repository, and that there is a Shazam package (maybe the same one as before, maybe not) registered there too. Then jqpm will happily install the Shazam package registered at shady in ~/.jq/shady/Shazam/.

Let's call the "shady" segmant of the path the "channel".

Personally I don't see the need for jq to provide any special support for unblessed channels, EXCEPT that it would be nice if there were an option to specify which directory under ~/.jq it should regard as the "blessed" directory. This would be useful for testing purposes. Once one has a "--blessed NAME" option, though, the question arises as to whether one should be able to use it to specify more than one channel.

Anyway, to summarize the main points:

  1. Assuming Shazam is a registered package in the "blessed" metadata repository for jq version M.N, and that it has been properly installed, jq version M.N should be able to "import Shazam" efficiently without requiring any search and without requiring explicit mention of Shazam anywhere other than the "import" statement itself.
  2. The current version of jqpm ensures that when the Shazam package for jq version M.N is installed, there is a file: ~/.jq/vM.N/Shazam/src/Shazam.jq
  3. A "--blessed NAME" option (by that or some other name) would be helpful.

@joelpurra
Copy link
Contributor

@pkoppstein:

@joelpurra - Would you have time to write a simple (bash?) script to prototype a jq-oriented package manger that actually uses npm (or bower)?

I'll see if I can squeeze it in, but can't promise.

The work on the jqpm protoytpe is coming along quite nicely (from a certain perspective), but I'd like to make sure that we're all on the same page regarding the goal that, when given "import Shazam", jq will be able to locate "Shazam.jq" very quickly and without any user-initiated configuration.

I'd rather see that the dependency has a config file (say jq.json) that defines the module entry point: "main": "./src/completely-separate-from-module-name.jq".

Oh, and I forgot to mention that when you're talking about ~/.jq/v1.4/Shazam/, I see that as a globally installed jq "program" (#!/usr/bin/env jq) or a dependency of such a program. I imagine most jq projects only needing folder-locally installed dependencies in ./.jq/Shazam or similar.

In the words of the npm creator, @isaacs: http://blog.nodejs.org/2011/03/23/npm-1-0-global-vs-local-installation

In general, the rule of thumb is:

  1. If you’re installing something that you want to use in your program, using require('whatever'), then install it locally, at the root of your project.
  2. If you’re installing something that you want to use in your shell, on the command line or something, install it globally, so that its binaries end up in your PATH environment variable.

@pkoppstein
Copy link
Contributor Author

@joelpurra wrote:

I'd rather see that the dependency has a config file (say jq.json) that defines the module entry point: "main": "./src/completely-separate-from-module-name.jq".

In a sense, "Shazam.jq" is the config file. It can do whatever it likes. Of course, that is at the moment quite limited, but the work on enhanced I/O is well underway, and it seems likely that the "module" directive will also allow a JSON component.

(On the last point I'd like to ask @nicowilliams and @wtlangford: has there been any discussion about how a jq program will be able to access a module's JSON component?)

In any case, if there were a separate per-package jq.json file, jq would have to (a) know how to find it; and (b) know what to do with it. Both would require changes to jq beyond those that have been under discussion.

Also, please note that for the time being, I would like to avoid deviations from the Julia model unless either (a) there is general agreement that it is necessary for technical reasons or for compatibility with jq 1.5; or (b) the folks at Julia plan a deviation themselves :-)

Thanks!

@nicowilliams
Copy link
Contributor

@joelpurra Modules would be libraries -- only defs, no top-level. Though they could define a main def that one could use as the whole of a jq program -- that'd be a fine convention.

Modules that are meant to be programs should just go in a bindir somewhere. It might be a good idea to have one pkg manager for library and program modules.

@pkoppstein

Hmmm, well, there's a lot to think about here.

For now I'm working on this:

# Modules start with:
module <name> <metadata>;
...

where the name is an identifier (though, is it necessary? we can get it from the file path/name), and the metadata is a constant-valued jq expression. Then:

import <name> <metadata> ...;

where is of similar form to module metadata and must "match" for some definition of "match".

The metadata would be an object, obviously, with keys like "version" with an object value with keys like "major", "minor", and "micro" with numeric values. Though I'd also allow "version" to have a numeric value instead, in which case a) the major version is part of the module name (as I think it should be), b) the minor version is the floor of the number, and c) the micro version is the decimal portion of the number. Matching would be: name and major version (if given) must match, minor must be at least the given one, and micro must be at least the given one if minor matches exactly.

There'd be a builtin which takes module names (strings) as inputs and outputs their metadata. I.e., the metadata is inside the module and jq will let you read it. Another / the same builtin would allow you to read dependency metadata. The pkg manager could (would) then use this to query modules' metadata: their version and dependency information.

I'm also thinking of a meta-module named jq by which to express dependencies on jq versions.

I don't understand the "no search" comment.

I don't want to have to have a jq version in the search path. I don't think jq should change so drastically that this should be necessary. It will suffice to have semantic versioning for modules and a way to express dependencies on jq versions.

In the jq libdir search list we'd have a directory per jq module, with the module code living in a .jq file.

Putting it all together we'd have:

$ cat $HOME/.jq/foo/foo.jq
module foo {version: 1.3}
import jq {version: 1.4};
import bar {version: {minor:3,micro:1}};
...
$ cat /usr/lib/jq/bar/bar.jq
...

$ jq -nc ' "foo" | moduleinfo '
{"name":"foo","version":1.3}
$ 

Arbitrary metadata would be allowed. Missing version info would be allowed too, why not.

What if we need multiple versions of a module installed? What do we do about DLL Hell?

A version museum might be necessary. In which case we probably do need a search by version, and we'd have a hierarchy a bit more like:

.../foo/v1.3/foo.jq   <--- specific version
.../foo/foo.jq        <--- default version

It's important to have the module name match a directory in the filesystem: so that it can be a git/whatever repo/workspace if that's desired. But I'd also allow .../foo.jq for locally-authored modules -- low ceremony == good.

Deep hierarchies would be allowed. It's a small change to the lexer to allow arbitrary components separated by ::.

Thoughts?

@nicowilliams
Copy link
Contributor

FYI, I have module and modulemeta working, though the latter does not yet return dependency information, and I've not yet added dependency metadata to the import directive (nor, therefore, semantic versioning when searching for dependencies).

@nicowilliams
Copy link
Contributor

I think it will be a short hop and a skip to do the remaining bits of core work.

@nicowilliams
Copy link
Contributor

I have all of that working, save for new tests and docs. The import directive's syntax has changed. There's no more search option, instead there's a metadata constant expression option, which is expected to be a constant object specifying the search path and/or version information. The modulemeta builtin now returns the module's metadata and its import metadata.

@nicowilliams
Copy link
Contributor

@nicowilliams
Copy link
Contributor

On Mon, Aug 18, 2014 at 11:38 AM, Nico Williams nico@cryptonector.com wrote:

Also, I am still puzzled by your decision to break with tradition and
@wtlangford's work regarding being able to load PATHNAME/Foo.jq via "-L
PATHNAME". If we are to go down the "-L PATH" path, then a reasonable
search order for "-L PATH" would be:

Huh? -L... never did that, and I didn't change that behavior anyways,
certainly not on purpose. @wtlangford?

Oh, I misunderstood you. Anyways, as I explained, I want a
non-versioned location for non-versioned local modules, but then I
have to worry about ambiguities, such as "any::foo" vs. "foo" in the
unversioned location.

I can make the ambiguity go away, by getting rid of the "any" location
and making the versioned location always be a non-identifier name. I
thought of that but I found it weird, and anyways, the import code
hadn't shipped, so I felt free to make this change.

@wtlangford, thoughts?

@pkoppstein
Copy link
Contributor Author

@nicowilliams wrote:

Oh, I misunderstood you.

Just to be clear, the main issue with -L at the moment is illustrated by this test case:

$ jq --version
jq-1.4-132-g287e2c9
$ file /tmp/Foo.jq
/tmp/Foo.jq: ASCII text
$ jq -n -L /tmp -f Foo.jq
jq: Could not open Foo.jq: No such file or directory

I want a non-versioned location for non-versioned local modules

So in the following, I'll assume that you want to have a single location for both "non-versioned" packages and modules. Is that assumption correct?

Regarding "any" and the potential for ambiguity:

First, I like your choice of the name "any". Assuming you still want a single "-L"-based mechanism for finding files and modules, then I think the following search order for finding Foo.jq given "-L PATH" would be OK:

  1. PATH/Foo.jq
  2. PATH/any/Foo/Foo.jq
  3. PATH/VERSION/Foo/Foo.jq # where VERSION is e.g. 1.4

That way, "import any::bar ..." would cause jq to go hunting in:

1a. PATH/any/bar.jq
2a. PATH/any/any/bar.jq # see (*) below
3a. PATH/VERSION/any/bar.jq # see (**) below

That actually seems fine to me, but to alleviate any concerns about this, it is worth point out that:

(*) The use of the name "any" for a package name could be disallowed, in the sense that jq could simply skip both 2a and 3a in the special case of "import any:_".

(**) The maintainers of the official metadata repository could simply ensure that (3a) would not normally exist.

Pragmatically, I think that (**) should be more than enough.

@nicowilliams
Copy link
Contributor

On Mon, Aug 18, 2014 at 11:22:25AM -0700, pkoppstein wrote:

@nicowilliams wrote:

Oh, I misunderstood you.

Just to be clear, the main issue with -L at the moment is illustrated
by this test case:

$ jq --version
jq-1.4-132-g287e2c9
$ file /tmp/Foo.jq
/tmp/Foo.jq: ASCII text
$ jq -n -L /tmp -f Foo.jq
jq: Could not open Foo.jq: No such file or directory

The -f argument does NOT search the library path. After all, the -f
argument is about a program, not a library file.

This has nothing to do with the "any" thing.

I want a non-versioned location for non-versioned local modules

So in the following, I'll assume that you want to have a single
location for both "non-versioned" packages and modules. Is that
assumption correct?

Not a "single" one, no. I want locations that aren't versioned so that
casual users needn't version their modules.

Regarding "any" and the potential for ambiguity:

The ambiguity was the result of choosing "next" for the
built-from-unreleased-code case. By using
${last_released_version}-master the ambiguity goes away (because
there's a "." in the version, which makes that not a valid part of a
valid module name).

I'll just remove "any" then.

@pkoppstein
Copy link
Contributor Author

@nicowilliams

The -f argument ...

Whoops, I gave the wrong example. Sorry about that. See (**) below. At the moment I'm more concerned about non-versioned local modules. On the one hand, you wrote:

I want a non-versioned location for non-versioned local modules

But then you wrote:

I'll just remove "any" then.

To support non-versioned packages properly, jq has to be able to find them, and the idea is that jq should be able to do so without requiring any special configuration. That's why I like your idea of using the name "any", but the particular name is not as important as the principle.

(**)

$ cat /tmp/Foo.jq
module Foo {};

def hi: "Hello from /tmp/foo.jq";

$ jq --version
jq-1.4-132-g287e2c9

$ jq -L /tmp 'import Foo; hi'
jq: error: module not found: Foo

jq: 1 compile error


$ mv /tmp/Foo.jq /tmp/any

$ jq -n -L /tmp 'import Foo; hi'
"Hello from /tmp/foo.jq"

@wtlangford
Copy link
Contributor

So, I'm back from a long couple of weeks. There's a lot happening here and that's quite exciting!

Looks to me like we've got an issue where we aren't sure how to handle modules that don't care about the jq version they were installed under/for?
I'm still a little unsure why the jq version is really necessary, though. If we intend to be using versions to identify which version of a module to load, then we're only interested in the module's version.

Have I missed something?

@pkoppstein
Copy link
Contributor Author

@wtlangford asked:

Have I missed something?

Welcome back! The short answer to your question is "yes". It would be difficult to recapitulate everything, but I would ask that you look at the Julia package management system (http://julia.readthedocs.org/en/latest/manual/packages/) if you haven't already done so. The jqpm prototype that I've been working on is still closely based on that. And for good reason.

@nicowilliams doesn't like my use of the word "channel" in this context but an important distinction to be made is between different "channels" of package metadata. (This is not to be confused with the different "providers" of the packages themselves.) The idea is that there could be at least one "official channel" for Package metadata, as well as many "unofficial ones".

It is still my hope that jq and whichever package manager is ultimately used (whether implemented in jq or not) will work together seamlessly, and with NO configuration required to use "officially sanctioned" packages (i.e. packages whose metadata is included in the "official channel").

(@nicowilliams has used the term "blessed" in this context.)

p.s. See also https://sublime.wbond.net/docs/channels_and_repositories

@nicowilliams
Copy link
Contributor

I did? Anyways, take a look at the commits I pushed tonight.

@nicowilliams
Copy link
Contributor

@pkoppstein I should say I didn't remove "any/", I just made it the empty string. So now jq searches in these locations for each directory in the search path:

$dir/mod.jq
$dir/mod/mod.jq
$dir/1.4-master/mod.jq
$dir/1.4-master/mod/mod.jq

When 1.5 ships it will be:

$dir/mod.jq
$dir/mod/mod.jq
$dir/1.5/mod.jq
$dir/1.5/mod/mod.jq

And master after that will use:

$dir/mod.jq
$dir/mod/mod.jq
$dir/1.5-master/mod.jq
$dir/1.5-master/mod/mod.jq

@pkoppstein
Copy link
Contributor Author

@nicowilliams wrote:

Anyways, take a look at the commits I pushed tonight.

Thanks for restoring the normal -L behavior. I like the "*-master" convention too. However I am troubled by (2) in the search order:

(1) $dir/mod.jq
(2) $dir/mod/mod.jq
(3) $dir/1.4-master/mod.jq
(4) $dir/1.4-master/mod/mod.jq

First, that is a recipe for cluttering .jq.
Secondly, it makes it very difficult for any package management system to deal with multiple metadata repositories.

In any case, I would appreciate it if you could let me know the answer to the following question. If "snoopy" is a metadata repository that provides metadata for a package named Peanuts, and if "linus" is a metadata repo that provides metadata for a package named Blankie, then where would you recommend the two installed packages go (e.g. as a matter of best practice)? If jq were to include a package management system, what would you want it to do by default in this case?

Incidentally I think that (3) above is also ill-advised. The following would be both (a) shorter and (b) more attuned to packages:

(1) $dir/mod.jq # module-oriented
(2) $dir/any/mod/mod.jq # package-oriented
(3) $dir/1.4-master/mod/mod.jq # package-oriented

@nicowilliams
Copy link
Contributor

@pkoppstein The non-versioned locations are not meant for package managers; they are meant for casual users., so I don't see why (2) should trouble you on that account.

I will consider removing (3), though it does mean more code. Package managers have to deal with conflicts anyways. But at least that'd be one fewer stat.

@pkoppstein
Copy link
Contributor Author

@nicowilliams wrote:

at least that'd be one fewer stat.

Exactly. The same logic applies to (2) too. Really. Has anyone asked for it?

Also, if you drop (2) and (3) now, then after the dust settles, you'll be able to add a third location without having to worry about all the stats :-)

p.s. You didn't answer my main question, which was not intended to be about conflicts, by the way.

@nicowilliams
Copy link
Contributor

Yes, possibly.

As to your question. I'm not sure. Consider having a different on-filesystem install location for each repo:

/opt/jq/lib/snoopy
/opt/jq/lib/linus

But then you can't mix and match easily.

Consider instead having a single install location and a pkg system that knows how to mix and match, and manage conflicts:

/opt/jq/lib

But now the pkg manager has to be smarter, and your choice of module mix from each repo has to be global to that location.

Consider instead having the repo name in the import directive. Well, we do have that, with the search key for the import metadata, but we'd really need a repo key as well, and we'd have to have this convention for search locations:

$dir/${repo:-some_default}/...

This last gives you the most power, but at the price of making imports (at least from programs, since modules can use search:"$ORIGIN/") have to specify a repo.

And now, for the non-versioned location we'd have an ambiguity problem once again, unless we required repo names to not be identifier-like.

There's one more option, which is to do what Linux distros do: you can have any number of repos, but you can't really mix and match as there's a single namespace, and more-blessed repos take precedence over less-blessed repos. (Oh, yes, I probably did say something about blessed-ness earlier.)

My brain hurts thinking about this. Partly because we lack requirements, and partly because I can't say that I've yet used a language that got this just right.

I'm inclined to think that the only approach that works well is the single-namespace, many-repos-with-pecking-order approach. But it's true that we have a chance to do something different, and the repo import metadata approach is the only one that I think I would consider instead.

You're right that we need to get this nailed down soon, before 1.5 ships, as otherwise it will be harder to fix later.

You preference?

@nicowilliams
Copy link
Contributor

I should add that the one nice thing about the single-namespace approach is that it's easier to find modules and docs that way.

We could also play with the lexer's definition of IDENT, since the foo::bar::baz bits haven't shipped yet. We could make it so the first component of a module name can't be all-caps, or that none can be (except the last one, since that might be a def name, which are allowed to be all-caps now). Or some variant (e.g., first component can't start with an underscore). Or heck, repo names on-filesystem could have to start with a hyphen, or what percent, or some such character.

@joelpurra
Copy link
Contributor

@wtlangford:

I'm still a little unsure why the jq version is really necessary, though. If we intend to be using versions to identify which version of a module to load, then we're only interested in the module's version.

Yes, me too. And I think a package should be able to define its own jq compatibility level.

Have I missed something?

Lots. I've entered the discussion preaching a node/npm style package manager. To me it seems the Julia option is far from great, and that the rest of the discussion shows that people are too used to ancient systems with only a single version of a package available at any one time and having to constantly adjust themselves (and each jq script they might run) to that. "DLL Hell", was it? I think the path will lead to people putting version information in the package names ("shazam-2") instead of in a metadata file ("shazam/jq.json") or git tags. But this is when we're talking about a single package storage instead keeping dependencies in a subfolder of the current jq script folder, which is the preferred way. We're talking a couple of kilobytes here, which would save a lot of people from headaches.

So, since the discussion is moving faster than I have time to code, I'm out. I still see the option of abusing the -L (or even -f) flags to build my own package manager, perhaps by the time people get version collisions across their own packages and projects.

@nicowilliams
Copy link
Contributor

@joelpurra

I'm concerned about your comment about the discussion moving faster than you have time to code. Please come back into it when you have time. A 1.5 release is still a while away.

IMO major version numbers belong in the module name. The reason is simple: a) backwards incompatible changes make the module different anyways, b) providing multiple major versions with one module is out (because the namespace within is too flat), but c) it's nice to be able to have multiple major versions installed. Minor and micro versions are supposed to be compatible, so just one version installed is a reasonable thing to expect -- except that developers aren't good enough at preventing backwards incompatible changes from sneaking in! But putting every module's version number in the filesystem path will yield something fairly unwieldy :(

The jq version number in the path thing was @pkoppstein's idea and I forget how I came to be convinced it's not a bad idea, but I'm still unsure about it. I think partly the idea is that different versions of jq could coexist with each other and a pkg mgr. But yeah, I'm thinking it's more annoying than helpful.

@pkoppstein
Copy link
Contributor Author

@joelpurra wrote:

I think a package should be able to define its own jq compatibility level.

That is supported by both julia and jqpm.

@wtlangord: Please note that it is important to distinguish between modules and packages. A package need not contain any modules. A module need not be in a package. A "package" can be thought of as the contents of a single directory, or perhaps as a single git or mercurial repository.

Since the package management system will no doubt itself evolve (in tandem with jq), a case could be made for having EVERYTHING related to packages being under ~/.jq/M.N. I'd be fine with that, but these and many other options seem to have been precluded by various constraints that @nicowilliams imposed.

In the following, I'll use the vocabulary of "channels" to avoid confusion with "providers". A channel is repository of package metadata; a "provider" is the source of a package. You can think of distribution channels, but the package manager should be able to handle both public and private metadata repositories.

One of my main goals is to ensure that jq and the package management system can handle multiple channels seamlessly, and that no jq configuration is required to handle "blessed" channels. That is so that jq can have a "standard library" that need not be distributed with jq itself.

@joelpurra
Copy link
Contributor

@nicowilliams:

I'm concerned about your comment about the discussion moving faster than you have time to code. Please come back into it when you have time. A 1.5 release is still a while away.

A POC npm style package manager script (plus jq wrapper script) doesn't sound too hard to create - it's just a matter of having/taking the time. I don't have a lot of time to spare.

IMO major version numbers belong in the module name.

Not at all. This is why semantic versioning exists and why should be a folder per each and every package version that has been installed user-/system-globally, which is dynamically referenced by a jq project's ./jq.json. But again, this is only for globally installed programs. (This is even if the package was loaded from a git/hg repository - that is not an issue, local parallel execution version selection is.)

I don't think you guys are considering project-local dependencies subfolders ./.jq/... and recursively installing dependencies enough. These folders need no version information in their names. I keep that as a separate thing. We're still talking a few kilobytes of storage space (in case you were worried about running out), maintained by a package manager.

I don't think that the import foo::bar::baz convention is good at all. Use strings for flexibility; they can point to package names, to relative paths, to namespaced packages etcetera.

What a "jq program" and what a "jq project" could be:
http://blog.nodejs.org/2011/03/23/npm-1-0-global-vs-local-installation

Where things end up:
https://www.npmjs.org/doc/files/npm-folders.html

How to specify a dependency's version range with a nice ruleset:
https://github.com/npm/node-semver

So, this has taken over an hour of my day already - time which I take from finishing my master's thesis. I still have strong feelings regarding proper packager manager design, and think it's a shame to develop a brand new system based on the current direction. Since there seems to be no other way to convince you other than programming an alternative, we'll just have to wait and see if/when that happens.

@pkoppstein
Copy link
Contributor Author

@joelpurra wrote:

Since there seems to be no other way to convince you other than programming an alternative ...

I think it would be very helpful if the implications for jq itself could be clarified. That is, if the package manager is implemented separately from jq, then what requirements are there on jq in order for (a) the two to interoperate seamlessly; and (b) for "import Shazam;" on the jq side to be able to find and load a "blessed" package named Shazam that has been installed by a user (without sudo) using the package manager, without ANY jq command-line directives (i.p., no -L options), and without any configuration of jq (i.p., no setting of JQ_LIBRARY_PATH).

@pkoppstein
Copy link
Contributor Author

p.s. - @joelpurra has emphasized the need for the package management system to be project-aware. This can be trivially accomplished within the current jqpm (Julia-based) framework simply by adding a flag, e.g. "--project FOLDER". That would, in effect, direct jq and jqpm to use FOLDER/.jq instead of ~/jq.

Thus I think the current jqpm (Julia-based) approach meets what I understand to be Joel's core requirements (semantic versioning; support for multiple concurrent projects), as well as some of the preferences, e.g. use of DVCS repositories and avoiding the development of a brand new system.

@joelpurra
Copy link
Contributor

@pkoppstein:

I think it would be very helpful if the implications for jq itself could be clarified.

Yes, I did that before. require(...) kind of matches import .... One difference is that require returns a reference to the imported module's exported api, which is determined by executing the module at runtime. Please look at these links again.
http://nodejs.org/api/modules.html
http://nodejs.org/api/modules.html#modules_all_together

What I think would be necessary in jq, is that import directives are scoped. This would mean that if shazam imports foobar, and foobar imports foobaz, then shazam would not be able to reference foobaz - only the public interface of foobar.

This also means that when jq is processing foobar during importing, it is starting off with a clean environment. foobar shouldn't be able to affect the global state and in that way affect shazam or any other parent or sibling packages. (At least not in the first iteration of development, and later only explicitly if deemed necessary by analyzing the flora of packages that have been released for possible improvements.)

@joelpurra
Copy link
Contributor

@pkoppstein:

This can be trivially accomplished within the current jqpm (Julia-based) framework simply by adding a flag, e.g. "--project FOLDER". That would, in effect, direct jq and jqpm to use FOLDER/.jq instead of ~/jq.

I strongly prefer inverting the default, and require a --global flag for non-local installations.
https://www.npmjs.org/doc/cli/npm-install.html

  • npm install installs all dependencies from ./package.json locally.
  • npm install shazam finds and downloads the latest shazam and installs it locally.
  • npm install --global my-node-program would install a user-global package, and if determined by looking at ~/.npm/.../my-node-program/1.2.3/package.json, also link any programs/executables to {prefix}/bin.

Thus I think the current jqpm (Julia-based) approach meets what I understand to be Joel's core requirements (semantic versioning; support for multiple concurrent projects), as well as some of the preferences, e.g. use of DVCS repositories and avoiding the development of a brand new system.

Not all of my core requirements. It is also leading down a dark path where there's only a single version of any package available to a user executing a jq script as everything is installed user-globally and without version-named subfolders. (Not talking about jq version at all.) This is bad.

There was something else too, but lost the thought.

@pkoppstein
Copy link
Contributor Author

@joelpurra wrote:

there's only a single version of any package available to a user executing a jq script as everything is installed user-globally and without version-named subfolders.

No!!!! You evidently missed the point about the "--project PATH" flag. There will typically be a set of installed packages under ~/.jq, but each project can have its own set of installed packages that need have no relationship at all with each other or with the standard set under ~/.jq.

Suppose I have projects P and Q in ~/projects/P and ~/projects/Q. Then it would make sense to use "--project ~/projects/P" and "--project ~/projects/Q". This would mean that one set of jq-related packages and repositories would live under ~/projects/P/.jq; and another under ~/projects/Q/.jq. These would be in addition to and independent of whatever there may under ~/.jq. Each project can have its own packages, each "checked out" as appropriate for the project.

For "system-wide" projects, one could specify "--project /usr/local/projects/P" or some such.

This scheme would also allow you to have more than one environment per project (e.g. one jq environment per branch).

@nicowilliams
Copy link
Contributor

On Thu, Aug 21, 2014 at 12:37:36AM -0700, Joel Purra wrote:

A POC npm style package manager script (plus jq wrapper script)
doesn't sound too hard to create - it's just a matter of having/taking
the time. I don't have a lot of time to spare.

That's fine. I'll see if I can make one.

folder per each and every package version that has been installed
user-/system-globally, which is dynamically referenced by a jq
project's ./jq.json. But again, this is only for globally installed
programs. (This is even if the package was loaded from a git/hg
repository - that is not an issue, local parallel execution version
selection is.)

Thanks for your very clear explanation. I think jq is almost there, if
you'll bear with me.

I don't think you guys are considering project-local dependencies
subfolders ./.jq/... and recursively installing dependencies enough.

Although... with $ORIGIN/ you get to do just that:

$dir/                   # Some global libdir for jq modules, say

$dir/foo/foo.jq         # Imports bar from $ORIGIN/

$dir/foo/bar/bar.jq     # This is 'bar' when imported by 'foo' using
                        # $ORIGIN/, not 'foo::bar'!
                        #
                        # Though oddly enough it it also be
                        # reachable as foo::bar by anything that
                        # searches $dir.

$dir/bar/bar.jq         # this can be imported as 'bar'

How cool is that?

I.e., foo.jq might say:

import bar {search:"$ORIGIN/"};

We can even make it so bar cannot be accessed as foo::bar elsewhere
like so:

$dir/foo/foo.jq
$dir/foo/.priv/bar/bar.jq # Because the directory containing 'bar' has
                          # a '.' in it it's not a valid IDENT.

And foo.jq would now have:

import bar {search:"$ORIGIN/.priv"};

And, the top-level bar can even depend on a private foo as well,
which might or might not be the same as the top-level foo.

Well, except for one thing: we treat the module names just like the C
run-time linker/loader treats ELF SONAMEs: if bar is already loaded
when foo gets loaded, then foo will get the loaded bar. We could
fix this by putting bar's version number in the SONAME (which I will
do).

$ORIGIN/ is a really, really good thing; we copied it for a reason :)

These folders need no version information in their names. I keep that

Right. Good. I would not want to have to write portable directory
scanning code to deal with match version numbers in file/directory
names!

as a separate thing. We're still talking a few kilobytes of storage
space (in case you were worried about running out), maintained by a
package manager.

Indeed.

I don't think that the import foo::bar::baz convention is good at
all. Use strings for flexibility; they can point to package names, to
relative paths, to namespaced packages etcetera.

Are you saying the use of an identifier is bad, the use of '::' is bad,
or both? Or perhaps you're concerned about how to make module-local
modules work? If the last, see above.

What a "jq program" and what a "jq project" could be:
http://blog.nodejs.org/2011/03/23/npm-1-0-global-vs-local-installation

Where things end up:
https://www.npmjs.org/doc/files/npm-folders.html

How to specify a dependency's version range with a nice ruleset:
https://github.com/npm/node-semver

Thanks, these are useful.

So, this has taken over an hour of my day already - time which I take
from finishing my master's thesis. I still have strong feelings
regarding proper packager manager design, and think it's a shame to
develop a brand new system based on the current direction. Since there
seems to be no other way to convince you other than programming an
alternative, we'll just have to wait and see if/when that happens.

We're hardly done, and we're very much open to suggestions.

@nicowilliams
Copy link
Contributor

On Fri, Aug 22, 2014 at 02:04:26AM -0700, Joel Purra wrote:

What I think would be necessary in jq, is that import directives are
scoped. This would mean that if shazam imports foobar, and
foobar imports foobaz, then shazam would not be able to
reference foobaz - only the public interface of foobar.

This also means that when jq is processing foobar during importing,
it is starting off with a clean environment. foobar shouldn't be
able to affect the global state and in that way affect shazam or any
other parent or sibling packages. (At least not in the first iteration
of development, and later only explicitly if deemed necessary by
analyzing the flora of packages that have been released for possible
improvements.)

jq programs have no global state anywhere. Not programs, not modules,
no jq code keeps global state, full stop.

Variables are read-only. You can only create new bindings that can
shadow previous ones. And the only ways to keep modifiable local state
are:

  • reduce
  • foreach (new in master)
  • constructs using reduce/foreach
  • via recursion (pass your state as input to the same function)

Values are also read-only. When you use the assignment operators those
output a new value that (notionally) is a modified copy of their input.

(I say notionally because in the common case no copy is needed, so that
the operation is cheap. The copy is notionally a deep copy, but in
practice even when a copy is needed only the nodes in the path to the
node that is modified are copied.)

:)

@nicowilliams
Copy link
Contributor

On Fri, Aug 22, 2014 at 02:15:27AM -0700, Joel Purra wrote:

I strongly prefer inverting the default, and require a --global flag
for non-local installations.

Because of $ORIGIN/ (and once I fix the "SONAME" construction, or
perhaps simply drop the attempt to cache compiled modules) you basically
get control over this today (in master). All you need to do is say:

import foobar {search:"$ORIGIN/"};

If that seems to obscure/verbose I might add a short-hand like so:

import foobar {location:"private"};
import baz    {location:"global"};

or

import foobar {private:true};
import baz    {};

@nicowilliams
Copy link
Contributor

  • I'm thinking that I should make the default search path for all imports be ["$ORIGIN/"] + $JQ_LIBRARY_PATH. Then import foo; will do the right thing if foo is "local" to the module doing the import.
  • I'll drop the jq version number business.
  • I'll come up with a better compiled-module cache key. On POSIX systems we could use {st_dev, st_ino}, elsewhere we could use the last component of the module's name and the version metadata from its module directive -- or just memcmp() the modules' name then the modules' source code. Or even just drop the cache, as it's only an optimization.
  • I'll add semantic version matching. I'll include support for decomposed version numbers, but I'll probably also have support for real numbers as major-less version numbers.

@nicowilliams
Copy link
Contributor

You can see some of this work in progress in the modsys3 branch of my fork of jq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants