Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically figure out dependencies for the project #165

Closed
f-f opened this issue Mar 27, 2019 · 14 comments
Closed

Automatically figure out dependencies for the project #165

f-f opened this issue Mar 27, 2019 · 14 comments

Comments

@f-f
Copy link
Member

f-f commented Mar 27, 2019

Currently when one adds a package to the packages.dhall the expectation is that spago will also pick it up as a dependency...but no, build dependencies need to be specified in the spago.dhall, as the packages.dhall only contains the pool of available packages.

I think this is not good UX and the user should not need to specify the dependencies in spago.dhall, so the idea here would be to figure out the package names of the dependencies we're using from the import statements in the source files.
We can do this because by definition of package set a module name can refer to at most one package.

How to do this: there was an experiment for this in psc-package from @kritzcreek here, calling the compiler to parse the files, get out the import statements, and figuring out the packages from there.

If we're able to do this reliably then we could:

  • remove the dependencies key in the spago.dhall
  • remove the separation between "dependencies" of your project and "packages in the package set" - every package in the package set would be importable right away
  • unify the two configuration files since there wouldn't be much confusion between the two sets of "dependencies"
@kritzcreek
Copy link
Member

In order for this to work without having to download all the packages in the package set your project is using, you'd also need to add metadata to a package-set file which contains the mapping from PackageName -> [ModuleName].

That metadata would need to be rebuilt whenever you merge a PR into the packageset.

@f-f
Copy link
Member Author

f-f commented Mar 28, 2019

@kritzcreek good points. Some ideas and questions:

  • downloading all the packages is not that crazy of an idea once we have global caching (tracked in Local Global Cache of Packages #133). I'm more worried about parsing time for all source files (at least with the current parser)
    Moreover, building the metadata on package-sets CI is fine, but since you can override/add packages in your project we'd still need to do it locally too
  • if the compiler is package-aware, do we still need this metadata? Because we would know which package to refer to from the parse, and we wouldn't need to get this information from elsewhere

@kritzcreek
Copy link
Member

Moreover, building the metadata on package-sets CI is fine, but since you can override/add packages in your project we'd still need to do it locally too

Your override/addition would need to change the package -> module mapping accordingly, or spago would need to make sure all additions and overrides are present and regenerate the mapping information for those locally. I think ideally the metadata generator lives in a different binary than spago though, so we can avoid the large dependency footprint.

in the compiler is package-aware, do we still need this metadata? Because we would know which package to refer to from the parse, and we wouldn't need to get this information from elsewhere

That would only be true if we required that every import statement mentions the package it's importing from. I think it's very unlikely that we can require that as it would break every PS file in existence. That's going to need a fair amount of thought still (it might be possible to require it if we can provide an automatic migration tool)

@f-f
Copy link
Member Author

f-f commented May 1, 2019

I think ideally the metadata generator lives in a different binary than spago though, so we can avoid the large dependency footprint

@kritzcreek I just realized that purs itself could implement this - spago depends on it already so this would avoid the operational complications of shipping another executable.
Would this be a welcome addition to the compiler?

@f-f f-f added the blocked label May 5, 2019
@f-f f-f added this to the 1.0 milestone May 6, 2019
@f-f
Copy link
Member Author

f-f commented May 12, 2019

Update: purescript-nix generates the aforementioned metadata file by implementing a small parser for imports. Since the imports syntax is not going to change dramatically anytime soon, maybe this is a viable approach..

@f-f
Copy link
Member Author

f-f commented May 12, 2019

There were questions on Slack from @reactormonk:

How would auto-completion work with this though?

With the current idea purs ide would suggest you stuff from the packages you already depend on.
To add another package you’d have to add one import from that package and then restart purs ide (this is basically the same as it is right now, where you have to instead add the package to spago.dhall and then restart)

Does the LSP support different relevantness of autocomplete? Otherwise you could complete all of spago 😄

The problem would be that you’d have to include the whole of the package set in the purs ide call, which would mean downloading the whole package set (which as I said above is not a crazy idea, but it starts to be borderline)

Could be preprocessed

If spago (or the compiler) would generate a metadata file from the package set with the mapping “module → package” (and/or “definition → modules”) then we’d know which dependencies we have from the files.
However, in purs ide we’d need to figure this out “dynamically” (because you might want names from a new dependency), and I think this would require passing this metadata file to purs ide? (so it could figure out things about packages that are not passed as globs, so packages that it didn't parse)
An alternative would be to have purs ide preprocess all of the package set, which would basically be equivalent to passing the whole package set as globs to purs ide though? (cc @kritzcreek)

@hdgarrood
Copy link
Contributor

@jmackie has proposed a new purs subcommand which would parse and analyse the modules you feed it, and tell you about how they depend on each other: purescript/purescript#3635. If I've understood correctly, adding this subcommand would mean that spago wouldn't have any need to depend on the compiler as a library to be able to do this properly.

@f-f
Copy link
Member Author

f-f commented May 27, 2019

@hdgarrood I would say so, yes!

E.g. this is what you get for the Main module of the sample project generated by spago init:

  "Main": {
    "path": "src/Main.purs",
    "depends": [
      "Prelude",
      "Control.Applicative",
      "Control.Apply",
      "Data.Functor",
      "Data.Function",
      "Control.Category",
      "Control.Semigroupoid",
      "Data.Boolean",
      "Data.Ord",
      "Data.Eq",
      "Data.HeytingAlgebra",
      "Data.Symbol",
      "Data.Unit",
      "Data.Show",
      "Record.Unsafe",
      "Type.Data.RowList",
      "Type.Data.Row",
      "Data.Void",
      "Data.Ord.Unsafe",
      "Data.Ordering",
      "Data.Semigroup",
      "Data.Ring",
      "Data.Semiring",
      "Control.Bind",
      "Control.Monad",
      "Data.BooleanAlgebra",
      "Data.Bounded",
      "Data.CommutativeRing",
      "Data.DivisionRing",
      "Data.EuclideanRing",
      "Data.Field",
      "Data.Monoid",
      "Data.NaturalTransformation",
      "Effect",
      "Effect.Console"
    ]
  }

Then to get the package names one has to lookup for all the other names in the purs graph result, and do the match "path → glob → package name" (which should be pretty straightforward)

However I'm still about concerned about the UX implications of all of this (see my previous comment), but maybe it will be fine (e.g. it would work so that to add a dependency one would have to import a module from that package and restart purs ide)

@joneshf
Copy link
Member

joneshf commented Jun 22, 2019

An alternate to removing dependencies and parsing .purs files could be to add more information to the manifests, keep the dependencies, but use them strictly. I.e. if you only list console as a dependency all you have access to are Effect.Console and Effect.Class.Console. The extra information would be listing the modules each package exposes and where they are located on disk. Given those two changes, the information about which modules/files are necessary for spago build and friends could be computed extraordinarily quickly. Generating the list of modules should be something that could happen as a migration once and going forward make a requirement for future packages.

Some benefits to this approach:

  • There's no need to wait on (or depend on) upstream releasing an import parser. It's not clear when that's going to happen.
  • It's statically known where every dependency module lives. There's no need to glob anything. Generating the list of sources boils down to concatenating lists of file paths.
  • It makes it possible to actually have "internal" modules (don't list them in the module list). Right now, every .purs file that exists in the src directory is always available to be imported and used, even if your package has invariants that the "internal" module not be used in any way.
  • Auditing dependencies becomes loads easier. The current state of affairs makes it much harder than necessary to know which packages are actually being used and which are along for the ride. By explicitly stating dependencies, and having the mapping statically known, auditing dependencies can happen with tooling that only needs to understand the manifests.

I'm not pushing for this approach, but wanted to throw it out there in case it was appealing.

@f-f
Copy link
Member Author

f-f commented Jun 23, 2019

@joneshf thanks for bringing this up! I did consider this kind of approach before (as this is what you do with Cabal too), and I think that:

  • explicitly listing the modules in the package manifests is kind of orthogonal to this issue (which is about removing the need for the user to declare the packages they depend on basically twice: once in the sources and once in the spago.dhall)
  • explicitly listing the modules is not very good UX anyways IMO, as that information is already in the source tree (this implies that I also disagree on the "internal modules" point, I think they should always be exposed and people free to use them)
    I recall your experiment of having Dhall download PureScript sources as Text, and I think the idea is absolutely brilliant (because you can just use Dhall to get checksums, etc) but I wouldn't really want to manually list my source files in the manifest, so the UX there is tricky 😄
  • re waiting for upstream: I'd be fine just with the "graph command" from Add graph command for graphing module dependencies purescript#3635 (which I think is better than depending on a parser), and I'm also fine waiting - after all the current situation is not optimal but it's also not super bad so we can wait for a bit
  • re auditing dependencies: if we manage to get rid of the dependencies list it will also get easier because the dependencies will dynamically come from source so it cannot get out of sync with reality (i.e. you don't risk anymore leaving in a dependency you don't use)

@joneshf
Copy link
Member

joneshf commented Jun 23, 2019

No worries. As mentioned, I'm not trying to push these changes on you.

  • removing the need for the user to declare the packages they depend on basically twice: once in the sources and once in the spago.dhall

Oh, that's the two places? I thought we were talking about packages.dhall and spago.dhall. Sorry for misunderstanding.

  • re auditing dependencies: if we manage to get rid of the dependencies list it will also get easier because the dependencies will dynamically come from source so it cannot get out of sync with reality (i.e. you don't risk anymore leaving in a dependency you don't use)

Doesn't this get harder? You have to parse every module in the source tree, then download and parse every package in the package set to find which package contains the modules, then you can audit the dependencies.

@f-f
Copy link
Member Author

f-f commented Jun 24, 2019

Doesn't this get harder? You have to parse every module in the source tree, then download and parse every package in the package set to find which package contains the modules, then you can audit the dependencies.

Oh that's right, that would add an additional step because you still need to compute somewhere the map module name → package name for all the packages in the set. We could ship a cache with the package set (we already ship some of this stuff to make the global cache work) to skip most of the work, but the implementation should still be there because people can have overriden and/or local packages.

@Benjmhart
Copy link
Contributor

after looking at this issue, it appears to be blocked waiting for the graph command, subsequently #415 is blocked as well

@f-f
Copy link
Member Author

f-f commented Apr 7, 2021

Now that we shipped a build-time check for "packages you should have in your project" (in #730), and given that we'll soon change the config format to be able to support version ranges, I don't think this is viable anymore - I'll close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants