-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disintegration of MLJModels #244
Comments
See also @tlienart 's earlier post at JuliaAI/MLJ.jl#276 |
Great, and just to stress it, we don't want this to be the rule, we want this to be a temporary solution and hope that the packages for which we end up writing glue code will integrate and own said glue code (we understand that some of these are slow moving packages with a lot of legacy and may be slow integrating stuff...) We definitely do not want other package devs with brand new cool packages to suggest a glue package. They should just own the interface like ParallelKMeans, EvoTrees or MLJLinearModels |
FWIW, we now have support for storing multiple registered Julia packages inside subdirectories a single Git repository. So e.g. if I already have a Git repository for my package |
This works exactly the same as if |
that's great! though here I think that having separate repos has the advantage that when eventually the original package (e.g. MultivariateStats) owns the glu code, we can just archive the repo whereas here you'd have to remove a subrepos which might be messier to maintain overall (?) |
So it's not a subrepo or anything fancy. It's just a regular folder. So if you want to move the glue code to the main package, just rename and move the files. |
Does that explanation make sense? Here's an example repo with two packages: https://github.com/DilumAluthge/FooBar There's nothing fancy, just regular folders. If you want to move the |
@DilumAluthge How do release notes work in a Repo hosting multiple packages? |
The idea is that you would make tags that look like this:
In the appropriate tag, you have release notes for the relevant package only. |
You can have any system you want of course. That is just a suggestion. |
I think this is cool but I also think that in this particular case having separate repos is simple & would help see clearly where the code ought to be & the git history, I have a feeling this will cause more headaches especially for new maintainers. Our end goal here is that this code ends up outside MLJModels and is integrated in other packages so I think having subfolders (I understand it's not subrepos) just does not seem to be the right approach here even if it's fully supported by Pkg. |
Yeah, given the long term goal here, I think it makes sense to keep these "glue" packages in separate Git repositories. |
Packages with implementation code to migrate out to separate package:
Those marked with * are candidates for algorithm-providing package to host model implementation code. Others get migrated to new package called "MLJPackageInterface". Important The name = "Birch",
package_name = "ScikitLearn",
load_path = "MLJScikitLearnInterface.Birch" cc @tlienart |
The MLJ.jl README has a very nice visualization of the dependency graph, stored in the repo at https://github.com/alan-turing-institute/MLJ.jl/blob/master/material/MLJ_stack.svg As we break up MLJModels and MLJBase, we should definitely remember to keep this image up-to-date. |
All done 🎉 Thanks to all who helped in this large project, especially @OkonSamuel @tlienart and @ExpandingMan |
Maintenance of MLJModels has become increasingly burdensome for several reasons. Perhaps the biggest problem is that it's centralised approach to providing model API implementations ("glue code") means:
Testing takes a long time. Tolerable, just.
[extras] must include a very large number of packages which invariably cause fatal version conflicts with the packges in [deps] during CI. (As I understand it, during a test, the [deps] are essentially pinned when [extras] are loaded.) Less tolerable.
With the existing package manager, we have no way to specify bounds on the algorithm-providing packages (the ones in [extras]). The latest release compatible with [deps] always get's loaded. If just one (of these many) packages makes a breaking change to the "glue code", then MLJModels CI fails.
While JuliaLang/Pkg.jl#1285 may help with the second and third issue, I don't think that is close to being resolved. We have also observed elsewhere that code loading using Requires.jl can be slower than otherwise. (And there is #243)
While the plan has always been for all algorithm-providing packages to implement their MLJ interfaces natively, this is not going to happen quickly. In the meantime, it would be good to address the issues above.
In discussions of the core team, @tlienart has suggested the following remedy: Move the glue code for each package X into its own repository Xglue, with its own testing, and make X an ordinary dependency of Xglue. (The package Xglue would be purely a "utility" package and essentially invisible to general users.)
Such migrations could be performed incrementally and I believe each migration would trigger only a patch release, basically because the model registry (which tracks where to find glue code) is part of MLJModels (and so is opaque to MLJ).
I think this a good idea and propose beginning this disintegation; See TODO list.
cc @DilumAluthge
The text was updated successfully, but these errors were encountered: