-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble with categorical(v)
when v
is a SentinelArrays.ChainedVector
#361
Comments
@quinnj and @nalimilan - this one is a tough issue. We need to decide which package should depend on which. |
Hmmm, bummer. Yeah, I'd rather not have to have either depend on each other; I wonder if we can play with the method signatures to avoid the ambiguities here. |
The problem is that on a conceptual level:
So there is no way to avoid ambiguity I feel. If we have to I would rather make CategoricalArrays.jl take SentinelArrays.jl as a dependency. Let us wait to hear what @nalimilan thinks. |
Yeah I am having issues with: transform!(df, [:col1, :col2, :col3, :col4] .=> categorical; renamecols=false) If I import a |
I'd rather not yet another dependency to CategoricalArrays that isn't strictly needed to use it (we already have JSON, JSON3, StructTypes and RecipesBase). Also, the same issue will affect PooledArrays once we implement the same optimized It looks like this is a problem that should be fixed in Julia. In the past, some functions have been implemented in a way that avoids ambiguity issues. There might be a way to define internal methods so that priority is given to the destination over the source type (or the reverse). Ideally IIRC it's been discussed to allow each package to specify a number giving the priority level of the method, or to say that calling any method defined for |
Yes - it should be hopefully fixed in Julia in the long term. However, we need to decide on something before that happens (and in practice it will not happen fast I am afraid). I will ask on Slack in #internal about the opinion what to do. Maybe there will be some reasonable suggestion. |
Given the discussion on Slack I do not think this is going to be resolved soon. @nalimilan - would you agree to take SentinelArrays.jl as a dependency then? (I do not like this solution, and I hope it is temporary, but I do not see any other work-around for the time being) |
I guess one solution would be to use Requires.jl for this, and also make JSON, JSON3, StructTypes and RecipesBase optional dependencies too. |
It seems as a best solution for now. One question regarding Requires.jl - does package loading sequence matter here or not? |
Our experience in MLJModels using using Requires.jl for optional model-code loading was painful and ultimately abandoned. We had issues with latency issues (packages pre-compiling twice) and we suspected it responsible for issues we were never able to resolve before abandoning its use (eg, JuliaAI/MLJ.jl#321). I think Requires is a very clever piece of software, but is essentially a hack; and I have heard both its author and Jeff describe it so. While it's likely the authors of CAs are better familiar with the correct use of Requires.jl than I, my suggestion is that they think two or three times before choosing that route. Personally, I'd rather live with the hard-wired dependencies until Julia comes up with a better solution to this kind of thing. |
I can't speak for pre-compiling twice (that sounds bad...), but Requires got a bit less hacky when all "registration" moved into the module using MLJModels.XGBoost_ is XGBoost_ loaded conditionally? It may be an order-of-events issue. There might be something fixable about this, but it would probably take some investigation. Requires hasn't changed its design much since Julia 1.0 came out, but it has gotten a lot more lightweight than it was back in 2019. I think it contributed ~0.5s to loading time back then, and we've gotten it down to about 1/10th of that. The latest was JuliaPackaging/Requires.jl#101, but JuliaLang/julia#37574 had already gotten it down considerably by that point. |
@timholy - thank you for commenting on this. Also probably you are the best to answer my earlier question:
Thank you! @nalimilan - shall we go for Requires.jl then? (and I assume it should go into CategoricalArrays.jl) |
Yes, perhaps it's worth trying out Requires.jl again; which we could probably use for all those "extra" CategoricalArray dependencies (StructTypes.jl, JSON.jl, etc.). |
It seems like it shouldn't, but... The reason for my thinking it this:
If there are other callbacks, though, perhaps that could mess some stuff up? |
Upon further reflection, it seems possible that the following may not work (not sure, but be on the lookout for such issues): module Inner
# stuff
function __init__()
@require ...
end
end
module Outer
using Inner
# some top-level expression that expects the `@require` block of `Inner` to have already run
end followed by julia> using Outer All would be fine if you typed |
Thank you for thinking about it. Fortunately with CategoricalArrays.jl and SentinelArrays.jl we shall not have this problem. |
OK, so let's try Requires.jl then. If it creates problems it's easy to move to standard dependencies. |
In JuliaData/DataFrames.jl#2883 the problem was reported again. |
The text was updated successfully, but these errors were encountered: