-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache run exports are propagating to outputs #977
Comments
I should note that I'm testing with the latest released version 0.18.1, not the trunk. I checked the current diff with main and didn't see anything relevant, but wanted to mention it in case I was missing something. |
Hi @vyasr - thanks for filing this bug. The intended behavior is the one mentioned in the docs, so it does seem like you found a bug! I would also be curious if you have thoughts in general on this? The previous multi-output / cache thing from conda-build didn't really specify any of this as far as I can tell. I'll try to get to this bug this week! |
The documented behavior is definitely what I would expect. You are right that the conda-build implementation of multiple outputs didn't specify any of this cleanly, and also has a lot of sharp edges that we've hit in different scenarios, so I'm happy to see this being redesigned from the ground up in rattler-build and in the new spec (I see that at some point it was being discussed but was removed from CEP 14 for now). In general for a multi-output recipe there's no way for rattler-build to know what cache dependencies to propagate to which outputs, so it needs extra information from somewhere. The ability to specify ignoring run exports at both the cache and output level seems like a pretty good way to handle this. Another option would be to introduce a new syntax for just the cache to indicate which exports should propagate to which outputs, but I don't really like that since it introduces new and more complex syntax with little benefit. The more extreme option that I could see being viable is for run exports to not propagate out of the cache at all. This would require more work on the part of recipe maintainers to specify everything on a per-output basis. I expect that for the majority of recipes that have a small number of outputs this would be much more painful, but for the small set of recipes that have many outputs it could be better. It would typically result in much more minimal dependency trees though since it reduces the risk of accidental dependency propagation. That would be one reason to consider it. |
One idea I had was to trace if an output depends (exactly?) on another output that already had the run exports applied, and if that's the case not re-depend on the same run exports. But it's just a small optimization that mainly decreases the size of the repodata and probably shouldn't really affect the solver speed etc. Curious about your thoughts :) |
tl;dr that seems like a good idea and I could get on board with that. On one hand, that seems like a bit of code smell to me. It has the same flavor of failing to "include what you use" in compiled languages, or in Python relying on a transitive package dependency to express a direct dependency. If package foo depends on both bar and baz, and bar depends on baz, then foo doesn't have to specify baz in the package metadata to have it be present in any environment that it installs. The risk is that bar stops needing baz, and all of a sudden foo breaks because an expected dependency is missing. On the other hand, in this case the dependencies are much more tightly coupled. Since it is between outputs in a given recipe, if the dependency in one output was removed then the dependents of that output would immediately see that change and add them, so there's no risk of getting out of sync. Therefore I think it's probably fine to do this. The case where it could improve the solver speed is if there are outputs that are not directly dependent on each other, but that's probably pretty rare. |
You could try again 0.19.0 which just landed on |
Was on holiday the past couple of weeks. Just verified that this works now, thanks! |
Awesome! |
I'm not sure if this is a bug or just me misreading the documentation. The documentation on the experimental cache feature states that
Since the cache does not itself produce any outputs, I assumed that the above statement meant that ignoring run exports in the cache would propagate to any outputs. However, that does not appear to be the case. Given the recipe below:
I observe the following:
If I include the commented out ignore run export in the package output section, then the numpy dependency is removed as I intended. Is that supposed to be necessary? If so, the documentation on the cache should be updated. However, I'm not sure what purpose ignoring run exports in the cache would serve if those run exports are propagated to the outputs.
The text was updated successfully, but these errors were encountered: