-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Package scripts in a Conda repository #2
Comments
hi @bebatut I have had this request before, but it's a bit weird of a packaging situation. The main contribution in this repo is (1) scripts, which can be conda packaged easily, and (2) a set of metadata files. It's a bit awkward to package metadata files in a conda repo, because you'll have to access them e.g. To me, it's simpler to have it all in a repo, where it's obvious where your metadata files are located. Let mek now what you think. |
I agree, the metadata files are large and should not be shipped with a package. There are a few possibilities other packages with a similar problem are doing:
Do you think anyone of those will work? |
These sound reasonable, but I'm trying to understand the motivation for this. Why is not cloning the repo sufficient, or something like |
@bluenote-1577 sorry to chime in, but it's a good question and a colleague of mine actually already started trying to make a conda recipe before seeing this issue as we are also looking to make a wrapper (but for nextflow). @bgruening is by far the expert here when it comes to conda, but from my experience forcing data to come with software in a conda recipe can be really annoying and a blocker for some people (even if it is small!). When it comes to (large) data people often want to place this in a cache location to allow re-use across different contexts, and/or allows you to delete a conda environment and start again if something breaks. Also having to include the data within the conda package means this is much larger and bloats conda's own cache on a persons machine, and can sometimes for people with slow internet connection can take much longer (where e.g. one person could download the data to a cache, and then everyone else can quickly install the software in their own environments as required, and not have to download the data every single time as they can share) Another benefit of allowing placing data outside a conda environment itself is also for developers, as during CI testing they can download the data once and then for each test create an environment of just the software (which is fast if it doesn't come with data), thus speeding up the tests. In that regard, if I would have a say in the matter, I would vote for option two or three of @bgruening 's suggestions :) |
In this case, as your metadata files are small, the ENV variable would be easiest, then you could include the metadata files during the clone, and maybe pre-set the variable within the conda environment to point to wherever the files are in the conda environment |
@jfy133 Thanks for helping out. I'm not an expert that conda, so from trying to understand your comments, what seems to be a good approach is to
I know that @bgruening suggested not packing in the metadata files into the conda package, but they are relatively small ( ~50MB for all, I think)... let me know what you think. |
I don't think 50MB is not small. Those packages will end up in all versions of the package, for all different architectures in all containers etc. |
@bgruening Fair enough. Here is my plan for the future, that I will get around to in the next two months:
e.g.
I will probably create a new repository called |
@jfy133 @bgruening just an update on this I had some time and managed to get a working version. My new repository is https://github.com/bluenote-1577/sylph-tax. I used ~/.config/sylph-tax/config.json to "point" the data to the downloaded location. The same metadata files can still be manually used. If you have any issues, let me know. My conda pull request is bioconda/bioconda-recipes#52683. Il'l update documentation after it gets approved. |
Super cool! I will have a more detailed look tomorrow. Do you see any way to support the database path via CLI argument? Or how are users supposed to create the config.json. Please note that is should be possible for workflows to influence this, without manual intervention. And to make things more complicated there are HPC environments without a HOME dir ... (https://academic.oup.com/gigascience/article/8/5/giz054/5497810) Thanks a lot! |
Hmmm, I wasn't aware that some HPC environments don't have a HOME... Currently:
For the main command,
|
I think this might be another case for having an environment variable as argued about which is what But the second example of manually specifying the TSV would equally work for me. |
yes this is a good idea. I have now implemented the ability to set the config to an ENV var called To summarize:
LMK what you think; specifically, if you still want ability to do |
I think that works for me! With the variable you just need to set it once and your done, I think it unlikely you would need to move the config file around so no need to specify via an explicit argument (given you can just supply the tar directly if you really need). |
@jfy133 bioconda/bioconda-recipes#52746 is done with the environment variable capabilities. I tested and it seems to be working on conda now. |
Wonderful, thank you very much @bluenote-1577 ! @sofstam, you can try it out now for the nf-core module :D, and if it works please report back :) |
Thank you very much @bluenote-1577! |
Hi @bluenote-1577, I have started creating the nf-core module for |
@sofstam BTW, would it be possible to redirect this issue to https://github.com/bluenote-1577/sylph-tax? That is the new repo. I just added a version flag to sylph-tax, so you can run I updated to v1.1.1 for |
Hi there!
I’ve been using the scripts in this repository, and they’ve been very helpful. I wanted to suggest packaging the scripts into a Bioconda repository to make them easier to install and manage, especially for users who want a consistent and reproducible environment.
By providing a Conda package, it would be much easier for us to install the scripts and keep them up to date. I'd be happy to help if needed!
Thanks for considering this suggestion!
Bérénice
The text was updated successfully, but these errors were encountered: