-
Notifications
You must be signed in to change notification settings - Fork 63
Migrating to modular software organization #425
Comments
|
|
|
|
|
|
@ChrisRackauckas Mar 18 17:49
|
@Ward9250 Mar 18 17:51
|
@kescobo Mar 18 17:53
|
@ChrisRackauckas Mar 18 17:53
|
@Ward9250 Mar 18 17:53
|
@ChrisRackauckas Mar 18 17:53
|
@kescobo Mar 18 18:03
|
@ChrisRackauckas Mar 18 18:04
|
@Ward9250 Mar 18 19:32
|
@mkborregaard Mar 18 20:07
|
@Ward9250 Mar 19 02:08
|
@jgreener64 Mar 19 16:39
|
@bicycle1885 11:08
|
Thank you, @Ward9250. I think "Bio" is enough informative as the prefix of package names. I do not type so long names in many places. So, BiologicalSymbols.jl should be BioSymbols.jl in my opinion. |
Proposal for BioJulia Ecosystem Organisation GuidelinesDefinitions
Organisation of existing Bio.jl code.
Organisation of new contributions.
Once agreement on guidelines is finalised, I will upload them to our contribution policy site |
We also need to keep Pkg3 in mind, which is planned to supersede the current packaging system. But I have no idea about the effects of Pkg3. |
It looks like Pkg3 will be a help rather than a harm: registries, environments and so on all look like they will help us when it comes to dependencies and versions. |
I think the proposal is on the right track.
|
Looks good, thanks @Ward9250! Just because it's the thing I've been working on, I have a clarification on your definition. The In principle a minhash sketch is extensible, but I'm only interested in developing the package for biological uses. What's the policy there? |
@jgreener64 Thanks! I suggest having this proposal open for a week for discussion, after which agreed on amendments will be made, and the amended proposal will then be on display for another day, before being added to our policy and made official. I will mark suggestions and amendments made on this thread with a 🎉 when they have been added to the proposal above. |
@kescobo I'm soon planning on writing a bunch of high-level k-mer counting stuff in Julia. Are you interested in pulling out the minhash-based "counter" into a package for kmer analysis? Thouhts all? Would that be too modular? |
@kdmurray91 Not sure. Frankly, I'm not 100% clear on where it's dependencies will lie once the split occurs. Based on this question I asked and this response from @Ward9250, it seems like my current implementation would be part of Happy to discuss further and help assuming the rest of the org agrees. Might make sense to break out into a separate issue? |
@kescobo, @kdmurray91 Kmers currently have an implementation in BioSequence, so really to know about organisation we should have a more solid idea of what is currently in BioSequence, what is missing, what will be added, and whether or not this requires a separate package. |
@kescobo I'm not 100% sure either. I was intending for such a package to contain a bunch of high-level kmer counting routines, using CountMin Sketches, Minhash sketches, bloom filters, dicts and dense arrays (like My motivation is that I want to make a package that contains the above, as I feel that all of the above would be a lot to include in |
I agree with @kdmurray91 to separate high-level kmer functions into a separated package if they don't depend on specific data structures of sequences. So, keeping the kmer generator in BioSequences.jl would be sensible. |
@kdmurray91 Kmers in BioSequence currently only inherit from the abstract type In fact it might be good to make that the pilot to try the less centralised maintainership of packages previously discussed (reminder quoted below) (which I am currently writing for the contribution and community policy site). @kdmurray91 If you are up for that, you would be listed as the designated maintainer of that package and added to the Maintainer's Team.
|
@kdmurray91 that makes sense to me - let's do it! |
@Ward9250 I'm up for that. I'm not 100% sure that the low level kmer code needs to be in the new package, but lets go with what you said above. Regarding maintainership, I'm more than happy to, with a couple of caveats: I'm not that experienced with optimising Julia code for performance, and would love to have code I write reviewed by someone a bit more experienced. I'd also like to advocate the "lowNMU" concept from Debian, which in essence gives other maintainers of BioJulia packages freedom to step in as maintainer if they deem in required, e.g. if I have a PhD-induced period of absence. This is a little contrary to the statement that:
@kescobo I think it would make sense to wait till after |
@kdmurray91 My instinct is to push against excessive splitting - seems like we can try to build the data structures as generically as possible, and then split them off at a later date if there's a compelling reason to. That said, I don't feel strongly about it. Happy to start working on that in the near-ish future, though can't devote more than a few hours a week until the summer when I'll (likely) be starting another postdoc. At this stage in our community, I think a low threshold for stepping in makes a lot of sense, though probably not worth formalizing. Each package could in principle have their own level of maintainer-control, though we should think hard about setting default values. I think @Ward9250 's suggestion makes some sense as a baseline, though probably only if there are at least two maintainers for any given package, in case one goes incommunicado. |
I think @kdmurray91's suggestion is a good one. So I propose changing
To the following, something like this will go up on the Contributing docs: Packages have at least one "Dedicated Maintainer" who has admin access to the package. All maintainers in the Maintainers team have push access to all code packages in the BioJulia ecosystem. |
@Ward9250 Perfect! @kescobo I think that sounds like a good start, like you say we can split it off later if it seems useful to others (which I assume it would be). I have CountMin.jl that will be rolled into this package, and your minhash sketch will go in. There are some issue with BloomFilters.jl, so I was thinking that we could fork that code. I'll continue this train of though in an issue @ kmers.jl. |
@Ward9250 I think it was too early to tag and release BioCore.jl v1.0. We no longer use Ragel so the BioCore.Ragel module should have been renamed and moved to BioCore.ReaderHelper or somewhere. Also, BioCore.StringFields may be no longer needed under the current design of text parsers. We really need to be careful when releasing a version 1.0. |
@bicycle1885 We can do that in a fixup or minor release. I don't think it's that radical that version 1 matches the contents of Bio.jl. I'm doing a release of BioCore and a release (even if it's not 1.0) of BioSequences this week as some drastic circumstances have come up at work, and there are things which are going to require them. Long story short, I'm being offered a position which means timelines for my current job have become a nightmare. EDIT:
This doesn't mean I'm going anywhere or that my work with BioJulia is stopping, but my PI and academic focuses will be different. |
Just to update people on progress. BioSequences.jl has been released, providing the features of Bio.Seq in our decentralised software packages. @bicycle1885 is currently working on GenomicIntervals.jl and I'm going to start work on GeneticVariation.jl this week. |
Let me clarify my feelings:
|
For me the gradual release was necessary to push through based on my non-biojulia work commitments, it seemed to be the most manageable - most contributors we've had are volunteers and the gradual decomposition is easier to find time for. I think all of those points are right and we should go with them. |
Is there an ETA on 0.6 btw? It seems I see everywhere it's "soon" but I thought it would be released by now - not that I'm complaining, the developers do a fantastic job. |
No ETA as far as I know. I'm saying "soon" for three months but it is not yet. The latest release is Julia 0.6-rc3 and they occasionally release three or four release candidates before final. So, I believe this "soon" is really soon. |
How close to completion is the decomposition? It seems most of the stuff is out in separate packages. Now that v0.6 is out and about it would be good to get it finalised - let me know if there is anything I can do. |
Hey @jgreener64 we are very close now. There's been a lull recently as I've had to go through job interview for the assembly and algorithms development team here in Norwich which I'm glad to say I got so anything cool we make will end up in biojulia in one form or another as I will still be allowed to do BioJulia stuff. Now I am working solidly on coding again now that prep stuff has gone away, I can get back to finishing this process. |
Great, thanks for the update @Ward9250 and well done. |
It has been requested by the community that the organisation of software in BioJulia should be made more modular than it currently is. Both for the benefit of both users and developers. We are reaching a point in Bio.jl where Travis does not complete, and anyone wanting one module of Bio.jl has to install and compile/pre-compile the rest of Bio.jl.
Below I migrate all conversations over from Gitter, and I outline a draft set of guidelines for maintainers and contributors about the organisation of packages.
The text was updated successfully, but these errors were encountered: