Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

back-porting fast cmdline re-organization from drop-dropseq into master #64

Open
wants to merge 62 commits into
base: master
Choose a base branch
from

Conversation

marvin-jens
Copy link
Member

The 'drop-dropseq' development branch and 'master' have moved very far apart. This makes it difficult and error-prone to merge other developments against master (e.g. Daniels spatial work) into both branches and keep them in sync.

Predominantly, issues are caused by a set of commits that I had made to speed up the commandline interface. On current master, the commandline interface imports all sorts of things right away (spacemake.preprocess, spacemake.spatial) and each of these modules has heavy import dependencies (scipy, scanpy, novosparc, cv2, ...).

My changes consist of

  1. consolidation of (almost) all cmdline parser construction a la get_...._parser() into a single, new spacemake/cmdline.py
  2. move the global imports of big packages (scipy, scanpy,...) into the scopes where they are really needed
  3. remove the unnecessary imports of sub-modules from spatial/__init__.py and preprocess/__init__.py
  4. the original smk.py is reduced to only the Spacemake object and related stuff

To get this into master, I also had to pull along changes to config.py and project_df.py. Mostly, this is moving the parser-code into cmdline.py (see point (1) above), but they have also evolved to support more functionality that is in drop-dropseq (adapter-flavors having the biggest footprint).

The current pull-request therefore provides two desireable advantages and some neutral changes. Desirable: massive speed-up of spacemake commandline interface, bringing master and drop-dropseq much closer together in the code. Neutral: support for adapter-flavors and a few more features is in the cmdline interface and config.py/project_df.py but the functionality is not there (requires more merges from drop-dropseq).

The pull request does not change the directory structure and still uses dropseq tools. The code that does the fastq-to-uBAM conversion is unchanged. The spatial stuff should be unchanged, except for moving of (most) parser code into spacemake/cmdline.py and changing some import statements as required by (2).

'spacemake/unittest.py` runs through without errors. But there are no real tests there yet for meshing, H&E integration, tile assignments etc.

It would be lovely if you could test this branch with your current data (@nukappa @danilexn) and let me know where it breaks.

I would also be happy to cut back on the neutral stuff (at least hide it from cmdline) that is non-functional anyway.

Best regards,
-Marvin

@marvin-jens
Copy link
Member Author

oh yeah, I also added some new py.test code. Ultimately I want this to replace the somewhat clunky unittest.py. Coverage is much better with more test cases in drop-dropseq, of course. Try it out with pytest --cov=spacemake tests/
In connection with that, setup.cfg now supplies some configuration for py.test and pulls version information from spacemake.contrib as @nukappa and I had discussed.

@danilexn danilexn mentioned this pull request Jan 15, 2024
marvin-jens and others added 17 commits January 15, 2024 11:39
…dline and it completes the trusty old unittest.py. However, more testing of spatial and puck code is required
…ad route now to spacemake/contrib.py __version__ as also used in setup.cfg
…d but no samples, pre-configured with both config.yaml and sample project_df.csv)
@marvin-jens
Copy link
Member Author

Pulled all recent changes to master for fixing a number of open issues and greatly expanded the tests. This branch should now feature-wise be a direct drop-in replacement for master. But that needs extensive testing on real data. It would be great if you, @nukappa and @danilexn, could give it a spin and let me know about the breakage you encounter. Thanks! :)

@danilexn
Copy link
Member

Sure, I run the tests and will let you know.

Also, I write additional unit tests (e.g., so we don't have issues like #89 anymore). Anyway, I will not implement anything new feature-wise.

@danilexn
Copy link
Member

@marvin-jens can you please merge the current master? I will add some tests for merge and others, but there are some conflicts - thanks so much 😄

@danilexn
Copy link
Member

@nukappa conflicts have been solved here, I will run tests on Open-ST and tiny test data, can you please run some independently? Thanks :)

@nukappa
Copy link
Member

nukappa commented Apr 22, 2024

Yes will do! Thank you!

@danilexn
Copy link
Member

There's something wrong with the create_spatial_barcode_file rule -- wrongly included in the DGE of the get_stats_prealigned_barcodes target rule. Please let's not implement anything else here until this is fixed!

@danilexn
Copy link
Member

Most recent commit cf5cb81 seems to fix this.

There's another issue, though: for very large samples, n_intersect_sequences uses too much memory - while we wait for the BCtree strategy to be implemented, we can load and match against the query by chunks (e.g., 100M reads).

@danilexn
Copy link
Member

danilexn commented Apr 23, 2024

When we merge #103 here (as soon as tested), we should be able to merge this into master.

There are a couple of optimizations and bugfixes that I implemented there (I can port them here, but since the BamTagHistogram functionality seems to work great, I would do both directly)

@danilexn danilexn added the enhancement New feature or request label Apr 24, 2024
Copy link
Member Author

@marvin-jens marvin-jens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Daniel! If we pass all tests I see no reason not to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants