Feat: Feature gene annotation #132

grantn5 · 2024-06-12T16:03:07Z

I have written some code that will produce a per gene CNV matrix, giving the mean log2 CNV per gene (as 1 gene can appear in many windows)
I have also updated the np.convolve mode to valid so it matches the documentation in the function. Full methods here

I have also updated the code in the instance where the window is larger than the number of genes on the chromosome, to only run one convolution the length of the genes on the given chr see https://github.com/grantn5/infercnvpy/blob/feature_geneAnnotation/src/infercnvpy/tl/_infercnv.py#L198

I have also updated the genomic_position_from_gtf function so it will always read the gtf in as a pandas df (avoids bug where it reads it in as polars df)

gtf = gtfparse.read_gtf(
        gtf_file, usecols=["seqname", "feature", "start", "end", "gene_id", "gene_name"], result_type="pandas"
    )

Closes #128

for more information, see https://pre-commit.ci

grantn5 · 2024-06-12T16:50:53Z

Hi @grst I am not familiar with this building of docs. It is failing due to a warning because the sphinx can't seem to find scanpy in the build environment, any suggestions?

grst · 2024-06-13T14:17:33Z

It seems the pca function got moved within scanpy. I fixed it in another PR.
Once you resolved the merge conflicts, readthedocs should build again.

I'll then do a proper review.

src/infercnvpy/io/_genepos.py

src/infercnvpy/tl/__init__.py

for more information, see https://pre-commit.ci

grantn5 · 2024-06-14T10:14:17Z

Hi @grst I have pulled your changes and have reverted genomic_position_from_gtf. This is now ready fro review!

grantn5 · 2024-06-18T14:49:52Z

Hi @grst It looks like all tests that use the oligodendroglioma adata are failing?
Has this object been moved in scanpy?

grst · 2024-06-18T18:46:07Z

I think this is rather an incompatibility with numpy 2.0 which was very recently released.
I'll look into it, but the issue might be in one of the dependencies.

grst

Just some minor comments. Thanks for adding test cases :)

Have you compared the runtime before and after adding your code? It looks like it could make the whole workflow significantly slower and memory intense. In that case it would be great if it could be disabled via a function parameter in tl.infercnvpy.

src/infercnvpy/tl/_infercnv.py

grantn5 · 2024-06-19T12:39:02Z

Just some minor comments. Thanks for adding test cases :)

Have you compared the runtime before and after adding your code? It looks like it could make the whole workflow significantly slower and memory intense. In that case it would be great if it could be disabled via a function parameter in tl.infercnvpy.

Hi thanks for the review!

I have tested the time on our in-house data ~50,000 cells, ~20,000 genes and it runs within about 10 minutes parallelized over 32 cpu's (before it took ~2 mins). However due to the nested for loop you have to massively reduce the chunksize ~100 as each sample needs to be looped through the flattened index of genes for each window https://github.com/grantn5/infercnvpy/blob/feature_geneAnnotation/src/infercnvpy/tl/_infercnv.py#L230 which will be very long.

I also found that there is a bug in n_jobs where it isn't automatically parallelized over all available cpus's so I have always been setting it manually.

This is definitely a CPU bound task rather than memory so if you are using a machine with lots of CPUs it will speed up immensely.

If required for the PR to be accepted I can do further refactoring to add a parameter to turn off per gene CNV calling by default. However, this may take some time due to the way the function is handling outputs from intermediate functions.

grst · 2024-06-20T18:49:04Z

Thanks for testing this. Given that it's a 5x runtime increase and scalability is one of the main selling points of this library, I would like to be able to switch this off. For 50k cells it might not be much of an issue, but when working with atlases >1M cells, this makes quite a difference.

for more information, see https://pre-commit.ci

grantn5 · 2024-06-24T13:09:47Z

Hi @grst I have made the calculation of per gene CNV optional, default set to False, I have also added some documentation to the function.

Additionally, I have added some benchmarking to the pytests to show the performance of the added computation.
Hopefully this is now good merge!

grst

LGTM, thanks!

knadia07 · 2024-08-01T06:58:56Z

hi there, i am confused how do we actually get the output for feature gene annotation mapping to back chromosome position, say for cells with very high cnv score?

redst4r and others added 18 commits October 15, 2022 15:27

annotate (in .uns['cnv']['var']) which window contains which genes

1138286

[pre-commit.ci] auto fixes from pre-commit.com hooks

e6f0020

for more information, see https://pre-commit.ci

Merge branch 'main' into feature_geneAnnotation

dc15059

remove black magic comma

0cacaa5

[pre-commit.ci] auto fixes from pre-commit.com hooks

28990a7

for more information, see https://pre-commit.ci

Merge branch 'main' into feature_geneAnnotation

54f0657

first pass at gene convolution annotation

7fecb8b

Merge branch 'main' into feature_geneAnnotation

b6b660f

sorting some conflicts

d8e7933

feature annotation working

c071fa3

adding test to calcualte gene average

e4bed8a

Converting gene matrix into sparse matrix

175bba7

updating gene position to using pandas

c71cf28

linting

4216b11

adding more tests

b8aa050

Delete .vscode directory

611e9e6

adding scanpy to conf.py

45d41be

changing scanpy.tl.pca to sc.tl.pca to get build docs to work

8dd606a

grst reviewed Jun 13, 2024

View reviewed changes

src/infercnvpy/io/_genepos.py Outdated Show resolved Hide resolved

src/infercnvpy/tl/__init__.py Outdated Show resolved Hide resolved

grantn5 and others added 6 commits June 14, 2024 10:16

Saving CNV matrix as sparse matrix

b034b07

Merge branch 'main' into feature_geneAnnotation

7b0709d

adding back infercnv.py

d5737b9

[pre-commit.ci] auto fixes from pre-commit.com hooks

79017bf

for more information, see https://pre-commit.ci

fixing linting error

91abbb7

editing code to appease ruff

a1abd33

grantn5 mentioned this pull request Jun 14, 2024

Per gene copy number signal. #128

Closed

grst reviewed Jun 19, 2024

View reviewed changes

src/infercnvpy/tl/_infercnv.py Outdated Show resolved Hide resolved

src/infercnvpy/tl/_infercnv.py Outdated Show resolved Hide resolved

src/infercnvpy/tl/_infercnv.py Outdated Show resolved Hide resolved

src/infercnvpy/tl/_infercnv.py Outdated Show resolved Hide resolved

Merge branch 'main' into feature_geneAnnotation

38e6504

grst mentioned this pull request Jun 20, 2024

Adding annotation to the convolution windows #58

Closed

grantn5 and others added 5 commits June 24, 2024 09:17

Merge branch 'main' into feature_geneAnnotation

155f520

[pre-commit.ci] auto fixes from pre-commit.com hooks

f9984d0

for more information, see https://pre-commit.ci

Converting gene conv to numpy array

c548087

Making the calculation of per gene values optional

b2c481e

[pre-commit.ci] auto fixes from pre-commit.com hooks

8e821fb

for more information, see https://pre-commit.ci

grst approved these changes Jun 26, 2024

View reviewed changes

grst merged commit 0073690 into icbi-lab:main Jun 26, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Feature gene annotation #132

Feat: Feature gene annotation #132

grantn5 commented Jun 12, 2024 •

edited by grst

Loading

grantn5 commented Jun 12, 2024

grst commented Jun 13, 2024 •

edited

Loading

grantn5 commented Jun 14, 2024

grantn5 commented Jun 18, 2024

grst commented Jun 18, 2024

grst left a comment

grantn5 commented Jun 19, 2024

grst commented Jun 20, 2024

grantn5 commented Jun 24, 2024

grst left a comment

knadia07 commented Aug 1, 2024

Feat: Feature gene annotation #132

Feat: Feature gene annotation #132

Conversation

grantn5 commented Jun 12, 2024 • edited by grst Loading

grantn5 commented Jun 12, 2024

grst commented Jun 13, 2024 • edited Loading

grantn5 commented Jun 14, 2024

grantn5 commented Jun 18, 2024

grst commented Jun 18, 2024

grst left a comment

Choose a reason for hiding this comment

grantn5 commented Jun 19, 2024

grst commented Jun 20, 2024

grantn5 commented Jun 24, 2024

grst left a comment

Choose a reason for hiding this comment

knadia07 commented Aug 1, 2024

grantn5 commented Jun 12, 2024 •

edited by grst

Loading

grst commented Jun 13, 2024 •

edited

Loading