Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the filter sizes #51

Merged
merged 5 commits into from
Oct 8, 2022
Merged

Fixing the filter sizes #51

merged 5 commits into from
Oct 8, 2022

Conversation

redst4r
Copy link
Contributor

@redst4r redst4r commented Oct 2, 2022

Hi,
just came across two bugs in the _running_mean() function:

  1. the size of the filter (pyramid) is off by one (len(pyramid) == window_size-1. Not a big deal, but slightly inconsistent.

  2. as mentioned by Shape of the output of cnv.tl.infercnv #37, due to the way np.convolve() works, if the filtersize is bigger than the number of genes, it flips filter and signal, and the result doesn't mean much

  3. is an easy fix

  4. Two options: either skip the chromosome entirely if there's not enough genes. Or, make the filter smaller, essentially the chromosome yields a single convolution result (all genes on the chromosome go into the convolution). I opted for 2), especially for larger filter sizes, you'd loose alot of data when doing 1)

Thanks for putting together this nice package!

Closes #37

@grst
Copy link
Member

grst commented Oct 3, 2022

Thanks for fixing this!

Do you urgently need this in a released version? Ideally, I would wait a couple of weeks until I fully ported infercnvpy to the cookiecutter-scverse template, which should also fix the CI.

@redst4r
Copy link
Contributor Author

redst4r commented Oct 3, 2022

Hi,
no need to put this into a release soon, I'll just merge it on my local repository for the time being!

Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the new template is merged in and the CI works again.

It seems that a variable is not defined. Can you please fix that?

r = np.arange(1, n + 1)

pyramid = np.minimum(r, r[::-1])
smoothed_x = np.apply_along_axis(lambda row: np.convolve(row, pyramid, mode=conv_mode), axis=1, arr=x) / np.sum(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems the conv_mode is not defined here. Can you add that as a parameter to the infercnvpy function and pass it on to the _running_mean? Or just leave it hardcoded to same - either is fine for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, my mistake, I was playing around with the convolution mode in another branch and it leaked into here...

PS: is there a specific reason to have the convolution with mode="same"?
Seems to create weird edge effects on the end of chromosomes. The filter keeps sliding past the end of the chromosome, so the last datapoints at the start/end of the chromosome are convoluted with a "ramp" (half of the pyramid filter). Also, that makes the effective filter size vary along the chromosome (if window_size=100 genes) you'll be smoothing over 100 genes most of the time, but at start and end of the chromosome, you'll just be smoothing 50 genes instead.
I've been using `mode="valid", that at least treats each position the same way.

I'm not sure what's more appropriate, maybe mode=valid makes detecting CNVs at the edges harder...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remember if there was a specific reason. Maybe I tried being consistent with the R version, but not sure if that's what they are doing.

Personally, I don't think it matters that much, as it's just a very small region, and single-cell based CNV calling is anyway qualitative at best.

@grst grst enabled auto-merge (squash) October 8, 2022 16:57
@codecov-commenter
Copy link

codecov-commenter commented Oct 8, 2022

Codecov Report

Merging #51 (5b4a734) into main (978fdfd) will decrease coverage by 0.95%.
The diff coverage is 0.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #51      +/-   ##
==========================================
- Coverage   66.01%   65.06%   -0.96%     
==========================================
  Files          13       13              
  Lines         409      415       +6     
==========================================
  Hits          270      270              
- Misses        139      145       +6     
Flag Coverage Δ
unittests 65.06% <0.00%> (-0.96%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/infercnvpy/tl/_infercnv.py 51.02% <0.00%> (-3.33%) ⬇️

@grst grst merged commit f6870ec into icbi-lab:main Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shape of the output of cnv.tl.infercnv
3 participants