Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QC: Cell size #82

Open
tischi opened this issue May 15, 2020 · 26 comments
Open

QC: Cell size #82

tischi opened this issue May 15, 2020 · 26 comments

Comments

@tischi
Copy link
Collaborator

tischi commented May 15, 2020

Median infected cell sizes per well distributions across plates

image

Cell size distributions for some plates

image

image

image

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

@metavibor @constantinpape
OK, this looks very intriguing!
So intriguing that I almost wonder whether there is something fishy going on somewhere...
Why would we have so reproducibly almost the exact same bimodal distribution?
But maybe this is just the outlier fraction due to dirt in the serum channel?
In the end those are not many cells (beware the log scale of the y-axis).

@metavibor
Copy link
Collaborator

Do we have cell sizes of 100000 pixels? Could you look at those cells? I think the biggest I measured was an order of magnitude smaller. The same is on the other end

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

Could you look at those cells?

Would it help you if I try to find out the image name? Then you could look with the plate viewer.

Here are random locations on one plate that should contain cells that are

larger than 100000

 [1] "plate2rep3_20200507_094942_519 C05-0002"
 [2] "plate2rep3_20200507_094942_519 H10-0004"
 [3] "plate2rep3_20200507_094942_519 D06-0005"
 [4] "plate2rep3_20200507_094942_519 H01-0006"
 [5] "plate2rep3_20200507_094942_519 H09-0004"
 [6] "plate2rep3_20200507_094942_519 B01-0000"
 [7] "plate2rep3_20200507_094942_519 H09-0008"
 [8] "plate2rep3_20200507_094942_519 B12-0000"
 [9] "plate2rep3_20200507_094942_519 H01-0004"
[10] "plate2rep3_20200507_094942_519 E09-0000"

smaller than 100

 [1] "plate2rep3_20200507_094942_519 G04-0000"
 [2] "plate2rep3_20200507_094942_519 G04-0003"
 [3] "plate2rep3_20200507_094942_519 C10-0000"
 [4] "plate2rep3_20200507_094942_519 F09-0005"
 [5] "plate2rep3_20200507_094942_519 H05-0007"
 [6] "plate2rep3_20200507_094942_519 B09-0006"
 [7] "plate2rep3_20200507_094942_519 G03-0005"
 [8] "plate2rep3_20200507_094942_519 G09-0000"
 [9] "plate2rep3_20200507_094942_519 D04-0005"
[10] "plate2rep3_20200507_094942_519 G09-0000"

@constantinpape
Copy link
Contributor

@tischi thanks for checking this.
As we have discussed, I will export 'to small', 'to large' masks now for these plates so we can inspect that visually. No need for image names, once we have masks in the PlateViewer, this should be fast to see.

@constantinpape
Copy link
Contributor

@metavibor @constantinpape
OK, this looks very intriguing!
So intriguing that I almost wonder whether there is something fishy going on somewhere...
Why would we have so reproducibly almost the exact same bimodal distribution?
But maybe this is just the outlier fraction due to dirt in the serum channel?
In the end those are not many cells (beware the log scale of the y-axis).

Ok, very interesting, we need to check up on this ....
I will let you know as soon as I exported the masks.

One thing to keep in mind is that @imagirom's code to compute these sizes is a bit non-standard.
I have checked for a few examples that it works, but maybe there are corner cases where it fails.

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

yes, but still then @metavibor would know in which wells to find some :-)

Regarding the initial thresholds for this, based on looking at the distributions I would say

100 and 25000

...would be sensible?

I will later try to fit something to the distributions (maybe 4 gaussians) to see what that gives...

@constantinpape
Copy link
Contributor

100 and 25000

...would be sensible?

25.000 is huge....
We should really check how these cells look like.

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

Ok, then let's wait until @metavibor looked at some? (before we re-run...)

@constantinpape
Copy link
Contributor

I wrote cell_size_mask now to all plates in /g/kreshuk/data/covid/data-processed.
This image has 2 colors, one for small and one for small (< 100 pix) and one for large (> 25000 pix) segments.
We can check it out after the meeting.

@constantinpape
Copy link
Contributor

Fyi, I double checked the size calculation, and it's correct.

@constantinpape
Copy link
Contributor

I have computed the size masks now.
I will double check that it worked now.

@constantinpape
Copy link
Contributor

I checked this now and this makes total sense:
all the large cells are segmentation errors caused by some image artifact.
This is usually a local very bright spot in one of the channels; unfortunately this currently ruins segmentation for the whole image, because the network image normalizations are not robust to this. Eventually, we can fix this by using a more robust normalization procedure.
For now, let's take maybe 15000 as size threshold and just kick these out.

Here are some examples:
Screenshot from 2020-05-15 15-52-04
Screenshot from 2020-05-15 15-53-29
Screenshot from 2020-05-15 15-54-25

@metavibor
Copy link
Collaborator

I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal
large cell

@constantinpape
Copy link
Contributor

I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal

Yes that looks normal. Maybe the site is wrong.
Anyway, I am sure that this is caused by the segmentation errors due to imaging artifacts.

@metavibor
Copy link
Collaborator

@constantinpape you said you included another "layer" that can be looked via PlateViewer, what is it? is it "cell_size_mask"? what is that supposed to show? nothing happens when I enable that in this image

@constantinpape
Copy link
Contributor

constantinpape commented May 15, 2020

is it "cell_size_mask"?

yes exactly that's it.

what is that supposed to show? nothing happens when I enable that in this image

It shows a mask for the cells that are larger than 25.000 pixels or smaller than a 100. If you don't see anything, then there is no cell larger than this in the image. A good way to use this is to zoom out a lot and just look for wells where you can see the mask:

E.g on plate 311:

Zoomed out:
Screenshot from 2020-05-15 16-39-03

Zoomed in on Well D08, which has artfiacts in the serum channel that screw up the segmentation:
Screenshot from 2020-05-15 16-40-11

@metavibor
Copy link
Collaborator

this is exactly what I did and found nothing on the plate 519 reported by @tischi ... I looked at 311 in D06 and realized these are all images that are flagged in quality control by Severina. The question is why are they showing up in the cell statistics, why any computation is done on these?

@constantinpape
Copy link
Contributor

this is exactly what I did and found nothing on the plate 519 reported by @tischi ...

I see. Maybe there is indeed an issue with Tischi's histograms.

The question is why are they showing up in the cell statistics, why any computation is done on these?

We still compute all the statistics even for the images that were marked as outliers.
Then, we don't take the outliers into account when computing the scores later.

(The reason for this is that we need to combine the manual and automatically detected outliers at some point; and we need the stats for the automatic checks, so it's easier to calculate all statistics first and then filter for outliers later.)

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

I'm looking at one of the sites (H10-004) proposed by @tischi where there is supposed to be a large cell but I don't see anything abnormal

This is strange indeed. Not sure. I can check the R code again...

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

@metavibor

Could you look in those wells? [EDIT: don't do it, see below]

 [1] "plate2rep3_20200507_094942_519 E01-0000" "plate2rep3_20200507_094942_519 F12-0000"
 [3] "plate2rep3_20200507_094942_519 A06-0001" "plate2rep3_20200507_094942_519 D07-0006"
 [5] "plate2rep3_20200507_094942_519 C01-0008" "plate2rep3_20200507_094942_519 H05-0000"
 [7] "plate2rep3_20200507_094942_519 D05-0006" "plate2rep3_20200507_094942_519 A06-0000"
 [9] "plate2rep3_20200507_094942_519 A06-0001" "plate2rep3_20200507_094942_519 B01-0006"

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

@constantinpape @imagirom @metavibor
....you guys are storing the background as a cell with label_id = 0, right?! 🍭

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

That was it:
image

@constantinpape
Copy link
Contributor

@constantinpape @imagirom @metavibor
....you guys are storing the background as a cell with label_id =0, right?! lollipop

yes indeed

@tischi
Copy link
Collaborator Author

tischi commented May 15, 2020

yes indeed

Those were our mysterious large cells.

@metavibor
Copy link
Collaborator

ok cool :) shall we say size limit 100-15000

@constantinpape
Copy link
Contributor

ok cool :) shall we say size limit 100-15000

Will do.
I am also estimating the values for the nuclei from the data now, will post it later here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants