Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a module for multi-qc #78

Closed
sergpolly opened this issue Jan 29, 2020 · 15 comments
Closed

Make a module for multi-qc #78

sergpolly opened this issue Jan 29, 2020 · 15 comments

Comments

@sergpolly
Copy link
Member

Multi-qc is great!
Well supported , maintained and documented
https://multiqc.info/docs/#introduction-2

We just need to make multi-qc plugin for pairtools (would be part of pairtools) to make multi-qc understand our .stats -> make all kinds of beautiful and interactive plots and tables browseable along with e.g. fastqc report
It would end up in distiller afterwards of course...

@golobor @nvictus - have anyone been doing anything like that yet?

@sergpolly
Copy link
Member Author

@sergpolly
Copy link
Member Author

preliminary stuff:
Screenshot from 2020-02-01 14-36-46

@golobor
Copy link
Member

golobor commented Feb 1, 2020 via email

@sergpolly
Copy link
Member Author

First draft is ready https://github.com/dekkerlab/MultiQC/tree/pairtools-module ...

Check out:

multiqc --outdir ~/blah --module pairtools /path/to/distiller-results

@sergpolly
Copy link
Member Author

here is a clickable example:
http://ummsres37.ad.umassmed.edu:8080/mqc/multiqc_report.html

@sergpolly
Copy link
Member Author

question regarding the types of pairs pairtools parse can generate:
is there a place somewhere where those are exhaustively enumerated, it is not obvious from the code ...
by looking at some "old"-ish distiller stats we have only NU NM MU there , are we always guaranteed to have those and not UN, MN, UM ?

I guess if one uses --no-flip option - than It is not guaranteed, otherwise it does - am I right ?
https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_parse.py#L802

Should I account for X type of alignments ? XX pairs ?

We've also seen strange pair-type MR ? is it now the XR ? - i don't fully understand the meaning, but it's a separate question i guess

here is how the barchart "pairs by alignment status" looks at the moment:

Screenshot from 2020-03-02 15-43-39

@mimakaev
Copy link

mimakaev commented Mar 2, 2020

There may be more types soon as Sasha finishes the walk rescuer...

@sergpolly
Copy link
Member Author

ok - i'll try to make it more flexible than

@sergpolly
Copy link
Member Author

here are the keys that I've included and assigned "nice" colors to ...
known_keys = ['UU', 'RU', 'UR', 'WW', 'DD', 'MR', 'MU', 'MM', 'NM', 'NU', 'NN', 'XX']

barchart would show them in the this order as well .

Any extra keys that are not form this list are going to be displayed after these ones, in a "random" order and with auto-coloring by MultiQC itself - i.e. it might look ugly at the end.

but before we submit everyhting I'd like to hear your input @golobor @mimakaev @agalitsyna - whoever it might concern - on the groupping of pairt-types, potentially missing categories, collapsing existing categories into 1 (UR+RU-> RU), XR vs MR still unclear to me, MN vs NM , etc

@golobor
Copy link
Member

golobor commented Mar 4, 2020 via email

@sergpolly
Copy link
Member Author

fix scalings-report section to use actual chromsizes instead of a fake 2_000_000 - which is happening now ...

@Phlya
Copy link
Member

Phlya commented Mar 6, 2020

I didn't manage to say that during the talk, but do we actually need to know chromsizes? Knowing bins fixes the ratios of areas between them (doesn't it?), and in the end we can ensure area under curve equals 1. So we can fake areas to ensure the right shape of the curve, and then rescale it to get correct Y axis.

(assuming we ignore the last bin issue)

@mimakaev
Copy link

mimakaev commented Mar 6, 2020

We need to know chromsizes to know the denominator. Only a few chromosomes contribute to the last several bins, and the distribution of chrom lengths would exactly determine that contribution.

@sergpolly
Copy link
Member Author

@mimakaev suggestions from slack:

  1. it would be nice to estimate % of self-circles and dangling ends, where dangling ends are FR < 1kb and self circles are RF <1kb - add this to "general stats"
  2. % cis is confusing:
    it is % cis out of (cis + trans) , not % cis out of total
    however, % cis out of total (especially after dangling ends - self circles were accounted for) is the most important metric.
    right now I have a dataset that has 40% duplicates, and 50% self-circles, and it tells me that %cis is 80%. if I didn't come with a prior that the dataset clearly has issues, after seeing it on higlass, I would have completely missed that and also missed self-circles potentially
  3. also, I wouldn't use red for RF type.
    I wouldn't use red-based colormap on the bottom for chromosome frequencies either
    and I would maybe move DD towards red-ish color (some kind of purple)
    red should mean "bad"
  4. sometimes this happens
    image
    there are way too many columns here
    and chr4 is the last label, even though it is actually chr1

@sergpolly sergpolly reopened this Mar 27, 2020
@agalitsyna
Copy link
Member

That was a great discussion! Seems like it's the whole Open2C package for that purpose now, and the issues and proposals can be addressed there: https://github.com/open2c/MultiQC
I'll move this to discussions as a historical note.

@open2c open2c locked and limited conversation to collaborators Apr 6, 2022
@agalitsyna agalitsyna converted this issue into discussion #115 Apr 6, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

5 participants