Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should we store stats as YAML (or json) #79

Closed
sergpolly opened this issue Jan 31, 2020 · 4 comments
Closed

should we store stats as YAML (or json) #79

sergpolly opened this issue Jan 31, 2020 · 4 comments

Comments

@sergpolly
Copy link
Member

that's how we store stats now:

total_mapped    2189618376
total_nodups    1753432070
cis     1533122797
...
pair_types/WW   88884
pair_types/MU   404456330
...
cis_1kb+        998076606
cis_2kb+        836035718
...
chrom_freq/chr1/chr1    137125332
chrom_freq/chr1/chr10   1791283
...

it is hard to parse that and YAML would serve us just fine i believe - should we switch ?
would be useful for #78

@Phlya
Copy link
Member

Phlya commented Jan 31, 2020 via email

@sergpolly
Copy link
Member Author

things like pairs_type:

...
pair_types/WW   88884
pair_types/MU   404456330
...

imply nested structure - i.e. I would want to parse it as

stats = {...,"pair_types": {"WW": 8884, "MU":40404000},...}

I'm not sure pandas would help with that

Also , for MultiQC - they don't want to rely on pandas for whatever reason - pandas isn't the smallest dependency I guess

@sergpolly
Copy link
Member Author

that's how we parse a typical stats file in the pairtools now: https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_stats.py#L263

with standard YAML - that is great for storing nested dicst, and various small lists it would simply look like:

import yaml

stats_dict = yaml.load("sample.nodups.stats.yml")

and here is the ultimate goal:
https://multiqc.info/
https://multiqc.info/examples/hi-c/multiqc_report.html

@agalitsyna
Copy link
Member

I updated pairtools stats output in yaml in version 1.0.0: https://github.com/open2c/pairtools/pull/117/files#diff-e4b8770efd538564222d48d69b00ed2c5012a76b35c926f1aba227fe45db2309

I guessed the best way to convert some fields, e.g. reporting chromosomes separated by slash instead of separate dict for each chromosome:

chrom_freq:
  chr1/chr1: 3
  chr1/chr2: 1
  chr2/chr3: 1

But this is minor and you may change it in the future.

@open2c open2c locked and limited conversation to collaborators Apr 20, 2022
@agalitsyna agalitsyna converted this issue into discussion #129 Apr 20, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants