-
Notifications
You must be signed in to change notification settings - Fork 32
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should we store stats as YAML (or json) #79
Comments
It's quite easy to parse, I think... Just read as a table with pandas?
…On Fri, Jan 31, 2020, 21:16 Sergey Venev ***@***.***> wrote:
that's how we store stats now:
total_mapped 2189618376
total_nodups 1753432070
cis 1533122797
...
pair_types/WW 88884
pair_types/MU 404456330
...
cis_1kb+ 998076606
cis_2kb+ 836035718
...
chrom_freq/chr1/chr1 137125332
chrom_freq/chr1/chr10 1791283
...
it is hard to parse that and YAML would serve us just fine i believe -
should we switch ?
would be useful for #78 <#78>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#79>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAWCZORRRRE7LRPBF5A2XJLRASIKVANCNFSM4KONV4VA>
.
|
things like pairs_type:
imply nested structure - i.e. I would want to parse it as stats = {...,"pair_types": {"WW": 8884, "MU":40404000},...} I'm not sure Also , for MultiQC - they don't want to rely on pandas for whatever reason - pandas isn't the smallest dependency I guess |
that's how we parse a typical stats file in the pairtools now: https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_stats.py#L263 with standard YAML - that is great for storing nested dicst, and various small lists it would simply look like:
and here is the ultimate goal: |
I updated I guessed the best way to convert some fields, e.g. reporting chromosomes separated by slash instead of separate dict for each chromosome:
But this is minor and you may change it in the future. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
that's how we store stats now:
it is hard to parse that and YAML would serve us just fine i believe - should we switch ?
would be useful for #78
The text was updated successfully, but these errors were encountered: