We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
this seems like generically useful code that ensures that the various dataframes are all copacetic and synchronized...
SampleDFs = namedtuple('SampleDFs', 'gather_df, all_df, left_df, names_df') def load_dfs(outdir, sample_id): # load mapping CSVs all_df = pd.read_csv(f'{outdir}/minimap/depth/{sample_id}.summary.csv') left_df = pd.read_csv(f'{outdir}/leftover/depth/{sample_id}.summary.csv') # load gather CSV gather_df = pd.read_csv(f'{outdir}/genbank/{sample_id}.x.genbank.gather.csv') # names! names_df = pd.read_csv(f'{outdir}/genbank/{sample_id}.genomes.info.csv') # connect gather_df to all_df and left_df using 'genome_id' def fix_name(x): return "_".join(x.split('_')[:2]).split('.')[0] gather_df['genome_id'] = gather_df['name'].apply(fix_name) names_df['genome_id'] = names_df['acc'].apply(fix_name) # check that all dataframes are copacetic in_gather = set(gather_df.genome_id) in_left = set(left_df.genome_id) assert in_gather == in_left assert in_gather == set(names_df.genome_id) assert in_gather == set(all_df.genome_id) # re-sort left_df and all_df to match gather_df order, using matching genome_id column all_df.set_index("genome_id") all_df.reindex(index=gather_df["genome_id"]) all_df.reset_index() left_df.set_index("genome_id") left_df.reindex(index=gather_df["genome_id"]) left_df.reset_index() #left_df["mapped_bp"] = (1 - left_df["percent missed"]/100) * left_df["genome bp"] #left_df["unique_mapped_coverage"] = left_df.coverage / (1 - left_df["percent missed"] / 100.0) names_df.set_index("genome_id") names_df.reindex(index=gather_df["genome_id"]) names_df.reset_index() return SampleDFs(gather_df, all_df, left_df, names_df)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
this seems like generically useful code that ensures that the various dataframes are all copacetic and synchronized...
The text was updated successfully, but these errors were encountered: