The SNP and gene datasets of M. Tuberculosis for drug resistance prediction. Here is a brief description of each file:
AllLabels.csv
contains the susceptibility/resistance status (susceptibility:0 and resistance:1) for each sample isolate to 12 different drugs.SNPList.csv
contains the list of all loci on the MTB genome where a mutation was detected using the variant calling tools, based on the reference genome provided here.SNP_data_part*.zip
contains csv files with the binary SNPs. The csv files are concatenated using loading_data package (refer to this repo).gene_data.csv.zip
contians a csv file that summarizes the SNPs based on the gene that they fall into to form a matrix that contains a single feature for each gene of each sample isolate.iso_list.csv
a list of all isolates IDs used in the training data.sparsetableFeb27.npz
The binary SNP file in npz format for ease of use.
For understanding how to load and use this data please visit the LRCN-drug-resistance repository, especially the loading_data section.
If you found the content of this repository useful, please cite us:
https://dl.acm.org/doi/abs/10.1145/3459930.3469534