Skip to content

The SNP and gene datasets of M. Tuberculosis for drug resistance prediction.

Notifications You must be signed in to change notification settings

AmirHoseinSafari/M.tuberculosis-dataset-for-drug-resistant

Repository files navigation

M.tuberculosis dataset for drug resistant

The SNP and gene datasets of M. Tuberculosis for drug resistance prediction. Here is a brief description of each file:

  • AllLabels.csv contains the susceptibility/resistance status (susceptibility:0 and resistance:1) for each sample isolate to 12 different drugs.
  • SNPList.csv contains the list of all loci on the MTB genome where a mutation was detected using the variant calling tools, based on the reference genome provided here.
  • SNP_data_part*.zip contains csv files with the binary SNPs. The csv files are concatenated using loading_data package (refer to this repo).
  • gene_data.csv.zip contians a csv file that summarizes the SNPs based on the gene that they fall into to form a matrix that contains a single feature for each gene of each sample isolate.
  • iso_list.csv a list of all isolates IDs used in the training data.
  • sparsetableFeb27.npz The binary SNP file in npz format for ease of use.

For understanding how to load and use this data please visit the LRCN-drug-resistance repository, especially the loading_data section.


Citation

If you found the content of this repository useful, please cite us:

https://dl.acm.org/doi/abs/10.1145/3459930.3469534


About

The SNP and gene datasets of M. Tuberculosis for drug resistance prediction.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published