This repository contains the SARS-CoV-2 variant metadata in the Parquet compressed format in the data directory:
- data/variants.parquet
The download_variants.ipynb notebook downloads SARS-CoV-2 variant metadata from China National Center for Bioinformation, standardizes the column names and data, and saves the results as a .parquet file.
The read_variants.ipynb notebook shows an example how to read selected columns from the .parquet file into a Pandas dataframe.