This repository provides detailed reports on Semifield image data, including data contents, species distribution, temporal and spatial distribution, missing data analysis, and status of unprocessed or backlog data.
To manage the project's dependencies efficiently, we use Conda, a powerful package manager and environment manager. Follow these steps to install Conda if you haven't already:
- Download the appropriate version of Miniconda for your operating system from the official Miniconda website.
- Follow the installation instructions provided on the website for your OS. This typically involves running the installer from the command line and following the on-screen prompts.
- Once installed, open a new terminal window and type
conda list
to ensure Conda was installed correctly. You should see a list of installed packages.
After installing Conda, you can set up an environment for this project using an environment file, which specifies all necessary dependencies. Here's how:
-
Clone this repository to your local machine.
-
Navigate to the repository directory in your terminal.
-
Locate the
environment.yaml
file in the repository. This file contains the list of packages needed for the project. -
Create a new Conda environment by running the following command:
conda env create -f environment.yaml
This command reads the
environment.yaml
file and creates an environment with the name and dependencies specified within it. -
Once the environment is created, activate it with:
conda activate <env_name>
Replace
<env_name>
with the name of the environment specified in theenvironment.yaml
file.
With the environment set up and activated, you can run the scripts provided in the repository to begin data exploration and analysis:
- Ensure your Conda environment is activated:
conda activate semifield-reports
- [NOTE] Setup the pipeline in the main config. To run a script, use the following command syntax:
python main.py task=<task_name>
-
ExporterBlobMetrics: Exports blob metrics by running AzCopy commands and saving the output to text files.
-
CalculatorBlobMetrics: Calculates and analyzes blob metrics from the exported text files, including extracting batch details, filtering data, and computing image counts.
-
Run the script with the configuration file as an argument:
python main.py task=export_blob_metrics
-
Text Files: The ExporterBlobMetrics class saves blob lists as text files. The text files are saved in the directory specified by
cfg.paths.data_dir
in the configuration file, with the nameing format<blob_container_name>.txt
. -
CSV Report: The CalculatorBlobMetrics class generates a CSV file containing mismatch statistics, detailing any discrepancies found during analysis. The CSV files are saved in the directory specified by
cfg.paths.data_dir
in the configuration file, with the nameing formatmismatch_statistics_record.csv
.
- ReporterBlobMetrics: Generates PDF reports from blob metrics stored in CSV files and saves the reports.
-
Run the script with the configuration file as an argument:
python main.py task=report_blob_metrics
- PDF Report: The ReporterBlobMetrics class generates a PDF report containing the blob metrics. The PDF file is saved in the directory specified by
cfg.paths.report
in the configuration file, with the naming formatsemifield-developed-images_image_counts_and_averages_report.pdf
.