GitHub - ninabernick/cca_data_analysis: Notebooks and scripts for running and analyzing CCA & C3A on generated data

senior thesis project

Components:

Jupyter notebooks for analyzing and visualizing data
- all .ipynb files
jobs for running analyses on compute cluster
- c3a_job_spawner.py, analyze_model_job.py, postprocessing.py, processing_job.sh
data from analyses
- final_data_zarr: results from standard C3A and CCA
- linear_c3a_results_zarr: results from linear weighted C3A
- log_c3a_results_zarr: results from log weighted C3A
- weighted_c3a_results_zarr: results from other weighted C3A trials

Workflow:

Update desired parameter values in c3a_job_spawner.py and desired algorithms to run in analyze_model_job.py, then run python c3a_job_spawner.py [desired results directory]
- you may be rate limited in the amount of jobs you can spawn (I believe ~200/hour). Each spawned instance of analyze_model_job.py is one job, so make sure the "outer parameters" in c3a_job_spawner.py do not result in too many jobs. Additionally, it is advised to use the day queue to get your jobs run immediately, but the maximum time allowed is 1 day, so do not make the inner parameter set too large.
Data will be stored in \[results_directory\]/ds_*.nc. To combine results into one xarray.Dataset, update the path to the results folder in processing_job.sh then run sbatch processing_job.sh.
- Data will be written to a zarr instead of NetCDF to drastically reduce the amount of space needed to store it. This may lead to some issues with encodings if you try to open these files and then rewrite them.
If you want to combine overall datasets, each with a different algorithm tested, use combine_datasets.py

I found that it was difficult to do any development or data analysis on the cluster due to connectivity issues with VSCode, so I used github to keep the code repo on the cluster updated and transferred data files to my personal machine to run analyses.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
ds_merged_zarr		ds_merged_zarr
figures		figures
final_data_zarr		final_data_zarr
linear_c3a_results_zarr		linear_c3a_results_zarr
log_c3a_results_zarr		log_c3a_results_zarr
weighted_c3a_results_zarr		weighted_c3a_results_zarr
.DS_Store		.DS_Store
README.md		README.md
analyze_model_job.py		analyze_model_job.py
c3a_analysis.ipynb		c3a_analysis.ipynb
c3a_job_spawner.py		c3a_job_spawner.py
c3a_weighted_analysis.ipynb		c3a_weighted_analysis.ipynb
combine_datasets.py		combine_datasets.py
environment.yml		environment.yml
final_paper_figures.ipynb		final_paper_figures.ipynb
plotting.py		plotting.py
postprocessing.py		postprocessing.py
processing_job.sh		processing_job.sh
thesis_final.pdf		thesis_final.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

senior thesis project

Components:

Workflow:

About

Releases

Packages

Languages

ninabernick/cca_data_analysis

Folders and files

Latest commit

History

Repository files navigation

senior thesis project

Components:

Workflow:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages