PHYS_449_project

Repository for our PHYS 449 group project.

The purpose of this project was to replicate the methods and results from this paper: https://doi.org/10.1111/j.1365-2966.2004.07442.x

The paper explores how astronomers can use machine learning to aid in galaxy classification, a time consuming task that would greatly benefit the astronomy community if it could be automated.

Dataset format

Our dataset consists of a folder with all of the raw images (raw_images) and two CSV files:

ids_and_labels.csv contains the object IDs, the binary classifications, and the full classifications
specific_ids_and_labels.csv contains the object IDs, the multiclass classifications, and the full classifications

Project Dependencies

Python

We are using Python 3.10.5

Create a new venv (virtual environment)

python -m venv group_proj_env

Note: you could also use conda, or another package manager to create and manage virtual environments.

Activate a venv

Using VSCode's file explorer, go to group_proj_env/Scripts/
Right click on the file named activate(should be the first one)
Click on "Copy Relative Path"
Paste this path in the command line and hit enter

Packages Required

numpy
opencv
matplotlib
pandas
requests
scikit-learn
pytorch
nptyping

Use this command to install the required packages

pip install -r requirements.txt

sdss package credits

The "sdss" folder in our "src" folder comes from the "sdss" package created by Behrouz Safari (https://github.com/behrouzz/sdss). Used under an MIT open source license

Running `main.py`

To run main.py, simply paste this in the command line and hit enter:

python main.py

json, hyperparameters

The hyperparameters can be found in the "param/param.json" file. It is split up into 3 sections/subdictionaries

optim

epochs: number of times the traning dataset is iterated over
learn_rate: the learning rate used by the NN's optimizer
binary_threshold: the "cutoff" value used when processing our images into binary images. Pixel values below this value = 0, pixel values above it = 1

model

hidden_nodes: the size of the hidden layer inside our NN
feature_size: the dimension of the feature vector inputted into the NN, i.e. the number of principle components kept from the covariance matrix for the whole processed dataset
batch: number of feature vectors trained on before updating model weights during training
train_end_index: the size of our training dataset

class_label_mapping

Assigns integer values for galaxy classes/subclasses being considered

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
grayscale_images		grayscale_images
param		param
processed_images		processed_images
raw_images		raw_images
src		src
.gitignore		.gitignore
README.md		README.md
galaxy_zoo_shortened.csv		galaxy_zoo_shortened.csv
ids_and_labels.csv		ids_and_labels.csv
main.py		main.py
requirements.txt		requirements.txt
specific_ids_and_labels.csv		specific_ids_and_labels.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PHYS_449_project

Dataset format

Project Dependencies

Python

Create a new venv (virtual environment)

Activate a venv

Packages Required

Use this command to install the required packages

sdss package credits

Running `main.py`

json, hyperparameters

optim

model

class_label_mapping

About

Releases

Packages

Languages

SkyeChen-28/PHYS_449_group_project

Folders and files

Latest commit

History

Repository files navigation

PHYS_449_project

Dataset format

Project Dependencies

Python

Create a new venv (virtual environment)

Activate a venv

Packages Required

Use this command to install the required packages

sdss package credits

Running main.py

json, hyperparameters

optim

model

class_label_mapping

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Running `main.py`

Packages