This package is used to perform the compute parts of the titanic kaggle contest. Namely -
- Mounting of datasets on the persistent folder (/data)
- Printing the shape of the dataset
- Preprocessing (aggreagtions + feature engineering)
- Modelling (+trianing)
- Accuracy
Follow this link to download all the dependcies and run the brane instance. (need to have both cli and instance installed and the instance running) and then follow from #6 below.
Or
Follow these instructions -
(Assuming you have docker, buildx plugin for docker, docker compose)
-
Download brane repository -
git clone https://github.com/epi-project/brane.git && cd brane
-
do -
chmod +x ./make.sh
-
Download cli for linux -
sudo wget -O /usr/local/bin/brane https://github.com/epi-project/brane/releases/latest/download/brane-linux
-
(alternate for #3) For macOS -
sudo wget -O /usr/local/bin/brane https://github.com/epi-project/brane/releases/latest/download/brane-darwin
-
Start instance -
./make.sh start-instance --precompiled
-
Run the following to build the package -
brane build ./container.yml
-
Push the package to run it remotely in your instance (do
brane login http://127.0.0.1 --user <username>
first) -brane push brane_compute
-
Use branescript (import this package in it by adding
import brane_compute;
on the top) or dobrane repl --remote http://127.0.0.1:50053
(If you running on a K8 cluster, use the cluster address instead)
Or
You can read the next section to import the package directly from this github repository.
Assuming that you have brane cli and brane instance downloaded and deployed on your local/K8 cluster, then you can run the following command -> brane import web-services-and-cloud-based-systems/brane-compute-package
This will import the package on your container registry hosted on your local/K8 cluster. Then you can use this package in your branescripts or on the repl.
Brane function | Description | Input | Output | Result |
---|---|---|---|---|
mount |
Function that mounts the train.csv and test.csv files in this brane package in the /data folder | - | Returns string - "done"/"error" | train.csv and test.csv is available in /data |
data_shape |
Function that returns the shape of the dataframe after reading the data from a file. | path of the file | shape | shape of the data |
preprocessing |
Function that preprocesses the dataset (train, test) | path, isTrain | Integer - 0 (success)/error code | Preprocesses dataset for models |
modelling |
Function to train the model on the basis mode provided. Mode is the identifier for the machine learning model provided. | path_train, path_test, mode | 0/error code | output vector |
get_model_accuracy |
Function to check the model accuracy | name | output | accuracy pf the model |
There are a few helper functions that are called internally, from within the brane functions:
- get_df
- name_proc
- imputting_na_values
- cat_to_num
- missingAge
- family
We have set automated build for this package using github actions and also run tests (pytest) on it. The triggers are push
and pull_request
, but, one can run it manually as well. Please refer to the image below to see how. (Click on run workflow(CI))
Assuming that you are in the root of the repository and downloaded the depencies (in the requirements.txt file), run pytest -v
.