Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/optimize package #5

Open
wants to merge 70 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
a1594e5
better package loading of modules and methods
nshahpazov Sep 5, 2022
0391842
remvoed data
nshahpazov Sep 5, 2022
4a012b7
moved the package setup in its own theeeng and added a make file
nshahpazov Sep 5, 2022
481ee1c
improved package consistency
nshahpazov Sep 6, 2022
7732f3c
cleaned data
nshahpazov Sep 6, 2022
2d8101a
removed module calls from circle ci
nshahpazov Sep 6, 2022
59ea35a
removed unnecessary step
nshahpazov Sep 6, 2022
904012e
added missing dep
nshahpazov Sep 6, 2022
e25bb82
incremented version and added saving of the model in testing
nshahpazov Sep 6, 2022
4385b89
added activate conda
nshahpazov Sep 6, 2022
6591519
added activate conda
nshahpazov Sep 6, 2022
f1190e7
added activate conda
nshahpazov Sep 6, 2022
0f71b35
reqs and final try with make
nshahpazov Sep 6, 2022
3fe2363
fixed typo
nshahpazov Sep 6, 2022
a73a077
fsay hello
nshahpazov Sep 6, 2022
bb6ad00
update
nshahpazov Sep 6, 2022
16e5a1e
add install req
nshahpazov Sep 6, 2022
6ead57c
add test
nshahpazov Sep 6, 2022
c509f7a
add test
nshahpazov Sep 6, 2022
ca5e5ef
add test
nshahpazov Sep 6, 2022
9110532
add test
nshahpazov Sep 6, 2022
d24d10e
add test
nshahpazov Sep 6, 2022
5f7f92a
add test
nshahpazov Sep 6, 2022
a781fed
add test
nshahpazov Sep 6, 2022
d9ba050
add test
nshahpazov Sep 6, 2022
fd2a821
add test
nshahpazov Sep 6, 2022
c6a2658
add test
nshahpazov Sep 6, 2022
b93e062
add test
nshahpazov Sep 6, 2022
9914e06
add test
nshahpazov Sep 6, 2022
08f483f
add test
nshahpazov Sep 6, 2022
b111511
add test
nshahpazov Sep 6, 2022
98a49e6
add test
nshahpazov Sep 6, 2022
5b348ad
add test
nshahpazov Sep 6, 2022
4cc5e08
add test
nshahpazov Sep 6, 2022
2833022
add test
nshahpazov Sep 6, 2022
83715c7
add test
nshahpazov Sep 6, 2022
533af1f
add test
nshahpazov Sep 6, 2022
df70ff0
add test
nshahpazov Sep 7, 2022
2980683
add test
nshahpazov Sep 7, 2022
97fafe0
add test
nshahpazov Sep 7, 2022
651f7f7
add test
nshahpazov Sep 7, 2022
aaf999f
add test
nshahpazov Sep 7, 2022
7967b44
add test
nshahpazov Sep 7, 2022
afd8f26
add test
nshahpazov Sep 7, 2022
e98cb2f
removed and adding a fetch step
nshahpazov Sep 7, 2022
a177f87
removed and adding a fetch step
nshahpazov Sep 7, 2022
5878deb
restructure
nshahpazov Sep 7, 2022
bc23935
restructure
nshahpazov Sep 7, 2022
799fb86
restructure
nshahpazov Sep 7, 2022
479412f
restructure
nshahpazov Sep 7, 2022
910e81d
restructure
nshahpazov Sep 7, 2022
74d8401
restructure
nshahpazov Sep 7, 2022
0f4ac4f
restructure
nshahpazov Sep 7, 2022
95aba8e
restructure
nshahpazov Sep 7, 2022
922233b
restructure
nshahpazov Sep 7, 2022
3ac3571
restructure
nshahpazov Sep 7, 2022
97043c0
restructure
nshahpazov Sep 7, 2022
9381056
restructure
nshahpazov Sep 7, 2022
e295698
restructure
nshahpazov Sep 7, 2022
b859c98
restructure
nshahpazov Sep 7, 2022
f2f3572
restructure
nshahpazov Sep 7, 2022
bb23797
restructure
nshahpazov Sep 7, 2022
1148844
remove requirements for now
nshahpazov Sep 7, 2022
4d9b0df
remove requirements for now
nshahpazov Sep 7, 2022
eb0a589
remove requirements for now
nshahpazov Sep 7, 2022
ec3aa64
remove requirements for now
nshahpazov Sep 7, 2022
4843fbe
remove requirements for now
nshahpazov Sep 7, 2022
ae6d0fd
remove requirements for now
nshahpazov Sep 7, 2022
a3905a8
remove requirements for now
nshahpazov Sep 7, 2022
5820f11
remove requirements for now
nshahpazov Sep 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 101 additions & 46 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -1,75 +1,130 @@
version: 2.1
executors:
docker-conda-executor:
docker-python-executor:
docker:
- image: continuumio/miniconda3:latest
- image: circleci/python:3.9.5

commands:
activate_conda:
update:
description: "Installing ubuntu prerequisites"
steps:
- run: |
conda init bash
source ~/.bashrc
conda env create -f environment.yml
conda activate houses
run_tests:
sudo apt-get update -y
sudo apt-get install make
sudo apt-get install unzip

prepare_pipeline_virtual_environment:
description: "Preparing the pipeline virtual environment"
steps:
- run: |
conda init bash
source ~/.bashrc
conda activate houses
pytest houses_pipeline
pytest houses_api
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

train_lasso_model:
install_pipeline_requirements:
description: "Installing pipeline requirements"
steps:
- checkout
- run: |
conda init bash
source ~/.bashrc
conda activate houses
python -m houses_pipeline.modelling.train_lasso
source venv/bin/activate
. venv/bin/activate
pip install -r requirements.txt

fetch_dataset:
run_pipeline_lint:
description: "Run pipeline linting"
steps:
- checkout
- run: |
conda init bash
source ~/.bashrc
conda activate houses
chmod +x ./houses_pipeline/fetch/fetch_dataset.sh
./houses_pipeline/fetch/fetch_dataset.sh data/raw
source venv/bin/activate
. venv/bin/activate
pylint houses_pipeline

run_pipeline_tests:
description: "Run pipeline tests"
steps:
- run: |
source venv/bin/activate
. venv/bin/activate
make test

preprocess_data:
train_and_publish_houses_model:
description: "Train the houses model"
steps:
- run: |
source venv/bin/activate
. venv/bin/activate
make publish

prepare_api_virtual_environment:
description: "Preparing the api virtual environment"
steps:
- run: |
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

install_api_requirements:
description: "Installing API requirements"
steps:
- run: |
. venv/bin/activate
pip install -r requirements.txt

run_api_tests:
description: "Run api tests"
steps:
- checkout
- run: |
conda init bash
source ~/.bashrc
conda activate houses
python -m houses_pipeline.preprocess
. venv/bin/activate
pytest .

prepare_tox:
steps:
- run: |
sudo pip install --upgrade pip
pip install --user tox

jobs:
setup_and_run_tests:
executor: docker-conda-executor
working_directory: ~/project
docker:
- image: continuumio/miniconda3:latest
setup_and_test_houses_pipeline:
executor: docker-python-executor
working_directory: ~/project/pipeline
steps:
- checkout:
path: ~/project
- update
- prepare_pipeline_virtual_environment
- install_pipeline_requirements
- run_pipeline_lint
- run_pipeline_tests
- train_and_publish_houses_model

setup_and_test_houses_api:
executor: docker-python-executor
working_directory: ~/project/api
steps:
- checkout:
path: ~/project
- update
- prepare_api_virtual_environment
- install_api_requirements
- run_api_tests

train_and_upload_houses_model:
executor: docker-python-executor
working_directory: ~/project/pipeline
steps:
- checkout
- activate_conda
# - TODO: ADD this fetch_dataset
- fetch_dataset
- preprocess_data
- train_lasso_model
- run_tests
- update
- prepare_houses_virtual_environment
- install_houses_requirements
- train_houses_model

# TODO: add build workflow
workflows:
version: 2
houses_workflow:
jobs:
- setup_and_run_tests:
- setup_and_test_houses_pipeline:
context:
- kaggling

- gemfury
- setup_and_test_houses_api:
context:
- kaggling
- gemfury
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
houses.egg-info
mlruns
build
dist
Expand All @@ -21,3 +20,7 @@ models/*.pkl

logs/*.log*
!logs/.gitkeep
# virtual environments
venv
pipeline_venv
api_venv
53 changes: 53 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
all: lint, test, train, publish

test:
@python -m pytest

lint:
echo "Linting the entire project"
@pylint houses_pipeline
@pylint houses_api

install_pipeline_develop:
pip install -e ./houses_pipeline

clean:
@echo "Cleaning up $(project_name)"
@rm -rf *.egg-info
@rm -rf $(raws_to_remove)
@rm -rf data/**/*.csv
@rm -rf logs/*.log*

# functionality
train: clean preprocess
@echo "Training a model for the $(project_name) project"
@python -m $(pipeline_dir).modelling.lasso.py

preprocess: fetch
@echo "Preprocessing the data for the $(project_name) project"
@mkdir -p $(pipeline_dir)/data/raw data/interim data/proccessed
@python -m $(pipeline_dir).preprocess data/raw/train.csv data/interim/train.csv
@python -m $(pipeline_dir).preprocess data/raw/test.csv data/interim/test.csv

fetch:
@mkdir -p data/raw data/interim data/proccessed
@chmod 701 ./$(pipeline_dir)/fetch/fetch_dataset.sh
@./$(pipeline_dir)/fetch/fetch_dataset.sh data/raw

clean_pipeline:
@echo "Cleaning up generated data, logs and models"
@rm -rf data/**/*.csv
@rm -rf models/**/*.pkl
# @rm -rf logs/*.log*

publish: train
.$(houses_pipeline)/scripts/publish_model.sh .


# ignore the following
# .PHONY: preprocess, test, fetch

# variables
project_name = houses_pipeline
pipeline_dir = houses_pipeline
raws_to_remove = data/raw/*.csv, data/raw/*.zip, data/raw/*.txt
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
14 changes: 14 additions & 0 deletions api/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
--index-url https://${FURY_AUTH}:@pypi.fury.io/nshahpazov/
--extra-index-url https://pypi.org/simple houses_pipeline

# api
flask==2.2.2
# schema validation
marshmallow==3.17.1

# # tests
pytest==7.1.2

# gemfury stored module
houses_pipeline==0.0.31
joblib==1.1.0
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
"""Controller setup for the testing context"""
import os
import json
import math
import pandas as pd
from houses_pipeline import __version__ as _model_version
from houses_pipeline.fetch import fetch_houses_dataset
from houses_pipeline.preprocess.core import preprocess
from houses_pipeline.modelling import lasso

# from houses_pipeline.fetch import fetch_houses_dataset
# from houses_pipeline.preprocess import preproces
from .. import __version__ as _api_version

from ..api.config import get_logger
from ..api.config import TEST_DATASET_PATH



_logger = get_logger(logger_name=__name__)

def test_health_endpoint_returns_ok_status(test_client):
Expand Down Expand Up @@ -48,28 +52,25 @@ def test_prediction_endpoint_returns_prediction(test_client):
across packages.
"""

# fetch data
os.system("./houses_pipeline/fetch/fetch_dataset.sh data/raw")
fetch_houses_dataset()

# preprocess
os.system("python -m houses_pipeline.preprocess")
os.system(
"python -m houses_pipeline.preprocess data/raw/test.csv data/interim/test.csv"
)
preprocess("data/raw/train.csv", "data/interim/train.csv")
preprocess("data/raw/test.csv", "data/interim/test.csv")

# train a model
os.system("python -m houses_pipeline.modelling.train_lasso")
# train a model and save it
lasso.train()

test_data = pd.read_csv(TEST_DATASET_PATH)
post_json = test_data[0:1].to_json(orient='records')
# when

response = test_client.post('/predict/lasso', json=json.loads(post_json))

# then
assert response.status_code == 200
response_json = json.loads(response.data)
prediction = response_json['predictions']
print(response_json)
response_version = response_json['version']
assert math.ceil(prediction[0]) == 117205
assert response_version == _model_version
File renamed without changes.
Loading