Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Python version and dependencies #520

Merged
merged 16 commits into from
Apr 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions .github/workflows/run_all_frameworks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
skip_baseline: ${{ steps.find-required-tests.outputs.skip_baseline }}
skip_evaluation: ${{ steps.find-required-tests.outputs.skip_evaluation }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: pull base branch
run: |
git fetch --unshallow origin $GITHUB_BASE_REF
Expand Down Expand Up @@ -90,17 +90,17 @@ jobs:
task: [iris, kc2, cholesterol]
fail-fast: false
steps:
- uses: actions/checkout@v2
- name: Setup Python 3.8
uses: actions/setup-python@v2
- uses: actions/checkout@v3
- name: Setup Python 3.9
uses: actions/setup-python@v4
with:
python-version: '3.8'
python-version: '3.9'
- name: Create venv
run: python -m venv venv
- uses: actions/cache@v2
- uses: actions/cache@v3
id: cache
with:
path: /home/runner/work/automlbenchmark/automlbenchmark/venv/lib/python3.8/site-packages
path: /home/runner/work/automlbenchmark/automlbenchmark/venv/lib/python3.9/site-packages
key: pip-v3-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
pip-v3-
Expand All @@ -126,24 +126,24 @@ jobs:

strategy:
matrix:
python-version: [3.8]
python-version: [3.9]
framework: ${{ fromJson(needs.detect_changes.outputs.frameworks) }}
task: ${{ fromJson(needs.detect_changes.outputs.tasks) }}
benchmark: ${{ fromJson(needs.detect_changes.outputs.benchmark) }}
fail-fast: true # not sure about this one, but considering the big workload it might be nicer

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Create venv
run: python -m venv venv
- uses: actions/cache@v2
- uses: actions/cache@v3
id: cache
with:
path: /home/runner/work/automlbenchmark/automlbenchmark/venv/lib/python3.8/site-packages
path: /home/runner/work/automlbenchmark/automlbenchmark/venv/lib/python3.9/site-packages
key: pip-v3-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
pip-v3-
Expand All @@ -152,8 +152,10 @@ jobs:
run: |
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
python -m pip install -r requirements.txt
- name: Run ${{ matrix.framework }} on ${{ matrix.task }}
run: |
source venv/bin/activate
python runbenchmark.py ${{ matrix.framework }} ${{ matrix.benchmark }} test -f 0 -t ${{ matrix.task }} -e
env:
GITHUB_PAT: ${{ secrets.PUBLIC_ACCESS_GITHUB_PAT }}
2 changes: 2 additions & 0 deletions .github/workflows/runbenchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,7 @@ jobs:
python -m pip install -r requirements.txt
- name: Install ${{ inputs.framework }}
run: python runbenchmark.py ${{ inputs.framework }} -s only
env:
GITHUB_PAT: ${{ secrets.PUBLIC_ACCESS_GITHUB_PAT }}
- name: Benchmark ${{ inputs.framework }}
run: python runbenchmark.py ${{ inputs.framework }} ${{ inputs.options }}
16 changes: 8 additions & 8 deletions amlb/results.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,24 +155,24 @@ def as_data_frame(self):

@memoize
def as_printable_data_frame(self, verbosity=3):
str_print = lambda val: '' if val in [None, '', 'None'] or (isinstance(val, float) and np.isnan(val)) else val
int_print = lambda val: int(val) if isinstance(val, float) and not np.isnan(val) else str_print(val)
num_print = lambda fn, val: None if isinstance(val, str) else fn(val)
str_print = lambda val: '' if val in [None, '', 'None'] or (isinstance(val, float) and np.isnan(val)) else str(val)
int_print = lambda val: int(val) if isinstance(val, (float, int)) and not np.isnan(val) else str_print(val)

df = self.as_data_frame()
force_str_cols = ['id']
nanable_int_cols = ['fold', 'models_count', 'seed']
low_precision_float_cols = ['duration', 'training_duration', 'predict_duration']
high_precision_float_cols = [col for col in df.select_dtypes(include=[np.float]).columns if col not in ([] + nanable_int_cols + low_precision_float_cols)]
high_precision_float_cols = [col for col in df.select_dtypes(include=[float]).columns if col not in ([] + nanable_int_cols + low_precision_float_cols)]
for col in force_str_cols:
df[col] = df[col].astype(np.object).map(str_print).astype(np.str)
df[col] = df[col].map(str_print)
for col in nanable_int_cols:
df[col] = df[col].astype(np.object).map(int_print).astype(np.str)
df[col] = df[col].map(int_print)
for col in low_precision_float_cols:
float_format = lambda f: ("{:.1g}" if f < 1 else "{:.1f}").format(f)
df[col] = df[col].astype(np.float).map(partial(num_print, float_format)).astype(np.float)
# The .astype(float) is required to maintain NaN as 'NaN' instead of 'nan'
df[col] = df[col].map(float_format).astype(float)
for col in high_precision_float_cols:
df[col] = df[col].map(partial(num_print, "{:.6g}".format)).astype(np.float)
df[col] = df[col].map("{:.6g}".format).astype(float)

cols = ([] if verbosity == 0
else ['task', 'fold', 'framework', 'constraint', 'result', 'metric', 'info'] if verbosity == 1
Expand Down
2 changes: 1 addition & 1 deletion amlb/runners/singularity.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def _upload_image(self, image):

def _generate_script(self, custom_commands):
singularity_content = """Bootstrap: docker
From: ubuntu:18.04
From: ubuntu:22.04
%files
. /bench/
%post
Expand Down
3 changes: 1 addition & 2 deletions frameworks/autosklearn/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
packaging
packaging<22.0
openml
scipy>=0.14.1,<1.7.0
45 changes: 43 additions & 2 deletions frameworks/oboe/exec.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,58 @@
import os
import sys

from sklearn.model_selection import StratifiedKFold
import numpy as np

sys.path.append("{}/lib/oboe/automl".format(os.path.realpath(os.path.dirname(__file__))))
from auto_learner import AutoLearner
from oboe import AutoLearner

from frameworks.shared.callee import call_run, result
from frameworks.shared.utils import Timer

log = logging.getLogger(__name__)


def kfold_fit_validate(self, x_train, y_train, n_folds, random_state=None):
"""Performs k-fold cross validation on a training dataset. Note that this is the function used to fill entries
of the error matrix.
Args:
x_train (np.ndarray): Features of the training dataset.
y_train (np.ndarray): Labels of the training dataset.
n_folds (int): Number of folds to use for cross validation.
Returns:
float: Mean of k-fold cross validation error.
np.ndarray: Predictions on the training dataset from cross validation.
"""
y_predicted = np.empty(y_train.shape)
cv_errors = np.empty(n_folds)
kf = StratifiedKFold(n_folds, shuffle=True, random_state=random_state)
for i, (train_idx, test_idx) in enumerate(kf.split(x_train, y_train)):
x_tr = x_train[train_idx, :]
y_tr = y_train[train_idx]
x_te = x_train[test_idx, :]
y_te = y_train[test_idx]
model = self.instantiate()
if len(np.unique(y_tr)) > 1:
model.fit(x_tr, y_tr)
y_predicted[test_idx] = np.expand_dims(model.predict(x_te), axis=1)
else:
y_predicted[test_idx] = y_tr[0]
cv_errors[i] = self.error(y_te, y_predicted[test_idx])
self.cv_error = cv_errors.mean()
self.cv_predictions = y_predicted
self.sampled = True
if self.verbose:
print("{} {} complete.".format(self.algorithm, self.hyperparameters))
return cv_errors, y_predicted


def run(dataset, config):
log.info(f"\n**** Oboe [{config.framework_version}] ****\n")
log.info(f"\n**** Applying monkey patch ****\n")
from oboe.model import Model
Model.kfold_fit_validate = kfold_fit_validate

log.info(f"\n**** Oboe [{config.framework_version}] ****\n")
is_classification = config.type == 'classification'
if not is_classification:
# regression currently fails (as of 26.02.2019: still under development state by oboe team)
Expand All @@ -32,6 +72,7 @@ def run(dataset, config):
n_cores=n_cores,
runtime_limit=config.max_runtime_seconds,
**training_params)
aml.error_matrix = aml.error_matrix.to_numpy()

aml_models = lambda: [aml.ensemble, *aml.ensemble.base_learners] if len(aml.ensemble.base_learners) > 0 else []

Expand Down
9 changes: 0 additions & 9 deletions frameworks/oboe/requirements.txt

This file was deleted.

2 changes: 1 addition & 1 deletion frameworks/oboe/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,5 @@ else
PIP install -U -e ${TARGET_DIR}
fi

cat ${HERE}/requirements.txt | sed '/^$/d' | while read -r i; do PIP install --no-cache-dir -U "$i"; done
#cat ${HERE}/requirements.txt | sed '/^$/d' | while read -r i; do PIP install --no-cache-dir -U "$i"; done
#PIP install --no-cache-dir -U -e git+https://github.com/udellgroup/oboe.git@${VERSION}#egg=oboe
3 changes: 2 additions & 1 deletion frameworks/ranger/exec.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@ def run(dataset: Dataset, config: TaskConfig):
here = dir_of(__file__)
meta_results_file = os.path.join(config.output_dir, "meta_results.csv")
run_cmd(("Rscript --vanilla -e \""
"source('{script}'); "
".libPaths('{package_directory}'); source('{script}'); "
"run('{train}', '{test}', '{output}', cores={cores}, meta_results_file='{meta_results}', task_type='{task_type}')"
"\"").format(
package_directory=os.path.join(here, "lib"),
script=os.path.join(here, 'exec.R'),
train=dataset.train.path,
test=dataset.test.path,
Expand Down
5 changes: 3 additions & 2 deletions frameworks/ranger/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ if [[ -x "$(command -v apt-get)" ]]; then
SUDO apt-get install -y r-base r-base-dev
fi
#PIP install --no-cache-dir -r $HERE/requirements.txt
LIB="${HERE}/lib/"
mkdir "${LIB}"

Rscript -e 'options(install.packages.check.source="no"); install.packages(c("ranger", "mlr3", "mlr3learners", "mlr3pipelines", "farff"), repos="https://cloud.r-project.org/")'

Rscript -e 'options(install.packages.check.source="no"); install.packages(c("ranger", "mlr3", "mlr3learners", "mlr3pipelines", "farff"), repos="https://cloud.r-project.org/", lib="'"${LIB}"'")'
Rscript -e 'packageVersion("ranger")' | awk '{print $2}' | sed "s/[‘’]//g" >> "${HERE}/.setup/installed"
2 changes: 1 addition & 1 deletion frameworks/shared/requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ psutil>=5.4
ruamel.yaml>=0.15

pyarrow>=4.0
#tables>=3.6 # if using hdf serializer (let the client choose)
# tables>=3.6 # if using hdf serializer (let the client choose)
16 changes: 8 additions & 8 deletions frameworks/shared/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
#
# This file is autogenerated by pip-compile
# To update, run:
# This file is autogenerated by pip-compile with Python 3.9
# by the following command:
#
# pip-compile frameworks/shared/requirements.in
# pip-compile --output-file=frameworks/shared/requirements.txt frameworks/shared/requirements.in
#
numpy==1.21.0
numpy==1.24.2
# via pyarrow
psutil==5.8.0
# via -r frameworks/shared/requirements.in
pyarrow==4.0.1
pyarrow==11.0.0
# via -r frameworks/shared/requirements.in
ruamel.yaml.clib==0.2.2
# via ruamel.yaml
ruamel.yaml==0.17.4
ruamel-yaml==0.17.21
# via -r frameworks/shared/requirements.in
ruamel-yaml-clib==0.2.7
# via ruamel-yaml
14 changes: 8 additions & 6 deletions requirements.in
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
boto3>=1.9,<2.0
liac-arff>=2.5,<3.0
numpy>=1.20,<2.0
pandas>=1.2.4,<2.0
psutil>=5.4,<6.0
ruamel.yaml>=0.15,<1.0
openml==0.12.2
scikit-learn>=0.24
pyarrow>=4.0
tables>=3.6

numpy>=1.24,<2.0
pandas>=1.5,<2.0
openml==0.13.1
scikit-learn>=1.0,<2.0

pyarrow>=11.0
# tables>=3.6
Loading