Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reactivate the integration tests for remote file access (GCS, S3) #210

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

tramora
Copy link
Collaborator

@tramora tramora commented Jul 11, 2024

Docker is installed in the khiopspydev docker image so that it can start the local fake servers .
The container must be started with the --privileged option otherwise docker inside the container would not start.
The fake servers are pre-provisioned with sample files to be ready before the tests run

closes #175, #53

@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 5 times, most recently from da00cae to 1dc349c Compare July 13, 2024 21:21
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch from 1dc349c to 6593606 Compare August 14, 2024 16:18
@tramora
Copy link
Collaborator Author

tramora commented Aug 14, 2024

@popescu-v & @folmos-at-orange : a detail we forgot the S3 & GCS drivers are not released publicly and thus not available in the dev image. When trying to access to the remote files during the tests khiops complains

error : there is no driver available for the URI scheme 'gs://'
error : there is no driver available for the URI scheme 's3://'

@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch from cb7c126 to 35c3a6f Compare August 19, 2024 12:19
@popescu-v
Copy link
Collaborator

@popescu-v & @folmos-at-orange : a detail we forgot the S3 & GCS drivers are not released publicly and thus not available in the dev image. When trying to access to the remote files during the tests khiops complains

error : there is no driver available for the URI scheme 'gs://'
error : there is no driver available for the URI scheme 's3://'

Yes, AFAIU we need to wait for the drivers to be publicly released before the tests can work.

@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 5 times, most recently from 18cf7e9 to a750996 Compare September 4, 2024 15:33
@tramora tramora self-assigned this Sep 24, 2024
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 2 times, most recently from ed8cb75 to 571f402 Compare September 25, 2024 12:07
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 5 times, most recently from beab51e to 04fbd2b Compare October 9, 2024 15:04
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 2 times, most recently from e9011a8 to 3351a38 Compare October 21, 2024 09:56
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 6 times, most recently from 3e90f62 to 53fdbc3 Compare December 12, 2024 00:22
@tramora
Copy link
Collaborator Author

tramora commented Dec 16, 2024

There is an issue remaining that breaks the tests and that I cannot fix right now.

It seems that khiops under the conda env cannot use the remote files drivers installed system wide. They cannot be installed via conda as there is no conda release yet.

This execution

/root/miniforge3/bin/conda run -n py3.12_conda  coverage run -m xmlrunner -o "reports/py3.12_conda" tests.test_remote_access.KhiopsS3RemoteFileTests.test_khiops_classifier_with_remote_access

fails with

khiops.core.exceptions.KhiopsRuntimeError: khiops execution had errors (return code 1):
Contents of stdout:
error : there is no driver available for the URI scheme 's3://'
error : Input command file s3://s3-bucket/khiops-cicd/tmp/KhiopsClassifier_fit_224bc1d9-4374-4f5e-a665-4e577ad107e6/build_dictionary_from_data_table_7ff01b03-06af-46d1-90ad-1564340c2d83._kh : Unable to open file
error : there is no driver available for the URI scheme 's3://'

@popescu-v
Copy link
Collaborator

There is an issue remaining that breaks the tests and that I cannot fix right now.

It seems that khiops under the conda env cannot use the remote files drivers installed system wide. They cannot be installed via conda as there is no conda release yet.

This is true. Hence, IMHO the way to go for the current PR is to test the remote access only in the native case (Conda environments for test, but with the native khiops-core and drivers packages installed).

Then, as soon as the Conda packages for the drivers are available, a new PR should activate these integration tests in full Conda environments.

@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 3 times, most recently from 2e88e35 to 4602e44 Compare December 17, 2024 17:14
khiops-gcs-driver-revision:
type: string
default: 0.0.10
description: Version of the driver used for remote files on Google Cloud Storage (GCS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Driver version for Google Cloud Storage remote files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewording done

khiops-s3-driver-revision:
type: string
default: 0.0.12
description: Version of the driver used for remote files AWS compatible storages (S3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver version for AWS-S3 remote file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewording done

@@ -43,7 +43,7 @@ jobs:
# because the `env` context is only accessible at the step level;
# hence, it is hard-coded
image: |-
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || 'latest' }}
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || '10.2.3.0.s3-gcs-remote-files' }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it isn't latest here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misunderstood the process agreed with @popescu-v.
As long as the PR is in review, the specific image build with this branch is not set yet as the latest.
Just before the merge then I can change to the latest.

shell: bash
run: |
# for S3 only
echo "HOME=${HOME}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? If I understand well HOME will have the same value after this line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command was put for diagnosis purpose : the HOME env variable was not expanded everywhere and I needed to see its value to copy it. Command removed

AWS_ENDPOINT_URL: http://localhost:4569
shell: bash
run: |
# for S3 only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this comment to # Prepare AWS-S3 credentials and configuration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewording done

)
# symmetric call to ensure the upload was OK
fs.copy_to_local(
f"{proto}://{bucket_name}/khiops-cicd/samples/" + file, "/tmp/dummy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just remove the + and put everything in the interpolated string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 106 to 107
Temporary skip the S3 and GCS tests under a conda environment
until the drivers for remote files access
are released as conda packages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Temporarily skip the S3 and GCS tests in a conda environment ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo fixed

Comment on lines 124 to 128
# Debug :
# print(
# f"{self.remote_access_test_case()} : "
# f"Variables available when testing if in a CONDA env : {os.environ}"
# )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment really needed ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was left in comment if later debug is needed. Comment removed

Comment on lines 445 to 441
"""
Tests using a docker runner should never be skipped even in a conda env
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern: The title of a docstring always in the first line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 134 to +150
def setUp(self):
self.skip_if_no_config()
if self.is_in_a_conda_env() and self.should_skip_in_a_conda_env():
# TODO : Temporary skip the test under a conda environment
# until the drivers for remote files access
# are released as conda packages
self.skipTest(
f"Remote test case {self.remote_access_test_case()} "
"in a conda environment is currently skipped"
)
self.print_test_title()

def tearDown(self):
# Cleanup the output dir (the files within and the folder)
if hasattr(self, "folder_name_to_clean_in_teardown"):
for filename in fs.list_dir(self.folder_name_to_clean_in_teardown):
fs.remove(
fs.get_child_path(
self.folder_name_to_clean_in_teardown, filename
)
)
fs.remove(self.folder_name_to_clean_in_teardown)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid using hasattr and setting up your own temporary urls, instead of a per test setup/tear-down, I'd use one for the whole class.

  • In the setupClass method you simply set the env var KHIOPS_TMP_DIR with a fresh runner.
  • In the tearDownClass method you clean up everything and restore the original temporary directory and runner.

Something like this example

import unittest
import os
import subprocess
from khiops import core as kh
from khiops.core.internals.runner import KhiopsLocalRunner


class TestWithAltRunner(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        # Save the original runner and temp dir
        cls.original_runner = kh.get_runner()
        cls.original_khiops_temp_dir = os.environ.get("KHIOPS_TMP_DIR")

        # Set the test suite temp dir
        os.environ["KHIOPS_TMP_DIR"] = "/tmp/mytmpdir/" # replace here with the URL
        cls.runner = KhiopsLocalRunner()
    @classmethod
    def tearDownClass(cls):
        # Restore the original runner and temp dir
        kh.set_runner(cls.original_runner)
        if cls.original_khiops_temp_dir is None:
            del os.environ["KHIOPS_TMP_DIR"]
        else:
            os.environ["KHIOPS_TMP_DIR"] = cls.original_khiops_temp_dir

        # Clean the whole alternative temporary file
        # ...

    def test(self):
        print("\n======= Current test runner =========")
        kh.get_runner().print_status()

Thierry RAMORASOAVINA added 4 commits December 20, 2024 19:05
… non-conda environments

- 2 fake servers (fakes3 and fake-gcs-server) are used locally instead of a real S3 and GCS servers.
- The fake servers are provisioned with sample files before the tests run
- During the construction of the khiopsdev docker image we need to install the s3/gcs drivers required by khiops
- As these drivers are not released yet as conda packages we must disable the tests in conda environments (except when using a docker runner)
…s, upload_file and download_file do not return a response but raises an exception in case of error
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch from 4602e44 to a5604b6 Compare December 20, 2024 18:05
)
# normalize the raised exception
except Exception as e:
Copy link
Collaborator

@popescu-v popescu-v Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would restrict the "catch" here to S3UploadFailedError:

except S3UploadFailedError as exc:
    raise RuntimeError(...) from exc

Thus, wenever there is a different (unexpected) exception, it would not be handled and whence would be propagated to the caller code so that we know about it (e.g. from inspecting the CI logs).

# Avoid resolving a fake s3-bucket.localhost hostname
# Alternate builders (buildx via moby buildkit) mount /etc/hosts read-only, the following command will fail
# echo "127.0.0.1 s3-bucket.localhost" >> /etc/hosts
# You will have to add the `add-hosts` input instead (https://github.com/docker/build-push-action/#inputs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we do this directly here? This Docker file is only useful in the CI context anyway.

@@ -79,17 +118,47 @@ def skip_if_no_config(self):
"has no configuration available"
)

@staticmethod
def is_in_a_conda_env():
# CONDA_DEFAULT_ENV and CONDA_PROMPT_MODIFIER
Copy link
Collaborator

@popescu-v popescu-v Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what we want here is to distinguish whether the khiops-core Khiops binaries package is installed in a Conda environment or in a native way (in the OS):

  • if installed natively, then these tests can be run
  • otherwise, they cannot be, until the Conda driver packages are shipped.

In order to detect whether we are running a Conda-installer khiops-core, we can rely on the fact that Conda-installed khiops-core has the Khiops executables in the same place as the Conda environment's bin directory. Hence, we can do something like this:

@staticmethod
def is_in_a_conda_env():
    """Detects whether this is run from a Conda environment and khiops-core is installed in the same env"""
   
   # Get path to the Khiops executable
    khiops_path = kh.get_runner()._khiops_path

    # If khiops_path is identical to $CONDA_PREFIX/bin, then return True
    conda_prefix = os.environ.get("CONDA_PREFIX")
    return conda_prefix is not None and os.path.join(conda_prefix, "bin") == os.path.dirname(khiops_path)

# CONDA_DEFAULT_ENV and CONDA_PROMPT_MODIFIER
# are the only usable env variables that let us know we are in a conda env
conda_default_env = os.environ.get("CONDA_DEFAULT_ENV")
return conda_default_env and conda_default_env.endswith("_conda")
Copy link
Collaborator

@popescu-v popescu-v Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .endswith leaks CI details in the tests (as the _conda env names for Conda-installed khiops-core are a CI detail).

The proposal from the previous comment avoids this.

def setUp(self):
self.skip_if_no_config()
if self.is_in_a_conda_env() and self.should_skip_in_a_conda_env():
# TODO : Temporary skip the test under a conda environment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase TODO comment to something like
# TODO: Temporary skip the test for Conda-installed khiops-core

@@ -177,17 +251,21 @@ def test_train_predictor_fail_and_log_with_remote_access(self):
log_file_path = fs.get_child_path(
self._khiops_temp_dir, f"khiops_log_{uuid.uuid4()}.log"
)

# no cleaning required as an error would raise without any result produced
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "an error would raise" to "an exception would be raised".


runner.samples_dir = f"s3://{bucket_name}/khiops-cicd/samples"
resources_directory = KhiopsTestHelper.get_resources_dir()
# WARNING : khiops temp files cannot be remote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line before the comment (to have paragraphs).

resources_directory = KhiopsTestHelper.get_resources_dir()
# WARNING : khiops temp files cannot be remote
cls._khiops_temp_dir = f"{resources_directory}/tmp/khiops-cicd"
# whereas root_temp_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "whereas".

Add empty line before the beginning of the comment (to have the commented block as a paragraph).

runner.samples_dir = f"gs://{bucket_name}/khiops-cicd/samples"
cls._khiops_temp_dir = f"gs://{bucket_name}/khiops-cicd/tmp"
resources_directory = KhiopsTestHelper.get_resources_dir()
# WARNING : khiops temp files cannot be remote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line before comment.

resources_directory = KhiopsTestHelper.get_resources_dir()
# WARNING : khiops temp files cannot be remote
cls._khiops_temp_dir = f"{resources_directory}/tmp/khiops-cicd"
# whereas root_temp_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "whereas". Add empty line before the first line of the comment.

Copy link
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reactivate the S3 and GCS filesystem integration tests GCS tests do not completely clean themselves
3 participants