Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI tests for develop/v2 sometimes fail #412

Open
KanaiYuma-aist opened this issue Dec 19, 2024 · 0 comments
Open

CI tests for develop/v2 sometimes fail #412

KanaiYuma-aist opened this issue Dec 19, 2024 · 0 comments

Comments

@KanaiYuma-aist
Copy link
Collaborator

KanaiYuma-aist commented Dec 19, 2024

Describe the bug
develop/v2 の CI テストがまれに失敗することがあります。
tests/hpo/apps/main/test_resume.py に原因があるようです。

Additional context
以下、pytest 実行時のログを記載します。

=========================================== test session starts ===========================================
platform linux -- Python 3.12.3, pytest-8.3.4, pluggy-1.5.0 -- /home/member/hpopt/aiaccelv2/document_warning/aiaccel_env/bin/python3
cachedir: .pytest_cache
rootdir: /home/member/hpopt/aiaccelv2/document_warning/aiaccel
configfile: pyproject.toml
plugins: mock-3.14.0, hydra-core-1.3.2, github-actions-annotate-failures-0.2.0, subprocess-1.5.2, cov-6.0.0
collected 4 items                                                                                         

tests/hpo/apps/main/test_resume.py::test_optimization_consistency FAILED                            [ 25%]
tests/hpo/apps/main/test_resume.py::test_normal_execution PASSED                                    [ 50%]
tests/hpo/apps/main/test_resume.py::test_resumable_execution PASSED                                 [ 75%]
tests/hpo/apps/main/test_resume.py::test_resume_execution PASSED                                    [100%]

================================================ FAILURES =================================================
______________________________________ test_optimization_consistency ______________________________________

temp_dir = PosixPath('/tmp/tmp09_6mxtj')

    def test_optimization_consistency(temp_dir: Path) -> None:
        """Test that split execution (resumable + resume) gives same results as normal execution.
    
        Test steps:
        1. Run 30 trials in normal mode:    python optimize.py objective.sh --config config.yaml
        2. Run 15 trials in resumable mode: python optimize.py objective.sh --config config.yaml --resumable                                                                                                          
        3. Run 15 trials in resume mode:    python optimize.py objective.sh --config config.yaml --resume
    
        Assertions:
        - Both executions should complete 30 trials
        - The best values from both executions should be nearly identical (within 1e-6)
        """
        from aiaccel.hpo.apps.optimize import main
    
        # Use different database files for normal and split execution
        normal_db = "normal_storage.db"
        split_db = "split_storage.db"
    
        # Normal execution with 30 trials
        study_name_normal = f"test_study_{uuid.uuid4().hex[:8]}"
        normal_config = modify_config(temp_dir / "config.yaml", study_name_normal, 30, normal_db)
    
        with patch("sys.argv", ["optimize.py", "objective.sh", "--config", str(normal_config)]):
            main()
    
        normal_results = get_trial_values(temp_dir / normal_db, study_name_normal)
        assert len(normal_results) == 30, "Normal execution should have 30 trials"
        normal_best = min(normal_results)
    
        trial_count = get_trial_count(temp_dir / normal_db, study_name_normal)
>       assert trial_count == 30
E       assert 31 == 30

/home/member/hpopt/aiaccelv2/document_warning/aiaccel/tests/hpo/apps/main/test_resume.py:146: AssertionError
------------------------------------------ Captured stdout setup ------------------------------------------

=== Content of config.yaml ===
storage:
  _target_: optuna.storages.RDBStorage
  url: sqlite:///aiaccel_storage.db
  engine_kwargs:
    connect_args:
      timeout: 30

study:
  _target_: optuna.create_study
  direction: minimize
  storage: ${storage}
  study_name: my_study
  load_if_exists: false
  sampler:
    _target_: optuna.samplers.TPESampler
    seed: 0

params:
  _convert_: partial
  _target_: aiaccel.hpo.apps.optimize.HparamsManager
  x1: [0, 1]
  x2:
    _target_: aiaccel.hpo.optuna.suggest_wrapper.SuggestFloat
    name: x2
    low: 0.0
    high: 1.0
    log: false

n_trials: 30
n_max_jobs: 1

group: gaa50000

========================================

=== Content of objective.sh ===
#!/bin/bash

#$-l rt_C.small=1
#$-cwd

source /etc/profile.d/modules.sh
module load gcc/13.2.0
module load python/3.10/3.10.14

python objective_for_test.py $@

========================================

=== Content of objective_for_test.py ===
from argparse import ArgumentParser
from pathlib import Path
import pickle as pkl


def main() -> None:
    parser = ArgumentParser()
    parser.add_argument("dst_filename", type=Path)
    parser.add_argument("--x1", type=float)
    parser.add_argument("--x2", type=float)
    args = parser.parse_args()

    x1, x2 = args.x1, args.x2

    y = (x1**2) - (4.0 * x1) + (x2**2) - x2 - (x1 * x2)

    with open(args.dst_filename, "wb") as f:
        pkl.dump(y, f)


if __name__ == "__main__":
    main()

========================================
------------------------------------------ Captured stderr call -------------------------------------------
[I 2024-12-19 14:15:48,329] A new study created in RDB with name: test_study_12d9cf0b
========================================= short test summary info =========================================
FAILED tests/hpo/apps/main/test_resume.py::test_optimization_consistency - assert 31 == 30
====================================== 1 failed, 3 passed in 10.95s =======================================
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant