Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement restart feature #268

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Implement restart feature #268

wants to merge 6 commits into from

Conversation

Nunkyl
Copy link
Contributor

@Nunkyl Nunkyl commented Mar 21, 2024

Implements the optimisation restart feature. During optimisation the state of the optimiser object will be continuously saved to disk. If optimisation fails at some point it will be possible to continue the optimisation process starting with the last saved configuration, instead of starting over. Right now the function was implemented for PopulationalOptimizer and EvoGraphOptimizer classes

Implementation details

The optimiser object's state will automatically always get saved to a .pkl file
The path is /default_data_dir/saved_state_path/run_id/timestamp.pkl, where
default_data_dir - default_data_dir() from golem.core.paths
saved_state_path - folder path inside default_data_dir (its default value can be changed by the user)
run_id - a unique id generated for every run
timestamp - the timestamp of when the file is written to disk

By default the state will be saved every 60 seconds. This can be changed in the save_state_delta parameter in optimise()

To restore saved state -> while creating an optimiser object (i.e. EvoGraphOptimizer) set class parameter use_saved_state to True and specify the path to the file if necessary (otherwise the last available file will be used)

New params in the optimiser class:

  • use_saved_state [optional] true or false
  • saved_state_path [optional] path to location of files (string)
  • saved_state_file [optional] path to a specific file that will be used for restoration (string)

New parameter in the optimise() function:

  • save_state_delta [optional] the amount of seconds to wait before saving the next file with the state (int), default = 60

Examples:

# Save state with using default params (no changes needed)
optimiser = EvoGraphOptimizer(objective, initial_population, requirements, gen_params, algo_params)  
optimiser.optimise(objective)

# Save state in specific location
optimiser = EvoGraphOptimizer(objective, initial_population, requirements, gen_params, algo_params, saved_state_path=saved_state_path)  
optimiser.optimise(objective)

# Save state every 5 min 
optimiser = EvoGraphOptimizer(objective, initial_population, requirements, gen_params, algo_params)  
optimiser.optimise(objective, save_state_delta = 300)

# Restore state from the default location
optimiser = EvoGraphOptimizer(objective, initial_population, requirements, gen_params, algo_params, use_saved_state=True)  
optimiser.optimise(objective)

# Restore state from a specific folder
optimiser = EvoGraphOptimizer(objective, initial_population, requirements, gen_params, algo_params, 
                              use_saved_state=True, saved_state_path=saved_state_path)  
optimiser.optimise(objective)

# Restore state from a specific file
optimiser = EvoGraphOptimizer(objective, initial_population, requirements, gen_params, algo_params, 
                              use_saved_state=True, saved_state_path=saved_state_path, saved_state_file=full_file_path)  
optimiser.optimise(objective)

Important:

  • While the saved state is restored all settings for the optimisation will be taken from the saved state file except for two:

    • GraphRequirements.timeout
    • GraphRequirements.num_of_generations

    It is possible to change their values for the second run of the optimiser

  • A new run using saved state will write data to the same folder the saved state file is in

@pep8speaks
Copy link

pep8speaks commented Mar 21, 2024

Hello @Nunkyl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 65:121: E501 line too long (124 > 120 characters)

Comment last updated at 2024-03-22 11:33:01 UTC

@Nunkyl Nunkyl requested review from maypink and YamLyubov March 21, 2024 09:42
@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2024

Codecov Report

Attention: Patch coverage is 68.42105% with 30 lines in your changes are missing coverage. Please review.

Project coverage is 72.76%. Comparing base (68706be) to head (bc97b62).

Files Patch % Lines
golem/core/optimisers/populational_optimizer.py 55.35% 25 Missing ⚠️
golem/core/optimisers/optimizer.py 78.26% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #268      +/-   ##
==========================================
- Coverage   72.88%   72.76%   -0.12%     
==========================================
  Files         140      140              
  Lines        8338     8409      +71     
==========================================
+ Hits         6077     6119      +42     
- Misses       2261     2290      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maypink
Copy link
Collaborator

maypink commented Mar 31, 2024

насчет сохранения каждые 60 секунд -- будто бы время это не оптимальный критерий для определения интервала сохранения, лучше делать это в поколениях. Например, каждое или каждое пятое. Можно делать выбор исходя из времени, затраченного на первое поколение, -- если оно большое, то сохранять каждое последующее, если нет -- каждое N

@@ -34,35 +34,42 @@ def __init__(self,
requirements: GraphRequirements,
graph_generation_params: GraphGenerationParams,
graph_optimizer_params: GPAlgorithmParameters,
use_saved_state: bool = False,
saved_state_path: str = 'saved_optimisation_state/main/evo_graph_optimiser',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'saved_optimisation_state/main' повторяется, можно вынести в отдельную константу. и вообще все строки

if os.path.isfile(saved_state_file):
current_saved_state_path = saved_state_file
else:
raise SystemExit('ERROR: Could not restore saved optimisation state: '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

можно наверно просто писать лог, мол начать с сохраненного состояния не удалось, оптимизация начинается с нуля

saved_state_path, **custom_optimizer_params)

# Restore state from previous run
if use_saved_state:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

лучше вынести всю restore optimisation related логику в отдельный приватный метод

@@ -108,10 +165,26 @@ def optimise(self, objective: ObjectiveFunction) -> Sequence[Graph]:
break
# Adding of new population to history
self._update_population(new_population)
delta = datetime.now() - last_write_time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

отдельный метод, код будет более читаемым

Comment on lines +173 to +177
if self.use_saved_state:
bar = tqdm(total=self.requirements.num_of_generations, desc='Generations', unit='gen',
initial=self.current_generation_num - 2)
else:
bar = tqdm(total=self.requirements.num_of_generations, desc='Generations', unit='gen', initial=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

У этого класса нет атрибута self.current_generation_num и self.use_saved_state. Можно не менять код этой функции, а в PopulationalOptimizer в optimize сетить нужное значение. Что-то типа:

pbar.n = self.current_generation_num 
pbar.refresh() 

Comment on lines +80 to +81
assert time1 > 2
assert time2 < 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Эта проверка не совсем ясна - почему именно 1 и 2?

initial_population = [generate_labeled_graph('tree', 5, node_types) for _ in range(10)]

# Setup optimization parameters
requirements_run_1 = GraphRequirements(timeout=timedelta(minutes=timeout),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Действительно ли нужно создавать два отдельных объекта?

@donRumata03
Copy link
Collaborator

делать выбор исходя из времени, затраченного на первое поколение, -- если оно большое, то сохранять каждое последующее, если нет -- каждое N

В идеале, наверное, сохранять фиксированную небольшую долю времени, затрачиваемого на сохранение.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants