Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate periodic loops into one function updating Den and updating autostop. #873

Conversation

rohinb2
Copy link
Contributor

@rohinb2 rohinb2 commented Jun 6, 2024

autostop.

Copy link
Contributor Author

rohinb2 commented Jun 6, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @rohinb2 and the rest of your teammates on Graphite Graphite

@rohinb2 rohinb2 force-pushed the 06-04-Save_information_about_active_function_calls branch from 9cd3695 to d53931e Compare June 6, 2024 21:56
@rohinb2 rohinb2 force-pushed the 06-04-Consolidate_periodic_loops_into_one_function_updating_Den_and_updating_autostop branch 2 times, most recently from 3cfdc6a to 20d04a5 Compare June 6, 2024 23:19
@rohinb2 rohinb2 marked this pull request as ready for review June 6, 2024 23:20
@rohinb2 rohinb2 force-pushed the 06-04-Save_information_about_active_function_calls branch from d53931e to 61bcf59 Compare June 6, 2024 23:20
@rohinb2 rohinb2 force-pushed the 06-04-Consolidate_periodic_loops_into_one_function_updating_Den_and_updating_autostop branch from 20d04a5 to baeb9bd Compare June 6, 2024 23:20
Comment on lines 251 to 253
print("hello")
await asyncio.sleep(STATUS_CHECK_DELAY)
print("delayed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ding

@@ -659,3 +671,25 @@ def contents(self, name_or_path, full_paths):
return folder(name=name_or_path, path=folder_url).resources(
full_paths=full_paths
)

def send_status(self, status: ResourceStatusData, cluster_rns_address: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to catch all other errors so any failures here don't nuke the loop or cluster servlet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess async doesn't rly matter because its in its own thread, but can for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to catch all other errors so any failures here don't nuke the loop or cluster servlet.

The cluster servlet has everything in a try: except: so it'll catch it

@@ -260,63 +246,45 @@ async def aclear_all_references_to_env_servlet_name(self, env_servlet_name: str)
# Cluster status functions
##############################################

async def asend_status_info_to_den(self):
async def aperiodic_status_check(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny name

Comment on lines +29 to +32
system_cpu_usage: float
system_memory_usage: Dict[str, Any]
system_disk_usage: Dict[str, Any]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the GPU info go?

Comment on lines +7 to +9
def __init__(self):
self._last_activity = time.time()
self._last_register = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the last_registered last_activity stuff anymore. We can just set the activity directly in SkyPilot, there's no reason to set it lazily like we did before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah currently we also do set_last_active_time_to_now in get_env_servlet_name_for_key. This happens quite often, we don't need to re-set it everytime, and the loop is always running, so we can just do it there.

if self.cluster_config.get("den_auth", False) and configs.token:
logger.debug("Creating send_status_info_to_den thread.")
if self.cluster_config.get("den_auth", False):
logger.info("Creating aperiodic_status_check thread.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this has to be on purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha

Copy link
Contributor

@dongreenberg dongreenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid, could be simplified a little and some extra fault tolerance

@rohinb2 rohinb2 force-pushed the 06-04-Consolidate_periodic_loops_into_one_function_updating_Den_and_updating_autostop branch from baeb9bd to 544fab0 Compare June 7, 2024 22:24
Copy link
Contributor Author

rohinb2 commented Jun 8, 2024

Merge activity

  • Jun 7, 8:13 PM EDT: @rohinb2 started a stack merge that includes this pull request via Graphite.
  • Jun 7, 8:14 PM EDT: Graphite rebased this pull request as part of a merge.
  • Jun 7, 8:15 PM EDT: @rohinb2 merged this pull request with Graphite.

Base automatically changed from 06-04-Save_information_about_active_function_calls to main June 8, 2024 00:13
@rohinb2 rohinb2 force-pushed the 06-04-Consolidate_periodic_loops_into_one_function_updating_Den_and_updating_autostop branch from 544fab0 to 65473ec Compare June 8, 2024 00:14
@rohinb2 rohinb2 merged commit deacb8e into main Jun 8, 2024
11 of 12 checks passed
@rohinb2 rohinb2 deleted the 06-04-Consolidate_periodic_loops_into_one_function_updating_Den_and_updating_autostop branch June 8, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants