Skip to content

Use /checkpoints instead of events parsing #312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

timofeev1995
Copy link

@timofeev1995 timofeev1995 commented May 8, 2025

Have you read the Contributing Guidelines?

Describe your changes

  • Replace /events-based logic with /checkpoints-based logic for fine_tuning.list_checkpoints()
  • Fix inconsistencies with merge_lora=false and full-sft jobs.

@@ -551,7 +594,7 @@ def list_events(self, id: str) -> FinetuneListEvents:

return FinetuneListEvents(**response.data)

def list_checkpoints(self, id: str) -> List[FinetuneCheckpoint]:
def list_checkpoints_from_events(self, id: str) -> List[FinetuneCheckpoint]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete the old function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@mryab mryab self-requested a review May 8, 2025 17:02
@@ -551,7 +594,7 @@ def list_events(self, id: str) -> FinetuneListEvents:

return FinetuneListEvents(**response.data)

def list_checkpoints(self, id: str) -> List[FinetuneCheckpoint]:
def list_checkpoints_from_events(self, id: str) -> List[FinetuneCheckpoint]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

checkpoint_path = checkpoint["path"]
step = checkpoint["step"]

is_final = int(step) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, this feels a bit brittle — what if for some reason we did save a checkpoint at step 0? I wonder if we can add a new field to the FinetuneCheckpoint that indicates that this is a final checkpoint

if checkpoint_path.endswith("_adapter"):
checkpoint_type = "Final Adapter"
else:
checkpoint_type = "Final Merged" if had_adapters else "Final"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more correct to either detect this from the training type or move the checkpoint naming to the backend side (because you'll have to make 2 requests otherwise)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants