Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[usage] List workspaces for each workspace instance in usage period #10495

Merged
merged 1 commit into from
Jun 9, 2022

Conversation

easyCZ
Copy link
Member

@easyCZ easyCZ commented Jun 7, 2022

Description

We first compute all workspace instances in a given period, for each WorkspaceID, we fetch the underlying Workspace from DB to have the necessary information to proceed to determining the owner (cost center)

Related Issue(s)

Fixes #

How to test

Unit tests

Release Notes

NONE

Documentation

NONE

@easyCZ easyCZ force-pushed the mp/usage-validate-instances branch from be1a7ce to 9fa1c65 Compare June 7, 2022 12:02
@easyCZ easyCZ force-pushed the mp/usage-list-workspaces branch from 08a44d6 to acdb029 Compare June 7, 2022 12:04
@easyCZ easyCZ force-pushed the mp/usage-validate-instances branch 3 times, most recently from 38183b3 to bb77bfb Compare June 8, 2022 07:30
@easyCZ easyCZ force-pushed the mp/usage-list-workspaces branch 2 times, most recently from ec61948 to 036774b Compare June 8, 2022 08:15
Base automatically changed from mp/usage-validate-instances to main June 8, 2022 09:24
@easyCZ easyCZ force-pushed the mp/usage-list-workspaces branch from 036774b to 971cd31 Compare June 8, 2022 09:25
@easyCZ easyCZ marked this pull request as ready for review June 8, 2022 09:25
@easyCZ easyCZ requested a review from a team June 8, 2022 09:25
@github-actions github-actions bot added the team: webapp Issue belongs to the WebApp team label Jun 8, 2022
@easyCZ easyCZ force-pushed the mp/usage-list-workspaces branch 4 times, most recently from 1040954 to 379bc08 Compare June 8, 2022 11:51
Comment on lines +88 to +91
type workspaceWithInstances struct {
Workspace db.Workspace
Instances []db.WorkspaceInstance
}
Copy link
Contributor

@jankeromnes jankeromnes Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: I wonder if we actually want to load into memory the "fully-hydrated" shapes of both Workspace and WorkspaceInstance.

In the past, we've had issues with memory consumption, because we were keeping full shapes in memory, while only a few fields were actually relevant. This is especially true for workspace instances, because there are so many of them.

A usage record will probably look something like this:

export type WorkspaceUsageRecord = {
    instanceId: string;
    workspaceId: string;
    userId: string;
    projectId?: string;
    teamId?: string;
    fromTime: string;
    toTime: string;
};

So, in my mind, the most efficient flow to get these usage records is:

  1. Query only instanceId, workspaceId, creationTime, stoppedTime from d_b_workspace_instance (for the current period) -- all the other fields are unnecessary
  2. Trim the time bounds between beginningOfMonth and min(getCurrentTime(), endOfMonth) (also set the toTime to the maximum bound when there is no stoppedTime yet)
  3. For each unique workspaceId query only the owner userId and possible projectId from d_b_workspace
  4. For each userId, query only the possible teamId (randomly pick one team for now)

If we query and then load the full Workspace and WorkspaceInstance shapes into memory, and start passing them around a lot, I worry that we'll have a bad time (because many workspace instances are created every month, and that number is rapidly increasing over time).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this makes sense. But I don't think it makes sense to do this now.

For me, this is an optimisation once we have something working. For now, I'd prefer to waste memory to get to a working version and then measure what is in fact the cost of these queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, thanks. (I don't really see why it's better to ship a less optimized version first when we know how to easily write a more optimized one, but I also agree about shipping this sooner than later, and maybe my suggestion could already be considered premature optimization). So, fine to defer this to later. 👍

Copy link
Member Author

@easyCZ easyCZ Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely because I think the current handling will go away anyway once we do add attribution to the WSI. If we optimized here, it would then be wasted work because it would also be replaced by the attribution work from WSI.

I don't really see why it's better to ship a less optimized version first when we know how to easily write a more optimized one

We don't have any hard data to know it is in fact an optimisation. Suboptimal, but with data is more valuable than optimal but without baseline metrics.

@easyCZ easyCZ force-pushed the mp/usage-list-workspaces branch from 379bc08 to 63e98ca Compare June 8, 2022 20:20
Copy link
Contributor

@jankeromnes jankeromnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks @easyCZ!

With the unit tests, this seems to be working as intended. 🎉 I couldn't resist to add a few more thoughts on (premature?) optimization, mostly because I generally find it easier to design an efficient algorithm from scratch (minimal data footprint, limited iterations) than to optimize existing complex code, but you're right -- the current code might already go away when we attribute instances on start. Someone clever once said, "one of the most expensive mistakes engineers make is to optimize something that shouldn't exist in the first place".

Anyway, so many words to say, thanks for the progress here! Looking forward to the next iterations. 🚀

workspaceIDs = append(workspaceIDs, instance.WorkspaceID)
}

workspaces, err := db.ListWorkspacesByID(ctx, u.conn, toSet(workspaceIDs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the IDs unique! I would have preferred to make the collection a unique "set" before all of the appends, in order to iterate just once over all instances (not once, then again in toSet to create the map, then another time over the unique IDs to convert them back to a list) -- but okay for quick iterations without worrying too much about complexity this early. 🛹

if err != nil {
return nil, fmt.Errorf("failed to load workspaces for workspace instances in time range: %w", err)
}
status.Workspaces = len(workspaces)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, at this point we have:

  • a collection of all "fully hydrated" instances
  • a collection of all "fully hydrated" workspaces, each with an attached sub-collection of all its corresponding "fully hydrated" instances (although these will just be references to a single instance in memory, right, not a full copy of the object?)

I'm still slightly uneasy with loading so many large objects into memory, and iterating over them multiple times. So, I'm looking forward to seeing exactly how we'll use these collections, and see if we can later reduce the amount of used memory and the number of iterations. 👀

@roboquat roboquat merged commit 9121279 into main Jun 9, 2022
@roboquat roboquat deleted the mp/usage-list-workspaces branch June 9, 2022 06:42
@roboquat roboquat added deployed: webapp Meta team change is running in production deployed Change is completely running in production labels Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: webapp Meta team change is running in production deployed Change is completely running in production release-note-none size/L team: webapp Issue belongs to the WebApp team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants