Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached Size Implementation #7387

Merged
merged 6 commits into from
Feb 1, 2021
Merged

Cached Size Implementation #7387

merged 6 commits into from
Feb 1, 2021

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Jan 26, 2021

Description

Here's a hard problem in a garbage collected language like Go: we need to know the exact size that an object occupies in memory, so we can perform accurate memory tracking for our caches. This issue in the official Go repo provides many examples of why this is exceedingly complicated, and as evidenced by the fact that the issue has been closed by the maintainers, it doesn't seem like the authors of Go are looking to implement a way to reliably do this in the language runtime.

It is then our time to get dirty. After many discussions with @systay, we agreed on an approach that I've implemented in this PR: a code generator, very similar to the one we use to automatically generate the walker code for our AST, but that instead generates a helper method for all the structures that can possibly fit in cache.

This PR is composed of two commits: the first one checks in the code generator and the second one checks in all the generated code.

My goals, in order of importance, are as follows:

  1. Getting feedback on the structure and correctness of the generated code
  2. Getting feedback on the design of the codegenerator
  3. Merging the (for now unused) codegen & generated code so I can resume my cache implementation on top of it

Notes on the codegen

As you can see, the generated code is rather comprehensive. The output that has been checked in has been generated from running the following command:

~/src/vitess
$ go run go/tools/sizegen/sizegen.go -in ./go/vt/... -gen vitess.io/vitess/go/vt/vtgate/engine.Plan -gen vitess.io/vitess/go/vt/vttablet/tabletserver.TabletPlan

The input to the sizegen tool are the base types that are going to be stored in the cache. In our case, this is engine.Plan and tabletserver.TabletPlan. From there, the generator builds a full AST of Vitess, typechecks all our packages and dependencies, and starts traversing the type information, starting from just the Plan and TabletPlan structs, to find every single field that can be reached from our root structs and generate a CachedSize helper for each one of them.

The generated code is written separately to every package in go/vt that contains cacheable data structures, in a file called cached_size.go. This is required so we can obtain fully accurate type information even for private fields in the structs.

The generator is smart and outputs optimal code: structs that are PODs do not get helper methods because their size is known a-priori. Pointers are always checked before being dereferenced. Slices take into account the capacity of their backing arrays, and maps use implementation internals to a get a down-to-the-byte count of how much is the map implementation using in backing data structures. For all the Interfaces that are stored in our cacheable structs, we use type inference to find all the structs in the codebase that could possibly implement such interface, and generate helper methods for all of them. Lastly, the generator takes special care to distinguish between fields that are embedded in structs and fields that are pointed at, to ensure that non-allocated data is not double counted.

The generator is also fully reproducible (particularly regarding to order), so that successive runs of the codegen always output the same files unless the underlying code has changed.

As far as I can tell, there are no outstanding issues or missing features on the codegen, but there may be correctness bugs, since there is a significant amount of complexity in the data structures we're storing on cache. I look forward to your feedback.

Related Issue(s)

#7304 is the WIP PR for the new cache implementation, where I explain in the detail the reason why accurate memory usage assessments for the cache are important.

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build
  • VTAdmin

cc @systay -- would you mind CCing whoever else you think would be interested on this?

@harshit-gangal
Copy link
Member

vmg and others added 5 commits January 29, 2021 12:08
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
These two features are now handled exceptionally:

- Generated files that don't attempt to perform interface casts no
  longer have a local interface definition for `cachedObject`
- Generated functions that use `unsafe.Pointer` for hashmap size
  calculations are now properly flagged with a pragma to prevent a false
positive in the compiler `ptrcheck` pass.
- Abstracted the out filesystem to make things easier to test

Signed-off-by: Vicent Marti <vmg@strn.cat>

Signed-off-by: Andres Taylor <andres@planetscale.com>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants