Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-026 Home Database Cache #618

Open
wants to merge 7 commits into
base: 5.0
Choose a base branch
from

Conversation

StephenCathcart
Copy link
Contributor

Added support for Bolt 5.8, which introduces a couple of useful features. The server now echoes back the resolved home/default database when starting a transaction, along with a connection hint indicating whether server-side routing (SSR) is enabled. If SSR is on, the echoed database can be used with an optimistic home database cache to predict a client-side route (optimistic routing), falling back to SSR if the guess is wrong. This can eliminate a round-trip, but only if all the following conditions are met:

  • Server-side routing (SSR) is enabled
  • Server supports Bolt 5.8
  • A brand-new session (not yet used) starts its first transaction (auto-commit or explicit) without an explicitly set target database

@@ -507,3 +514,21 @@ func (p *Pool) deactivateWriter(serverName string, db string) {
p.log.Debugf(log.Pool, p.logId, "Deactivating writer %s for database %s", serverName, db)
p.router.InvalidateWriter(db, serverName)
}

func (p *Pool) updateCacheState() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally correct as far as I can tell. But this has optimization potential. The server could instead track the number of connections with and without SSR using two atomics. This is faster beacuse

  • not holding a lock on the whole server while iterating over all connections
  • not performing the update operation after every pool borrow/return, but only when a connection is being opened/closed

@@ -295,12 +302,12 @@ type sessionRouter interface {
// note: bookmarks are lazily supplied, only when a new routing table needs to be fetched
// this is needed because custom bookmark managers may provide bookmarks from external systems
// they should not be called when it is not needed (e.g. when a routing table is cached)
GetOrUpdateReaders(ctx context.Context, bookmarks func(context.Context) ([]string, error), database string, auth *idb.ReAuthToken, boltLogger log.BoltLogger) ([]string, error)
GetOrUpdateReaders(ctx context.Context, bookmarks func(context.Context) ([]string, error), database string, auth *idb.ReAuthToken, boltLogger log.BoltLogger, useHomeDbGuess bool) ([]string, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential for refactoring: the useHomeDbGuess flag is probably better called isHomeDbGuess or just guessedDb or such. Further, I think it would go great together with the database string into a single struct as this data is always passed around together until eventually it's broken apart and only the string (or "") is passed further down the stack.

const DefaultCacheMaxSize = 1000

// cachePruneFraction defines the fraction of entries to prune when the cache exceeds its size.
const cachePruneFraction = 0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ADR states that (and how) cachePruneFraction should depend on the size of the cache. Currently, the cache size is fixed, but the cache is written in a way that it can easily be changed from the outside (NewCache takes a maxSize).

In order for this relation between the numbers not to get lost, I recommend to do one of the following:

  • compute the prune factor dynamically in NewCache
  • make NewCache not take a maxSize, but instead always use DefaultCacheMaxSize. In this case, you can pre-compute cachePruneFraction and use a constant like it's right now. When using this option, I suggest to leave a comment at the const definition that explains how to update the value should DefaultCacheMaxSize be changed (i.e., put the formula there).

Comment on lines +90 to +92
if !c.IsEnabled() {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the code right, when SSR is disabled/unknown, the cache is not updated even if we learn about about a user -> homeDD mapping. We decided to track the homeDBs regardless of SSR availability (s. ADR).

I thought there's a TestKit test for that 👀 If not, could you add one?

}

// InvalidateDatabase removes all entries that reference a specific database.
func (c *Cache) InvalidateDatabase(database string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: dead code (unused method)

@@ -725,7 +800,7 @@ func (s *sessionWithContext) getServerInfo(ctx context.Context) (ServerInfo, err
if err := s.resolveHomeDatabase(ctx); err != nil {
return nil, errorutil.WrapError(err)
}
_, err := s.getOrUpdateServers(ctx, idb.ReadMode)
_, err := s.getOrUpdateServers(ctx, idb.ReadMode, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getServerInfo, is no exception from this new feature. It, too, can and should make use of the home db cache.

Worth writing a TestKit test for, I think 📝

N.B. technically, verifyAuthentication should also make use of the cache. But the way it's exposed to the user this function will only be called with the session configured with "system" as DB. But that again is making assumptions cross concern boundaries, which is a recipe for regressions (spooky action at a distance).

return
}

key, _ := computeCacheKey(ctx, s.auth, s.cache, s.config.ImpersonatedUser)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 I spy with my little eye a swallowed error. Is this on purpose? Do we want to continue with a key that might be ""?

Comment on lines +1394 to +1396
if v, ok := authTokenMap["scheme"].(string); ok {
scheme = v
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring potential: 4 times basically the same 3 lines. Something along the lines of

func mapGetString(data map[string]any, key string) string {
	out, _ := data[key].(string)
	return out
}

// It returns a list of servers, a boolean indicating whether the home database guess was used, and an error if resolution fails.
func (s *sessionWithContext) getServerList(ctx context.Context, mode idb.AccessMode) ([]string, bool, error) {
if s.config.DatabaseName != "" {
if err := s.resolveHomeDatabase(ctx); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we calling resolveHomeDatabase when there is an explicit database configured on the session?

key, _ := computeCacheKey(ctx, s.auth, s.cache, s.config.ImpersonatedUser)
s.cache.Set(key, database)

s.log.Debugf(log.Session, s.logId, "Resolved home database, uses db '%s'", database)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know I like myself some good logging 😄 Optional suggestion: a bit more logging in the session. E.g., when we decide to explicitly resolve the home database, ... It's up to you and ofc. should't become too noisy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants