Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(dp-server): simplify by removing dataplane_callbacks #12890

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

lahabana
Copy link
Contributor

Motivation

Dataplane callbacks was an abstraction causing more complexity then
anything.

Implementation information

We now have a watchdog_callbacks which start the lifecycle and the
watchdog
We carefully ensure (and test) that there always a single watchdog running at once
even when DP reconnects.

Supporting documentation

part of #12881

Part of kumahq#12881

Signed-off-by: Charly Molter <charly.molter@konghq.com>
@lahabana lahabana requested a review from a team as a code owner February 19, 2025 14:40
@lahabana lahabana changed the title Fix/12881 2 refactor(dp-server): simplify by removing dataplane_callbacks Feb 19, 2025
@lahabana lahabana requested a review from jijiechen February 19, 2025 14:41
Copy link
Contributor

Reviewer Checklist

🔍 Each of these sections need to be checked by the reviewer of the PR 🔍:
If something doesn't apply please check the box and add a justification if the reason is non obvious.

  • Is the PR title satisfactory? Is this part of a larger feature and should be grouped using > Changelog?
  • PR description is clear and complete. It Links to relevant issue as well as docs and UI issues
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as an image registry)
  • IPv6 is taken into account (.e.g: no string concatenation of host port)
  • Tests (Unit test, E2E tests, manual test on universal and k8s)
    • Don't forget ci/ labels to run additional/fewer tests
  • Does this contain a change that needs to be notified to users? In this case, UPGRADE.md should be updated.
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label)

Dataplane callbacks was an abstraction causing more complexity then
anything.
We now have a watchdog_callbacks which start the lifecycle and the
watchdog
We carefully ensure (and test) that there always a single watchdog running at once
even when DP reconnects.

part of kumahq#12881

Signed-off-by: Charly Molter <charly.molter@konghq.com>
Signed-off-by: Charly Molter <charly.molter@konghq.com>
Signed-off-by: Charly Molter <charly.molter@konghq.com>
Signed-off-by: Charly Molter <charly.molter@konghq.com>
Signed-off-by: Charly Molter <charly.molter@konghq.com>
}
t.activeStreams[streamID].proxyInfo = pInfo
ctx := t.activeStreams[streamID].ctx
pInfo.Lock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the fact that the long running cleanup runs with this lock means that 2 connections to the same DP may lead to holding the global lock for longer.

// Watchdog should be run only once for given DP regardless of the number of streams.
// For ADS there is only one stream for DP.
//
// We keep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something is missing in this comment

defer t.Unlock()
dataplaneSyncLog.V(1).Info("stream is open", "streamID", streamID)
if _, found := t.activeStreams[streamID]; found {
return errors.Errorf("streamID %d is already tracked, we should never reopen a stream", streamID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to return error in this case? Docs says OnStreamClosed will be called:

	// OnStreamOpen is called once an xDS stream is opened with a stream ID and the type URL (or "" for ADS).
	// Returning an error will end processing and close the stream. OnStreamClosed will still be called.
	OnStreamOpen(context.Context, int64, string) error

so we're probably risking closing this stream no?

pInfo.Unlock()
return // We are not the last stream, we don't care
}
if sInfo.proxyInfo.cancelFunc != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you released the t.Unlock() you can't use sInfo as it could be accessed for example from the OnStreamRequest:

t.activeStreams[streamID].proxyInfo = pInfo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here sInfo is a copy to the value in the map, right? So this statement should still work even the proxyInfo was re-assigned in the map from the method OnStreamRequest.

l := dataplaneSyncLog.WithValues("dpKey", dpKey, "streamID", streamID)
t.Lock()
if t.activeStreams[streamID].proxyInfo != nil {
return nil // We fast return if we already know that this streamID is tracking a specific dataplane
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing t.Unlock() above this line

defer close(pInfo.done)
l.V(1).Info("watchdog started")
if t.dpWatchdogFactory != nil {
t.dpWatchdogFactory.New(dpKey, pInfo.meta.Load).Start(pInfo.ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the purpose of atomic.Pointer here. Isn't it the same as

go func(metaPtr *core_xds.DataplaneMetadata) {
	defer close(pInfo.done)
	l.V(1).Info("watchdog started")
	if t.dpWatchdogFactory != nil {
		t.dpWatchdogFactory.New(dpKey, metaPtr).Start(pInfo.ctx)
	}
}(metadata)

assuming of course t.dpWatchdogFactory.New accepts *DataplaneMetadata instead of func() *DataplaneMetadata. In this case, pointer will be copied as value so you don't need to access pInfo field anymore

@lahabana lahabana added this to the 2.10.x milestone Feb 24, 2025
@lahabana lahabana self-assigned this Feb 24, 2025
@lahabana lahabana marked this pull request as draft February 24, 2025 15:33
@lahabana
Copy link
Contributor Author

Switched to draft. Removing this would be good but things have changed a little so this PR needs to be reworked

}

func (t *dataplaneSyncCallbacks) OnStreamClosed(streamID core_xds.StreamID) {
t.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many lock/unlock invocations in this method. I'm thinking how can we improve it to make it cleaner.

@jijiechen
Copy link
Member

I like this change. It simplifies things so much.

@lahabana lahabana modified the milestones: 2.10.x, 2.11.x Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants