Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/opamp]: Custom Message Support #32281

Merged

Conversation

BinaryFissionGames
Copy link
Contributor

@BinaryFissionGames BinaryFissionGames commented Apr 9, 2024

Description:

  • Adds support for utilizing custom capabilities with the opamp extension

Link to tracking Issue: #32021

Testing:

  • Unit tests

Documentation:
Added some docs for usage to the opamp extension README

@BinaryFissionGames BinaryFissionGames changed the title Feat/opamp extension custom messages [extension/opamp]: Custom messages Apr 9, 2024
@BinaryFissionGames BinaryFissionGames changed the title [extension/opamp]: Custom messages [extension/opamp]: Custom Message Support Apr 9, 2024
@github-actions github-actions bot added the cmd/otelcontribcol otelcontribcol command label Apr 10, 2024
@BinaryFissionGames BinaryFissionGames force-pushed the feat/opamp-extension-custom-messages branch from 523146c to 780e1c1 Compare April 10, 2024 16:12
@BinaryFissionGames BinaryFissionGames marked this pull request as ready for review April 10, 2024 20:18
@BinaryFissionGames BinaryFissionGames requested a review from a team April 10, 2024 20:18
@crobert-1
Copy link
Member

correctness-traces test failure is frequency of #27295, I've added a reference there.

@BinaryFissionGames BinaryFissionGames force-pushed the feat/opamp-extension-custom-messages branch from 6d98bd7 to adbb28d Compare April 15, 2024 14:52
@djaglowski
Copy link
Member

This looks good to me at a high level but I found the naming of CustomCapability to be a bit confusing because half of the implementation is actually in the CustomCapabilityCallback. It seems like we sort of have a sender and a listener pattern going on here, so I wonder if it would be clearer to wrap the callback into a CustomCapabilityListener and move the unregister mechanism there as well (since it basically says, "I'm no longer listening").
Roughly:

type CustomCapabilityRegistry interface {
	Register(capability string, listener CustomCapabilityListener) (CustomCapabilitySender, error)
}

type CustomCapabilityListener interface {
	ReceiveMessage(*protobufs.CustomMessage)
	Done() <-chan struct{}
}

type CustomCapabilitySender interface {
	SendMessage(messageType string, message []byte) (messageSendingChannel chan struct{}, err error)
}

@BinaryFissionGames
Copy link
Contributor Author

I do like how that partitions the responsibilities and makes it a little clearer, I'll take a stab at refactoring to fit that interface and see what it's like.

@BinaryFissionGames BinaryFissionGames force-pushed the feat/opamp-extension-custom-messages branch from f173210 to ff81b43 Compare April 17, 2024 02:32
@BinaryFissionGames
Copy link
Contributor Author

@djaglowski I've refactored to fit that interface, mind taking another look?

@djaglowski
Copy link
Member

Based on an offline conversation I think something like the previous approach was cleaner, though I think we need a better name for the struct which is returned by registering.

@BinaryFissionGames
Copy link
Contributor Author

Alright, I think this is good to look at now.

After discussing with @djaglowski we decided on having Register return an extra function to unregister, instead of lumping it in an interface with SendMessage; The naming (CustomCapability) was confusing and it didn't feel like those two things went together.

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @djaglowski for giving this a review. Just a few small things so far, I want to discuss the external API before going any further, now that I'm really looking closely at this.

extension/opampextension/README.md Outdated Show resolved Hide resolved
extension/opampextension/README.md Outdated Show resolved Hide resolved
extension/opampextension/custom_messages.go Show resolved Hide resolved
extension/opampextension/registry_test.go Outdated Show resolved Hide resolved
@evan-bradley
Copy link
Contributor

Regarding the custom message API, it seems to me we have two actors here:

  1. The OpAMP extension, which serves as a registry.
  2. The consuming component, which implements a custom capability.

Could we design API around this and encapsulate the functionality for each actor in separate interfaces? Here's how I would expect the API could look, and how a component may use it:

type CustomCapabilityRegistry interface {
	Register(capability string) (cm CustomMessageHandler err error)
}

// Open to suggestions for other names, but this is essentially a
// contact point between the component and the OpAMP connection.
type CustomMessageHandler interface {
	Message() <-chan *protobufs.CustomMessage
	SendMessage(messageType string, message []byte) (messageSendingChannel chan struct{}, err error)
	Unregister()
}

type myComponent struct{
	cmh CustomMessageHandler
	shutdown chan struct{}
}

func (m *myComponent) Start(host component.Host) {
	// Get registry, cast it, etc.
	registry := host.GetExtensions()["myOpAMPExtension"].(CustomCapabilityRegistry)

	m.cmh, _ = registry.Register("myComponent")
}

func (m *myComponent) handleAsyncOps() {
	switch {
	case msg := <-m.cmh.Message():
	case <-m.shutdown:
		break
	}
}

func (m *myComponent) doOperation() {
	m.cmh.SendMessage("operation", []byte{})
}

func (m *myComponent) Shutdown() {
	m.cmh.Unregister()
	close(m.shutdown)
}

I see a few advantages here:

  1. Allows extending the CustomCapabilityHandler interface in the future if desired.
  2. From the component's perspective, keeps all the CustomMessage functionality contained in a single struct.
  3. Allows the component to handle messages alongside any other asynchronous operation it handles. This could be a disadvantage for simpler components since it requires a little bit of extra setup.

@BinaryFissionGames
Copy link
Contributor Author

I see a few advantages here:

1. Allows extending the `CustomCapabilityHandler` interface in the future if desired.

2. From the component's perspective, keeps all the CustomMessage functionality contained in a single struct.

3. Allows the component to handle messages alongside any other asynchronous operation it handles. This could be a disadvantage for simpler components since it requires a little bit of extra setup.

Thanks for this feedback! I do like the extensibility here. I think this also solves this awkwardness where you want to receive a message, then respond to it. Before it would be something like:

var sender opampextension.CustomMessageSender
sender, _ = registry.Register("cap", func(cm *protobufs.CustomMessage){
  // Process message
  sender.SendMessage("type", []byte("some response message"))
})

It's honestly not perfectly clear to me whether the above avoids a data race or not.

With this interface it's a little more well ordered and less awkward, IMO:

handler, _ = registry.Register("cap")

for {
  cm <- handler.Message()
  // Process message
  handler.SendMessage("type", []byte("some response message")
}

The only downside I see is what happens when a component misbehaves and ends up not listening to the message channel?

  1. We could just send down the channel as normal, and block until the component receives the message. The issue here is that one misbehaving component can block all other components from properly working, and freeze up the OpAMP extension as well. I think we should protect against this.
  2. We could spin of a goroutine for each send, which essentially buffers the custom message in memory until it can be sent. Downside is that memory is technically unbounded now, since we spin up goroutines that may just sit still if a component is not reading from the channel. The other downside here is that messages may be received out-of-order (e.g. if you have 10 messages queued and blocked on the channel sent, the next one sent is random IIRC).
  3. Set some fixed buffer size for the channel, and drop any messages if the buffer is full. Downside here is that messages may be dropped, which is definitely something we don't want to do.
  4. Have some unbounded queue. Adding to this queue is asynchronous, and preserves order of the messages. There is a separate goroutine that pops the queue and moves the message into the channel, making the recieve non-blocking. Downside is memory is unbounded, and complexity.

It's probably good to note that this would only be in the case where the component is misbehaving. Also, the implementation as it is now can spawn goroutines that takes up memory forever if a component ends up blocking them, so maybe unbounded memory isn't that big of a deal in these cases.

Also, now that I'm thinking of it, in-order message processing isn't even guaranteed in the current design, because the spun-off go routines might not execute in the order they were spawned.

I'm thinking either 2 or 4 are going to be our best options on how to handle it, and I think initially implementing 2 would be fine, then upgrading to 4 when there is a need for the messages to be well-ordered.

I'll try and rework this PR to use this suggested interface, because I do think it has the potential to fix a few issues and leave us open to adding things if needed later without any breaking changes.

@evan-bradley
Copy link
Contributor

evan-bradley commented Apr 19, 2024

Thanks for the detailed analysis.

I'm thinking either 2 or 4 are going to be our best options on how to handle it, and I think initially implementing 2 would be fine, then upgrading to 4 when there is a need for the messages to be well-ordered.

I think that 2 and 4 are probably more powerful solutions if we could provide hard limits somewhere, but for now I would advocate for 3. We could probably offer this alongside some user-facing configuration to tune the size of the buffer. If we need to allocate limited resources somewhere, I feel that the Collector's pipelines are probably generally a more important place to put resources than custom messages, with configuration options to allow users to change this if they have different priorities.

@BinaryFissionGames
Copy link
Contributor Author

@evan-bradley I've refactored this to fit the interface you've suggested now.

I think all your feedback from the previous iteration should be resolved now, and this is ready for another pass.

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

extension/opampextension/README.md Show resolved Hide resolved
extension/opampextension/custom_messages.go Show resolved Hide resolved
extension/opampextension/custom_messages.go Outdated Show resolved Hide resolved
extension/opampextension/registry.go Show resolved Hide resolved
extension/opampextension/registry.go Outdated Show resolved Hide resolved
extension/opampextension/registry.go Outdated Show resolved Hide resolved
extension/opampextension/registry_test.go Outdated Show resolved Hide resolved
@BinaryFissionGames
Copy link
Contributor Author

I've incorporated your feedback @evan-bradley - One additional change I made is that I unexported the WithMaxQueuedMessages method, because there is a linter that only allows a single exported method in a component. This will be unexported when we move the custom_messages.go to it's own separate module.

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Nice work!

extension/opampextension/registry.go Outdated Show resolved Hide resolved
@BinaryFissionGames BinaryFissionGames force-pushed the feat/opamp-extension-custom-messages branch from e1674bd to 5166988 Compare May 3, 2024 14:02
@evan-bradley evan-bradley merged commit c160941 into open-telemetry:main May 3, 2024
162 checks passed
@github-actions github-actions bot added this to the next release milestone May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants