Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init: PluginFactory support #1

Merged
merged 8 commits into from
Nov 4, 2022
Merged

init: PluginFactory support #1

merged 8 commits into from
Nov 4, 2022

Conversation

yaauie
Copy link
Contributor

@yaauie yaauie commented Apr 7, 2022

Release notes

Introduced PluginFactory support adapter, to allow plugins to safely instantiate inner plugins while maintaining execution context.

What does this PR do?

Some plugins operate by wrapping one or more separately-maintained plugins,
and do so by instantiating them directly. This can be problematic because
the pipeline's execution context is not propagated to the inner plugin, which
can cause features that rely on execution context such as pipeline settings
to not work as expected.

This support adapter introduces a PluginFactory, that can be used to
instantiate inner plugins that are pre-contextualized with the outer plugin's
execution context.

By including this support adapter into a plugin, we ensure that the plugin
has a LogStash::Plugin#plugin_factory, which can be used to load
contextualized proxies for plugin classes, which in turn can be instantiated
in the normal way, replacing the direct instantiation of the plugin class:

   def register
     grok_args = {}

-    @inner = ::LogStash::Filters::Grok.new(grok_args)
+    @inner = plugin_factory.filter('grok').new(grok_args)
  end

As with our other support adapters, this reaches into Logstash internals to
effectively back-port functionality from new Logstash core to plugins when
they are run on older Logstashes, and is expected to be API-stable with the
feature as-implemented in Logstash core.

  • When this module is included into a LogStash::Plugin, LogStash::Plugin#plugin_factory returns a PluginFactory, which supports
    • PluginFactory#input(input_name) -> PluginClassProxy#new(params={}) -> input instance
    • PluginFactory#output(output_name) -> PluginClassProxy#new(params={}) -> output instance
    • PluginFactory#codec(codec_name) -> PluginClassProxy#new(params={}) -> codec instance
    • PluginFactory#filter(filter_name) -> PluginClassProxy#new(params={}) -> filter instance

Why is it important/What is the impact to the user?

Inner plugins need access to the effective execution context in order to properly respect settings like pipeline.ecs_compatibility.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

How to test this PR locally

A separate PR will be made against the HTTP Input plugin, which embeds codecs, so that they can properly respect the pipeline-level settings with regard to ECS compatibility

Related issues

Related: elastic/logstash#13978

Closes

Closes #2

Some plugins operate by wrapping one or more separately-maintained plugins,
and do so by instantiating them directly. This can be problematic because
the pipeline's execution context is not propagated to the inner plugin, which
can cause features that rely on execution context such as pipeline settings
to not work as expected.

This support adapter introduces a PluginFactory, that can be used to
instantiate inner plugins that are pre-contextualized with the outer plugin's
execution context.

By including this support adapter into a plugin, we ensure that the plugin
has a `LogStash::Plugin#plugin_factory`, which can be used to load
contextualized proxies for plugin classes, which in turn can be instantiated
in the normal way, replacing the direct instantiation of the plugin class:

~~~ diff
   def register
     grok_args = {}

-    @inner = ::LogStash::Filters::Grok.new(grok_args)
+    @inner = plugin_factory.filter('grok').new(grok_args)
  end
~~~
When instantiating an inner plugin without an explicit id, ensure that
a meaningful distinct id is used that includes the parent plugin's id
and context about the inner plugin's nature.
Comment on lines 50 to 59
INIT_MUTEX = Mutex.new
private_constant :INIT_MUTEX

##
# @return [PluginFactory]
def plugin_factory
@_plugin_factory || INIT_MUTEX.synchronize do
@_plugin_factory ||= PluginFactory.new(self)
end
end
Copy link
Contributor Author

@yaauie yaauie Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review note:

I'm not overly-fond of introducing a global lock here, and hadn't planned on having our PluginFactory being permanently referenced from a plugin after being used, but I found both justified in order to provide meaningful default distinct id generation that has a unique sequence number scoped to the outer plugin.

  • the cost of an outer plugin retaining a PluginFactory is very small, since it effectively contains only a single reference to the outer plugin itself and a single AtomicLong.
  • the scope of locking is very small (single simple object instantiation) and is avoided after an outer plugin has successfully acquired a single PluginFactory.

If we do not want this lock, or want to free the PluginFactory as soon as it is used, we could use a single global sequence id generator to ensure the distinctness of ids, but the distinct number at the tail would become global and no longer related to the id of the outer plugin.

class PluginFactory
  sequence_id = java.util.concurrent.atomic.AtomicLong.new(0)
  SEQUENCE_GENERATOR = proc { sequence_id.increment_and_get }
  private_constant :SEQUENCE_GENERATOR
  # ...
  private
  def next_sequence_id
    SEQUENCE_GENERATOR.call
  end
end

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect plugins to define multiple inner plugins of the same type?

And if so, is it unreasonable to mandate that they provide a unique (hopefully meaningful) id to disambiguate between these inner plugins, rather than introducing/requiring a global lock to generate a sequence number?

Also, are we expecting the id to show up anywhere other than in the debug logs at creation? As far as I could tell, that was the only place it shows up, seeing as the inner plugin does not show up in the monitoring API..

Copy link
Contributor Author

@yaauie yaauie Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect plugins to define multiple inner plugins of the same type?

Codecs come to mind, but I don't think we have reasonable expectations either way.

is it unreasonable to mandate that they provide a unique (hopefully meaningful) id to disambiguate between these inner plugins, rather than introducing/requiring a global lock to generate a sequence number?

I would rather do so in one testable place than require each consumer to handle their own logic. In the above comment I provided a way to have a lock-free global sequence number, which would make it very difficult for a user to use the factory incorrectly (no implicit collisions). Another way would be to have "#{outer_id}/inner-#{inner_type}-#{inner-name}@#{SecureRandom.uuid}".

Also, are we expecting the id to show up anywhere other than in the debug logs at creation?

Maybe? I can't remember the specifics of logging context, but I believe that the plugin id is part of the logger context (or should be?). I would hope that a plugin instantiated with an execution context would get wired up to all of the metrics at some point in the future, even if it isn't now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense.

I think I was a little surprised that there was nothing to contextualize the inner plugin with the outer plugin when supplying an id, compared to what we see with no id specified. Maybe some logging to state that an inner plugin is being created and to what plugin it belongs to might suffice

By avoiding mocking our outer instance's `Plugin#execution_context`,
we eliminate an rspec-internal race condition that occurs when tracking
calls to that mock from many threads simultaneously.
Copy link

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff! I have a couple of comments inline, plus a more general one - do we have any sort of plan to have inner plugins somehow visible in the monitoring API, and/or a mechanism to publish their stats?

.travis.yml Outdated
- logstash-plugins/.ci:travis/travis.yml@1.x
matrix:
include:
- env: ELASTIC_STACK_VERSION=7.9.3 # last release pre-introduction of core Contextualizer
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the end of life matrix, 7.9.3 is no longer supported - does it make sense to add the extra complexity for an unsupported version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify and require 7.17, which is the only 7.x that is not EOL.

Doing so will require users to be on 7.17 to consume fixes to any plugins that rely on this feature, but will also simplify the adapter and specs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently 7.13+ is still supported through February. 🤷

I'm easy either way on this, but I tend to prefer simplification if possible

s.name = 'logstash-mixin-plugin_factory_support'
s.version = '1.0.0'
s.licenses = %w(Apache-2.0)
s.summary = "Support for the Plugin Factory introduced in Logstash 8.3, for plugins wishing to use this API on older Logstashes"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks outdated, unless I'm missing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. This was written when I was hoping to target a companion core-support in 8.3. Good catch. I'll reword.

Comment on lines 50 to 59
INIT_MUTEX = Mutex.new
private_constant :INIT_MUTEX

##
# @return [PluginFactory]
def plugin_factory
@_plugin_factory || INIT_MUTEX.synchronize do
@_plugin_factory ||= PluginFactory.new(self)
end
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect plugins to define multiple inner plugins of the same type?

And if so, is it unreasonable to mandate that they provide a unique (hopefully meaningful) id to disambiguate between these inner plugins, rather than introducing/requiring a global lock to generate a sequence number?

Also, are we expecting the id to show up anywhere other than in the debug logs at creation? As far as I could tell, that was the only place it shows up, seeing as the inner plugin does not show up in the monitoring API..

# @param params [Hash{String=>Object}]
# @return [LogStash::Plugin]
def new(params={})
params_with_id = params.include?('id') ? params : params.merge('id' => generate_inner_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to scope this provided id underneath the outer id? Or provide an option to do so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid changing the semantics of explicitly-given parameters, and right now this PluginProxy<type.downcase,name.downcase>#new behaves identically to its counterpart LogStash::#{type}s::#{name}::new method for all explicit parameters.

I consider providing an implicit useful default to be non-breaking of expectations.

We could require an explicit id parameter, but that would push the complexity to calling code where it is likely to be implemented in different ways by different callers, typically with less contextual help than we can offer by generating a default in a consistent way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now logs a bread-crumb linking the inner plugin id with its outer plugin id, at debug-level because this really is an implementation detail of the outer plugin and I would like to avoid flooding info-level since excessive info-level noise tends to distract people who are digging into unrelated problems.

Copy link

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yaauie yaauie merged commit d980523 into logstash-plugins:main Nov 4, 2022
@yaauie yaauie deleted the init branch November 4, 2022 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ESF] Allow plugins to safely instantiate inner plugins while maintaining execution context
3 participants