Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context & result of filters except for text_filters aren't overwritten on call time #402

Closed
nekketsuuu opened this issue Apr 15, 2024 · 3 comments · Fixed by #403
Closed

Comments

@nekketsuuu
Copy link

The doc of HTMLPipeline#call says that its context parameter is passed to each filter:

# Apply all filters in the pipeline to the given HTML.
#
# html - A UTF-8 String comprised of HTML.
# context - The context hash passed to each filter. See the Filter docs
# for more info on possible values. This object MUST NOT be modified
# in place by filters. Use the Result for passing state back.
# result - The result Hash passed to each filter for modification. This
# is where Filters store extracted information from the content.
#
# Returns the result Hash after being filtered by this Pipeline. Contains an
# :output key with the String HTML markup based on the
# output of the last filter in the pipeline.
def call(text, context: {}, result: {})

But in fact, the context is only passed to each text filters, and not passed to a convert filter and all node filters:

filter.call(doc, context: context, result: result)

html = @convert_filter.call(text)

result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)

So we can pass context to a convert filter and node filters only when a pipeline is constructed by HTMLPipeline.new, and cannot pass it to them when the pipeline is called.

On html-pipeline v2, the context hash is actually passed into all filters:

@filters.inject(html) do |doc, filter|
perform_filter(filter, doc, context, result)
end

Is this an intended change? If so, how about:

  1. rewriting a document of HTMLPipeline#call to reflect the current behavior, or
  2. adding context and result parameters to call methods of convert filters and node filters?

For option 2, changing the interface is a bit difficult because for node filters we should modify @context of given node filters before passing the filters to Selma::Rewriter#new.

Environment: html-pipeline v3.1.1

@gjtorikian
Copy link
Owner

Hey, thanks for the thorough report. Just acking this and I'll look into it soon!

@gjtorikian
Copy link
Owner

Hi, I opened #402 to address this. In summary: no, this should not have been removed. Context should be able to be passed at call time to each filter, too. That way you set up one pipeline and it renders the same content in different ways, depending on the context hash at the time (or the global context you set up when first defining the pipeline).

In any case, take a look at that branch and let me know if that works for your use case.

@nekketsuuu
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants