Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compacting body filters #357

Merged
merged 4 commits into from
Oct 10, 2018

Conversation

nsmolenskii
Copy link
Contributor

@nsmolenskii nsmolenskii commented Oct 7, 2018

Description

Add body filters to compact most common content types

Motivation and Context

Custom HttpLogForrmatters (different from JsonHttpLogFormatter) have also need to minify message body. To reduce code duplication from formatter to formatter, it could be extracted as delegating body filters.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

}
}

private Document documentWithoutTextNodes(final String body) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems rather inefficient, a custom sax or stax parser should have much less overhead.

There are also some security implication when parsing the request payload, see https://en.wikipedia.org/wiki/XML_external_entity_attack and https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#JAXP_DocumentBuilderFactory.2C_SAXParserFactory_and_DOM4J for how to prevent this in java.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with a fact, that sax/stax parsing will be more effective than dom in general.
Unfortunately, potential benefit from introducing custom parser with supporting rules will be miserable, due to fact of parsing already allocated in memory string .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had applied recommended changes, according to XXE

@nsmolenskii nsmolenskii force-pushed the add-compacting-body-filters branch from 541b54b to 9b10a7d Compare October 7, 2018 16:33
@whiskeysierra
Copy link
Collaborator

To reduce code duplication from formatter to formatter, it could be extracted as delegating body filters.

But from what I see there is nothing shared, is there?

@nsmolenskii nsmolenskii mentioned this pull request Oct 8, 2018
6 tasks
@nsmolenskii
Copy link
Contributor Author

nsmolenskii commented Oct 8, 2018

Well, it shared not as part of every single formatter, but as part of infrastructural body filters.
E.g.:

  • you have json http log formatter, but some of your endpoints are xml based, you will receive minified xml, as soon as xml compacting body filter will be configured.
  • you introduces another http log formatter (like add key value http logger #358) and you don't have to repeat logic to minify json or xml, or your custom format. It's only about json/xml compacting body filter have to be configured.

@whiskeysierra
Copy link
Collaborator

Well, it shared not as part of every single formatter, but as part of infrastructural body filters.

I failed to realize that you moved the compacting from the formatter to the filter.

We could think about removing it from the JSONHttpLogFormatter in favor of registering those two filters by default, but that would be a breaking change. Maybe something for 2.x

}

private boolean shouldCompact(@Nullable final String contentType, final String body) {
return contentTypes.test(contentType) && heuristic.isProbablyJson(body);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&& !jsonCompactor.isCompacted(body)?

}

/**
* @return configured {@link DocumentBuilderFactory} against XML External Entity (XXE).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the links provided by Jörn here as @see tags?


@Override
public String filter(@Nullable final String contentType, final String body) {
return contentTypes.test(contentType) ? compact(body) : body;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cheaply detect whether the body may not even need compacting, e.g. when it doesn't contain a new line?

@nsmolenskii
Copy link
Contributor Author

I failed to realize that you moved the compacting from the formatter to the filter.

We could think about removing it from the JSONHttpLogFormatter in favor of registering those two filters by default, but that would be a breaking change. Maybe something for 2.x

Due to backward compatibility (JsonHttpLogFormatter marked as STABLE), I was not going into that direction myself. But I also find it would be a proper change.

@nsmolenskii nsmolenskii force-pushed the add-compacting-body-filters branch from 04c271b to 2beb3ed Compare October 8, 2018 09:36
}

private boolean shouldCompact(final String body) {
return Stream.of("<?xml", "\n", " ").anyMatch(body::contains);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xml declaration is redundant and should be covered by the content type check, shouldn't it? Spaces would be generally ok, since the whole log message would still fit into a single line.

I'd just change this in the same way as the json compactor:

return body.indexOf('\n') != -1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite possible to have xml prolog inside of the message, but do not have \n inside.
So, to not check <?xml potentially means to keep it in compacted version (due to compact skip)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if there are no newlines, what exactly would you expect to happen when compacting? Removing other whitespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I got your point. We have different points of view to minimisation. I consider extra whitespaces and xml prolog as too verbose in logs. But I could leave with, cause single line is much more important

@nsmolenskii
Copy link
Contributor Author

So do we have some open tasks, preventing that PR ?

@whiskeysierra whiskeysierra merged commit 987e390 into zalando:master Oct 10, 2018
@nsmolenskii nsmolenskii deleted the add-compacting-body-filters branch October 11, 2018 06:03
@whiskeysierra
Copy link
Collaborator

I released your changes as 1.10.0. 🎉

@whiskeysierra whiskeysierra mentioned this pull request Nov 18, 2018
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants