Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add form-url-encoded body filter #366

Merged

Conversation

nsmolenskii
Copy link
Contributor

Description

Add form url encoded body filter

Motivation and Context

Form url encoded request is quite often used during oauth authentication.
It will be nice to secure such request by default and also provide configurable filter for general purpose.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@@ -34,6 +35,14 @@ public static BodyFilter accessToken() {
return replaceJsonStringProperty(properties, "XXX");
}

@API(status = MAINTAINED)
public static BodyFilter oauthRequest() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that standardized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to client-credentials and password flows those parameters looks quite standardised. Also I think it makes sense to add filtering out refresh_token form accessToken() cause it in the end have much higher risks that access_token

Copy link
Contributor Author

@nsmolenskii nsmolenskii Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to w3.org: x-ww-form-urlencoded should follow RFC-1738, that describes URL formats.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think it makes sense to add filtering out refresh_token form accessToken() cause it in the end have much higher risks that access_token

Agreed!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC-1738, that describes URL formats.

That one doesn't describe query parameters, does it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find anything query parameter related in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that rfc is quite unreadable. Therefore there is a part describes encoding part as well as format of HTTP URL

Unsafe:

   Characters can be unsafe for a number of reasons.  The space
   character is unsafe because significant spaces may disappear and
   insignificant spaces may be introduced when URLs are transcribed or
   typeset or subjected to the treatment of word-processing programs.
   The characters "<" and ">" are unsafe because they are used as the
   delimiters around URLs in free text; the quote mark (""") is used to
   delimit URLs in some systems.  The character "#" is unsafe and should
   always be encoded because it is used in World Wide Web and in other
   systems to delimit a URL from a fragment/anchor identifier that might
   follow it.  The character "%" is unsafe because it is used for
   encodings of other characters.  Other characters are unsafe because
   gateways and other transport agents are known to sometimes modify
   such characters. These characters are "{", "}", "|", "\", "^", "~",
   "[", "]", and "`".

   All unsafe characters must always be encoded within a URL. For
   example, the character "#" must be encoded within URLs even in
   systems that do not normally deal with fragment or anchor
   identifiers, so that if the URL is copied into another system that
   does use them, it will not be necessary to change the URL encoding.
; HTTP

httpurl        = "http://" hostport [ "/" hpath [ "?" search ]]
hpath          = hsegment *[ "/" hsegment ]
hsegment       = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
search         = *[ uchar | ";" | ":" | "@" | "&" | "=" ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example RFC2396 mentioned in java.net.URL provides almost same idea

2. URI Characters and Escape Sequences

   URI consist of a restricted set of characters, primarily chosen to
   aid transcribability and usability both in computer systems and in
   non-computer communications. Characters used conventionally as
   delimiters around URI were excluded.  The restricted set of
   characters consists of digits, letters, and a few graphic symbols
   were chosen from those common to most of the character encodings and
   input facilities available to Internet users.

      uric          = reserved | unreserved | escaped

   Within a URI, characters are either used as delimiters, or to
   represent strings of data (octets) within the delimited portions.
   Octets are either represented directly by a character (using the US-
   ASCII character for that octet [ASCII]) or by an escape encoding.
   This representation is elaborated below.
3.4. Query Component

   The query component is a string of information to be interpreted by
   the resource.

      query         = *uric

   Within a query component, the characters ";", "/", "?", ":", "@",
   "&", "=", "+", ",", and "$" are reserved.

Copy link
Contributor Author

@nsmolenskii nsmolenskii Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, how I personally understand it. x-www-form-urlencoded bodies should follow same RFCs, those describe URL format. Both (URI and URL) RFCs describes same for query|search part.

Copy link
Contributor

@maksymgendin maksymgendin Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, what about refresh_token? Did you consciously decide not to obfuscate it? I just stumbled upon the fact that the refresh token appears in our logs....

Currently I would need to configure a custom BodyFilter bean:

@Bean
BodyFilter bodyFilter() {
    return Stream.concat(
                    StreamSupport.stream(ServiceLoader.load(BodyFilter.class).spliterator(), false),
                    Stream.of(BodyFilters.oauthRequest())
            )
            .reduce(
                    BodyFilters.replaceFormUrlEncodedProperty(Set.of("refresh_token"), "XXX"),
                    BodyFilter::merge
            );
}


return (contentType, body) -> json.test(contentType) ?
pattern.matcher(body).replaceAll("$1\"" + replacement + "\"") : body;
return (contentType, body) -> formUrlEncoded.test(contentType) ? delegate.filter(body) : body;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a private, shared method that takes a Predicate and a UnaryOperator and produces a BodyFilter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see some reasons to do that - ternary operator quite readable itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's a quite common pattern, almost suggested by the BodyFilter API since it passes the content type for exactly that reason.

public static BodyFilter replaceFormUrlEncodedQuery(final Set<String> properties, final String replacement) {
final Predicate<String> formUrlEncoded = MediaTypeQuery.compile("application/x-www-form-urlencoded");

final QueryFilter delegate = properties.stream()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works since the query string and the form bodies are encoded in the same way. Should we extract this parsing/transformation from query filter into a separate, shared unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x-www-form-urlencoded has same format as normal query. I think it is quite OK, to just delegate to query filters.

* @return BodyFilter generated
*/
@API(status = MAINTAINED)
public static BodyFilter replaceFormUrlEncodedQuery(final Set<String> properties, final String replacement) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop the Query from the name.

@whiskeysierra whiskeysierra merged commit 35d2ea2 into zalando:master Oct 15, 2018
@whiskeysierra
Copy link
Collaborator

Thanks for the contribution. I just released it as 1.11.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants