Default nginx pipelines should decode url.original field #19088

SlavikCA · 2020-06-10T04:33:42Z

Describe the enhancement:

Describe a specific use case for the enhancement or feature:

Here is how url.original field currently looks in Kibana:
/A%20Beka%20G1%20Howe/029_AND_30/15%20reading%20elephants.mp4

Here is how it should look:
/A Beka G1 Howe/009/17 Reading Elephants.mp4

Here is the original record in the Nginx access log:

lessons.example.com 192.168.0.1 - - [09/Jun/2020:12:10:39 -0700] "GET /A%20Beka%20G1%20Howe/029_AND_30/15%20reading%20elephants.mp4 HTTP/1.1" 206 7648063 "http://lessons.example.com/A%20Beka%20G1%20Howe/029_AND_30/15%20reading%20elephants.mp4" "Mozilla/5.0 (Linux; Android 5.1.1; KFFOWI) AppleWebKit/537.36 (KHTML, like Gecko) Silk/81.2.16 like Chrome/81.0.4044.138 Safari/537.36"

So, that example fixes the issue of HTML-encoded character, such as spaces.

Here is one more example, where decoding is needed to non-english characters:

Here is how url.original field currently looks in Kibana:
/%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B0%D1%8F%20%D1%88%D0%BA%D0%BE%D0%BB%D0%B0%20-%20InternetUrok%201%D0%BA%D0%BB%D0%B0%D1%81%D1%81/

Here is how it should look:
/Русская школа - InternetUrok 1класс/

Here is the original record in the Nginx access log:

lessons.example.com 192.168.0.1 - - [09/Jun/2020:21:31:51 -0700] "GET /%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B0%D1%8F%20%D1%88%D0%BA%D0%BE%D0%BB%D0%B0%20-%20InternetUrok%201%D0%BA%D0%BB%D0%B0%D1%81%D1%81/ HTTP/1.1" 200 894 "http://lessons.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"

Issue is the same for Nginx access and error pipelines (message field)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-06-10T09:03:42Z

Pinging @elastic/integrations-services (Team:Services)

legoguy1000 · 2021-03-23T23:51:12Z

@andrewkroh I'm working on a PR for this but trying to decide if the url components should be decoded or not. I guess it depends on if we're trying to show what the user entered into the browser/the link or what the browser/web server converted it to in order to remove special characters. Thoughts?

andrewkroh · 2021-03-25T12:24:07Z

Based on my interpretation of ECS, I think url.original should remain unchanged from the original source. @elastic/ecs, wdyt?

The uri_parts processor should automatically URL decode the parts of the URI as described in https://docs.oracle.com/javase/7/docs/api/java/net/URI.html.

The getUserInfo, getPath, getQuery, getFragment, getAuthority, and getSchemeSpecificPart methods decode any escaped octets in their corresponding components. The strings returned by these methods may contain both other characters and illegal characters, and will not contain any escaped octets.

You'll see in its source that it doesn't use the getRaw* accessors.

So I would decode the same components as the uri_parts processor. I'd even see if you could use that processor, but I think it may require a more complete URI to work. So you may have to use urldecode and split it the parts with an alternative means. Or maybe reconstruct the url (assuming the necessary info is present) for url.full and then decode it and then run uri_parts.

legoguy1000 · 2021-03-25T15:25:41Z

looks like the raw java code u posted does URL decode it and works with incomplete URLs. So I feel that the processor should work. Going to continue to play with it.

legoguy1000 · 2021-03-25T16:33:46Z

the uri_parts processor is working however it populates the uri.original with the raw un-decoded value so I've added a separate urldecode processor for url.original and http.request.referrer. I've update Apache and Nginx and am looking at the other modules that may not be fully decoding/parsing urls.

legoguy1000 · 2021-03-25T19:48:57Z

PR has been updated, please take a look to see if we think it meets the intent of the issue & ECS standard.

legoguy1000 · 2021-03-27T21:38:11Z

I updated 15 Modules that i could find that have url.* data and have sample data to validate the changes. Marked the PR ready for review

legoguy1000 · 2021-04-20T01:50:33Z

@SlavikCA Please see the conversation in #24699. It was decided amongst the group and the ECS authors that per the spec, the url.original field should remain exactly as it was recorded by the system generating the log/event. However by adding the parts processor, the urls will be broken down into components and those will be url decoded.

SlavikCA · 2021-04-20T03:15:13Z

@legoguy1000 That's good solution. Do you think we can expect it in 7.13?

legoguy1000 · 2021-04-20T03:20:04Z

I don't know what the cutoff date for 7.13 is but the ci pipeline should be done in a couple hours and based off my conversations with the elastic devs, I think it should be good to merge once it passes.

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 10, 2020

SlavikCA mentioned this issue Jun 10, 2020

fixes #19088 urldecode for nginx access url.original and nginx error … #19090

Closed

2 tasks

ChrsMark added the Team:Services (Deprecated) Label for the former Integrations-Services team label Jun 10, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 10, 2020

andresrc added [zube]: Inbox [zube]: In Review and removed [zube]: Inbox labels Jun 10, 2020

sayden added the Filebeat Filebeat label Jun 16, 2020

This was referenced Jul 10, 2020

fixes #19088: urldecode nginx.access url.original and nginx.error message #19816

Closed

fixes #19088 urldecode nginx.access url.original and nginx.error message #19917

Closed

legoguy1000 mentioned this issue Mar 23, 2021

[Filebeat] Add URI Parts Processor to multiple modules #24699

Merged

6 tasks

andrewstucki closed this as completed in #24699 Apr 27, 2021

zube bot added [zube]: Done and removed [zube]: In Review labels Apr 27, 2021

andrewstucki mentioned this issue Apr 27, 2021

Cherry-pick #24699 to 7.x: [Filebeat] Add URI Parts Processor to multiple modules #25353

Merged

6 tasks

zube bot removed the [zube]: Done label Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default nginx pipelines should decode url.original field #19088

Default nginx pipelines should decode url.original field #19088

SlavikCA commented Jun 10, 2020 •

edited

Loading

elasticmachine commented Jun 10, 2020

legoguy1000 commented Mar 23, 2021

andrewkroh commented Mar 25, 2021

legoguy1000 commented Mar 25, 2021

legoguy1000 commented Mar 25, 2021 •

edited

Loading

legoguy1000 commented Mar 25, 2021

legoguy1000 commented Mar 27, 2021 •

edited

Loading

legoguy1000 commented Apr 20, 2021

SlavikCA commented Apr 20, 2021

legoguy1000 commented Apr 20, 2021

Default nginx pipelines should decode url.original field #19088

Default nginx pipelines should decode url.original field #19088

Comments

SlavikCA commented Jun 10, 2020 • edited Loading

elasticmachine commented Jun 10, 2020

legoguy1000 commented Mar 23, 2021

andrewkroh commented Mar 25, 2021

legoguy1000 commented Mar 25, 2021

legoguy1000 commented Mar 25, 2021 • edited Loading

legoguy1000 commented Mar 25, 2021

legoguy1000 commented Mar 27, 2021 • edited Loading

legoguy1000 commented Apr 20, 2021

SlavikCA commented Apr 20, 2021

legoguy1000 commented Apr 20, 2021

SlavikCA commented Jun 10, 2020 •

edited

Loading

legoguy1000 commented Mar 25, 2021 •

edited

Loading

legoguy1000 commented Mar 27, 2021 •

edited

Loading