Add new fields file.origin_referrer_url & file.origin_url #2348

AsuNa-jp · 2024-06-27T08:19:48Z

Checklist

Have you signed the contributor license agreement?
- YES
Have you followed the contributor guidelines?
- YES
For proposing substantial changes or additions to the schema, have you reviewed the RFC process?
- Not substantial changes
If submitting code/script changes, have you verified all tests pass locally using make test?
- Not code/script changes changes
If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes?
- YES
Is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
- YES
Have you added an entry to the CHANGELOG.next.md?
- YES

PR summary

This PR adds file.origin_referrer_url and file.origin_url

To reviewers

Size of the fields
The two fields added in this PR are intended to store URL. Therefore, the default field size of 1024 bytes is insufficient. As a result, we want to set ignore_above to 8k(8192) bytes, if possible.

Default Field
This field will not necessarily be included in all file events.
Therefore, I set it as default_field: false, but please let me know if this is incorrect.

mjwolf

I think this looks generally pretty good, just some small things to fix up

CHANGELOG.next.md

mjwolf · 2024-08-13T20:17:17Z

schemas/file.yml

+    - name: origin_referrer_url
+      level: extended
+      type: keyword
+      ignore_above: 8192


This is a fairly large ignore_above value, do you expect that there will be data this size? Have you looked into the performance impact?

do you expect that there will be data this size?

Yes. When defining the URL size, I did some research and came across a discussion from a past ECS thread about URL size, which I used as a reference. There was a discussion that the URL field should be at least 4096 bytes, and ideally 8192 bytes based on the actual URL log data, so I followed that suggestion. In addition, after discussing it within the team, we concluded that since the typical maximum URL size accepted by servers is 8KB, 8KB seemed like the most reasonable choice.

Have you looked into the performance impact?

However, I haven't look at the performance impact of setting ignore_above as 8192. If performance testing is required, I would be happy to carry it out. If there is any documentation on how to do it, could you please share it with me?

mjwolf · 2024-08-13T20:20:02Z

schemas/file.yml

+      level: extended
+      type: keyword
+      ignore_above: 8192
+      description: The url of the webpage that linked to the file.


Capitalize URL here and in the below description

Thanks for the suggestion! I have changed url to URL!

mjwolf · 2024-08-13T20:24:38Z

schemas/file.yml

+      level: extended
+      type: keyword
+      ignore_above: 8192
+      description: The url where the file is hosted.


Have you considered redirects? For example, when using CDNs, the initial URL can be redirected to various download URLs. Is that meaningful for your use case? Should the description specify if this is the initial request or a redirected URL?

Yes, I have implemented this field (origin_url) to retrieve the actual URL from which the file was downloaded using the Mark of the Web (MoTW) information in Windows. (MoTW is metadata added to files downloaded from the internet in Windows. )

For example, if you download ejabberd_20.12-0_amd64.deb from https://www.process-one.net/downloads/downloads-action.php?file=/20.12/ejabberd_20.12-0_amd64.deb, the download is redirected, and the file is actually retrieved from https://static.process-one.net/ejabberd/downloads/20.12/ejabberd_20.12-0_amd64.deb.

In MoTW, as shown in the attached image, the redirect URL is saved as the file's host URL.

Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

AsuNa-jp · 2024-09-04T10:32:34Z

@mjwolf
Thank you for the detailed comments/feedbacks🙇🏻‍♀️ & I’m sorry for the delay in responding, as I was out of the office from mid to late August. I’ve updated the PR based on your feedback, and I’ve also answered your questions above. I’d appreciate it if you could take a look.🙇🏻‍♀️

trisch-me · 2024-09-12T11:50:29Z

schemas/file.yml

+      level: extended
+      type: keyword
+      ignore_above: 8192
+      description: The URL of the webpage that linked to the file.


I don't quite understand what does it mean? Could you elaborate it a bit, give some examples maybe? Is it url, where file was located before downloading?

Hi @trisch-me,
Thank you for the comments/feedbacks!🙇🏻‍♀️
Following is the actual example of the file.origin_referrer_url and file.origin_url.

When you download an image file (for example, news30_img3.png) from the following web page,

http://girls.seccon.jp/news30.html

the file.origin_url will be the url for the downloaded file, and the file.origin_referrer_url will be the url which hosted the file (http://girls.seccon.jp/news30.html).

thanks for explanation, can you add more information into description/examples of the actual fields?

trisch-me · 2024-09-12T13:29:25Z

Hey @AsuNa-jp I think it would be great to follow RFC process for this change, because it's a new addition and it should usually go through the process.

Currently looking into fields I'm not sure we can add those to the file namespace because from the semantic logic they are of type url and would be great to have them as such.

in Otel we have this differentiation between schema, i.e. https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/url.yaml and usage of schema (or how we call it - usecases), where you have fields from different namespaces together in 1 place, describing the usecase (https://github.com/open-telemetry/semantic-conventions/blob/main/model/trace/database.yaml#L263)

Translating this into your PR and new fields, I would argue that they should be a part of the file namespace (though it's possible too) but rather be a part of that mix for specific usecase. Also another possibility to have them not as strings but as a reference to a file namespace. Why - because then file.url will have the same restrictions as all other urls (for example redacting sensitive data in url)

In current Otel version one can't rename referenced attributes, it means url.full is always url.full but there will be a change soon for this possibility

AsuNa-jp · 2024-09-20T09:52:02Z

Hi @trisch-me!

I’ve finally come to understand the differences between ECS and OpenTelemetry, so I’d like to answer your previous question.

Whether or not it should be placed in the file namespace

ECS

Based on the definition of the file fields in ECS, I believe file.origin_referrer_url and file.origin_url
should be defined in the file namespace.

A file is defined as a set of information that has been created on, or has existed on a filesystem.
File objects can be associated with host events, network events, and/or file events (e.g., those produced by File Integrity Monitoring [FIM] products or services). File fields provide details about the affected file associated with the event or metric.
https://www.elastic.co/guide/en/ecs/current/ecs-file.html

This is because this origin_referrer_url and origin_url information is added automatically to the file's Alternate Data Stream (ADS) in NTFS (the Windows file system) when a file is downloaded. It is not information retrieved from a web request (eg. GET request), but rather from the file system. According to the ECS definition, this type of information should be placed in the file namespace.

For example, when you download an image file (image17.webp) from this webpage using a web browser, the download source URL is automatically added to the file's Alternate Data Stream (ADS) as following.

file's ADS

Inside image17.webp:Zone.Identifier:$DATA

(ReferrerUrl -> `origin_referrer_url` . HostUrl->`origin_url` )

OpenTelemetry

As for OpenTelemetry (registry.yaml), it seems to limit the file namespace to file metadata (such as size), so I understand your doubts about whether it should be included in the file namespace. However, if we are to unify the naming with ECS, I believe that file.origin_referrer_url and file.origin_url should also be included in the file namespace on the OpenTelemetry side.

If there is no need to unify the naming between ECS and OpenTelemetry, then I think it's fine to handle it as a mixed use case(currently, spans.yaml), as you suggested. However, this field itself is not intended for any specific use case, but rather it could be included in logs for any file-related operation. Therefore, it's difficult to define a single (or a few)use case. In such a case, what would be the best approach?

AsuNa-jp · 2024-09-20T10:01:04Z

@magermark
If you have any thoughts or suggestions from endpoint (elastic defend) perspective, I would be grateful if you could kindly share them.

trisch-me · 2024-09-20T11:17:07Z

hey @AsuNa-jp great explanation, I agree with you that in that case we might add those fields into file namespace.

Do you foresee there any private information in the url path? I don't think we are in control of it, but we might have it, and we might have to or want to remove it. Should we add this to the description?

We aim to have the same names for both otel and ecs fields, otherwise it will be a breaking change for us (small one but better to avoid). I'm not opposing this change in the ECS, but I would like to encourage you to create a similar PR in Otel (I can give you some hints and can show how to) to get initial response on the field names.

AsuNa-jp · 2024-09-20T12:27:37Z

@trisch-me Thank you for your reply.
I had also been thinking about how to address the points you raised.

Do you foresee there any private information in the url path?

Yes, if a password is included in the URL of a web page, it can be exposed as shown below.

(Above is the URL to access the e-ticket I purchased, but it has already expired, so it is no longer accessible.)

We aim to have the same names for both otel and ecs fields

If the field needs to be the same between ECS and Otel, how about first adding file.origin_referrer_url and file.origin_url to ECS. and then, once Otel supports renaming referenced attributes, we can update spans.yaml by adding ref: url.full(and rename or alias as file.origin_referrer_url). How does that sound?

trisch-me · 2024-09-23T12:44:49Z

I think we can start with just adding fields to the otel as we want to have them in ECS, I don't see any problems in starting this discussion already. Can you do it? Should I create a PR?

And we should add a note about possible sensitive data inside url

add new fields

ae987fb

AsuNa-jp requested a review from a team as a code owner June 27, 2024 08:19

AsuNa-jp self-assigned this Jun 27, 2024

AsuNa-jp added 2 commits June 27, 2024 04:22

add changelog

7e044e8

add generated files

0e2d677

AsuNa-jp mentioned this pull request Jun 27, 2024

Add file.origin_referrer_url and file.origin_url to FileEvent elastic/endpoint-package#514

Merged

1 task

AsuNa-jp added 5 commits June 27, 2024 07:12

change ingore_above to 8192

b6232b7

change ingore_above to 8192

9044e9b

change ingore_above to 8192

41653a4

change ingore_above to 8192

6bd7c2b

Merge branch 'main' into new_fileevent_fields

4c931f8

AsuNa-jp added New Field Request review endpoint Relevant to elastic endpoint security Team: ECS labels Aug 6, 2024

AsuNa-jp requested review from jdu2600 and mjwolf August 13, 2024 14:09

mjwolf reviewed Aug 13, 2024

View reviewed changes

AsuNa-jp and others added 4 commits September 3, 2024 15:15

Update CHANGELOG.next.md

454627d

Co-authored-by: Michael Wolf <michael.wolf@elastic.co>

Update file.yml

87cb587

Merge branch 'main' into new_fileevent_fields

78cd17e

add regenerated files

1dbe5c0

AsuNa-jp added 2 commits September 6, 2024 09:31

Merge branch 'main' into new_fileevent_fields

d590196

Merge branch 'main' into new_fileevent_fields

eba4cc7

trisch-me reviewed Sep 12, 2024

View reviewed changes

Merge branch 'main' into new_fileevent_fields

4977a98

Merge branch 'main' into new_fileevent_fields

eedee10

AsuNa-jp marked this pull request as draft September 25, 2024 09:18

AsuNa-jp removed the request for review from jdu2600 September 27, 2024 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new fields file.origin_referrer_url & file.origin_url #2348

Add new fields file.origin_referrer_url & file.origin_url #2348

AsuNa-jp commented Jun 27, 2024 •

edited

Loading

mjwolf left a comment

mjwolf Aug 13, 2024

AsuNa-jp Sep 4, 2024 •

edited

Loading

mjwolf Aug 13, 2024

AsuNa-jp Sep 4, 2024

mjwolf Aug 13, 2024

AsuNa-jp Sep 4, 2024

AsuNa-jp commented Sep 4, 2024

trisch-me Sep 12, 2024

AsuNa-jp Sep 12, 2024 •

edited

Loading

trisch-me Sep 20, 2024

trisch-me commented Sep 12, 2024 •

edited

Loading

AsuNa-jp commented Sep 20, 2024

AsuNa-jp commented Sep 20, 2024

trisch-me commented Sep 20, 2024

AsuNa-jp commented Sep 20, 2024 •

edited

Loading

trisch-me commented Sep 23, 2024 •

edited

Loading

Add new fields file.origin_referrer_url & file.origin_url #2348

Are you sure you want to change the base?

Add new fields file.origin_referrer_url & file.origin_url #2348

Conversation

AsuNa-jp commented Jun 27, 2024 • edited Loading

Checklist

PR summary

To reviewers

mjwolf left a comment

Choose a reason for hiding this comment

mjwolf Aug 13, 2024

Choose a reason for hiding this comment

AsuNa-jp Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

mjwolf Aug 13, 2024

Choose a reason for hiding this comment

AsuNa-jp Sep 4, 2024

Choose a reason for hiding this comment

mjwolf Aug 13, 2024

Choose a reason for hiding this comment

AsuNa-jp Sep 4, 2024

Choose a reason for hiding this comment

AsuNa-jp commented Sep 4, 2024

trisch-me Sep 12, 2024

Choose a reason for hiding this comment

AsuNa-jp Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

trisch-me Sep 20, 2024

Choose a reason for hiding this comment

trisch-me commented Sep 12, 2024 • edited Loading

AsuNa-jp commented Sep 20, 2024

Whether or not it should be placed in the file namespace

ECS

OpenTelemetry

AsuNa-jp commented Sep 20, 2024

trisch-me commented Sep 20, 2024

AsuNa-jp commented Sep 20, 2024 • edited Loading

trisch-me commented Sep 23, 2024 • edited Loading

AsuNa-jp commented Jun 27, 2024 •

edited

Loading

AsuNa-jp Sep 4, 2024 •

edited

Loading

AsuNa-jp Sep 12, 2024 •

edited

Loading

trisch-me commented Sep 12, 2024 •

edited

Loading

AsuNa-jp commented Sep 20, 2024 •

edited

Loading

trisch-me commented Sep 23, 2024 •

edited

Loading