Migration of apache2 error fileset to ECS #8963

ruflin · 2018-11-07T12:43:36Z

apache.error.level -> log.level
apache.error.message -> message
apache.error.pid -> process.pid
apache.error.tid -> process.tid
apache.error.client -> source.ip

Other changes

Changelog updated
Migration file updated
Field alias added for the old fields
Test outputs updated
Fix source.ip to be IP only. Before had a HOSTORIP pattern.

Note: The ECS fields used were on purpose only put on the module level and are put into filebeat or libbeat when they are used in more then 1 place.

ruflin · 2018-11-07T12:45:35Z

filebeat/module/apache2/error/test/test.log-expected.json

+        "log.offset": 229, 
+        "message": "File does not exist: /usr/local/apache2/htdocs/favicon.ico", 
+        "process.pid": "35708", 
+        "process.tid": "4328636416", 


I wonder why this shows up as a string instead of a int here.

Just hasn't been cast, I assume.

However I think the transient state of the JSON payload is not that critical:

The ECS template will define them as long, and ES will index them as such.

Re-reading them with Kibana will correctly indicate they are numeric

The only problem is that the _source will return this as a string, not a numeric.

Because of 3, we may want to fix this eventually. But I think it's out of scope for the mapping to ECS.

@webmat Totally missed your comment here. I think you are actually spot on. I was kind of thinking if the type is defined as long it would also have an affect on _source but obviously it will not.

ruflin · 2018-11-07T12:45:53Z

filebeat/module/apache2/error/ingest/pipeline.json

@@ -5,27 +5,15 @@
      "grok": {
        "field": "message",
        "patterns": [
-          "\\[%{APACHE_TIME:apache2.error.timestamp}\\] \\[%{LOGLEVEL:apache2.error.level}\\]( \\[client %{IPORHOST:apache2.error.client}\\])? %{GREEDYDATA:apache2.error.message}",
-          "\\[%{APACHE_TIME:apache2.error.timestamp}\\] \\[%{DATA:apache2.error.module}:%{LOGLEVEL:apache2.error.level}\\] \\[pid %{NUMBER:apache2.error.pid}(:tid %{NUMBER:apache2.error.tid})?\\]( \\[client %{IPORHOST:apache2.error.client}\\])? %{GREEDYDATA:apache2.error.message1}"
+          "\\[%{APACHE_TIME:apache2.error.timestamp}\\] \\[%{LOGLEVEL:log.level}\\]( \\[client %{IPORHOST:source.ip}\\])? %{GREEDYDATA:message}",


is apache2.error.client always going to be an IP?

The Apache error log doesn't seem to support a "host" (meaning hostname or IP), according to their doc. The value here seems to be %a, which would be IP:port (kind of like an "authority", but in this case always an IP before the port).

This makes sense, because in an error logging scenario, you want to minimize the chances of fancy features breaking before an error is logged. Therefore the reverse DNS lookup needed to produce %h and try to resolve to a name would be skipped here.

Only thing I'm not sure of is %a seems to always include a port. Although I'm not sure the original parsing ever supported that (where are Beats grok patterns defined?), and the test logs don't include it either...

I did a bit of spelunking, and our test log may be incorrect. My default Apache httpd install includes the port, even for local access. The current grok is correct in making this whole section optional, as some error logs have nothing to do with clients (e.g. logging restarts). I wonder if this may have hidden problems with the client trying to parse an IP, when it was in fact IP:port.

In any case, I think we could be pragmatic and optionally parse the port here:

Recap:

yes it's always an IP

There should also be a port. Let's parse it if it's present.

See below for example logs that include the port

The whole client section being optional is a good thing. Keep this too.

For reference here's the logs I just generated on my local homebrew httpd, trying to access localhost:8080/meh:

[Wed Nov 07 13:51:42.196822 2018] [mpm_prefork:notice] [pid 60830] AH00163: Apache/2.4.37 (Unix) configured -- resuming normal operations [Wed Nov 07 13:51:42.197371 2018] [core:notice] [pid 60830] AH00094: Command line: '/usr/local/opt/httpd/bin/httpd' [Wed Nov 07 14:05:19.882541 2018] [mpm_prefork:notice] [pid 60830] AH00171: Graceful restart requested, doing restart [Wed Nov 07 14:05:19.898329 2018] [mpm_prefork:notice] [pid 60830] AH00163: Apache/2.4.37 (Unix) configured -- resuming normal operations [Wed Nov 07 14:05:19.898366 2018] [mpm_prefork:info] [pid 60830] AH00164: Server built: Oct 24 2018 19:50:11 [Wed Nov 07 14:05:19.898388 2018] [core:notice] [pid 60830] AH00094: Command line: '/usr/local/opt/httpd/bin/httpd' [Wed Nov 07 14:05:19.898451 2018] [core:debug] [pid 60830] log.c(1568): AH02639: Using SO_REUSEPORT: no (1) [Wed Nov 07 14:05:19.898527 2018] [mpm_prefork:debug] [pid 60830] prefork.c(918): AH00165: Accept mutex: none (default: sysvsem) [Wed Nov 07 14:05:32.324848 2018] [authz_core:debug] [pid 62310] mod_authz_core.c(817): [client ::1:50002] AH01626: authorization result of Require all granted: granted [Wed Nov 07 14:05:32.327138 2018] [authz_core:debug] [pid 62310] mod_authz_core.c(817): [client ::1:50002] AH01626: authorization result of <RequireAny>: granted [Wed Nov 07 14:05:32.327240 2018] [core:info] [pid 62310] [client ::1:50002] AH00128: File does not exist: /usr/local/var/www/meh [Wed Nov 07 14:05:52.563539 2018] [authz_core:debug] [pid 62312] mod_authz_core.c(817): [client ::1:50093] AH01626: authorization result of Require all granted: granted [Wed Nov 07 14:05:52.564189 2018] [authz_core:debug] [pid 62312] mod_authz_core.c(817): [client ::1:50093] AH01626: authorization result of <RequireAny>: granted [Wed Nov 07 14:05:52.564320 2018] [core:info] [pid 62312] [client ::1:50093] AH00128: File does not exist: /usr/local/var/www/meh

Sounds like in the first step we can feed it into our "magic" hostname field and then follow up also with the port separation and feed it properly into the fields. Should we open a separate issue for that?

I think for these migrations we should fix little things like this right in the migration PR. We have too many modules to convert. If each of them requires 2-3 PRs each to do in full, that's going to mean a whole lot of back & forth.

If we unearth deeper problems with a given module, there may be a case for more than one PR, though.

ruflin · 2018-11-21T07:24:30Z

@webmat Ready for review.

webmat · 2018-11-21T20:19:16Z

dev-tools/ecs-migration.yml

+  copy_to: false
+
+- from: apache2.error.tid
+  to: process.tid


Threads can also have a name, a priority (nice level), and perhaps other interesting properties. I would name this field process.thread.tid, to make space for other attributes we may want to eventually support about threads...

I remember we discussed it in the past that we should also have a thread object. Was the conclusion to have it under process or global? @andrewkroh do you remember?

Consider this:

process.thread.tid process.thread.name process.thread.priority

I don't yet have a strong opinion on nesting under process vs top level.

But I think it's really important we have a "thread" field set, so we can eventually track other things about threads.

As a thread always belongs to a thread (is that correct 🤔 ) I think it makes sense to nest it under process. Should we open an issue / PR in ECS?

Agreed. That's what I've been dying to do. Open this PR and about 15 others ;-)

webmat · 2018-11-29T15:39:51Z

@ruflin I assume I should take over this one as well?

ruflin · 2018-11-29T15:41:46Z

@webmat Will not say no ;-) But also do not want to drop the work on you.

ruflin · 2018-12-14T09:25:15Z

@webmat Pushed this forward a bit on my end.

filebeat/module/apache2/error/ingest/pipeline.json

webmat · 2018-12-14T16:43:44Z

libbeat/_meta/fields.ecs.yml

@@ -1740,7 +1740,17 @@
        packet/event. Source fields are usually populated in conjunction with destination fields.
      type: group
      fields:
-
+        - name: address


webmat · 2018-12-14T16:44:50Z

@ruflin Good stuff. I want to do my next big push on modules next week. If you don't have time to finish this one, I'll pick it up.

ruflin · 2018-12-17T08:33:44Z

filebeat/module/apache2/error/ingest/pipeline.json

-          "\\[%{APACHE_TIME:apache2.error.timestamp}\\] \\[%{LOGLEVEL:apache2.error.level}\\]( \\[client %{IPORHOST:apache2.error.client}\\])? %{GREEDYDATA:apache2.error.message}",
-          "\\[%{APACHE_TIME:apache2.error.timestamp}\\] \\[%{DATA:apache2.error.module}:%{LOGLEVEL:apache2.error.level}\\] \\[pid %{NUMBER:apache2.error.pid}(:tid %{NUMBER:apache2.error.tid})?\\]( \\[client %{IPORHOST:apache2.error.client}\\])? %{GREEDYDATA:apache2.error.message1}"
+          "\\[%{APACHE_TIME:apache2.error.timestamp}\\] \\[%{LOGLEVEL:log.level}\\]( \\[client %{IPORHOST:source.address}\\])? %{GREEDYDATA:message}",
+          "\\[%{APACHE_TIME:apache2.error.timestamp}\\] \\[%{DATA:apache2.error.module}:%{LOGLEVEL:log.level}\\] \\[pid %{NUMBER:process.pid:int}(:tid %{NUMBER:process.thread.id:long})?\\]( \\[client %{IPORHOST:source.address}\\])? %{GREEDYDATA:message}"


@webmat To my surprise I had to use :long here as it seems for :int the value was too big. Probably good to know if you hit issue with other values too and we should be careful as we now know there is a limit.

@jakelandis Is there any disadvantage here to always use :long instead of :int?

Wow I'm glad we caught the :int case with our test data! I've been using only "int" for all of the coercions I've fixed so far. Although all of those I did fit in the bounds, AFAICT (ports, PIDs, mostly).

I think always using long would actually be fine. It'll instantiate a long, but then shove that into a JSON object, where the distinction disappears anyway.

I'm now curious to know if the int used by grok is signed or not. Both possibilities can actually cause problems. If it's signed, it's currently not enough to properly capture the high ports or PIDs, over 32k. If it's unsigned, it wouldn't work to capture negative integers...

I created #9598 to tackle this.

@ruflin - no disadvantage other then you may also need to update the mapping.

@webmat - It is signed. Specifically we use https://docs.oracle.com/javase/8/docs/api/java/lang/Long.html#parseLong-java.lang.String- or https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#parseInt-java.lang.String- to cast the value found from the grok expression.

Great, thanks for confirming, @jakelandis.

Most of our fields actually are long in ElasticSearch, so this won't be a problem. I'll fix our coercions soon.

Fields migrated: * apache.error.level -> log.level * apache.error.message -> message * apache.error.pid -> process.pid * apache.error.tid -> process.tid * apache.error.client -> source.ip Other changes * Changelog updated * Migration file updated * Field alias added for the old fields * Test outputs updated * Fix source.ip to be IP only. Before had a HOSTORIP pattern. Note: The ECS fields used were on purpose only put on the module level and are put into filebeat or libbeat when they are used in more then 1 place.

Co-Authored-By: ruflin <spam@ruflin.com>

webmat · 2018-12-18T16:03:09Z

As discussed, here's my post-merge review ;-)

source.address should optionally be copied to source.ip when it's an IP. This requires a small grok in the pipeline.

I'm not sure we'll be able to get started necessarily on the ideas we're accumulating in elastic/ecs#181 for 7.0. So I'd rather see this small pipeline processor added in the module for now.

webmat · 2018-12-18T16:03:52Z

I'll take care of it likely today, btw.

* Migration of apache2 error fileset to ECS Fields migrated: * apache.error.level -> log.level * apache.error.message -> message * apache.error.pid -> process.pid * apache.error.tid -> process.tid * apache.error.client -> source.ip Other changes * Changelog updated * Migration file updated * Field alias added for the old fields * Test outputs updated * Fix source.ip to be IP only. Before had a HOSTORIP pattern. Note: The ECS fields used were on purpose only put on the module level and are put into filebeat or libbeat when they are used in more then 1 place.

ruflin added in progress Pull request is currently in progress. module Filebeat Filebeat ecs labels Nov 7, 2018

ruflin commented Nov 7, 2018

View reviewed changes

ruflin mentioned this pull request Nov 14, 2018

Transition Beats to ECS #8655

Closed

ruflin force-pushed the ecs-apache-error-migration branch 2 times, most recently from a79b3f6 to 415bb35 Compare November 21, 2018 07:24

ruflin added review and removed in progress Pull request is currently in progress. labels Nov 21, 2018

ruflin changed the title ~~[WIP] Migration of apache2 error fileset to ECS~~ Migration of apache2 error fileset to ECS Nov 21, 2018

ruflin force-pushed the ecs-apache-error-migration branch from 415bb35 to 25d98c1 Compare November 21, 2018 11:44

webmat reviewed Nov 21, 2018

View reviewed changes

ruflin added the Team:Integrations Label for the Integrations team label Nov 27, 2018

alvarolobato assigned ruflin Dec 4, 2018

ruflin mentioned this pull request Dec 7, 2018

Rename filebeat apache2 module to apache #9402

Merged

ruflin force-pushed the ecs-apache-error-migration branch 2 times, most recently from edd00bb to 57bd9e0 Compare December 14, 2018 09:25

ruflin force-pushed the ecs-apache-error-migration branch from 57bd9e0 to cf99ef9 Compare December 14, 2018 14:27

ruflin requested a review from a team as a code owner December 14, 2018 14:27

webmat reviewed Dec 14, 2018

View reviewed changes

filebeat/module/apache2/error/ingest/pipeline.json Outdated Show resolved Hide resolved

webmat reviewed Dec 14, 2018

View reviewed changes

ruflin commented Dec 17, 2018

View reviewed changes

webmat mentioned this pull request Dec 17, 2018

Fix recently translated Filebeat modules where integer coercions were added #9598

Closed

14 tasks

ruflin force-pushed the ecs-apache-error-migration branch from a433dbb to 108717c Compare December 18, 2018 08:42

ruflin and others added 5 commits December 18, 2018 13:47

update fields

5e82d25

Update filebeat/module/apache2/error/ingest/pipeline.json

d4901f8

Co-Authored-By: ruflin <spam@ruflin.com>

update generated files

74d3de3

update pipeline and rebase

52ade7e

ruflin force-pushed the ecs-apache-error-migration branch from 108717c to 52ade7e Compare December 18, 2018 12:50

ruflin merged commit 8eff2a8 into elastic:master Dec 18, 2018

ruflin deleted the ecs-apache-error-migration branch December 18, 2018 13:56

jsoriano mentioned this pull request Jan 9, 2019

Fix apache fields config for ECS migration #9960

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration of apache2 error fileset to ECS #8963

Migration of apache2 error fileset to ECS #8963

ruflin commented Nov 7, 2018 •

edited

Loading

ruflin Nov 7, 2018

webmat Nov 7, 2018

ruflin Nov 21, 2018

ruflin Nov 7, 2018

webmat Nov 7, 2018

ruflin Nov 12, 2018

webmat Nov 13, 2018

ruflin commented Nov 21, 2018

webmat Nov 21, 2018 •

edited

Loading

ruflin Nov 22, 2018

webmat Nov 23, 2018 •

edited

Loading

ruflin Nov 27, 2018

webmat Nov 27, 2018

webmat commented Nov 29, 2018

ruflin commented Nov 29, 2018

ruflin commented Dec 14, 2018

webmat Dec 14, 2018

webmat commented Dec 14, 2018

ruflin Dec 17, 2018

webmat Dec 17, 2018

jakelandis Dec 17, 2018

webmat Dec 17, 2018

webmat commented Dec 18, 2018

webmat commented Dec 18, 2018

Migration of apache2 error fileset to ECS #8963

Migration of apache2 error fileset to ECS #8963

Conversation

ruflin commented Nov 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Nov 21, 2018

webmat Nov 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webmat Nov 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webmat commented Nov 29, 2018

ruflin commented Nov 29, 2018

ruflin commented Dec 14, 2018

Choose a reason for hiding this comment

webmat commented Dec 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webmat commented Dec 18, 2018

webmat commented Dec 18, 2018

ruflin commented Nov 7, 2018 •

edited

Loading

webmat Nov 21, 2018 •

edited

Loading

webmat Nov 23, 2018 •

edited

Loading