Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for new 7.10 rsa2elk datasets #21240

Merged
merged 2 commits into from
Sep 29, 2020

Conversation

adriansr
Copy link
Contributor

What does this PR do?

This updates the Javascript pipelines in the new rsa2elk datasets for 7.10.

Why is it important?

There were two problems with the original pipelines:

  • juniper/netscreen:

    This pipeline used ’ XML entity as a quote character. This entity translates to UNICODE codepoint U+0092 (PRIVATE USE 2) (�), which is not printable and can cause problems.

My understanding is that this is the result of either:

  • Device logs are encoded in the windows-1252 codepage, or
  • Log parsers originally written in windows-1252 codepage
    and
  • Expecting XML's &#xNNN to encode a byte instead of a unicode codepoint.

In this codepage, \x92 represents a quotation mark similar to the ASCII \x27 single quotation mark ('). The correct codepoint to use for this character would have been U+2019 (’, RIGHT SINGLE QUOTATION MARK, ’).

As it is unclear if the original logs contain this special quote, or it's the result of writing the parsers in a Windows editor, it's better to replace it's usage with empty captures that skip over the quote.

  • All pipelines:

The original pipelines had been generated with some debugging comments that made them much larger than necessary.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

It's OK as long as it passes the tests.

Some parsers from netwitness wrongly use &#x092 XML entity as a quote
character. This entity translates to UNICODE codepoint U+0092 (PRIVATE
USE 2), which is not printable and can cause problems.

My understanding is that this is the result of either:
- Device logs are encoded in the windows-1252 codepage, or
- Log parsers originally written in windows-1252 codepage.

In this codepage, \x92 represents a quotation mark similar to the
ASCII \x27 single quotation mark (').

I believe someone misunderstood XML's &#xNNN entity as escaping a byte value,
instead of a UNICODE codepoint.

As it is unclear if the original logs contain this special quote, or it's the
result of writting the parsers in a Windows editor, it's better to replace
it's usage with empty captures that skip over this quote.
The original pipelines had been generated with some debugging comments
in them, which made them much larger than necessary.
@adriansr adriansr added bug review needs_backport PR is waiting to be backported to other branches. labels Sep 23, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 23, 2020
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #21240 opened]

  • Start Time: 2020-09-23T10:47:31.731+0000

  • Duration: 59 min 27 sec

Test stats 🧪

Test Results
Failed 0
Passed 2497
Skipped 388
Total 2885

@adriansr adriansr merged commit 24e972f into elastic:master Sep 29, 2020
v1v added a commit to v1v/beats that referenced this pull request Sep 29, 2020
* upstream/master:
  feat: prepare release pipelines (elastic#21238)
  Add IP validation to Security module (elastic#21325)
  Fixes for new 7.10 rsa2elk datasets (elastic#21240)
  o365input: Restart after fatal error (elastic#21258)
  Fix panic in cgroups monitoring (elastic#21355)
  Handle multiple upstreams in ingress-controller (elastic#21215)
  [CI] Fix runbld when workspace does not exist (elastic#21350)
  [Filebeat] Fix checkpoint (elastic#21344)
  [CI] Archive build reasons (elastic#21347)
  Add dashboard for pubsub metricset in googlecloud module (elastic#21326)
  [Elastic Agent] Allow embedding of certificate (elastic#21179)
  Adds a default for failure_cache.min_ttl (elastic#21085)
  [libbeat] Disk queue implementation (elastic#21176)
@adriansr adriansr added v7.10.0 and removed needs_backport PR is waiting to be backported to other branches. labels Sep 29, 2020
adriansr added a commit to adriansr/beats that referenced this pull request Sep 29, 2020
* Fix bad unicode character used in juniper/netscreen

Some parsers from netwitness wrongly use &#x092 XML entity as a quote
character. This entity translates to UNICODE codepoint U+0092 (PRIVATE
USE 2), which is not printable and can cause problems.

My understanding is that this is the result of either:
- Device logs are encoded in the windows-1252 codepage, or
- Log parsers originally written in windows-1252 codepage.

In this codepage, \x92 represents a quotation mark similar to the
ASCII \x27 single quotation mark (').

I believe someone misunderstood XML's &#xNNN entity as escaping a byte value,
instead of a UNICODE codepoint.

As it is unclear if the original logs contain this special quote, or it's the
result of writting the parsers in a Windows editor, it's better to replace
it's usage with empty captures that skip over this quote.

* Update pipelines for new 7.10 rsa2elk datasets

The original pipelines had been generated with some debugging comments
in them, which made them much larger than necessary.

(cherry picked from commit 24e972f)
adriansr added a commit that referenced this pull request Sep 29, 2020
* Fix bad unicode character used in juniper/netscreen

Some parsers from netwitness wrongly use &#x092 XML entity as a quote
character. This entity translates to UNICODE codepoint U+0092 (PRIVATE
USE 2), which is not printable and can cause problems.

My understanding is that this is the result of either:
- Device logs are encoded in the windows-1252 codepage, or
- Log parsers originally written in windows-1252 codepage.

In this codepage, \x92 represents a quotation mark similar to the
ASCII \x27 single quotation mark (').

I believe someone misunderstood XML's &#xNNN entity as escaping a byte value,
instead of a UNICODE codepoint.

As it is unclear if the original logs contain this special quote, or it's the
result of writting the parsers in a Windows editor, it's better to replace
it's usage with empty captures that skip over this quote.

* Update pipelines for new 7.10 rsa2elk datasets

The original pipelines had been generated with some debugging comments
in them, which made them much larger than necessary.

(cherry picked from commit 24e972f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants