Skip to content

Network community_id processor for ingest pipelines #66534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 13, 2021

Conversation

danhermann
Copy link
Contributor

@danhermann danhermann commented Dec 17, 2020

Adds a processor that computes the community_id for flow data according to the Community ID Specification.

For example:

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "community_id": {
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "source": {
          "ip": "128.232.110.120",
          "port": 34855
        },
        "destination": {
          "ip": "66.35.250.204",
          "port": 80
        },
        "network": {
          "transport": "TCP"
        }
      }
    }
  ]
}

populates the network.community_id field as below:

...
"_source" : {
  "destination" : {
    "port" : 80,
    "ip" : "66.35.250.204"
  },
  "source" : {
    "port" : 34855,
    "ip" : "128.232.110.120"
  },
  "network" : {
    "community_id" : "1:LQU9qZlK+B5F3KDmev6m5PMibrg=",
    "transport" : "TCP"
  }
}
...

Closes #55685

@danhermann danhermann added >feature :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.0.0 v7.12.0 labels Dec 17, 2020
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Dec 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@danhermann
Copy link
Contributor Author

@andrewkroh, I would appreciate a look at this to see if it meets the criteria for #55685. I did replicate all the unit tests for the Beats community_id processor to verify that this processor produces the same values, so I'm reasonably confident about it in that regard.

@danhermann
Copy link
Contributor Author

cc: @elastic/es-ui in case Kibana auto-complete needs to be updated with this new processor.

@alisonelizabeth
Copy link
Contributor

Thanks @danhermann. I've opened elastic/kibana#86321 to track changes needed in Kibana. Can you also share the available options for this processor?

@danhermann
Copy link
Contributor Author

Thanks @danhermann. I've opened elastic/kibana#86321 to track changes needed in Kibana. Can you also share the available options for this processor?

Yes, I'll open a separate PR with the docs for this processor that will include all its available options. I'll tag you on that one when I open it.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Just a minor comment from me.


Flow flow = new Flow();
try {
flow.source = InetAddress.getByName(sourceIpAddrString);
Copy link
Member

@andrewkroh andrewkroh Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that this could potentially perform network calls to lookup IP addresses from names.

Luckily we have a utility for this that's safe. Can you use that package here?

public static InetAddress forString(String ipString) {

return transportNumber;
}

public static Transport fromNumber(int transportNumber) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a worth it, so feel free to ignore. To ensure that fromNumber stays up-to-date with all the enum transportNumbers it might be useful to have test that checks the all the Transport.values() work with fromNumber without exception.

@danhermann
Copy link
Contributor Author

@alisonelizabeth, the processor's options are documented here: #66592

@danhermann
Copy link
Contributor Author

This looks great. Just a minor comment from me.

Thanks, @andrewkroh! I've made both changes that you suggested.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakelandis jakelandis requested a review from martijnvg January 7, 2021 16:17
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For processors added in ingest-common module there is a yaml test (Ingest common installed) that ensures that processors are correct wired up. Maybe we should have such a test for the xpack ingest module as well? Then we verify that uri_parts and community_id processors have correctly been wired up and can be used in pipelines.

Ingest integration LGTM.

import static org.elasticsearch.ingest.ConfigurationUtils.newConfigurationException;
import static org.elasticsearch.ingest.ConfigurationUtils.readBooleanProperty;

public class CommunityIdProcessor extends AbstractProcessor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make class final?

@danhermann
Copy link
Contributor Author

@elasticmachine update branch

@danhermann
Copy link
Contributor Author

Thanks, @martijnvg! I'll open another PR with the module yaml test that you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >feature Team:Data Management Meta label for data/management team v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingest Node processor for Network Community ID
6 participants