New component: IPFIX Lookup #28692

fizzers123 · 2023-10-30T12:41:30Z

The purpose and use-cases of the new component

Allow traces to be enhanced by IPFIX information stored in an ElasticSearch cluster.

A very similar functionality was already suggested once in February 2023 #18270. We would be interested in contributing our code here.

Example configuration for the component

processors:
  groupbytrace:
    wait_duration: 100s
    num_traces: 1000
    num_workers: 2
  ipfix_lookup:
    elastic_search:
      connection: 
        addresses:
          - https://<elastic_ip>:30200/
        username: elastic
        password: <password_here>
        certificate_fingerprint: <cert_fingerprint_here>
    timing:
      lookup_window: 120
    # # OPTIONAL settings:
    # query_parameters:
    #   base_query:
    #     field_name: input.type
    #     field_value: netflow
    #   device_identifier: "fields.observer\\.ip.0"
    #   lookup_fields:
    #     source_ip: source.ip
    #     source_port: source.port
    #     destination_ip: destination.ip
    #     destination_port: destination.port
    # span_attribute_fields:
    #   - "@this"
    #   - "fields.event\\.duration.0"
    #   - "fields.observer\\.ip.0"
    #   - "fields.source\\.ip.0"
    #   - "fields.source\\.port.0"
    #   - "fields.destination\\.ip.0"
    #   - "fields.destination\\.port.0"
    #   - "fields.netflow\\.ip_next_hop_ipv4_address"
    # spans:
    #   span_fields:
    #     source_ips:
    #       - net.peer.ip
    #       - net.peer.name
    #       - src.ip
    #     source_ports:
    #       - net.peer.port
    #       - src.port
    #     destination_ip_and_port:
    #       - http.host
    #     destination_ips:
    #       - dst.ip
    #     destination_ports:
    #       - dst.port  
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [groupbytrace, ipfix_lookup]
      exporters: [otlp/jaeger, debug]
  telemetry:
    logs:
      level: debug

Telemetry data types supported

traces

Is this a vendor-specific component?

This is a vendor-specific component
If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

No response

Sponsor (optional)

No response

Additional context

As part of our Bachelor thesis at the Eastern Switzerland University of Applied Sciences we have created a basic implementation of this functionality.

(The network was intentionally slowed down for this screenshot)

ipfix_lookup processor

Inside the OptenTelemetry pipeline, a new processor called ipfix_lookup can be configured. Before the IPFIX lookup is performed, all the traces are grouped together, and a delay is added by the groupbytrace processor. The groupbytrace will group all the incoming spans by trace and wait for the wait_duration until forwarding it to the ipfix_lookup processor.

Inside the ipfix_lookup processor each trace span is then checked to see if the IP and port quartet can be extracted. When the four values (source.ip, source.port, destination.ip, destination.port, observer.ip) are found, the corresponding flow is searched in ElasticSearch. For the time frame of the search, two considerations must be made.

Firstly, there is an ingest delay in any large distributed search engine. Because of this, the spans need to be pre-processed by the groupbytrace processor. The delay can be defined in the processors.groupbytrace.wait_duration value. Afterwards, the search can be started. The time window that will be searched can be configured in the processors.ipfixLookup.timing.lookupWindow. To keep the processor simple, the lookupWindow is added before the start timestamp and after the end timestamp. This way, the chance that the Netflow/IPFIX records leading or being caused by this span is found is maximized.

summary span

A summary span was added to simplify the display of the spans in Jaeger, under which all Netflow/IPFIX spans are placed. As depicted in the screenshot the summary span is highlighted yellow and contains the TCP IP quartet in the name. Both request and response are grouped under the same summary span.

The summary span improves the ipfix_lookup processor as it can be split into two separate actions. First, the trace will be checked for the IP/Port quartet, and summary spans will be created. In the second step, the processor iterates through.

The text was updated successfully, but these errors were encountered:

djaglowski · 2023-11-15T16:13:38Z

Would you mind making a case for this being a connector vs a processor or receiver?

The only reason I bring up receiver as an option is because this was proposed previously and no argument was made against it. (Perhaps there is an obvious one but I'm not familiar with the protocol.)

If not a receiver, why not a processor? If I'm understanding correctly, it would only support traces. Therefore it's not clear that it needs to be a connector.

fizzers123 · 2023-11-15T16:55:28Z

Hi @djaglowski

The reason it would be impossible to implement as a receiver is the fact that the context propagation information can not be extracted out of the NetFlow logs. The NetFlow/IPFIX logs only provide information up to OSI Layer 4 and context propagation like the traceparent header is at OSI Layer 7.

The reason a connector was chosen is that new spans are inserted into an existing trace. With a processor such a modification would have required the workaround described in the Why use a Connector? guide. (if i understood correctly)

Historically, some processors transmitted data by making use of a work-around that follows a bad practice where a processor directly exports data after processing.
https://opentelemetry.io/docs/collector/build-connector/#why-use-a-connector

djaglowski · 2023-11-15T19:42:21Z

The reason a connector was chosen is that new spans are inserted into an existing trace. With a processor such a modification would have required the workaround described in the Why use a Connector? guide. (if i understood correctly)

Historically, some processors transmitted data by making use of a work-around that follows a bad practice where a processor directly exports data after processing.

I think there are two possible concerns to parse through here.

The first, as you cited, I think is not the same problem which is described there. That pattern was problematic because it emitted data directly to exporters, which meant there was no further opportunity to process the data. In this case, it would be possible to inject the generated spans directly into the original data stream (or replace the original altogether) and then continue processing both from there e.g. receiver -> proc 1 -> ipfixlookup -> proc 2 -> proc 3 -> exporter.

That said, the second consideration here is whether or not it is actually appropriate to do either of the above (replace the original data, or mix the generated into the original). In most situations, I would lean towards keeping generated data stream separate from the original data stream. This gives the user full control over whether to keep the original stream, keep both separate, or mix the two.

However, in this case you mentioned that we'd be generating spans which are part of the same trace. This sounds a lot like the generated and original data meaningfully belong together, but again I'm not familiar enough with the protocol to determine this. I think it would be helpful if you could clarify the following:

Is the original data intended to be replaced by the generated data? Or, is it at least sometimes useful to keep both?
If the answer to 1 is no (keep both generated and original data), do you think users may want to process the generated and original streams differently? Or, do you think both streams will generally be processed the same way?
If the answers to 1 and 2 are no (keep both, process the same), is there any specific reason why the generated and original streams are semantically different, such that users should keep them separate?

fizzers123 · 2023-11-15T20:10:47Z

The generated data are new IPFIX spans, which are part of an existing trace of spans. No original data is modified. Only new spans are added.

It makes only sense that both are kept. The new IPFIX spans are of little value without the original trace.
I can't think of a use case where splitting the steams makes sense. Therefore, I believe they will generally be processed the same way.
What exactly do you mean by semantically different?
Would you consider spans from a Java app versus spans from a Reverse Proxy semantically different?
3.1 If yes, the IPFIX spans should be kept separate. The IPFIX spans are just another source of spans.
3.2 If no, they can be handled together with all the other spans.

djaglowski · 2023-11-15T21:57:00Z

It makes only sense that both are kept. The new IPFIX spans are of little value without the original trace.

I can't think of a use case where splitting the steams makes sense. Therefore, I believe they will generally be processed the same way.

Thanks, based on these, I think a processor is probably appropriate. The only case where it would not be in my opinion would be based on the third question.

What exactly do you mean by semantically different?
Would you consider spans from a Java app versus spans from a Reverse Proxy semantically different?
3.1 If yes, the IPFIX spans should be kept separate. The IPFIX spans are just another source of spans.
3.2 If no, they can be handled together with all the other spans.

I didn't explain this well but basically I'm asking if there's some other reason not to add the generated data directly into the original data stream. It sounds like there isn't a problem, so I would still that a processor is appropriate here.

ubaumann · 2023-11-15T22:43:55Z

Maybe to explain the use case (as far as I understand ;))

IPFIX or Netflow are telemetry data about the network packet flow. So, in this case, the ELK Stack contains all the metadata from the packages sent through the network. This provides a lot of observational information. With the right queries, you can see the path a single network package took.

This project now aims to aggregate an Application trace with the exact network information. If I am looking in Jagger at an API call, I usually see all the telemetry data from the application SDK. With this approach, the application trace gets aggregated with the network information for this particular package. I would see how long the API function is running, the function would make a DB or backend API call, and I would not only see how long it takes to get to the DB/Backend, I would see the exact path my restest took over the network. This could show that we have performance issues when the path goes to the second load balancer or if any other network connection would cause some issues.
The goal is to add only the application-generated traffic of the network telemetry pool to open telemetry.

I am really interested to see this coming true as a user. The application guys blaming the network would finally be much less :D
What this approach makes unique is extracting the network metadata from the rest sent (source and destination IP and source and destination port) loo,kup at the exact path, and adding it to the application trace.

What would be the problem/impact if something is implemented as a processor or connector?

djaglowski · 2023-11-15T23:04:52Z

What would be the problem/impact if something is implemented as a processor or connector?

As far as I can tell, there wouldn't really be a difference for solving the use case, which I why I suggest a processor instead of a connector. A processor is easier to implement and more importantly easier to configure because you don't have to worry about hooking pipelines up to one another. Connectors are great for certain things but I unless I'm missing something I think it should be unnecessary and therefore unnecessarily complicated.

SuniAve · 2023-11-20T12:20:35Z

Hi @djaglowski

I work together with @fizzers123 on this project.
Thanks for your input. We agree that a processor would be the right component for our goal. We have migrated our code from a connector to a processor and are now in the process of further improvements.

This is an updated version of the illustration:

fizzers123 · 2023-12-16T15:04:33Z

We have updated the implementation quiet a bit and published our code here: https://github.com/fizzers123/opentelemetry-collector-contrib/tree/ipfix-processor-implementation/processor/ipfixlookupprocessor.

github-actions · 2024-07-31T03:29:49Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

fizzers123 added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Oct 30, 2023

github-actions bot mentioned this issue Oct 31, 2023

Weekly Report: 2023-10-24 - 2023-10-31 #28813

Closed

This was referenced Nov 7, 2023

Weekly Report: 2023-10-31 - 2023-11-07 #29000

Closed

Weekly Report: 2023-11-07 - 2023-11-14 #29245

Closed

fizzers123 changed the title ~~New component: Ipfixlookupconnector~~ New component: IPFIX Lookup Nov 20, 2023

github-actions bot mentioned this issue Nov 21, 2023

Weekly Report: 2023-11-14 - 2023-11-21 #29422

Closed

djaglowski mentioned this issue Nov 27, 2023

Semantic conventions for telemetry pipeline monitoring open-telemetry/oteps#238

Closed

github-actions bot mentioned this issue Nov 28, 2023

Weekly Report: 2023-11-21 - 2023-11-28 #29517

Closed

98 tasks

This was referenced Dec 5, 2023

Weekly Report: 2023-11-28 - 2023-12-05 #29650

Closed

Weekly Report: 2023-12-05 - 2023-12-12 #29753

Closed

github-actions bot mentioned this issue Dec 19, 2023

Weekly Report: 2023-12-12 - 2023-12-19 #30067

Closed

This was referenced Dec 23, 2023

initial framework from ipfixlookupprocessor #30193

Closed

initial framework from ipfixlookupprocessor #30194

Closed

implementation of ipfixlookupprocessor #30195

Closed

github-actions bot mentioned this issue Dec 26, 2023

Weekly Report: 2023-12-19 - 2023-12-26 #30206

Closed

github-actions bot mentioned this issue Jan 2, 2024

Weekly Report: 2023-12-26 - 2024-01-02 #30242

Closed

88 tasks

github-actions bot mentioned this issue Jan 9, 2024

Weekly Report: 2024-01-02 - 2024-01-09 #30334

Closed

github-actions bot mentioned this issue Jan 16, 2024

Weekly Report: 2024-01-09 - 2024-01-16 #30565

Closed

This was referenced Jan 23, 2024

Weekly Report: 2024-01-16 - 2024-01-23 #30711

Closed

Weekly Report: 2024-01-23 - 2024-01-30 #30848

Closed

github-actions bot mentioned this issue Jul 2, 2024

Weekly Report: 2024-06-25 - 2024-07-02 #33839

Open

github-actions bot mentioned this issue Jul 9, 2024

Weekly Report: 2024-07-02 - 2024-07-09 #33962

Open

This was referenced Jul 16, 2024

Weekly Report: 2024-07-09 - 2024-07-16 #34087

Closed

Weekly Report: 2024-07-16 - 2024-07-23 #34202

Closed

github-actions bot mentioned this issue Jul 30, 2024

Weekly Report: 2024-07-23 - 2024-07-30 #34301

Closed

github-actions bot added the Stale label Jul 31, 2024

github-actions bot mentioned this issue Aug 6, 2024

Weekly Report: 2024-07-30 - 2024-08-06 #34410

Closed

This was referenced Aug 13, 2024

Weekly Report: 2024-08-06 - 2024-08-13 #34626

Closed

Weekly Report: 2024-08-13 - 2024-08-20 #34743

Closed

This was referenced Aug 27, 2024

Weekly Report: 2024-08-20 - 2024-08-27 #34856

Closed

Weekly Report: 2024-08-27 - 2024-09-03 #34966

Closed

This was referenced Sep 10, 2024

Weekly Report: 2024-09-03 - 2024-09-10 #35086

Open

Weekly Report: 2024-09-10 - 2024-09-17 #35228

Open

github-actions bot mentioned this issue Sep 24, 2024

Weekly Report: 2024-09-17 - 2024-09-24 #35377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New component: IPFIX Lookup #28692

New component: IPFIX Lookup #28692

fizzers123 commented Oct 30, 2023 •

edited

Loading

djaglowski commented Nov 15, 2023

fizzers123 commented Nov 15, 2023

djaglowski commented Nov 15, 2023

fizzers123 commented Nov 15, 2023

djaglowski commented Nov 15, 2023

ubaumann commented Nov 15, 2023

djaglowski commented Nov 15, 2023

SuniAve commented Nov 20, 2023

fizzers123 commented Dec 16, 2023

github-actions bot commented Jul 31, 2024

New component: IPFIX Lookup #28692

New component: IPFIX Lookup #28692

Comments

fizzers123 commented Oct 30, 2023 • edited Loading

The purpose and use-cases of the new component

Example configuration for the component

Telemetry data types supported

Is this a vendor-specific component?

Code Owner(s)

Sponsor (optional)

Additional context

ipfix_lookup processor

summary span

djaglowski commented Nov 15, 2023

fizzers123 commented Nov 15, 2023

djaglowski commented Nov 15, 2023

fizzers123 commented Nov 15, 2023

djaglowski commented Nov 15, 2023

ubaumann commented Nov 15, 2023

djaglowski commented Nov 15, 2023

SuniAve commented Nov 20, 2023

fizzers123 commented Dec 16, 2023

github-actions bot commented Jul 31, 2024

fizzers123 commented Oct 30, 2023 •

edited

Loading