Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throughput calculation logic #270

Merged
merged 1 commit into from
Dec 10, 2021

Conversation

heanlan
Copy link
Contributor

@heanlan heanlan commented Oct 26, 2021

  • Background
    Originally we don't provide throughput information in the IPFIX record, we compute throughput(byte/s) at the front end, e.g. aggregating octetDeltaCount of all the records of a connection in last 60s, and divide by 60s. In order to decrease the computation complexity at the front end, we decide to add throughput calculation into flow-aggregator, and report the latest throughput in the form of IPFIX record fields. We take this PR as the first step of supporting flow-aggregator's throughput calculation, by simply adding the fields and basic calculation logic. As a TODO, we will consider making changes on when to trigger the records exporting, based on whether we receive the equal number of records from both source/destination nodes, to make the throughput more accurate.

  • Methods
    For each connection, we calculate throughput, reverseThroughput, node throughput, reverse node throughput in the aggregation process.
    Node throughput of an aggregated flow record is defined as: diff(octetTotalCount) / diff(flowEndSeconds) between the newly-received record and the last received record from the same node.
    Common field throughput is defined as: if the newly-received record has the latest flowEndSeconds timestamp, common field throughput will be updated by the node-throughput calculated by the newly-received record.

  • Fields added to IPFIX record:

antreaFlowEndSecondsElementList = []string{
"flowEndSecondsFromSourceNode",
"flowEndSecondsFromDestinationNode",
}

antreaThroughputElementList = []string{
"throughput",
"reverseThroughput",
}

antreaSourceThroughputElementList = []string{
"throughputFromSourceNode",
"reverseThroughputFromSourceNode",
}

antreaDestinationThroughputElementList = []string{
"throughputFromDestinationNode",
"reverseThroughputFromDestinationNode",
}
  • Rationale of adding the fields:
    • antreaFlowEndSecondsElementList are added as I suppose, when we report throughputFromNode values, it makes more sense if they are bonding with timestampsFromNode. For example, in the case when we receive n records from src node, while only n-1 records from dst node, and we need to export the aggregated record now due to active timeout. If we don't provide flowEndSecondsFromSourceNode, flowEndSecondsFromDestinationNode, the front end will treat throughputFromSourceNode and throughputFromDestinationNode both w.r.t the common field flowEndSeconds, which is updated by the n-th record from src node. In fact, we haven't received the record from the destination node w.r.t. the time flowEndSeconds.

    • antreaThroughputElementList, antreaSourceThroughputElementList, antreaDestinationThroughputElementList are added as we have the requirements to display per-connection throughput, reverse throughput, node throughput, reverse node throughput information at the front end. They are grouped in three separate groups in order to make use of their index to make the code more compact and clear, reference to statsElementList, antreaSourceStatsElementList, antreaDestinationStatsElementList.

Fixes: #265

@heanlan heanlan marked this pull request as draft October 26, 2021 21:21
@heanlan heanlan force-pushed the throughput-calculation branch 4 times, most recently from af88ed6 to 5b3d1eb Compare November 2, 2021 20:32
@codecov
Copy link

codecov bot commented Nov 2, 2021

Codecov Report

Merging #270 (828b7d6) into main (6cb56c8) will decrease coverage by 0.77%.
The diff coverage is 65.16%.

❗ Current head 828b7d6 differs from pull request most recent head 9fe6033. Consider uploading reports for the commit 9fe6033 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #270      +/-   ##
==========================================
- Coverage   73.80%   73.03%   -0.78%     
==========================================
  Files          18       18              
  Lines        2634     2792     +158     
==========================================
+ Hits         1944     2039      +95     
- Misses        543      580      +37     
- Partials      147      173      +26     
Flag Coverage Δ
integration-tests 50.89% <51.76%> (-0.83%) ⬇️
unit-tests 71.74% <65.16%> (-0.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/collector/tcp.go 63.75% <50.00%> (-0.36%) ⬇️
pkg/intermediate/aggregate.go 70.78% <63.85%> (-4.58%) ⬇️
pkg/registry/registry_antrea.go 100.00% <100.00%> (ø)

@heanlan heanlan force-pushed the throughput-calculation branch 8 times, most recently from b7db918 to b2e51bc Compare November 8, 2021 20:37
@heanlan heanlan marked this pull request as ready for review November 8, 2021 20:44
@heanlan
Copy link
Contributor Author

heanlan commented Nov 8, 2021

@srikartati @zyiou could you please help take a look at the overall logic to see whether it's on the right track, mainly in aggregate.go? Thanks

I know there's a lot to be optimized on the coding style, e.g. avoiding duplicated code. I'll work on that at the same time.

pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate_test.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
pkg/intermediate/aggregate.go Show resolved Hide resolved
pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
@heanlan heanlan force-pushed the throughput-calculation branch 3 times, most recently from 02b20b2 to 7eef33c Compare November 15, 2021 22:40
@heanlan heanlan marked this pull request as draft November 16, 2021 20:25
@heanlan heanlan force-pushed the throughput-calculation branch 13 times, most recently from 539f3c9 to 6b020cf Compare November 23, 2021 20:12
@heanlan heanlan force-pushed the throughput-calculation branch 3 times, most recently from 828b7d6 to 0a2140c Compare November 29, 2021 18:16
Copy link
Contributor Author

@heanlan heanlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zyiou @dreamtalen , thanks for the reviews.
I made the corresponding changes on Antrea side in PR2692, along with some small changes in this PR since last review, which are pointed out below.
I apologized for mixing it up with the previous commit. Please take a look when you have time. Thanks!

pkg/intermediate/aggregate.go Outdated Show resolved Hide resolved
}

// Initialize flowEndSecondsFromSourceNode and flowEndSecondsFromDestinationNode.
for _, ieName := range antreaFlowEndSecondsElements {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change: Adding a group antreaFlowEndSecondsElements instead of "hard-coding" the fields name. Also to make the pattern more consistent with other extra fields at flowaggregator.go on Antrea side - code


// Initialize the throughput elements.
var incomingVal, reverseIncomingVal uint64
// For the edge case when the record has the same timeEnd and timeStart values,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change: Adding an edge case handling, since for some of the conntrack connections, when the start time is zero, both start and end time will be set reset to the current time - code

@heanlan heanlan marked this pull request as ready for review November 30, 2021 01:46
@heanlan heanlan requested a review from zyiou November 30, 2021 01:47
Signed-off-by: heanlan <hanlan@vmware.com>
Copy link
Contributor

@zyiou zyiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@heanlan heanlan merged commit a1a8e81 into vmware:main Dec 10, 2021
heanlan added a commit to heanlan/go-ipfix that referenced this pull request Dec 10, 2021
Add throughput calculation into flow-aggregator aggregation process.
Signed-off-by: heanlan <hanlan@vmware.com>
zyiou pushed a commit that referenced this pull request Dec 10, 2021
Add throughput calculation into flow-aggregator aggregation process.
Signed-off-by: heanlan <hanlan@vmware.com>
@heanlan heanlan deleted the throughput-calculation branch December 10, 2021 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add connection throughput calculation into aggregation process
4 participants