Skip to content

KAFKA-15022: [2/N] introduce graph to compute min cost#13996

Merged
mjsax merged 9 commits intoapache:trunkfrom
lihaosky:min-cost-graph
Jul 20, 2023
Merged

KAFKA-15022: [2/N] introduce graph to compute min cost#13996
mjsax merged 9 commits intoapache:trunkfrom
lihaosky:min-cost-graph

Conversation

@lihaosky
Copy link
Contributor

@lihaosky lihaosky commented Jul 11, 2023

Description

Introduce graph to calculate min-flow if an existing flow already input

Test

Unit test.

@mjsax mjsax added streams kip Requires or implements a KIP labels Jul 12, 2023
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of initial comments.

Task :streams:checkstyleTest

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13996/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/GraphTest.java:47:30: ',' is not followed by whitespace. [WhitespaceAfter]

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13996/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/GraphTest.java:49:30: ',' is not followed by whitespace. [WhitespaceAfter]

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13996/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/GraphTest.java:158:19: Variable 'exception' should be declared final. [FinalLocalVariable]

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13996/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/GraphTest.java:169:19: Variable 'exception' should be declared final. [FinalLocalVariable]

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13996/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/GraphTest.java:182:19: Variable 'exception' should be declared final. [FinalLocalVariable]

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13996/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/GraphTest.java:194:19: Variable 'exception' should be declared final. [FinalLocalVariable]

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13996/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/GraphTest.java:206:19: Variable 'exception' should be declared final. [FinalLocalVariable]

[ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-13996/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/GraphTest.java:218:19: Variable 'exception' should be declared final. [FinalLocalVariable]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareTo is to establish an order, right? Why do we order by (destination,capacity,cost); does is matter, or could we use any order as long as deterministic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any order should be fine. But as you pointed out. We can actually remove this.

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove compareTo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Kafka, it's common practice to omit the get on getter, so it should just be nodes() (similar for other methods)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this rather go into the constructor of Edge class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use nested SortedMap? Is seems SortedMap<V, List<Edge> would be sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for getting the edge between two nodes efficiently. e.g. to get edge between node 0 and node 1, we can do adjList.get(0).get(1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we implement Comparable? Could not spot the reason why it's required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I originally used SortedMap<V, SortedSet<Edge> as adjList which need to sort Edge. Later I changed it to SortedMap<V, SortedMap<V, Edge>> and we can remove this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means their can't be a flow in this edge which essentially means these two nodes are disconnected.

Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more questions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm that you linked from the KIP, the algorithms says "go over all edges" -- we need 2 for-loops to this but what is not totally obvious; might be worth to add a coment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm it says, so it N-1 times, but we do it N times. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. N-1 times will find shortest path, we need to do it one more time to see if the path can be even shorter which mean there's a negative cycle. That's why there's a check on line 335.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be final? (cf comment below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this update? Seems it would not change? Can we remove this line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this update? Seems it would not change? Can we remove this line?

(We cannot make parentEdge final thought it seems, as it's set to a different value below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually parentEdge is changing from line 283

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, we are finding the minimum edge inside the cycle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. If you want to flow through all the edges in the cycle, the maximum you can flow is the the minimum capacity (residualFlow) in all the edges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was first confused about why we don't start with nodeInCycle and do everything is a single loop -- now I understand that we do it to construct an "exit" criteria -- seems worth to add a comment to explain it upfront.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I get this condition. Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's some flow going in one direction, there should be the same amount of flow going in opposite direction.

In the case of forward edge, the counterEdge.flow is always larger than possibleFlow because forward edge's flow is backward edge's residual flow and possibleFlow is always smaller than or equal to residual flow.

In the case of backward edge, counterEdge.flow can be smaller than possibleFlow because forwardEdge's residual flow is bounded by capacity. Actually in case's counterEdge.flow < possibleFlow, we also need to reset it to 0. I'll fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. After closer look, I think updating backward edge doesn't make much sense. So I added a forwardEdge bool in Edge to indicate whether it's forward edge and only update the flow in forward edge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this part does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to update the original graph edge to the computed flow. Basically iterating all edges in original graph and change the flow to what's in residualGraph's corresponding edge.

Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple is smaller comments/questions. Can be addressed by follow up PR (or if you want inside N/3).

int totalCost = 0;
for (final Map.Entry<V, SortedMap<V, Edge>> nodeEdges : adjList.entrySet()) {
final SortedMap<V, Edge> edges = nodeEdges.getValue();
for (final Entry<V, Edge> nodeEdge : edges.entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could just iterate over the valueSet ?

}

public Edge(final V destination, final int capacity, final int cost, final int residualFlow, final int flow,
final boolean forwardEdge) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting (if it does not fit in one line, we should move each parameter into it's one line to simplify reading)

public Edge(
    final V destination,
    final int capacity,
    final int cost,
    final int residualFlow,
    final int flow,
    final boolean forwardEdge
) {


final Graph<?>.Edge otherEdge = (Graph<?>.Edge) other;

return destination.equals(otherEdge.destination) && capacity == otherEdge.capacity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit formatting:

return destination.equals(otherEdge.destination)
    && capacity == otherEdge.capacity
    && cost == otherEdge.cost
    && residualFlow == otherEdge.residualFlow
    && flow == otherEdge.flow
    && forwardEdge == otherEdge.forwardEdge;


@Override
public String toString() {
return "{destination= " + destination + ", capacity=" + capacity + ", cost=" + cost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return "Edge: {...}" ?

@Override
public String toString() {
return "{destination= " + destination + ", capacity=" + capacity + ", cost=" + cost
+ ", residualFlow=" + residualFlow + ", flow=" + flow + ", forwardEdge=" + forwardEdge;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing }.

Should we also switch to one line per parameter we print to simplify reading?

graph.addEdge(4, 2, 1, 0, 1);
graph.addEdge(1, 5, 1, 0, 1);
graph.addEdge(3, 5, 1, 0, 1);
graph.setSourceNode(4);
Copy link
Member

@mjsax mjsax Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to use 0 as source? (At least for my mind it's easier to follow what going on, if numbers are "ordered")

Or to avoid a lot of re-writing, name the source -1 (and the sink 99) so both a clearly different, and we don't need to update too much code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will use -1 and 99 then

final V destination;
final int capacity;
final int cost;
int residualFlow;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this field?

If I read the code right, for a forward edge, it's always capacity - flow and for a backward edge it's always 0. So it seem redundant (and potentially error prone to store it expliclity)? -- Instead we could have a residualFlow() method that compute it on-the-fly (we could also simplify the update logic when modifying flow as we only need to update the flow itself)? Or do I read the update logic inside cancelNegativeCycle incorrectly and those properties are not an invariant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Parameter order... Thought it's , flow, residualFlow, (not sure why you picked the "reverse" order)

edge = edges.get(1);
assertEquals(1, edge.flow);
assertEquals(0, edge.residualFlow);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we do not check all conditions? Flow from 0->1 is shifted to 0->3, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0->3 flow is checked on line 269?

graph1.addEdge(4, 6, 1, 0, 1);

graph1.setSourceNode(5);
graph1.setSinkNode(6);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

edgeList.add(new TestEdge(4, 5, 1, 1, 0));

// Test no matter the order of adding edges, min cost flow flows from 0 to 2 and then from 2 to 5
for (int i = 0; i < 10; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to test this 10 times? Given that the test runs for each PR on nightly builds, it seems sufficient to just run it once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to test it more if someone trigger this manually. I feel 10 times won't hurt much, but I'm fine switching to 1 if you feel strongly 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test should run fast. Don't feel strongly about it.

@mjsax mjsax merged commit 6bb88ae into apache:trunk Jul 20, 2023
jeqo pushed a commit to jeqo/kafka that referenced this pull request Jul 21, 2023
Part of KIP-925.

Reviewers: Matthias J. Sax <matthias@confluent.io>
Cerchie pushed a commit to Cerchie/kafka that referenced this pull request Jul 25, 2023
Part of KIP-925.

Reviewers: Matthias J. Sax <matthias@confluent.io>
@lihaosky lihaosky deleted the min-cost-graph branch August 8, 2023 17:09
jeqo pushed a commit to aiven/kafka that referenced this pull request Aug 15, 2023
Part of KIP-925.

Reviewers: Matthias J. Sax <matthias@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kip Requires or implements a KIP streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants