Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: (codelash) ⚡️ Speed up function find_all_cycle_edges by 17% #5389

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

misrasaurabh1
Copy link
Contributor

📄 find_all_cycle_edges in src/backend/base/langflow/graph/graph/utils.py

✨ Performance Summary:

  • Speed Increase: 📈 17% (0.17x faster)
  • Runtime Reduction: ⏱️ From 2.08 milliseconds down to 1.78 millisecond (best of 44 runs)

📝 Explanation and details

Here is the optimized version of the given program. The major optimization here is to avoid unnecessary list concatenations in the DFS recursion by using a more efficient approach for aggregating cycle edges.


Correctness verification

The new optimized code was tested for correctness. The results are listed below:

Test Status Details
⚙️ Existing Unit Tests 51 Passed See below
🌀 Generated Regression Tests 42 Passed See below
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Coverage 100.0%

⚙️ Existing Unit Tests Details

Click to view details
- graph/graph/test_utils.py

🌀 Generated Regression Tests Details

Click to view details
from collections import defaultdict

# imports
import pytest  # used for our unit tests
from langflow.graph.graph.utils import find_all_cycle_edges

# unit tests

# Basic Functionality
def test_single_node_no_edges():
    # Test with a single node and no edges
    codeflash_output = find_all_cycle_edges('A', [])

def test_single_edge_no_cycle():
    # Test with a single edge and no cycle
    codeflash_output = find_all_cycle_edges('A', [('A', 'B')])

def test_single_edge_with_cycle():
    # Test with a single edge that forms a cycle
    codeflash_output = find_all_cycle_edges('A', [('A', 'A')])

# Simple Cycles
def test_two_nodes_one_cycle():
    # Test with two nodes forming a cycle
    codeflash_output = find_all_cycle_edges('A', [('A', 'B'), ('B', 'A')])

def test_three_nodes_one_cycle():
    # Test with three nodes forming a cycle
    codeflash_output = find_all_cycle_edges('A', [('A', 'B'), ('B', 'C'), ('C', 'A')])

# Complex Cycles
def test_multiple_cycles():
    # Test with multiple cycles in the graph
    codeflash_output = find_all_cycle_edges('A', [('A', 'B'), ('B', 'C'), ('C', 'A'), ('B', 'D'), ('D', 'B')])

def test_interconnected_cycles():
    # Test with interconnected cycles in the graph
    codeflash_output = find_all_cycle_edges('A', [('A', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'D'), ('D', 'E'), ('E', 'C')])

# Disconnected Graph
def test_disconnected_components_no_cycles():
    # Test with disconnected components and no cycles
    codeflash_output = find_all_cycle_edges('A', [('A', 'B'), ('C', 'D')])

def test_disconnected_components_one_cycle():
    # Test with disconnected components and one cycle
    codeflash_output = find_all_cycle_edges('A', [('A', 'B'), ('B', 'A'), ('C', 'D')])

# Large Graphs
def test_large_acyclic_graph():
    # Test with a large acyclic graph
    edges = [(chr(i), chr(i+1)) for i in range(ord('A'), ord('Z'))]
    codeflash_output = find_all_cycle_edges('A', edges)

def test_large_cyclic_graph():
    # Test with a large cyclic graph
    edges = [(chr(i), chr(i+1)) for i in range(ord('A'), ord('Z'))] + [('Z', 'A')]
    codeflash_output = find_all_cycle_edges('A', edges)

# Edge Cases
def test_entry_point_not_in_graph():
    # Test when the entry point is not in the graph
    codeflash_output = find_all_cycle_edges('X', [('A', 'B'), ('B', 'C')])

def test_graph_with_only_entry_point():
    # Test with a graph that has only the entry point and a self-loop
    codeflash_output = find_all_cycle_edges('A', [('A', 'A')])

def test_graph_with_multiple_self_loops():
    # Test with a graph that has multiple self-loops
    codeflash_output = find_all_cycle_edges('A', [('A', 'A'), ('B', 'B'), ('C', 'C')])

# Performance and Scalability
def test_very_large_graph():
    # Test with a very large graph to assess performance
    edges = [(chr(i), chr(i+1)) for i in range(ord('A'), ord('Z'))] + [('Z', 'A')]
    codeflash_output = find_all_cycle_edges('A', edges)

# Directed Acyclic Graph (DAG)
def test_dag_no_cycles():
    # Test with a Directed Acyclic Graph (DAG)
    codeflash_output = find_all_cycle_edges('A', [('A', 'B'), ('B', 'C'), ('C', 'D')])

# Multiple Entry Points
def test_different_entry_points():
    # Test with different entry points in the same graph
    edges = [('A', 'B'), ('B', 'C'), ('C', 'A')]
    codeflash_output = find_all_cycle_edges('A', edges)
    codeflash_output = find_all_cycle_edges('B', edges)
    codeflash_output = find_all_cycle_edges('C', edges)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from collections import defaultdict

# imports
import pytest  # used for our unit tests
from langflow.graph.graph.utils import find_all_cycle_edges

# unit tests

def test_basic_single_cycle():
    # Graph: A -> B -> C -> A
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C'), ('C', 'A')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_basic_multiple_cycles():
    # Graph: A -> B -> C -> A, B -> D -> B
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('B', 'D'), ('D', 'B')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_basic_no_cycles():
    # Graph: A -> B -> C
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_edge_empty_graph():
    # Empty graph
    entry_point = 'A'
    edges = []
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_edge_single_node_no_edges():
    # Single node with no edges
    entry_point = 'A'
    edges = []
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_edge_single_node_self_loop():
    # Single node with a self-loop
    entry_point = 'A'
    edges = [('A', 'A')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_edge_disconnected_graph():
    # Disconnected graph: A -> B, C -> D -> C
    entry_point = 'A'
    edges = [('A', 'B'), ('C', 'D'), ('D', 'C')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)
    entry_point = 'C'
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_complex_multiple_entry_points():
    # Graph: A -> B -> C -> A, D -> E -> F -> D
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('D', 'E'), ('E', 'F'), ('F', 'D')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)
    entry_point = 'D'
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_complex_interlinked_cycles():
    # Graph: A -> B -> C -> A, B -> D -> E -> B
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('B', 'D'), ('D', 'E'), ('E', 'B')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_large_acyclic_graph():
    # Large acyclic graph
    entry_point = 'A'
    edges = [(chr(65 + i), chr(66 + i)) for i in range(1000)]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_large_cyclic_graph():
    # Large cyclic graph
    entry_point = 'A'
    edges = [(chr(65 + i), chr(66 + i)) for i in range(999)] + [('Z', 'A')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_multiple_components_isolated_cycles():
    # Graph: A -> B -> C -> A, D -> E -> F -> D
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('D', 'E'), ('E', 'F'), ('F', 'D')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)
    entry_point = 'D'
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_mixed_acyclic_and_cyclic_components():
    # Graph: A -> B -> C -> A, D -> E -> F
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('D', 'E'), ('E', 'F')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)
    entry_point = 'D'
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_complex_nested_cycles():
    # Graph: A -> B -> C -> D -> B, C -> E -> F -> C
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'B'), ('C', 'E'), ('E', 'F'), ('F', 'C')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_complex_back_edges():
    # Graph: A -> B -> C -> D, D -> B
    entry_point = 'A'
    edges = [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'B')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_redundant_edges():
    # Graph with parallel edges: A -> B, A -> B, B -> C -> A
    entry_point = 'A'
    edges = [('A', 'B'), ('A', 'B'), ('B', 'C'), ('C', 'A')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)

def test_self_loops_and_multiple_cycles():
    # Graph with self-loops and multiple cycles: A -> A, B -> C -> D -> B, E -> F -> G -> E
    entry_point = 'A'
    edges = [('A', 'A'), ('B', 'C'), ('C', 'D'), ('D', 'B'), ('E', 'F'), ('F', 'G'), ('G', 'E')]
    codeflash_output = find_all_cycle_edges(entry_point, edges)
    entry_point = 'B'
    codeflash_output = find_all_cycle_edges(entry_point, edges)
    entry_point = 'E'
    codeflash_output = find_all_cycle_edges(entry_point, edges)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

📣 **Feedback**

If you have any feedback or need assistance, feel free to join our Discord community:

Discord

Here is the optimized version of the given program. The major optimization here is to avoid unnecessary list concatenations in the DFS recursion by using a more efficient approach for aggregating cycle edges.
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Dec 20, 2024
@github-actions github-actions bot added the refactor Maintenance tasks and housekeeping label Dec 20, 2024
@github-actions github-actions bot added refactor Maintenance tasks and housekeeping and removed refactor Maintenance tasks and housekeeping labels Dec 20, 2024
Copy link

codspeed-hq bot commented Dec 20, 2024

CodSpeed Performance Report

Merging #5389 will degrade performances by 57.93%

Comparing codeflash-ai:codeflash/optimize-find_all_cycle_edges-2024-12-11T14.37.21 (8a42cc0) with main (243055e)

Summary

⚡ 1 improvements
❌ 4 regressions
✅ 10 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main codeflash-ai:codeflash/optimize-find_all_cycle_edges-2024-12-11T14.37.21 Change
test_get_and_cache_all_types_dict 1 ms 2.4 ms -57.93%
test_setup_llm_caching 2.1 ms 1.2 ms +77.28%
test_successful_run_with_input_type_any 262.7 ms 320 ms -17.91%
test_successful_run_with_output_type_any 203 ms 248.4 ms -18.29%
test_successful_run_with_output_type_debug 225.1 ms 258 ms -12.76%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Maintenance tasks and housekeeping size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant