Skip to content

Conversation

@hlinsen
Copy link
Contributor

@hlinsen hlinsen commented Oct 14, 2025

This PR fixes uninitialized depot pred/succ when order locations are set.
There was also an issue when computing the distance cost with the exclusive scan for the order location case. We are now using the correct number of nodes for both cases.

Summary by CodeRabbit

  • Refactor
    • Simplified internal route copying mechanism by removing explicit parameters.
    • Adjusted sliding TSP local search algorithm to consider a different set of orders during optimization.
    • Modified memory allocation strategy for routing computations.

@hlinsen hlinsen added this to the 25.12 milestone Oct 14, 2025
@hlinsen hlinsen requested a review from a team as a code owner October 14, 2025 23:56
@hlinsen hlinsen added the bug Something isn't working label Oct 14, 2025
@hlinsen hlinsen added the non-breaking Introduces a non-breaking change label Oct 14, 2025
@hlinsen hlinsen changed the base branch from branch-25.10 to branch-25.12 October 14, 2025 23:56
@hlinsen hlinsen requested review from a team as code owners October 14, 2025 23:56
@hlinsen hlinsen requested review from msarahan and rgsl888prabhu and removed request for a team October 14, 2025 23:56
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR fixes critical bugs in the TSP (Traveling Salesman Problem) sliding moves algorithm when order locations are explicitly set. The changes address two main issues: uninitialized depot predecessor/successor relationships and incorrect buffer sizing for distance calculations.

The fix involves modifying copy_to_tsp_route() in route.cuh to remove the depot_included parameter and always properly initialize depot relationships using actual start/end node information rather than assuming depot is at index 0. In sliding_tsp.cu, the exclusive scan buffer size is corrected from n_nodes+1 to n_nodes+2 to prevent buffer overflows, and the algorithm now uses get_num_depot_excluded_orders() instead of get_num_orders() to get the correct node count for depot-excluded scenarios.

These changes ensure the TSP optimization works correctly regardless of whether depots are included in the route structure, fixing distance cost computation issues that could lead to suboptimal or incorrect routing solutions.

Important Files Changed

Changed Files
Filename Score Overview
cpp/src/routing/local_search/sliding_tsp.cu 4/5 Fixed TSP sliding moves by correcting scan buffer size and using proper node count calculation
cpp/src/routing/route/route.cuh 4/5 Removed depot_included parameter and ensured depot pred/succ are always properly initialized

Confidence score: 4/5

  • This PR addresses well-defined bugs with targeted fixes that improve algorithm correctness
  • Score reflects solid understanding of the problem with clear fixes, though testing coverage for edge cases is not visible
  • Pay close attention to both files as they contain critical changes to core TSP optimization logic

Sequence Diagram

sequenceDiagram
    participant User
    participant LocalSearch as "Local Search"
    participant TSPSolver as "TSP Solver"
    participant Route as "Route"
    participant MoveCandidate as "Move Candidates"
    participant GPU as "GPU Kernels"

    User->>LocalSearch: "Request TSP optimization"
    LocalSearch->>TSPSolver: "perform_sliding_tsp(solution, candidates)"
    TSPSolver->>Route: "check_routes_can_insert_and_get_sh_size()"
    Route-->>TSPSolver: "shared memory size"
    
    TSPSolver->>TSPSolver: "compute_max_active()"
    TSPSolver->>MoveCandidate: "resize_temp_storage()"
    MoveCandidate-->>TSPSolver: "temp storage ready"
    
    TSPSolver->>GPU: "fill_reverse_distances_kernel()"
    GPU->>Route: "compute reverse distances"
    Route-->>GPU: "distances computed"
    GPU-->>TSPSolver: "reverse distances ready"
    
    TSPSolver->>GPU: "compute_cumulative_distances(reverse=true)"
    GPU-->>TSPSolver: "cumulative distances computed"
    
    TSPSolver->>GPU: "find_sliding_moves_tsp()"
    GPU->>Route: "evaluate sliding window moves"
    Route->>Route: "eval_move() for each candidate"
    Route-->>GPU: "move evaluations"
    GPU-->>TSPSolver: "best moves found"
    
    TSPSolver->>GPU: "set_moved_regions_kernel()"
    GPU->>Route: "mark impacted regions"
    Route-->>GPU: "regions marked"
    
    TSPSolver->>GPU: "execute_sliding_moves_tsp()"
    GPU->>Route: "copy_to_tsp_route()"
    Route->>Route: "initialize depot pred/succ for order locations"
    Route->>Route: "apply sliding moves with reversal"
    Route->>Route: "update node sequences"
    Route-->>GPU: "moves executed"
    GPU-->>TSPSolver: "TSP moves applied"
    
    TSPSolver->>GPU: "fill_forward_distances_kernel()"
    GPU->>Route: "compute forward distances"
    Route-->>GPU: "distances computed"
    
    TSPSolver->>GPU: "compute_cumulative_distances(reverse=false)"
    GPU-->>TSPSolver: "forward cumulative distances"
    
    TSPSolver->>Route: "compute_cost()"
    Route-->>TSPSolver: "updated costs"
    TSPSolver-->>LocalSearch: "optimization complete"
    LocalSearch-->>User: "TSP solution improved"
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@hlinsen hlinsen changed the base branch from branch-25.12 to branch-25.10 October 14, 2025 23:58
@hlinsen hlinsen changed the base branch from branch-25.10 to branch-25.12 October 15, 2025 15:05
@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

Walkthrough

These changes modify the sliding TSP algorithm to exclude depot orders from consideration and simplify the route-to-TSP copying mechanism by removing a conditional parameter, altering how route endpoints are initialized during TSP route construction.

Changes

Cohort / File(s) Summary
Route interface simplification
cpp/src/routing/route/route.cuh
Removed bool depot_included parameter from copy_to_tsp_route() method; eliminated conditional depot-handling logic; now unconditionally initializes TSP predecessor/successor using endpoint indices
Sliding TSP algorithm update
cpp/src/routing/local_search/sliding_tsp.cu
Updated copy_to_tsp_route() call to use parameterless signature; changed node count computation from get_num_orders() to get_num_depot_excluded_orders(); increased exclusive-sum buffer size from n_nodes + 1 to n_nodes + 2; propagated updated node count to downstream buffer sizing and distance calculations

Sequence Diagram(s)

sequenceDiagram
    participant sliding_tsp as Sliding TSP
    participant route as Route
    participant cub as CUB Scan
    participant storage as Temp Storage

    Note over sliding_tsp: Before: n_nodes = all orders
    Note over sliding_tsp: After: n_nodes = depot-excluded orders
    
    sliding_tsp->>route: copy_to_tsp_route() [no parameter]
    Note over route: Always use endpoint-based init<br/>(no depot conditional)
    route-->>sliding_tsp: Route copied
    
    sliding_tsp->>cub: ExclusiveSum(..., n_nodes + 2)
    Note over cub: Buffer size increased from<br/>n_nodes + 1
    cub->>storage: Allocate & resize
    storage-->>cub: Ready
    cub-->>sliding_tsp: Prefix sums computed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Rationale: Changes span two files with mixed complexity—straightforward parameter removal and unconditional endpoint initialization, but coupled with a semantically significant shift in order-counting logic (all orders → depot-excluded orders) and buffer size adjustments requiring verification against algorithm correctness.

Poem

🐰 Hops of joy for TSP's dance,
Depots excluded, give orders a chance,
No more conditions to slow down the race,
Endpoints aligned in their proper place!
Buffers expanded, paths now more true—
Simple and swift, the algorithm's renewed! 🚀

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Fix TSP when order locations are set” directly references the primary issue addressed by the changeset, namely correcting uninitialized depot handling and scan-size calculations in the TSP code when order locations are provided. It is concise, avoids extraneous detail, and clearly signals the purpose of the PR to a reviewer scanning the history.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hlinsen
Copy link
Contributor Author

hlinsen commented Oct 16, 2025

/merge

@hlinsen
Copy link
Contributor Author

hlinsen commented Oct 16, 2025

/merge

@rgsl888prabhu
Copy link
Collaborator

@coderabbitai ignore pre-merge checks

@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

✅ Actions performed

Pre-merge checks override command executed successfully.

@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

❌ Error

An error occurred while trying to override pre-merge checks. Please try again later.

@rgsl888prabhu
Copy link
Collaborator

/merge

1 similar comment
@rgsl888prabhu
Copy link
Collaborator

/merge

@rgsl888prabhu rgsl888prabhu merged commit b0b719a into NVIDIA:branch-25.12 Oct 17, 2025
89 checks passed
@rgsl888prabhu
Copy link
Collaborator

Merging this manually since it is a critical PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants