difflib.py Differ.compare is too slow [for degenerate cases]

# Bug report

### Bug description:

```python
import difflib

a = ["0123456789\n"] * 1_000
b = ["01234a56789\n"] * 1_000
list(difflib.Differ().compare(a, b))  # very slow and will probably hit max recursion depth
```
The case is pathological in the sense of many lines with the same exact diff / ratio.

The issue is that in the current implementation `_fancy_replace` will take the first pair of lines (with the same ratio) as a split point and will call itself recursively for all lines starting at 2, then 3, 4, etc. This repeats `1_000` times resulting in a massive recursion depth and `O(N^3)` complexity scaling.

For an average random case it should split anywhere in the range of  `1_000` with a complexity scaling of `O(N^2 log N)`.

I personally encountered this in diffing csv files where one of the files has a column added which, apparently, results in all-same ratios for every line in the file.

### Proposal

Fixing this is not so hard by adding some heuristics (WIP) https://github.com/pulkin/cpython/commit/31e1ed03cf05bdbc9b4695c8bb680e65963f9bff

The idea is very straightforward: while doing the `_fancy_replace` magic, if you see many diffs with the same exact ratio, pick the one closest to the middle of chunks rather than the first one (which can be the worst possible choice). This is done by promoting the ratios of those pairs of lines that are closer to the middle of the chunks.

The `_drag_to_center` function turns line number into the weight added to the ratio (twice: for a and for b). The weight is zero for both ends of chunks and maximal in the middle (quadratic poly was chosen for simplicity). The magnitude of the weight "`_gravity`" is small enough to only affect ratios that are exactly the same: it relies on the assumption that we probably have less than 500k symbols in the line such that the steps in the `ratio` are greater than 1e-6. If this assumption fails some diffs may become different (not necessarily worse).

Performance impact for non-pathological cases is probably minimal.

### CPython versions tested on:

3.9, 3.12

### Operating systems tested on:

Linux


### Linked PRs
* gh-119131
* gh-119376
* gh-119492

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

difflib.py Differ.compare is too slow [for degenerate cases] #119105

Bug report

Bug description:

Proposal

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

difflib.py Differ.compare is too slow [for degenerate cases] #119105

Description

Bug report

Bug description:

Proposal

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions