Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd behavior from pasting large text in the new REPL #119517

Closed
treyhunner opened this issue May 24, 2024 · 3 comments
Closed

Odd behavior from pasting large text in the new REPL #119517

treyhunner opened this issue May 24, 2024 · 3 comments
Labels
topic-repl Related to the interactive shell type-bug An unexpected behavior, bug, or error

Comments

@treyhunner
Copy link
Member

treyhunner commented May 24, 2024

Bug report

Bug description:

This is a description of a few bugs related to paste mode in the new REPL.

Blank lines

I tried copy-pasting the first 604 lines of this copy of Frankenstein (about 32,000 characters).

After pasting, when I scroll up in my terminal I see 1 additional line above the currently visible lines (the 61 last lines) and then I see 543 blank lines. It seems that the first 543/604 lines show up as blank and the last 61 lines do not.

>>> f = """


[539 more blank lines]


... appreciate the extraordinary merits of this wonderful man.  Sometimes I
... have endeavoured to discover what quality it is which he possesses that
[57 more lines of text]
... embraced the gallant vessel on its course and wrecked it--thus!
... """
>>>

I assume this behavior may have been for the purpose of enhancing performance.
Pasting this text only took about 1 second, which is much faster than a couple weeks ago) but still slower than the old REPL.
I have not yet attempted to reproduce the blank line issue before #119341 was merged to see if it was the cause.

Paste performance and Ctrl+C

When pasting the full 7,652 line (441,033 character) text of Frankenstein takes about 21 seconds.
That's again, considerably faster than it would have been a couple weeks ago, though it does still takes a bit of time.

The most concerning part about the slow pasting is that the terminal is entirely locked during pasting.
Hitting Ctrl+C does not stop pasting and the arrow keys don't work.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

@treyhunner treyhunner added the type-bug An unexpected behavior, bug, or error label May 24, 2024
@AlexWaygood AlexWaygood added the topic-repl Related to the interactive shell label May 24, 2024
@pablogsal
Copy link
Member

pablogsal commented Jun 4, 2024

@treyhunner can you check if the first problem reproduces with the current main?

Ah, yeah it reproduces. Unfortunately to fix this we would need some considerable changes to how characters are processed so I am unsure if this will get fixed before 3.13 is out.

The problem is that to speed up pasting we are only refreshing at the end, but that doesn't input the characters in the terminal so terminal scrolling doesn't show them becase they were never written :(

CC @ambv @lysnikolaou

@pablogsal
Copy link
Member

To fix this we need to get rid of the calc_complete_screen function and replace it for something more efficient

@godlygeek
Copy link
Contributor

When pasting the full 7,652 line (441,033 character) text of Frankenstein takes about 21 seconds.

How long does it take if you export PYTHON_BASIC_REPL=1?

pablogsal added a commit to pablogsal/cpython that referenced this issue Jun 7, 2024
* Restore signal handlers for SIGINT and SIGSTOP (Ctrl-C and Ctrl-Z)
* Ensure that signals are processed as soon as possible by making
  reads more efficient.
* Protect against invalid state in internal REPL functions when
  interrumpted.
* Do not show extraneous newlines above the scroll buffer when pasting
  text in the REPL

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit that referenced this issue Jun 11, 2024
* Remove pyrepl's optimization for self-insert

This will be replaced by a less specialized optimization.

* Use line-buffering when pyrepl echoes pastes

Previously echoing was totally suppressed until the entire command had
been pasted and the terminal ended paste mode, but this gives the user
no feedback to indicate that an operation is in progress. Drawing
something to the screen once per line strikes a balance between
perceived responsiveness and performance.

* Remove dead code from pyrepl

`msg_at_bottom` is always true.

* Speed up pyrepl's screen rendering computation

The Reader in pyrepl doesn't hold a complete representation of the
screen area being drawn as persistent state. Instead, it recomputes it,
on each keypress. This is fast enough for a few hundred bytes, but
incredibly slow as the input buffer grows into the kilobytes (likely
because of pasting).

Rather than making some expensive and expansive changes to the repl's
internal representation of the screen, add some caching: remember some
data from one refresh to the next about what was drawn to the screen
and, if we don't find anything that has invalidated the results that
were computed last time around, reuse them. To keep this caching as
simple as possible, all we'll do is look for lines in the buffer that
were above the cursor the last time we were asked to update the screen,
and that are still above the cursor now. We assume that nothing can
affect a line that comes before both the old and new cursor location
without us being informed. Based on this assumption, we can reuse old
lines, which drastically speeds up the overwhelmingly common case where
the user is typing near the end of the buffer.

* Speed up pyrepl prompt drawing

Cache the `can_colorize()` call rather than repeatedly recomputing it.
This call looks up an environment variable, and is called once per
character typed at the REPL. The environment variable lookup shows up as
a hot spot when profiling, and we don't expect this to change while the
REPL is running.

* Speed up pasting multiple lines into the REPL

Previously, we were checking whether the command should be accepted each
time a line break was encountered, but that's not the expected behavior.
In bracketed paste mode, we expect everything pasted to be part of
a single block of code, and encountering a newline shouldn't behave like
a user pressing <Enter> to execute a command. The user should always
have a chance to review the pasted command before running it.

* Use a read buffer for input in pyrepl

Previously we were reading one byte at a time, which causes much slower
IO than necessary. Instead, read in chunks, processing previously read
data before asking for more.

* Optimize finding width of a single character

`wlen` finds the width of a multi-character string by adding up the
width of each character, and then subtracting the width of any escape
sequences. It's often called for single character strings, however,
which can't possibly contain escape sequences. Optimize for that case.

* Optimize disp_str for ASCII characters

Since every ASCII character is known to display as single width, we can
avoid not only the Unicode data lookup in `disp_str` but also the one
hidden in `str_width` for them.

* Speed up cursor movements in long pyrepl commands

When the current pyrepl command buffer contains many lines, scrolling up
becomes slow. We have optimizations in place to reuse lines above the
cursor position from one refresh to the next, but don't currently try to
reuse lines below the cursor position in the same way, so we wind up
with quadratic behavior where all lines of the buffer below the cursor
are recomputed each time the cursor moves up another line.

Optimize this by only computing one screen's worth of lines beyond the
cursor position. Any lines beyond that can't possibly be shown by the
console, and bounding this makes scrolling up have linear time
complexity instead.

---------

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 11, 2024
* Remove pyrepl's optimization for self-insert

This will be replaced by a less specialized optimization.

* Use line-buffering when pyrepl echoes pastes

Previously echoing was totally suppressed until the entire command had
been pasted and the terminal ended paste mode, but this gives the user
no feedback to indicate that an operation is in progress. Drawing
something to the screen once per line strikes a balance between
perceived responsiveness and performance.

* Remove dead code from pyrepl

`msg_at_bottom` is always true.

* Speed up pyrepl's screen rendering computation

The Reader in pyrepl doesn't hold a complete representation of the
screen area being drawn as persistent state. Instead, it recomputes it,
on each keypress. This is fast enough for a few hundred bytes, but
incredibly slow as the input buffer grows into the kilobytes (likely
because of pasting).

Rather than making some expensive and expansive changes to the repl's
internal representation of the screen, add some caching: remember some
data from one refresh to the next about what was drawn to the screen
and, if we don't find anything that has invalidated the results that
were computed last time around, reuse them. To keep this caching as
simple as possible, all we'll do is look for lines in the buffer that
were above the cursor the last time we were asked to update the screen,
and that are still above the cursor now. We assume that nothing can
affect a line that comes before both the old and new cursor location
without us being informed. Based on this assumption, we can reuse old
lines, which drastically speeds up the overwhelmingly common case where
the user is typing near the end of the buffer.

* Speed up pyrepl prompt drawing

Cache the `can_colorize()` call rather than repeatedly recomputing it.
This call looks up an environment variable, and is called once per
character typed at the REPL. The environment variable lookup shows up as
a hot spot when profiling, and we don't expect this to change while the
REPL is running.

* Speed up pasting multiple lines into the REPL

Previously, we were checking whether the command should be accepted each
time a line break was encountered, but that's not the expected behavior.
In bracketed paste mode, we expect everything pasted to be part of
a single block of code, and encountering a newline shouldn't behave like
a user pressing <Enter> to execute a command. The user should always
have a chance to review the pasted command before running it.

* Use a read buffer for input in pyrepl

Previously we were reading one byte at a time, which causes much slower
IO than necessary. Instead, read in chunks, processing previously read
data before asking for more.

* Optimize finding width of a single character

`wlen` finds the width of a multi-character string by adding up the
width of each character, and then subtracting the width of any escape
sequences. It's often called for single character strings, however,
which can't possibly contain escape sequences. Optimize for that case.

* Optimize disp_str for ASCII characters

Since every ASCII character is known to display as single width, we can
avoid not only the Unicode data lookup in `disp_str` but also the one
hidden in `str_width` for them.

* Speed up cursor movements in long pyrepl commands

When the current pyrepl command buffer contains many lines, scrolling up
becomes slow. We have optimizations in place to reuse lines above the
cursor position from one refresh to the next, but don't currently try to
reuse lines below the cursor position in the same way, so we wind up
with quadratic behavior where all lines of the buffer below the cursor
are recomputed each time the cursor moves up another line.

Optimize this by only computing one screen's worth of lines beyond the
cursor position. Any lines beyond that can't possibly be shown by the
console, and bounding this makes scrolling up have linear time
complexity instead.

---------

(cherry picked from commit 32a0fab)

Co-authored-by: Matt Wozniski <mwozniski@bloomberg.net>
Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal added a commit that referenced this issue Jun 11, 2024
gh-119517: Fixes for pasting in pyrepl (GH-120253)

* Remove pyrepl's optimization for self-insert

This will be replaced by a less specialized optimization.

* Use line-buffering when pyrepl echoes pastes

Previously echoing was totally suppressed until the entire command had
been pasted and the terminal ended paste mode, but this gives the user
no feedback to indicate that an operation is in progress. Drawing
something to the screen once per line strikes a balance between
perceived responsiveness and performance.

* Remove dead code from pyrepl

`msg_at_bottom` is always true.

* Speed up pyrepl's screen rendering computation

The Reader in pyrepl doesn't hold a complete representation of the
screen area being drawn as persistent state. Instead, it recomputes it,
on each keypress. This is fast enough for a few hundred bytes, but
incredibly slow as the input buffer grows into the kilobytes (likely
because of pasting).

Rather than making some expensive and expansive changes to the repl's
internal representation of the screen, add some caching: remember some
data from one refresh to the next about what was drawn to the screen
and, if we don't find anything that has invalidated the results that
were computed last time around, reuse them. To keep this caching as
simple as possible, all we'll do is look for lines in the buffer that
were above the cursor the last time we were asked to update the screen,
and that are still above the cursor now. We assume that nothing can
affect a line that comes before both the old and new cursor location
without us being informed. Based on this assumption, we can reuse old
lines, which drastically speeds up the overwhelmingly common case where
the user is typing near the end of the buffer.

* Speed up pyrepl prompt drawing

Cache the `can_colorize()` call rather than repeatedly recomputing it.
This call looks up an environment variable, and is called once per
character typed at the REPL. The environment variable lookup shows up as
a hot spot when profiling, and we don't expect this to change while the
REPL is running.

* Speed up pasting multiple lines into the REPL

Previously, we were checking whether the command should be accepted each
time a line break was encountered, but that's not the expected behavior.
In bracketed paste mode, we expect everything pasted to be part of
a single block of code, and encountering a newline shouldn't behave like
a user pressing <Enter> to execute a command. The user should always
have a chance to review the pasted command before running it.

* Use a read buffer for input in pyrepl

Previously we were reading one byte at a time, which causes much slower
IO than necessary. Instead, read in chunks, processing previously read
data before asking for more.

* Optimize finding width of a single character

`wlen` finds the width of a multi-character string by adding up the
width of each character, and then subtracting the width of any escape
sequences. It's often called for single character strings, however,
which can't possibly contain escape sequences. Optimize for that case.

* Optimize disp_str for ASCII characters

Since every ASCII character is known to display as single width, we can
avoid not only the Unicode data lookup in `disp_str` but also the one
hidden in `str_width` for them.

* Speed up cursor movements in long pyrepl commands

When the current pyrepl command buffer contains many lines, scrolling up
becomes slow. We have optimizations in place to reuse lines above the
cursor position from one refresh to the next, but don't currently try to
reuse lines below the cursor position in the same way, so we wind up
with quadratic behavior where all lines of the buffer below the cursor
are recomputed each time the cursor moves up another line.

Optimize this by only computing one screen's worth of lines beyond the
cursor position. Any lines beyond that can't possibly be shown by the
console, and bounding this makes scrolling up have linear time
complexity instead.

---------

(cherry picked from commit 32a0fab)

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Co-authored-by: Matt Wozniski <mwozniski@bloomberg.net>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
* Remove pyrepl's optimization for self-insert

This will be replaced by a less specialized optimization.

* Use line-buffering when pyrepl echoes pastes

Previously echoing was totally suppressed until the entire command had
been pasted and the terminal ended paste mode, but this gives the user
no feedback to indicate that an operation is in progress. Drawing
something to the screen once per line strikes a balance between
perceived responsiveness and performance.

* Remove dead code from pyrepl

`msg_at_bottom` is always true.

* Speed up pyrepl's screen rendering computation

The Reader in pyrepl doesn't hold a complete representation of the
screen area being drawn as persistent state. Instead, it recomputes it,
on each keypress. This is fast enough for a few hundred bytes, but
incredibly slow as the input buffer grows into the kilobytes (likely
because of pasting).

Rather than making some expensive and expansive changes to the repl's
internal representation of the screen, add some caching: remember some
data from one refresh to the next about what was drawn to the screen
and, if we don't find anything that has invalidated the results that
were computed last time around, reuse them. To keep this caching as
simple as possible, all we'll do is look for lines in the buffer that
were above the cursor the last time we were asked to update the screen,
and that are still above the cursor now. We assume that nothing can
affect a line that comes before both the old and new cursor location
without us being informed. Based on this assumption, we can reuse old
lines, which drastically speeds up the overwhelmingly common case where
the user is typing near the end of the buffer.

* Speed up pyrepl prompt drawing

Cache the `can_colorize()` call rather than repeatedly recomputing it.
This call looks up an environment variable, and is called once per
character typed at the REPL. The environment variable lookup shows up as
a hot spot when profiling, and we don't expect this to change while the
REPL is running.

* Speed up pasting multiple lines into the REPL

Previously, we were checking whether the command should be accepted each
time a line break was encountered, but that's not the expected behavior.
In bracketed paste mode, we expect everything pasted to be part of
a single block of code, and encountering a newline shouldn't behave like
a user pressing <Enter> to execute a command. The user should always
have a chance to review the pasted command before running it.

* Use a read buffer for input in pyrepl

Previously we were reading one byte at a time, which causes much slower
IO than necessary. Instead, read in chunks, processing previously read
data before asking for more.

* Optimize finding width of a single character

`wlen` finds the width of a multi-character string by adding up the
width of each character, and then subtracting the width of any escape
sequences. It's often called for single character strings, however,
which can't possibly contain escape sequences. Optimize for that case.

* Optimize disp_str for ASCII characters

Since every ASCII character is known to display as single width, we can
avoid not only the Unicode data lookup in `disp_str` but also the one
hidden in `str_width` for them.

* Speed up cursor movements in long pyrepl commands

When the current pyrepl command buffer contains many lines, scrolling up
becomes slow. We have optimizations in place to reuse lines above the
cursor position from one refresh to the next, but don't currently try to
reuse lines below the cursor position in the same way, so we wind up
with quadratic behavior where all lines of the buffer below the cursor
are recomputed each time the cursor moves up another line.

Optimize this by only computing one screen's worth of lines beyond the
cursor position. Any lines beyond that can't possibly be shown by the
console, and bounding this makes scrolling up have linear time
complexity instead.

---------

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
* Remove pyrepl's optimization for self-insert

This will be replaced by a less specialized optimization.

* Use line-buffering when pyrepl echoes pastes

Previously echoing was totally suppressed until the entire command had
been pasted and the terminal ended paste mode, but this gives the user
no feedback to indicate that an operation is in progress. Drawing
something to the screen once per line strikes a balance between
perceived responsiveness and performance.

* Remove dead code from pyrepl

`msg_at_bottom` is always true.

* Speed up pyrepl's screen rendering computation

The Reader in pyrepl doesn't hold a complete representation of the
screen area being drawn as persistent state. Instead, it recomputes it,
on each keypress. This is fast enough for a few hundred bytes, but
incredibly slow as the input buffer grows into the kilobytes (likely
because of pasting).

Rather than making some expensive and expansive changes to the repl's
internal representation of the screen, add some caching: remember some
data from one refresh to the next about what was drawn to the screen
and, if we don't find anything that has invalidated the results that
were computed last time around, reuse them. To keep this caching as
simple as possible, all we'll do is look for lines in the buffer that
were above the cursor the last time we were asked to update the screen,
and that are still above the cursor now. We assume that nothing can
affect a line that comes before both the old and new cursor location
without us being informed. Based on this assumption, we can reuse old
lines, which drastically speeds up the overwhelmingly common case where
the user is typing near the end of the buffer.

* Speed up pyrepl prompt drawing

Cache the `can_colorize()` call rather than repeatedly recomputing it.
This call looks up an environment variable, and is called once per
character typed at the REPL. The environment variable lookup shows up as
a hot spot when profiling, and we don't expect this to change while the
REPL is running.

* Speed up pasting multiple lines into the REPL

Previously, we were checking whether the command should be accepted each
time a line break was encountered, but that's not the expected behavior.
In bracketed paste mode, we expect everything pasted to be part of
a single block of code, and encountering a newline shouldn't behave like
a user pressing <Enter> to execute a command. The user should always
have a chance to review the pasted command before running it.

* Use a read buffer for input in pyrepl

Previously we were reading one byte at a time, which causes much slower
IO than necessary. Instead, read in chunks, processing previously read
data before asking for more.

* Optimize finding width of a single character

`wlen` finds the width of a multi-character string by adding up the
width of each character, and then subtracting the width of any escape
sequences. It's often called for single character strings, however,
which can't possibly contain escape sequences. Optimize for that case.

* Optimize disp_str for ASCII characters

Since every ASCII character is known to display as single width, we can
avoid not only the Unicode data lookup in `disp_str` but also the one
hidden in `str_width` for them.

* Speed up cursor movements in long pyrepl commands

When the current pyrepl command buffer contains many lines, scrolling up
becomes slow. We have optimizations in place to reuse lines above the
cursor position from one refresh to the next, but don't currently try to
reuse lines below the cursor position in the same way, so we wind up
with quadratic behavior where all lines of the buffer below the cursor
are recomputed each time the cursor moves up another line.

Optimize this by only computing one screen's worth of lines beyond the
cursor position. Any lines beyond that can't possibly be shown by the
console, and bounding this makes scrolling up have linear time
complexity instead.

---------

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
* Remove pyrepl's optimization for self-insert

This will be replaced by a less specialized optimization.

* Use line-buffering when pyrepl echoes pastes

Previously echoing was totally suppressed until the entire command had
been pasted and the terminal ended paste mode, but this gives the user
no feedback to indicate that an operation is in progress. Drawing
something to the screen once per line strikes a balance between
perceived responsiveness and performance.

* Remove dead code from pyrepl

`msg_at_bottom` is always true.

* Speed up pyrepl's screen rendering computation

The Reader in pyrepl doesn't hold a complete representation of the
screen area being drawn as persistent state. Instead, it recomputes it,
on each keypress. This is fast enough for a few hundred bytes, but
incredibly slow as the input buffer grows into the kilobytes (likely
because of pasting).

Rather than making some expensive and expansive changes to the repl's
internal representation of the screen, add some caching: remember some
data from one refresh to the next about what was drawn to the screen
and, if we don't find anything that has invalidated the results that
were computed last time around, reuse them. To keep this caching as
simple as possible, all we'll do is look for lines in the buffer that
were above the cursor the last time we were asked to update the screen,
and that are still above the cursor now. We assume that nothing can
affect a line that comes before both the old and new cursor location
without us being informed. Based on this assumption, we can reuse old
lines, which drastically speeds up the overwhelmingly common case where
the user is typing near the end of the buffer.

* Speed up pyrepl prompt drawing

Cache the `can_colorize()` call rather than repeatedly recomputing it.
This call looks up an environment variable, and is called once per
character typed at the REPL. The environment variable lookup shows up as
a hot spot when profiling, and we don't expect this to change while the
REPL is running.

* Speed up pasting multiple lines into the REPL

Previously, we were checking whether the command should be accepted each
time a line break was encountered, but that's not the expected behavior.
In bracketed paste mode, we expect everything pasted to be part of
a single block of code, and encountering a newline shouldn't behave like
a user pressing <Enter> to execute a command. The user should always
have a chance to review the pasted command before running it.

* Use a read buffer for input in pyrepl

Previously we were reading one byte at a time, which causes much slower
IO than necessary. Instead, read in chunks, processing previously read
data before asking for more.

* Optimize finding width of a single character

`wlen` finds the width of a multi-character string by adding up the
width of each character, and then subtracting the width of any escape
sequences. It's often called for single character strings, however,
which can't possibly contain escape sequences. Optimize for that case.

* Optimize disp_str for ASCII characters

Since every ASCII character is known to display as single width, we can
avoid not only the Unicode data lookup in `disp_str` but also the one
hidden in `str_width` for them.

* Speed up cursor movements in long pyrepl commands

When the current pyrepl command buffer contains many lines, scrolling up
becomes slow. We have optimizations in place to reuse lines above the
cursor position from one refresh to the next, but don't currently try to
reuse lines below the cursor position in the same way, so we wind up
with quadratic behavior where all lines of the buffer below the cursor
are recomputed each time the cursor moves up another line.

Optimize this by only computing one screen's worth of lines beyond the
cursor position. Any lines beyond that can't possibly be shown by the
console, and bounding this makes scrolling up have linear time
complexity instead.

---------

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-repl Related to the interactive shell type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants