From 3376144771f31082ddfb4719e4f82604a49a05a4 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 26 Feb 2025 18:01:56 +0100 Subject: [PATCH 1/3] gh-116666: Add glossary entry for `token` --- Doc/glossary.rst | 15 +++++++++++++++ Doc/reference/lexical_analysis.rst | 5 +++-- Doc/tutorial/errors.rst | 2 +- Doc/tutorial/interactive.rst | 8 ++++---- 4 files changed, 23 insertions(+), 7 deletions(-) diff --git a/Doc/glossary.rst b/Doc/glossary.rst index a6b94b564db177..54112a1b4c3fe2 100644 --- a/Doc/glossary.rst +++ b/Doc/glossary.rst @@ -800,6 +800,10 @@ Glossary thread removes *key* from *mapping* after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach. + lexical analyzer + + Formal name for the *tokenizer*; see :term:`token`. + list A built-in Python :term:`sequence`. Despite its name it is more akin to an array in other languages than to a linked list since access to @@ -1291,6 +1295,17 @@ Glossary See also :term:`binary file` for a file object able to read and write :term:`bytes-like objects `. + token + + A small unit of source code, generated by the + :ref:`lexical analyzer ` (also called *tokenizer*). + Names, numbers, strings, operators, + newlines and similar are represented by tokens. + + The :mod:`tokenize` module exposes Python's lexical analyzer. + The :mod:`token` module contains information on the various types + of tokens. + triple-quoted string A string which is bound by three instances of either a quotation mark (") or an apostrophe ('). While they don't provide any functionality diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index f7167032ad7df9..ff801a7d4fc494 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -8,8 +8,9 @@ Lexical analysis .. index:: lexical analysis, parser, token A Python program is read by a *parser*. Input to the parser is a stream of -*tokens*, generated by the *lexical analyzer*. This chapter describes how the -lexical analyzer breaks a file into tokens. +:term:`tokens `, generated by the *lexical analyzer* (also known as +the *tokenizer*). +This chapter describes how the lexical analyzer breaks a file into tokens. Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120` diff --git a/Doc/tutorial/errors.rst b/Doc/tutorial/errors.rst index c01cb8c14a0360..7ac34629723e53 100644 --- a/Doc/tutorial/errors.rst +++ b/Doc/tutorial/errors.rst @@ -24,7 +24,7 @@ complaint you get while you are still learning Python:: SyntaxError: invalid syntax The parser repeats the offending line and displays little arrows pointing -at the token in the line where the error was detected. The error may be +at the :term:`token` in the line where the error was detected. The error may be caused by the absence of a token *before* the indicated token. In the example, the error is detected at the function :func:`print`, since a colon (``':'``) is missing before it. File name and line number are printed so you diff --git a/Doc/tutorial/interactive.rst b/Doc/tutorial/interactive.rst index 4e054c4e6c2c32..0e2fdffca837fd 100644 --- a/Doc/tutorial/interactive.rst +++ b/Doc/tutorial/interactive.rst @@ -37,10 +37,10 @@ Alternatives to the Interactive Interpreter This facility is an enormous step forward compared to earlier versions of the interpreter; however, some wishes are left: It would be nice if the proper -indentation were suggested on continuation lines (the parser knows if an indent -token is required next). The completion mechanism might use the interpreter's -symbol table. A command to check (or even suggest) matching parentheses, -quotes, etc., would also be useful. +indentation were suggested on continuation lines (the parser knows if an +:data:`~tokens.INDENT` token is required next). The completion mechanism might +use the interpreter's symbol table. A command to check (or even suggest) +matching parentheses, quotes, etc., would also be useful. One alternative enhanced interactive interpreter that has been around for quite some time is IPython_, which features tab completion, object exploration and From 9ba976fef711ecb6d0829052e669e594e4cfec4f Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 5 Mar 2025 16:59:14 +0100 Subject: [PATCH 2/3] errors.rst: avoid talking about tokens in the SyntaxError intro --- Doc/tutorial/errors.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/Doc/tutorial/errors.rst b/Doc/tutorial/errors.rst index 7ac34629723e53..bfb281c1b7d66a 100644 --- a/Doc/tutorial/errors.rst +++ b/Doc/tutorial/errors.rst @@ -24,11 +24,12 @@ complaint you get while you are still learning Python:: SyntaxError: invalid syntax The parser repeats the offending line and displays little arrows pointing -at the :term:`token` in the line where the error was detected. The error may be -caused by the absence of a token *before* the indicated token. In the -example, the error is detected at the function :func:`print`, since a colon -(``':'``) is missing before it. File name and line number are printed so you -know where to look in case the input came from a script. +at the place where the error was detected. Note that this is not always the +place that needs to be fixed. In the example, the error is detected at the +function :func:`print`, since a colon (``':'``) is missing just before it. + +The file name (```` in our example) and line number are printed so you +know where to look in case the input came from a file. .. _tut-exceptions: From 39dc8d76efb34afcbefea49f9a8c7a75d6a8302b Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 5 Mar 2025 17:36:58 +0100 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> --- Doc/glossary.rst | 2 +- Doc/tutorial/interactive.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Doc/glossary.rst b/Doc/glossary.rst index 54112a1b4c3fe2..1734d4c7339c2b 100644 --- a/Doc/glossary.rst +++ b/Doc/glossary.rst @@ -1298,7 +1298,7 @@ Glossary token A small unit of source code, generated by the - :ref:`lexical analyzer ` (also called *tokenizer*). + :ref:`lexical analyzer ` (also called the *tokenizer*). Names, numbers, strings, operators, newlines and similar are represented by tokens. diff --git a/Doc/tutorial/interactive.rst b/Doc/tutorial/interactive.rst index 0e2fdffca837fd..00e705f999f4b2 100644 --- a/Doc/tutorial/interactive.rst +++ b/Doc/tutorial/interactive.rst @@ -38,7 +38,7 @@ Alternatives to the Interactive Interpreter This facility is an enormous step forward compared to earlier versions of the interpreter; however, some wishes are left: It would be nice if the proper indentation were suggested on continuation lines (the parser knows if an -:data:`~tokens.INDENT` token is required next). The completion mechanism might +:data:`~token.INDENT` token is required next). The completion mechanism might use the interpreter's symbol table. A command to check (or even suggest) matching parentheses, quotes, etc., would also be useful.