From 3e0f691b3c114d57c37fc7c50e123025c92f177f Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 24 May 2022 14:22:39 +0530 Subject: [PATCH 01/18] Changes to the User Guide's Data Files page - All code snippets were given for `setup.py`. Have added corresponding snippets for `setup.cfg` and `pyproject.toml`. - To avoid incentivizing multiple top-level packages, have modified all the package trees and code snippets to include only a single package `mypkg`. Have added a separate example to illustrate the functionality of using the empty string `""` / the asterisk `*` for capturing data files from multiple packages. Have also modified the `setup.py` code snippets and removed the `find_packages("src")` since there is only a single package in each case (except one); have opted to explicitly name the package instead. - Have added a package tree example for the first `package_data` snippet. Have also added a package tree / code snippet example to show how `package_data` patterns should include subdirectories, separating it from the example showing the empty string `""` / asterisk `*` functionality. - Tried to have consistent naming for all directories and data files used in the package trees and code snippets. All directories have been named `mypkg` and data files have been named `data1.txt`, `data2.rst` etc. - Have reformatted package tree examples. Reformatting has been done by replacing the only-indentation based directory structure diagram with a line-based tree layout; I think this looks neater. - Have added `.. note::` blocks for paragraphs that would be more appropriately phased as a Note. Other minor changes to text content have been made. --- docs/userguide/datafiles.rst | 324 +++++++++++++++++++++++++++-------- 1 file changed, 251 insertions(+), 73 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 9817e63913..28fda201b8 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -9,14 +9,36 @@ by including the data files **inside the package directory**. Setuptools offers three ways to specify this most common type of data files to be included in your package's [#datafiles]_. -First, you can simply use the ``include_package_data`` keyword, e.g.:: +First, you can simply use the ``include_package_data`` keyword, e.g.: - from setuptools import setup, find_packages +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + # ... + include_package_data = True + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup setup( - ... + # ..., include_package_data=True ) +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools] + # ... + # By default, include-package-data is true in pyproject.toml, so you do + # NOT have to specify this line. + include-package-data = true + This tells setuptools to install any data files it finds in your packages. The data files must be specified via the |MANIFEST.in|_ file. (They can also be tracked by a revision control system, using an appropriate @@ -26,67 +48,187 @@ Control Systems` for information on how to write such plugins.) If you want finer-grained control over what files are included (for example, if you have documentation files in your package directories and want to exclude -them from installation), then you can also use the ``package_data`` keyword, -e.g.:: +them from installation), then you can also use the ``package_data`` keyword. +For example, if the package tree looks like this:: - from setuptools import setup, find_packages - setup( - ... - package_data={ - # If any package contains *.txt or *.rst files, include them: - "": ["*.txt", "*.rst"], - # And include any *.msg files found in the "hello" package, too: - "hello": ["*.msg"], - } - ) + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── __init__.py + ├── data1.rst + ├── data2.rst + ├── data1.txt + └── data2.txt + +You can use the following configuration to capture the ``.txt`` and ``.rst`` files as +data files: + +.. tab:: setup.cfg + + .. code-block:: ini + + # ... + [options.package_data] + mypkg = + *.txt + *.rst + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup + setup( + # ..., + package_data={"mypkg": ["*.txt", "*.rst"]} + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + # ... + [tool.setuptools.package_data] + mypkg = ["*.txt", "*.rst"] The ``package_data`` argument is a dictionary that maps from package names to lists of glob patterns. The globs may include subdirectory names, if the data files are contained in a subdirectory of the package. For example, if the package tree looks like this:: - setup.py - src/ - mypkg/ - __init__.py - mypkg.txt - data/ - somefile.dat - otherdata.dat - -The setuptools setup file might look like this:: - - from setuptools import setup, find_packages - setup( - ... - packages=find_packages("src"), # include all packages under src - package_dir={"": "src"}, # tell distutils packages are under src - - package_data={ - # If any package contains *.txt files, include them: - "": ["*.txt"], - # And include any *.dat files found in the "data" subdirectory - # of the "mypkg" package, also: - "mypkg": ["data/*.dat"], - } - ) - -Notice that if you list patterns in ``package_data`` under the empty string, -these patterns are used to find files in every package, even ones that also -have their own patterns listed. Thus, in the above example, the ``mypkg.txt`` -file gets included even though it's not listed in the patterns for ``mypkg``. + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── data + │   ├── data1.rst + │   └── data2.rst + ├── __init__.py + ├── data1.txt + └── data2.txt + +The configuration might look like this: + +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + # ... + packages = + mypkg + package_dir = + mypkg = src + + [options.package_data] + mypkg = + *.txt + data/*.rst + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup + setup( + # ..., + packages=["mypkg"], + package_dir={"mypkg": "src"}, + package_data={"mypkg": ["*.txt", "data/*.rst"]} + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools] + # ... + packages = ["mypkg"] + package-dir = { mypkg = "src" } + + [tool.setuptools.package-data] + mypkg = ["*.txt", "data/*.rst"] + +In other words, if datafiles are contained in a subdirectory of a package that isn't a +package itself (no ``__init__.py``), then the subdirectory names (or ``*`` to include +all subdirectories) are required in the ``package_data`` argument (as shown above with +``"data/*.rst"``). + +If you have multiple top-level packages and a common pattern of data files for both packages, for example:: + + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + ├── mypkg1 + │   ├── data1.rst + │   ├── data1.txt + │   └── __init__.py + └── mypkg2 + ├── data2.txt + └── __init__.py + +then you can supply a configuration like this to capture both ``mypkg1/data1.txt`` and +``mypkg2/data2.txt``, as well as ``mypkg1/data1.rst``. + +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + packages = + mypkg1 + mypkg2 + package_dir = + mypkg1 = src + mypkg2 = src + + [options.package_data] + * = + *.txt + mypkg1 = + data1.rst + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup + setup( + # ..., + packages=["mypkg1", "mypkg2"], + package_dir={"mypkg1": "src", "mypkg2": "src"}, + package_data={"": ["*.txt"], "mypkg1": ["data1.rst"]}, + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools] + # ... + packages = ["mypkg1", "mypkg2"] + package-dir = { mypkg1 = "src", mypkg2 = "src" } + + [tool.setuptools.package-data] + "*" = ["*.txt"] + mypkg1 = ["data1.rst"] + +Notice that if you list patterns in ``package_data`` under the empty string ``""`` in +``setup.py``, and the asterisk ``*`` in ``setup.cfg`` and ``pyproject.toml``, these +patterns are used to find files in every package. For example, both files +``mypkg1/data1.txt`` and ``mypkg2/data2.txt`` are captured as data files. Also note +how other patterns specified for individual packages continue to work, i.e. +``mypkg1/data1.rst`` is captured as well. Also notice that if you use paths, you *must* use a forward slash (``/``) as the path separator, even if you are on Windows. Setuptools automatically converts slashes to appropriate platform-specific separators at build time. -If datafiles are contained in a subdirectory of a package that isn't a package -itself (no ``__init__.py``), then the subdirectory names (or ``*``) are required -in the ``package_data`` argument (as shown above with ``"data/*.dat"``). - -When building an ``sdist``, the datafiles are also drawn from the -``package_name.egg-info/SOURCES.txt`` file, so make sure that this is removed if -the ``setup.py`` ``package_data`` list is updated before calling ``setup.py``. +.. note:: + When building an ``sdist``, the datafiles are also drawn from the + ``package_name.egg-info/SOURCES.txt`` file, so make sure that this is removed if + the ``setup.py`` ``package_data`` list is updated before calling ``setup.py``. .. note:: If using the ``include_package_data`` argument, files specified by @@ -101,26 +243,56 @@ aren't sufficient to precisely define what files you want included. For example, you may want to include package README files in your revision control system and source distributions, but exclude them from being installed. So, setuptools offers an ``exclude_package_data`` option as well, that allows you -to do things like this:: +to do things like this: - from setuptools import setup, find_packages - setup( - ... - packages=find_packages("src"), # include all packages under src - package_dir={"": "src"}, # tell distutils packages are under src +.. tab:: setup.cfg - include_package_data=True, # include everything in source control + .. code-block:: ini - # ...but exclude README.txt from all packages - exclude_package_data={"": ["README.txt"]}, - ) + [options] + # ... + packages = + mypkg + package_dir = + mypkg = src + include_package_data = True + + [options.exclude_package_data] + mypkg = + README.txt + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup + setup( + # ..., + packages=["mypkg"], + package_dir={"mypkg": "src"}, + include_package_data=True, + exclude_package_data={"mypkg": ["README.txt"]}, + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools] + # ... + packages = ["mypkg"] + package-dir = { mypkg = "src" } + + [tool.setuptools.exclude-package-data] + mypkg = ["README.txt"] The ``exclude_package_data`` option is a dictionary mapping package names to lists of wildcard patterns, just like the ``package_data`` option. And, just -as with that option, a key of ``""`` will apply the given pattern(s) to all -packages. However, any files that match these patterns will be *excluded* -from installation, even if they were listed in ``package_data`` or were -included as a result of using ``include_package_data``. +as with that option, you can use the empty string key ``""`` in ``setup.py`` and the +asterisk ``*`` in ``setup.cfg`` and ``pyproject.toml`` to match all top-level packages. +However, any files that match these patterns will be *excluded* from installation, +even if they were listed in ``package_data`` or were included as a result of using +``include_package_data``. In summary, the three options allow you to: @@ -138,13 +310,14 @@ In summary, the three options allow you to: included when a package is installed, even if they would otherwise have been included due to the use of the preceding options. -NOTE: Due to the way the distutils build process works, a data file that you -include in your project and then stop including may be "orphaned" in your -project's build directories, requiring you to run ``setup.py clean --all`` to -fully remove them. This may also be important for your users and contributors -if they track intermediate revisions of your project using Subversion; be sure -to let them know when you make changes that remove files from inclusion so they -can run ``setup.py clean --all``. +.. note:: + Due to the way the distutils build process works, a data file that you + include in your project and then stop including may be "orphaned" in your + project's build directories, requiring you to run ``setup.py clean --all`` to + fully remove them. This may also be important for your users and contributors + if they track intermediate revisions of your project using Subversion; be sure + to let them know when you make changes that remove files from inclusion so they + can run ``setup.py clean --all``. .. _Accessing Data Files at Runtime: @@ -189,6 +362,11 @@ run time be included **inside the package**. ---- +.. [#experimental] + Support for specifying package metadata and build configuration options via + ``pyproject.toml`` is experimental and might change + in the future. See :doc:`/userguide/pyproject_config`. + .. [#datafiles] ``setuptools`` consider a *package data file* any non-Python file **inside the package directory** (i.e., that co-exists in the same location as the regular ``.py`` files being distributed). From 9edfe7b655cfdda88912ed005a3e7d658d3884f9 Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 24 May 2022 16:54:25 +0530 Subject: [PATCH 02/18] Added news fragment --- changelog.d/3335.doc.rst | 1 + 1 file changed, 1 insertion(+) create mode 100644 changelog.d/3335.doc.rst diff --git a/changelog.d/3335.doc.rst b/changelog.d/3335.doc.rst new file mode 100644 index 0000000000..94c81d6086 --- /dev/null +++ b/changelog.d/3335.doc.rst @@ -0,0 +1 @@ +Changes to code snippets and other examples in the Data Files page of the User Guide -- by :user:`codeandfire` From 6c469ee5ca4a68374b690d5a29586718dba15b27 Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Wed, 25 May 2022 11:39:49 +0530 Subject: [PATCH 03/18] Elaborated on first example involving `include_package_data` Have tried to make the working of the `include_package_data` option as clear as possible. - Added a package tree - Tried to clearly state that the data files must be either included in `MANIFEST.in`, or tracked by a VCS, in order for them to be included in the installation of the package, when `include_package_data=True`. - Added a `MANIFEST.in` snippet to make things more clear. --- docs/userguide/datafiles.rst | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 28fda201b8..4055fd24b9 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -9,7 +9,20 @@ by including the data files **inside the package directory**. Setuptools offers three ways to specify this most common type of data files to be included in your package's [#datafiles]_. -First, you can simply use the ``include_package_data`` keyword, e.g.: +First, you can simply use the ``include_package_data`` keyword. +For example, if the package tree looks like this:: + + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── __init__.py + ├── data1.rst + ├── data2.rst + ├── data1.txt + └── data2.txt + +and you supply this configuration: .. tab:: setup.cfg @@ -39,12 +52,19 @@ First, you can simply use the ``include_package_data`` keyword, e.g.: # NOT have to specify this line. include-package-data = true -This tells setuptools to install any data files it finds in your packages. -The data files must be specified via the |MANIFEST.in|_ file. -(They can also be tracked by a revision control system, using an appropriate -plugin such as :pypi:`setuptools-scm` or :pypi:`setuptools-svn`. -See the section below on :ref:`Adding Support for Revision -Control Systems` for information on how to write such plugins.) +then all the ``.txt`` and ``.rst`` files will be automatically installed with +your package, provided: + +1. These files are included via the |MANIFEST.in|_ file, like so:: + + include src/mypkg/*.txt + include src/mypkg/*.rst + +2. OR, they are being tracked by a revision control system such as Git, Mercurial + or SVN, and you have configured an appropriate plugin such as + :pypi:`setuptools-scm` or :pypi:`setuptools-svn`. + (See the section below on :ref:`Adding Support for Revision + Control Systems` for information on how to write such plugins.) If you want finer-grained control over what files are included (for example, if you have documentation files in your package directories and want to exclude From 24f4745e9b28345fe627bd382e1bda95e9c90698 Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 12:51:10 +0530 Subject: [PATCH 04/18] Small change Removed the statement within the parentheses, since the example which follows does not illustrate this specific example (of having documentation files that you may not want to include in the installation). Besides the `exclude_package_data` option covers this exact use case in a later example. --- docs/userguide/datafiles.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 4055fd24b9..afc95cb83d 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -66,9 +66,8 @@ your package, provided: (See the section below on :ref:`Adding Support for Revision Control Systems` for information on how to write such plugins.) -If you want finer-grained control over what files are included (for example, -if you have documentation files in your package directories and want to exclude -them from installation), then you can also use the ``package_data`` keyword. +If you want finer-grained control over what files are included, then you can also use +the ``package_data`` keyword. For example, if the package tree looks like this:: project_root_directory From 97e7993caeaf474ab75679d19bd0babed36e5546 Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 13:13:01 +0530 Subject: [PATCH 05/18] Treating data subdirectories as namespace packages Modified code snippets for `package_data` example with `data` subdirectory to treat the `data` subdirectory as a namespace package. Also modified a paragraph below these snippets. --- docs/userguide/datafiles.rst | 56 ++++++++++++++++++++++-------------- 1 file changed, 35 insertions(+), 21 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index afc95cb83d..c9f6fc373f 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -112,9 +112,10 @@ data files: mypkg = ["*.txt", "*.rst"] The ``package_data`` argument is a dictionary that maps from package names to -lists of glob patterns. The globs may include subdirectory names, if the data -files are contained in a subdirectory of the package. For example, if the -package tree looks like this:: +lists of glob patterns. + +Another common pattern is where some (or all) of the data files are placed under +a separate subdirectory. For example:: project_root_directory ├── setup.py # and/or setup.cfg, pyproject.toml @@ -127,7 +128,12 @@ package tree looks like this:: ├── data1.txt └── data2.txt -The configuration might look like this: +Here, the ``.rst`` files are placed under a ``data`` subdirectory inside ``mypkg``. +The ``.txt`` files are directly under ``mypkg`` as before. + +In this case, the recommended approach is to treat ``data`` as a namespace package +(refer `PEP 420 `_). The configuration +might look like this: .. tab:: setup.cfg @@ -135,44 +141,52 @@ The configuration might look like this: [options] # ... - packages = - mypkg + packages = find_namespace: package_dir = - mypkg = src + = src + + [options.packages.find] + where = src [options.package_data] mypkg = *.txt - data/*.rst + mypkg.data = + *.rst .. tab:: setup.py .. code-block:: python - from setuptools import setup + from setuptools import setup, find_namespace_packages setup( # ..., - packages=["mypkg"], - package_dir={"mypkg": "src"}, - package_data={"mypkg": ["*.txt", "data/*.rst"]} + packages=find_namespace_packages(where="src"), + package_dir={"": "src"}, + package_data={ + "mypkg": ["*.txt"], + "mypkg.data": ["*.rst"], + } ) .. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ .. code-block:: toml - [tool.setuptools] - # ... - packages = ["mypkg"] - package-dir = { mypkg = "src" } + [tool.setuptools.packages.find] + # scanning for namespace packages is true by default in pyproject.toml, so + # you need NOT include the following line. + namespaces = true + where = ["src"] [tool.setuptools.package-data] - mypkg = ["*.txt", "data/*.rst"] + mypkg = ["*.txt"] + "mypkg.data" = ["*.rst"] -In other words, if datafiles are contained in a subdirectory of a package that isn't a -package itself (no ``__init__.py``), then the subdirectory names (or ``*`` to include -all subdirectories) are required in the ``package_data`` argument (as shown above with -``"data/*.rst"``). +In other words, we allow Setuptools to scan for namespace packages in the ``src`` directory, +which enables the ``data`` directory to be identified, and then, we separately specify data +files for the root package ``mypkg``, and the namespace package ``data`` under the package +``mypkg``. If you have multiple top-level packages and a common pattern of data files for both packages, for example:: From f2c5bd3cf6f670b94c582a852b6e85043d3779eb Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 13:28:30 +0530 Subject: [PATCH 06/18] Modified code snippets for multiple top-level packages example Made them consistent with the snippets given on the Package Discovery page. - Instead of enumerating a list of all the packages in `packages`, using `find_packages` or `find:` instead. The `find_packages` call in `setup.py` contains a `where` argument. In `setup.cfg`, included the section `options.packages.find` with a `where` option. - Instead of supplying the same `package_dir` for each package, using an empty string to indicate a `package_dir` for all packages. - In `pyproject.toml`, using the `where` option instead of `package-dir`. - Textual changes. --- docs/userguide/datafiles.rst | 40 ++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index c9f6fc373f..5ff195a938 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -188,7 +188,8 @@ which enables the ``data`` directory to be identified, and then, we separately s files for the root package ``mypkg``, and the namespace package ``data`` under the package ``mypkg``. -If you have multiple top-level packages and a common pattern of data files for both packages, for example:: +If you have multiple top-level packages and a common pattern of data files for all these +packages, for example:: project_root_directory ├── setup.py # and/or setup.cfg, pyproject.toml @@ -201,20 +202,21 @@ If you have multiple top-level packages and a common pattern of data files for b ├── data2.txt └── __init__.py -then you can supply a configuration like this to capture both ``mypkg1/data1.txt`` and -``mypkg2/data2.txt``, as well as ``mypkg1/data1.rst``. +Here, both packages ``mypkg1`` and ``mypkg2`` share a common pattern of having ``.txt`` +data files. However, only ``mypkg1`` has ``.rst`` data files. In such a case, the following +configuration will work: .. tab:: setup.cfg .. code-block:: ini [options] - packages = - mypkg1 - mypkg2 + packages = find: package_dir = - mypkg1 = src - mypkg2 = src + = src + + [options.packages.find] + where = src [options.package_data] * = @@ -226,11 +228,11 @@ then you can supply a configuration like this to capture both ``mypkg1/data1.txt .. code-block:: python - from setuptools import setup + from setuptools import setup, find_packages setup( # ..., - packages=["mypkg1", "mypkg2"], - package_dir={"mypkg1": "src", "mypkg2": "src"}, + packages=find_packages(where="src"), + package_dir={"": "src"}, package_data={"": ["*.txt"], "mypkg1": ["data1.rst"]}, ) @@ -238,21 +240,19 @@ then you can supply a configuration like this to capture both ``mypkg1/data1.txt .. code-block:: toml - [tool.setuptools] - # ... - packages = ["mypkg1", "mypkg2"] - package-dir = { mypkg1 = "src", mypkg2 = "src" } - + [tool.setuptools.packages.find] + where = ["src"] + [tool.setuptools.package-data] "*" = ["*.txt"] mypkg1 = ["data1.rst"] Notice that if you list patterns in ``package_data`` under the empty string ``""`` in ``setup.py``, and the asterisk ``*`` in ``setup.cfg`` and ``pyproject.toml``, these -patterns are used to find files in every package. For example, both files -``mypkg1/data1.txt`` and ``mypkg2/data2.txt`` are captured as data files. Also note -how other patterns specified for individual packages continue to work, i.e. -``mypkg1/data1.rst`` is captured as well. +patterns are used to find files in every package. For example, we use ``""`` or ``*`` +to indicate that the ``.txt`` files from all packages should be captured as data files. +Also note how we can continue to specify patterns for individual packages, i.e. +we specify that ``data1.rst`` from ``mypkg1`` alone should be captured as well. Also notice that if you use paths, you *must* use a forward slash (``/``) as the path separator, even if you are on Windows. Setuptools automatically From 0f0836bed517321d503fdc33ae618d74561a2c7a Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 13:51:24 +0530 Subject: [PATCH 07/18] Elaborated on example for `exclude_package_data` Tried to make why this option is useful more clear. --- docs/userguide/datafiles.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 5ff195a938..56492b0b03 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -272,11 +272,12 @@ converts slashes to appropriate platform-specific separators at build time. .. https://docs.python.org/3/distutils/setupscript.html#installing-package-data Sometimes, the ``include_package_data`` or ``package_data`` options alone -aren't sufficient to precisely define what files you want included. For -example, you may want to include package README files in your revision control -system and source distributions, but exclude them from being installed. So, -setuptools offers an ``exclude_package_data`` option as well, that allows you -to do things like this: +aren't sufficient to precisely define what files you want included. For example, +consider a scenario where you have ``include_package_data=True``, and you are using +a revision control system with an appropriate plugin. Your README is probably being +tracked by the revision control system, and therefore by default it will be included +when your package is installed. Supposing you want to prevent this README from being +included in the installation, then you could use the ``exclude_package_data`` option: .. tab:: setup.cfg From 20f393b87e39ae79ea694abbfcb6a1fefc57155e Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 13:53:36 +0530 Subject: [PATCH 08/18] Modified code snippets for `exclude_package_data` example Made them consistent with the snippets given on the Package Discovery page. The changes made here are similar to the changes made to the previous example. --- docs/userguide/datafiles.rst | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 56492b0b03..c03aaf21bb 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -285,10 +285,9 @@ included in the installation, then you could use the ``exclude_package_data`` op [options] # ... - packages = - mypkg + packages = find: package_dir = - mypkg = src + = src include_package_data = True [options.exclude_package_data] @@ -299,11 +298,11 @@ included in the installation, then you could use the ``exclude_package_data`` op .. code-block:: python - from setuptools import setup + from setuptools import setup, find_packages setup( # ..., - packages=["mypkg"], - package_dir={"mypkg": "src"}, + packages=find_packages(where="src"), + package_dir={"": "src"}, include_package_data=True, exclude_package_data={"mypkg": ["README.txt"]}, ) @@ -312,10 +311,8 @@ included in the installation, then you could use the ``exclude_package_data`` op .. code-block:: toml - [tool.setuptools] - # ... - packages = ["mypkg"] - package-dir = { mypkg = "src" } + [tool.setuptools.packages.find] + where = ["src"] [tool.setuptools.exclude-package-data] mypkg = ["README.txt"] From 57035458294f078e981201481d7709e22015a10a Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 14:05:21 +0530 Subject: [PATCH 09/18] Added note to `package_data` option In the end of the document, in the summary section, there is a line stating that the files matched by `package_data` do not require a corresponding `MANIFEST.in` or a revision control system plugin. Have included this note higher up in the document because I felt it may be of interest to users and they might miss this line so far down the document. --- docs/userguide/datafiles.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index c03aaf21bb..ffdea51e6c 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -112,7 +112,9 @@ data files: mypkg = ["*.txt", "*.rst"] The ``package_data`` argument is a dictionary that maps from package names to -lists of glob patterns. +lists of glob patterns. Note that the data files specified using the ``package_data`` +option neither require to be included within a |MANIFEST.in|_ file, nor +require to be added by a revision control system plugin. Another common pattern is where some (or all) of the data files are placed under a separate subdirectory. For example:: From b1694432f57a109a70f1abcf71d8c0c213656f43 Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 16:26:18 +0530 Subject: [PATCH 10/18] Elaborated on usage of `importlib.resources` - Added example package tree - Added snippet on how typically the `__file__` attribute would be used - Added snippet showing usage of `importlib.resources` with the `files()` API - Added notes on compatibility of this code with different Python versions along with references - Added snippet to show usage of `importlib_resources` backport --- docs/userguide/datafiles.rst | 56 ++++++++++++++++++++++++++++++++---- 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index ffdea51e6c..943b853526 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -359,12 +359,52 @@ Accessing Data Files at Runtime ------------------------------- Typically, existing programs manipulate a package's ``__file__`` attribute in -order to find the location of data files. However, this manipulation isn't -compatible with PEP 302-based import hooks, including importing from zip files -and Python Eggs. It is strongly recommended that, if you are using data files, -you should use :mod:`importlib.resources` to access them. -:mod:`importlib.resources` was added to Python 3.7 and the latest version of -the library is also available via the :pypi:`importlib-resources` backport. +order to find the location of data files. For example, if you have a structure +like this:: + + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── data + │   └── data1.txt + ├── __init__.py + └── foo.py + +Then, in ``mypkg/foo.py``, you may try something like this in order to access +``mypkg/data/data1.txt``: + +.. code-block:: python + + import os + data_path = os.path.join(os.path.dirname(__file__), 'data', 'data1.txt') + with open(data_path, 'r') as data_file: + ... + +However, this manipulation isn't compatible with PEP 302-based import hooks, +including importing from zip files and Python Eggs. It is strongly recommended that, +if you are using data files, you should use :mod:`importlib.resources` to access them. +In this case, you would do something like this: + +.. code-block:: python + + from importlib.resources import files + data_text = files('mypkg.data').joinpath('data1.txt').read_text() + +:mod:`importlib.resources` was added to Python 3.7. However, the API illustrated in +this code (using ``files()``) was added only in Python 3.9, [#files_api]_ and support +for accessing data files via namespace packages was added only in Python 3.10 [#namespace_support]_ +(the ``data`` subdirectory is a namespace package under the root package ``mypkg``). +Therefore, you may find this code to work only in Python 3.10 (and above). For other +versions of Python, you are recommended to use the :pypi:`importlib-resources` backport +which provides the latest version of this library. In this case, the only change that +has to be made to the above code is to replace ``importlib.resources`` with ``importlib_resources``, i.e. + +.. code-block:: python + + from importlib_resources import files + ... + See :doc:`importlib-resources:using` for detailed instructions [#importlib]_. .. tip:: Files inside the package directory should be *read-only* to avoid a @@ -412,6 +452,10 @@ run time be included **inside the package**. :pypi:`importlib-metadata`. However this might vary depending on which version of Python is installed. +.. [#files_api] Reference: https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy + +.. [#namespace_support] Reference: https://github.com/python/importlib_resources/pull/196#issuecomment-734520374 + .. |MANIFEST.in| replace:: ``MANIFEST.in`` .. _MANIFEST.in: https://packaging.python.org/en/latest/guides/using-manifest-in/ From bb9e256b19dede3984ddb71416a4023832aec34d Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 16:28:50 +0530 Subject: [PATCH 11/18] Removed footnote I believe this footnote is outdated and not required in lieu of the added notes describing compatibility with different Python versions --- docs/userguide/datafiles.rst | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 943b853526..81210dd51b 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -405,7 +405,7 @@ has to be made to the above code is to replace ``importlib.resources`` with ``im from importlib_resources import files ... -See :doc:`importlib-resources:using` for detailed instructions [#importlib]_. +See :doc:`importlib-resources:using` for detailed instructions. .. tip:: Files inside the package directory should be *read-only* to avoid a series of common problems (e.g. when multiple users share a common Python @@ -447,11 +447,6 @@ run time be included **inside the package**. .. [#system-dirs] These locations can be discovered with the help of third-party libraries such as :pypi:`platformdirs`. -.. [#importlib] Recent versions of :mod:`importlib.resources` available in - Pythons' standard library should be API compatible with - :pypi:`importlib-metadata`. However this might vary depending on which version - of Python is installed. - .. [#files_api] Reference: https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy .. [#namespace_support] Reference: https://github.com/python/importlib_resources/pull/196#issuecomment-734520374 From f0e4c8fd3d9b404d31c35a052ac08579f87c1714 Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 16:44:16 +0530 Subject: [PATCH 12/18] Added `packages`, `package_dir` and `where` options in all examples For consistency. --- docs/userguide/datafiles.rst | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 81210dd51b..462b860bdf 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -30,15 +30,23 @@ and you supply this configuration: [options] # ... + packages = find: + package_dir = + = src include_package_data = True + [options.packages.find] + where = src + .. tab:: setup.py .. code-block:: python - from setuptools import setup + from setuptools import setup, find_packages setup( # ..., + packages=find_packages(where="src"), + package_dir={"": "src"}, include_package_data=True ) @@ -52,6 +60,9 @@ and you supply this configuration: # NOT have to specify this line. include-package-data = true + [tool.setuptools.packages.find] + where = ["src"] + then all the ``.txt`` and ``.rst`` files will be automatically installed with your package, provided: @@ -87,7 +98,15 @@ data files: .. code-block:: ini + [options] # ... + packages = find: + package_dir = + = src + + [options.packages.find] + where = src + [options.package_data] mypkg = *.txt @@ -97,9 +116,11 @@ data files: .. code-block:: python - from setuptools import setup + from setuptools import setup, find_packages setup( # ..., + packages=find_packages(where="src"), + package_dir={"": "src"}, package_data={"mypkg": ["*.txt", "*.rst"]} ) @@ -107,7 +128,9 @@ data files: .. code-block:: toml - # ... + [tool.setuptools.packages.find] + where = ["src"] + [tool.setuptools.package_data] mypkg = ["*.txt", "*.rst"] @@ -292,6 +315,9 @@ included in the installation, then you could use the ``exclude_package_data`` op = src include_package_data = True + [options.packages.find] + where = src + [options.exclude_package_data] mypkg = README.txt From 9d47a8ff35ee3ee8db6578f48cd3b5834b5f7f3d Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 16:51:55 +0530 Subject: [PATCH 13/18] Removed footnote This footnote describes what Setuptools considers as a data file. This note is important and may be missed by the reader if it is kept as a footnote, hence I have copied its contents up ahead in the document, just after the `include_package_data` example. --- docs/userguide/datafiles.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 462b860bdf..b56319caf8 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -8,7 +8,7 @@ for data files distributed with a package is for use *by* the package, usually by including the data files **inside the package directory**. Setuptools offers three ways to specify this most common type of data files to -be included in your package's [#datafiles]_. +be included in your packages. First, you can simply use the ``include_package_data`` keyword. For example, if the package tree looks like this:: @@ -77,6 +77,10 @@ your package, provided: (See the section below on :ref:`Adding Support for Revision Control Systems` for information on how to write such plugins.) +By default, ``include_package_data`` considers **all** non ``.py`` files found inside +the package directory (``src/mypkg`` in this case) as data files, and includes those that +satisfy (at least) one of the above two conditions into the source distribution, and +consequently in the installation of your package. If you want finer-grained control over what files are included, then you can also use the ``package_data`` keyword. For example, if the package tree looks like this:: @@ -466,10 +470,6 @@ run time be included **inside the package**. ``pyproject.toml`` is experimental and might change in the future. See :doc:`/userguide/pyproject_config`. -.. [#datafiles] ``setuptools`` consider a *package data file* any non-Python - file **inside the package directory** (i.e., that co-exists in the same - location as the regular ``.py`` files being distributed). - .. [#system-dirs] These locations can be discovered with the help of third-party libraries such as :pypi:`platformdirs`. From 6c3c88420c2c9c5a9081c591fbec15782b29d77c Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 17:21:32 +0530 Subject: [PATCH 14/18] Added sections - Added `include_package_data`, `package_data` and `exclude_package_data` sections to make clear the three options provided by Setuptools to manage data files. - Added a separate section illustrating the use of a `data` subdirectory, after these three sections. - Placed the summary of the three options under a Summary section. - Changed the levels of the last two sections to match the level of the five sections added. - Small changes. Changed the wording where appropriate to suit the new flow. Changed a paragraph on path separators in glob patterns to a Note. --- docs/userguide/datafiles.rst | 180 +++++++++++++++++++---------------- 1 file changed, 98 insertions(+), 82 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index b56319caf8..8f0b18bfbb 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -7,6 +7,9 @@ are placed in a platform-specific location. However, the most common use case for data files distributed with a package is for use *by* the package, usually by including the data files **inside the package directory**. +include_package_data +==================== + Setuptools offers three ways to specify this most common type of data files to be included in your packages. First, you can simply use the ``include_package_data`` keyword. @@ -77,6 +80,9 @@ your package, provided: (See the section below on :ref:`Adding Support for Revision Control Systems` for information on how to write such plugins.) +package_data +============ + By default, ``include_package_data`` considers **all** non ``.py`` files found inside the package directory (``src/mypkg`` in this case) as data files, and includes those that satisfy (at least) one of the above two conditions into the source distribution, and @@ -95,7 +101,7 @@ For example, if the package tree looks like this:: ├── data1.txt └── data2.txt -You can use the following configuration to capture the ``.txt`` and ``.rst`` files as +then you can use the following configuration to capture the ``.txt`` and ``.rst`` files as data files: .. tab:: setup.cfg @@ -143,79 +149,10 @@ lists of glob patterns. Note that the data files specified using the ``package_d option neither require to be included within a |MANIFEST.in|_ file, nor require to be added by a revision control system plugin. -Another common pattern is where some (or all) of the data files are placed under -a separate subdirectory. For example:: - - project_root_directory - ├── setup.py # and/or setup.cfg, pyproject.toml - └── src - └── mypkg - ├── data - │   ├── data1.rst - │   └── data2.rst - ├── __init__.py - ├── data1.txt - └── data2.txt - -Here, the ``.rst`` files are placed under a ``data`` subdirectory inside ``mypkg``. -The ``.txt`` files are directly under ``mypkg`` as before. - -In this case, the recommended approach is to treat ``data`` as a namespace package -(refer `PEP 420 `_). The configuration -might look like this: - -.. tab:: setup.cfg - - .. code-block:: ini - - [options] - # ... - packages = find_namespace: - package_dir = - = src - - [options.packages.find] - where = src - - [options.package_data] - mypkg = - *.txt - mypkg.data = - *.rst - -.. tab:: setup.py - - .. code-block:: python - - from setuptools import setup, find_namespace_packages - setup( - # ..., - packages=find_namespace_packages(where="src"), - package_dir={"": "src"}, - package_data={ - "mypkg": ["*.txt"], - "mypkg.data": ["*.rst"], - } - ) - -.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ - - .. code-block:: toml - - [tool.setuptools.packages.find] - # scanning for namespace packages is true by default in pyproject.toml, so - # you need NOT include the following line. - namespaces = true - where = ["src"] - - [tool.setuptools.package-data] - mypkg = ["*.txt"] - "mypkg.data" = ["*.rst"] - -In other words, we allow Setuptools to scan for namespace packages in the ``src`` directory, -which enables the ``data`` directory to be identified, and then, we separately specify data -files for the root package ``mypkg``, and the namespace package ``data`` under the package -``mypkg``. +.. note:: + If your glob patterns use paths, you *must* use a forward slash (``/``) as + the path separator, even if you are on Windows. Setuptools automatically + converts slashes to appropriate platform-specific separators at build time. If you have multiple top-level packages and a common pattern of data files for all these packages, for example:: @@ -232,8 +169,8 @@ packages, for example:: └── __init__.py Here, both packages ``mypkg1`` and ``mypkg2`` share a common pattern of having ``.txt`` -data files. However, only ``mypkg1`` has ``.rst`` data files. In such a case, the following -configuration will work: +data files. However, only ``mypkg1`` has ``.rst`` data files. In such a case, if you want to +use the ``package_data`` option, the following configuration will work: .. tab:: setup.cfg @@ -283,10 +220,6 @@ to indicate that the ``.txt`` files from all packages should be captured as data Also note how we can continue to specify patterns for individual packages, i.e. we specify that ``data1.rst`` from ``mypkg1`` alone should be captured as well. -Also notice that if you use paths, you *must* use a forward slash (``/``) as -the path separator, even if you are on Windows. Setuptools automatically -converts slashes to appropriate platform-specific separators at build time. - .. note:: When building an ``sdist``, the datafiles are also drawn from the ``package_name.egg-info/SOURCES.txt`` file, so make sure that this is removed if @@ -300,6 +233,9 @@ converts slashes to appropriate platform-specific separators at build time. .. https://docs.python.org/3/distutils/setupscript.html#installing-package-data +exclude_package_data +==================== + Sometimes, the ``include_package_data`` or ``package_data`` options alone aren't sufficient to precisely define what files you want included. For example, consider a scenario where you have ``include_package_data=True``, and you are using @@ -357,6 +293,86 @@ However, any files that match these patterns will be *excluded* from installatio even if they were listed in ``package_data`` or were included as a result of using ``include_package_data``. +Subdirectory for Data Files +=========================== + +A common pattern is where some (or all) of the data files are placed under +a separate subdirectory. For example:: + + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── data + │   ├── data1.rst + │   └── data2.rst + ├── __init__.py + ├── data1.txt + └── data2.txt + +Here, the ``.rst`` files are placed under a ``data`` subdirectory inside ``mypkg``, +while the ``.txt`` files are directly under ``mypkg``. + +In this case, the recommended approach is to treat ``data`` as a namespace package +(refer `PEP 420 `_). The configuration +might look like this: + +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + # ... + packages = find_namespace: + package_dir = + = src + + [options.packages.find] + where = src + + [options.package_data] + mypkg = + *.txt + mypkg.data = + *.rst + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup, find_namespace_packages + setup( + # ..., + packages=find_namespace_packages(where="src"), + package_dir={"": "src"}, + package_data={ + "mypkg": ["*.txt"], + "mypkg.data": ["*.rst"], + } + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools.packages.find] + # scanning for namespace packages is true by default in pyproject.toml, so + # you need NOT include the following line. + namespaces = true + where = ["src"] + + [tool.setuptools.package-data] + mypkg = ["*.txt"] + "mypkg.data" = ["*.rst"] + +In other words, we allow Setuptools to scan for namespace packages in the ``src`` directory, +which enables the ``data`` directory to be identified, and then, we separately specify data +files for the root package ``mypkg``, and the namespace package ``data`` under the package +``mypkg``. + +Summary +======= + In summary, the three options allow you to: ``include_package_data`` @@ -386,7 +402,7 @@ In summary, the three options allow you to: .. _Accessing Data Files at Runtime: Accessing Data Files at Runtime -------------------------------- +=============================== Typically, existing programs manipulate a package's ``__file__`` attribute in order to find the location of data files. For example, if you have a structure @@ -451,7 +467,7 @@ See :doc:`importlib-resources:using` for detailed instructions. Non-Package Data Files ----------------------- +====================== Historically, ``setuptools`` by way of ``easy_install`` would encapsulate data files from the distribution into the egg (see `the old docs From 3854a8ddf196f01376d2ed5df7466c4717b3bf54 Mon Sep 17 00:00:00 2001 From: Saniya Maheshwari Date: Tue, 31 May 2022 17:37:13 +0530 Subject: [PATCH 15/18] Added an `include_package_data` snippet to the subdirectory example Just to make it clear that we can use either one of `package_data` or `include_package_data` and not just the former. --- docs/userguide/datafiles.rst | 48 ++++++++++++++++++++++++++++++++++-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 8f0b18bfbb..e8b7505d82 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -314,8 +314,8 @@ Here, the ``.rst`` files are placed under a ``data`` subdirectory inside ``mypkg while the ``.txt`` files are directly under ``mypkg``. In this case, the recommended approach is to treat ``data`` as a namespace package -(refer `PEP 420 `_). The configuration -might look like this: +(refer `PEP 420 `_). With ``package_data``, +the configuration might look like this: .. tab:: setup.cfg @@ -370,6 +370,50 @@ which enables the ``data`` directory to be identified, and then, we separately s files for the root package ``mypkg``, and the namespace package ``data`` under the package ``mypkg``. +With ``include_package_data`` the configuration is simpler: you simply need to enable +scanning of namespace packages in the ``src`` directory and the rest is handled by Setuptools. + +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + packages = find_namespace: + package_dir = + = src + include_package_data = True + + [options.packages.find] + where = src + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup, find_namespace_packages + setup( + # ... , + packages=find_namespace_packages(where="src"), + package_dir={"": "src"}, + include_package_data=True, + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools] + # ... + # By default, include-package-data is true in pyproject.toml, so you do + # NOT have to specify this line. + include-package-data = true + + [tool.setuptools.packages.find] + # scanning for namespace packages is true by default in pyproject.toml, so + # you need NOT include the following line. + namespaces = true + where = ["src"] + Summary ======= From dfdc6d5f3788fcf91ae669be7367a8ddf9992ea2 Mon Sep 17 00:00:00 2001 From: Anderson Bravalheri Date: Tue, 7 Jun 2022 17:11:45 +0100 Subject: [PATCH 16/18] Apply suggestions from code review --- docs/userguide/datafiles.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index e8b7505d82..4bc2ad9c80 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -141,7 +141,7 @@ data files: [tool.setuptools.packages.find] where = ["src"] - [tool.setuptools.package_data] + [tool.setuptools.package-data] mypkg = ["*.txt", "*.rst"] The ``package_data`` argument is a dictionary that maps from package names to @@ -314,7 +314,7 @@ Here, the ``.rst`` files are placed under a ``data`` subdirectory inside ``mypkg while the ``.txt`` files are directly under ``mypkg``. In this case, the recommended approach is to treat ``data`` as a namespace package -(refer `PEP 420 `_). With ``package_data``, +(refer :pep:`420`). With ``package_data``, the configuration might look like this: .. tab:: setup.cfg @@ -357,7 +357,7 @@ the configuration might look like this: [tool.setuptools.packages.find] # scanning for namespace packages is true by default in pyproject.toml, so - # you need NOT include the following line. + # you do NOT need to include the following line. namespaces = true where = ["src"] @@ -471,7 +471,7 @@ Then, in ``mypkg/foo.py``, you may try something like this in order to access with open(data_path, 'r') as data_file: ... -However, this manipulation isn't compatible with PEP 302-based import hooks, +However, this manipulation isn't compatible with :pep:`302`-based import hooks, including importing from zip files and Python Eggs. It is strongly recommended that, if you are using data files, you should use :mod:`importlib.resources` to access them. In this case, you would do something like this: From 10cbf95ba513c13cbffef54761a5a7e5f668dd96 Mon Sep 17 00:00:00 2001 From: Anderson Bravalheri Date: Tue, 7 Jun 2022 17:34:31 +0100 Subject: [PATCH 17/18] Add a more realistic example for exclude-package-data --- docs/userguide/datafiles.rst | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 4bc2ad9c80..260cdbb130 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -239,10 +239,14 @@ exclude_package_data Sometimes, the ``include_package_data`` or ``package_data`` options alone aren't sufficient to precisely define what files you want included. For example, consider a scenario where you have ``include_package_data=True``, and you are using -a revision control system with an appropriate plugin. Your README is probably being -tracked by the revision control system, and therefore by default it will be included -when your package is installed. Supposing you want to prevent this README from being -included in the installation, then you could use the ``exclude_package_data`` option: +a revision control system with an appropriate plugin. +Sometimes developers add directory-specific marker files (such as `.gitignore`, +`.gitkeep`, `.gitattributes`, or `.hgignore`), these files are probably being +tracked by the revision control system, and therefore by default they will be +included when the package is installed. +Supposing you want to prevent these files from being included in the +installation (they are not relevant to Python or the package), then you could +use the ``exclude_package_data`` option: .. tab:: setup.cfg @@ -260,7 +264,7 @@ included in the installation, then you could use the ``exclude_package_data`` op [options.exclude_package_data] mypkg = - README.txt + .gitattributes .. tab:: setup.py @@ -272,7 +276,7 @@ included in the installation, then you could use the ``exclude_package_data`` op packages=find_packages(where="src"), package_dir={"": "src"}, include_package_data=True, - exclude_package_data={"mypkg": ["README.txt"]}, + exclude_package_data={"mypkg": [".gitattributes"]}, ) .. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ @@ -283,13 +287,14 @@ included in the installation, then you could use the ``exclude_package_data`` op where = ["src"] [tool.setuptools.exclude-package-data] - mypkg = ["README.txt"] + mypkg = [".gitattributes"] The ``exclude_package_data`` option is a dictionary mapping package names to lists of wildcard patterns, just like the ``package_data`` option. And, just as with that option, you can use the empty string key ``""`` in ``setup.py`` and the asterisk ``*`` in ``setup.cfg`` and ``pyproject.toml`` to match all top-level packages. -However, any files that match these patterns will be *excluded* from installation, + +Any files that match these patterns will be *excluded* from installation, even if they were listed in ``package_data`` or were included as a result of using ``include_package_data``. From 463b3409cb413e881fdbc91f858e7a9d825fc6f4 Mon Sep 17 00:00:00 2001 From: Anderson Bravalheri Date: Tue, 7 Jun 2022 17:46:09 +0100 Subject: [PATCH 18/18] Small changes avoiding mentioning distutils directly --- docs/userguide/datafiles.rst | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 260cdbb130..8622b6c447 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -2,16 +2,19 @@ Data Files Support ==================== -The distutils have traditionally allowed installation of "data files", which +Old packaging installation methods in the Python ecosystem +have traditionally allowed installation of "data files", which are placed in a platform-specific location. However, the most common use case for data files distributed with a package is for use *by* the package, usually by including the data files **inside the package directory**. +Setuptools focuses on this most common type of data files and offers three ways +of specifying which files should be included in your packages, as described in +the following sections. + include_package_data ==================== -Setuptools offers three ways to specify this most common type of data files to -be included in your packages. First, you can simply use the ``include_package_data`` keyword. For example, if the package tree looks like this:: @@ -244,6 +247,7 @@ Sometimes developers add directory-specific marker files (such as `.gitignore`, `.gitkeep`, `.gitattributes`, or `.hgignore`), these files are probably being tracked by the revision control system, and therefore by default they will be included when the package is installed. + Supposing you want to prevent these files from being included in the installation (they are not relevant to Python or the package), then you could use the ``exclude_package_data`` option: @@ -439,7 +443,7 @@ In summary, the three options allow you to: been included due to the use of the preceding options. .. note:: - Due to the way the distutils build process works, a data file that you + Due to the way the build process works, a data file that you include in your project and then stop including may be "orphaned" in your project's build directories, requiring you to run ``setup.py clean --all`` to fully remove them. This may also be important for your users and contributors