diff --git a/changelog.d/3335.doc.rst b/changelog.d/3335.doc.rst new file mode 100644 index 0000000000..94c81d6086 --- /dev/null +++ b/changelog.d/3335.doc.rst @@ -0,0 +1 @@ +Changes to code snippets and other examples in the Data Files page of the User Guide -- by :user:`codeandfire` diff --git a/docs/userguide/datafiles.rst b/docs/userguide/datafiles.rst index 9817e63913..8622b6c447 100644 --- a/docs/userguide/datafiles.rst +++ b/docs/userguide/datafiles.rst @@ -2,91 +2,231 @@ Data Files Support ==================== -The distutils have traditionally allowed installation of "data files", which +Old packaging installation methods in the Python ecosystem +have traditionally allowed installation of "data files", which are placed in a platform-specific location. However, the most common use case for data files distributed with a package is for use *by* the package, usually by including the data files **inside the package directory**. -Setuptools offers three ways to specify this most common type of data files to -be included in your package's [#datafiles]_. -First, you can simply use the ``include_package_data`` keyword, e.g.:: +Setuptools focuses on this most common type of data files and offers three ways +of specifying which files should be included in your packages, as described in +the following sections. + +include_package_data +==================== + +First, you can simply use the ``include_package_data`` keyword. +For example, if the package tree looks like this:: + + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── __init__.py + ├── data1.rst + ├── data2.rst + ├── data1.txt + └── data2.txt + +and you supply this configuration: + +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + # ... + packages = find: + package_dir = + = src + include_package_data = True + + [options.packages.find] + where = src + +.. tab:: setup.py + + .. code-block:: python from setuptools import setup, find_packages setup( - ... + # ..., + packages=find_packages(where="src"), + package_dir={"": "src"}, include_package_data=True ) -This tells setuptools to install any data files it finds in your packages. -The data files must be specified via the |MANIFEST.in|_ file. -(They can also be tracked by a revision control system, using an appropriate -plugin such as :pypi:`setuptools-scm` or :pypi:`setuptools-svn`. -See the section below on :ref:`Adding Support for Revision -Control Systems` for information on how to write such plugins.) +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ -If you want finer-grained control over what files are included (for example, -if you have documentation files in your package directories and want to exclude -them from installation), then you can also use the ``package_data`` keyword, -e.g.:: + .. code-block:: toml - from setuptools import setup, find_packages - setup( - ... - package_data={ - # If any package contains *.txt or *.rst files, include them: - "": ["*.txt", "*.rst"], - # And include any *.msg files found in the "hello" package, too: - "hello": ["*.msg"], - } - ) + [tool.setuptools] + # ... + # By default, include-package-data is true in pyproject.toml, so you do + # NOT have to specify this line. + include-package-data = true + + [tool.setuptools.packages.find] + where = ["src"] + +then all the ``.txt`` and ``.rst`` files will be automatically installed with +your package, provided: + +1. These files are included via the |MANIFEST.in|_ file, like so:: + + include src/mypkg/*.txt + include src/mypkg/*.rst + +2. OR, they are being tracked by a revision control system such as Git, Mercurial + or SVN, and you have configured an appropriate plugin such as + :pypi:`setuptools-scm` or :pypi:`setuptools-svn`. + (See the section below on :ref:`Adding Support for Revision + Control Systems` for information on how to write such plugins.) + +package_data +============ + +By default, ``include_package_data`` considers **all** non ``.py`` files found inside +the package directory (``src/mypkg`` in this case) as data files, and includes those that +satisfy (at least) one of the above two conditions into the source distribution, and +consequently in the installation of your package. +If you want finer-grained control over what files are included, then you can also use +the ``package_data`` keyword. +For example, if the package tree looks like this:: + + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── __init__.py + ├── data1.rst + ├── data2.rst + ├── data1.txt + └── data2.txt + +then you can use the following configuration to capture the ``.txt`` and ``.rst`` files as +data files: + +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + # ... + packages = find: + package_dir = + = src + + [options.packages.find] + where = src + + [options.package_data] + mypkg = + *.txt + *.rst + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup, find_packages + setup( + # ..., + packages=find_packages(where="src"), + package_dir={"": "src"}, + package_data={"mypkg": ["*.txt", "*.rst"]} + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools.packages.find] + where = ["src"] + + [tool.setuptools.package-data] + mypkg = ["*.txt", "*.rst"] The ``package_data`` argument is a dictionary that maps from package names to -lists of glob patterns. The globs may include subdirectory names, if the data -files are contained in a subdirectory of the package. For example, if the -package tree looks like this:: +lists of glob patterns. Note that the data files specified using the ``package_data`` +option neither require to be included within a |MANIFEST.in|_ file, nor +require to be added by a revision control system plugin. - setup.py - src/ - mypkg/ - __init__.py - mypkg.txt - data/ - somefile.dat - otherdata.dat +.. note:: + If your glob patterns use paths, you *must* use a forward slash (``/``) as + the path separator, even if you are on Windows. Setuptools automatically + converts slashes to appropriate platform-specific separators at build time. -The setuptools setup file might look like this:: +If you have multiple top-level packages and a common pattern of data files for all these +packages, for example:: - from setuptools import setup, find_packages - setup( - ... - packages=find_packages("src"), # include all packages under src - package_dir={"": "src"}, # tell distutils packages are under src - - package_data={ - # If any package contains *.txt files, include them: - "": ["*.txt"], - # And include any *.dat files found in the "data" subdirectory - # of the "mypkg" package, also: - "mypkg": ["data/*.dat"], - } - ) + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + ├── mypkg1 + │   ├── data1.rst + │   ├── data1.txt + │   └── __init__.py + └── mypkg2 + ├── data2.txt + └── __init__.py -Notice that if you list patterns in ``package_data`` under the empty string, -these patterns are used to find files in every package, even ones that also -have their own patterns listed. Thus, in the above example, the ``mypkg.txt`` -file gets included even though it's not listed in the patterns for ``mypkg``. +Here, both packages ``mypkg1`` and ``mypkg2`` share a common pattern of having ``.txt`` +data files. However, only ``mypkg1`` has ``.rst`` data files. In such a case, if you want to +use the ``package_data`` option, the following configuration will work: -Also notice that if you use paths, you *must* use a forward slash (``/``) as -the path separator, even if you are on Windows. Setuptools automatically -converts slashes to appropriate platform-specific separators at build time. +.. tab:: setup.cfg -If datafiles are contained in a subdirectory of a package that isn't a package -itself (no ``__init__.py``), then the subdirectory names (or ``*``) are required -in the ``package_data`` argument (as shown above with ``"data/*.dat"``). + .. code-block:: ini -When building an ``sdist``, the datafiles are also drawn from the -``package_name.egg-info/SOURCES.txt`` file, so make sure that this is removed if -the ``setup.py`` ``package_data`` list is updated before calling ``setup.py``. + [options] + packages = find: + package_dir = + = src + + [options.packages.find] + where = src + + [options.package_data] + * = + *.txt + mypkg1 = + data1.rst + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup, find_packages + setup( + # ..., + packages=find_packages(where="src"), + package_dir={"": "src"}, + package_data={"": ["*.txt"], "mypkg1": ["data1.rst"]}, + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools.packages.find] + where = ["src"] + + [tool.setuptools.package-data] + "*" = ["*.txt"] + mypkg1 = ["data1.rst"] + +Notice that if you list patterns in ``package_data`` under the empty string ``""`` in +``setup.py``, and the asterisk ``*`` in ``setup.cfg`` and ``pyproject.toml``, these +patterns are used to find files in every package. For example, we use ``""`` or ``*`` +to indicate that the ``.txt`` files from all packages should be captured as data files. +Also note how we can continue to specify patterns for individual packages, i.e. +we specify that ``data1.rst`` from ``mypkg1`` alone should be captured as well. + +.. note:: + When building an ``sdist``, the datafiles are also drawn from the + ``package_name.egg-info/SOURCES.txt`` file, so make sure that this is removed if + the ``setup.py`` ``package_data`` list is updated before calling ``setup.py``. .. note:: If using the ``include_package_data`` argument, files specified by @@ -96,31 +236,195 @@ the ``setup.py`` ``package_data`` list is updated before calling ``setup.py``. .. https://docs.python.org/3/distutils/setupscript.html#installing-package-data +exclude_package_data +==================== + Sometimes, the ``include_package_data`` or ``package_data`` options alone -aren't sufficient to precisely define what files you want included. For -example, you may want to include package README files in your revision control -system and source distributions, but exclude them from being installed. So, -setuptools offers an ``exclude_package_data`` option as well, that allows you -to do things like this:: +aren't sufficient to precisely define what files you want included. For example, +consider a scenario where you have ``include_package_data=True``, and you are using +a revision control system with an appropriate plugin. +Sometimes developers add directory-specific marker files (such as `.gitignore`, +`.gitkeep`, `.gitattributes`, or `.hgignore`), these files are probably being +tracked by the revision control system, and therefore by default they will be +included when the package is installed. - from setuptools import setup, find_packages - setup( - ... - packages=find_packages("src"), # include all packages under src - package_dir={"": "src"}, # tell distutils packages are under src +Supposing you want to prevent these files from being included in the +installation (they are not relevant to Python or the package), then you could +use the ``exclude_package_data`` option: - include_package_data=True, # include everything in source control +.. tab:: setup.cfg - # ...but exclude README.txt from all packages - exclude_package_data={"": ["README.txt"]}, - ) + .. code-block:: ini + + [options] + # ... + packages = find: + package_dir = + = src + include_package_data = True + + [options.packages.find] + where = src + + [options.exclude_package_data] + mypkg = + .gitattributes + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup, find_packages + setup( + # ..., + packages=find_packages(where="src"), + package_dir={"": "src"}, + include_package_data=True, + exclude_package_data={"mypkg": [".gitattributes"]}, + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools.packages.find] + where = ["src"] + + [tool.setuptools.exclude-package-data] + mypkg = [".gitattributes"] The ``exclude_package_data`` option is a dictionary mapping package names to lists of wildcard patterns, just like the ``package_data`` option. And, just -as with that option, a key of ``""`` will apply the given pattern(s) to all -packages. However, any files that match these patterns will be *excluded* -from installation, even if they were listed in ``package_data`` or were -included as a result of using ``include_package_data``. +as with that option, you can use the empty string key ``""`` in ``setup.py`` and the +asterisk ``*`` in ``setup.cfg`` and ``pyproject.toml`` to match all top-level packages. + +Any files that match these patterns will be *excluded* from installation, +even if they were listed in ``package_data`` or were included as a result of using +``include_package_data``. + +Subdirectory for Data Files +=========================== + +A common pattern is where some (or all) of the data files are placed under +a separate subdirectory. For example:: + + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── data + │   ├── data1.rst + │   └── data2.rst + ├── __init__.py + ├── data1.txt + └── data2.txt + +Here, the ``.rst`` files are placed under a ``data`` subdirectory inside ``mypkg``, +while the ``.txt`` files are directly under ``mypkg``. + +In this case, the recommended approach is to treat ``data`` as a namespace package +(refer :pep:`420`). With ``package_data``, +the configuration might look like this: + +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + # ... + packages = find_namespace: + package_dir = + = src + + [options.packages.find] + where = src + + [options.package_data] + mypkg = + *.txt + mypkg.data = + *.rst + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup, find_namespace_packages + setup( + # ..., + packages=find_namespace_packages(where="src"), + package_dir={"": "src"}, + package_data={ + "mypkg": ["*.txt"], + "mypkg.data": ["*.rst"], + } + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools.packages.find] + # scanning for namespace packages is true by default in pyproject.toml, so + # you do NOT need to include the following line. + namespaces = true + where = ["src"] + + [tool.setuptools.package-data] + mypkg = ["*.txt"] + "mypkg.data" = ["*.rst"] + +In other words, we allow Setuptools to scan for namespace packages in the ``src`` directory, +which enables the ``data`` directory to be identified, and then, we separately specify data +files for the root package ``mypkg``, and the namespace package ``data`` under the package +``mypkg``. + +With ``include_package_data`` the configuration is simpler: you simply need to enable +scanning of namespace packages in the ``src`` directory and the rest is handled by Setuptools. + +.. tab:: setup.cfg + + .. code-block:: ini + + [options] + packages = find_namespace: + package_dir = + = src + include_package_data = True + + [options.packages.find] + where = src + +.. tab:: setup.py + + .. code-block:: python + + from setuptools import setup, find_namespace_packages + setup( + # ... , + packages=find_namespace_packages(where="src"), + package_dir={"": "src"}, + include_package_data=True, + ) + +.. tab:: pyproject.toml (**EXPERIMENTAL**) [#experimental]_ + + .. code-block:: toml + + [tool.setuptools] + # ... + # By default, include-package-data is true in pyproject.toml, so you do + # NOT have to specify this line. + include-package-data = true + + [tool.setuptools.packages.find] + # scanning for namespace packages is true by default in pyproject.toml, so + # you need NOT include the following line. + namespaces = true + where = ["src"] + +Summary +======= In summary, the three options allow you to: @@ -138,28 +442,69 @@ In summary, the three options allow you to: included when a package is installed, even if they would otherwise have been included due to the use of the preceding options. -NOTE: Due to the way the distutils build process works, a data file that you -include in your project and then stop including may be "orphaned" in your -project's build directories, requiring you to run ``setup.py clean --all`` to -fully remove them. This may also be important for your users and contributors -if they track intermediate revisions of your project using Subversion; be sure -to let them know when you make changes that remove files from inclusion so they -can run ``setup.py clean --all``. +.. note:: + Due to the way the build process works, a data file that you + include in your project and then stop including may be "orphaned" in your + project's build directories, requiring you to run ``setup.py clean --all`` to + fully remove them. This may also be important for your users and contributors + if they track intermediate revisions of your project using Subversion; be sure + to let them know when you make changes that remove files from inclusion so they + can run ``setup.py clean --all``. .. _Accessing Data Files at Runtime: Accessing Data Files at Runtime -------------------------------- +=============================== Typically, existing programs manipulate a package's ``__file__`` attribute in -order to find the location of data files. However, this manipulation isn't -compatible with PEP 302-based import hooks, including importing from zip files -and Python Eggs. It is strongly recommended that, if you are using data files, -you should use :mod:`importlib.resources` to access them. -:mod:`importlib.resources` was added to Python 3.7 and the latest version of -the library is also available via the :pypi:`importlib-resources` backport. -See :doc:`importlib-resources:using` for detailed instructions [#importlib]_. +order to find the location of data files. For example, if you have a structure +like this:: + + project_root_directory + ├── setup.py # and/or setup.cfg, pyproject.toml + └── src + └── mypkg + ├── data + │   └── data1.txt + ├── __init__.py + └── foo.py + +Then, in ``mypkg/foo.py``, you may try something like this in order to access +``mypkg/data/data1.txt``: + +.. code-block:: python + + import os + data_path = os.path.join(os.path.dirname(__file__), 'data', 'data1.txt') + with open(data_path, 'r') as data_file: + ... + +However, this manipulation isn't compatible with :pep:`302`-based import hooks, +including importing from zip files and Python Eggs. It is strongly recommended that, +if you are using data files, you should use :mod:`importlib.resources` to access them. +In this case, you would do something like this: + +.. code-block:: python + + from importlib.resources import files + data_text = files('mypkg.data').joinpath('data1.txt').read_text() + +:mod:`importlib.resources` was added to Python 3.7. However, the API illustrated in +this code (using ``files()``) was added only in Python 3.9, [#files_api]_ and support +for accessing data files via namespace packages was added only in Python 3.10 [#namespace_support]_ +(the ``data`` subdirectory is a namespace package under the root package ``mypkg``). +Therefore, you may find this code to work only in Python 3.10 (and above). For other +versions of Python, you are recommended to use the :pypi:`importlib-resources` backport +which provides the latest version of this library. In this case, the only change that +has to be made to the above code is to replace ``importlib.resources`` with ``importlib_resources``, i.e. + +.. code-block:: python + + from importlib_resources import files + ... + +See :doc:`importlib-resources:using` for detailed instructions. .. tip:: Files inside the package directory should be *read-only* to avoid a series of common problems (e.g. when multiple users share a common Python @@ -175,7 +520,7 @@ See :doc:`importlib-resources:using` for detailed instructions [#importlib]_. Non-Package Data Files ----------------------- +====================== Historically, ``setuptools`` by way of ``easy_install`` would encapsulate data files from the distribution into the egg (see `the old docs @@ -189,17 +534,17 @@ run time be included **inside the package**. ---- -.. [#datafiles] ``setuptools`` consider a *package data file* any non-Python - file **inside the package directory** (i.e., that co-exists in the same - location as the regular ``.py`` files being distributed). +.. [#experimental] + Support for specifying package metadata and build configuration options via + ``pyproject.toml`` is experimental and might change + in the future. See :doc:`/userguide/pyproject_config`. .. [#system-dirs] These locations can be discovered with the help of third-party libraries such as :pypi:`platformdirs`. -.. [#importlib] Recent versions of :mod:`importlib.resources` available in - Pythons' standard library should be API compatible with - :pypi:`importlib-metadata`. However this might vary depending on which version - of Python is installed. +.. [#files_api] Reference: https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy + +.. [#namespace_support] Reference: https://github.com/python/importlib_resources/pull/196#issuecomment-734520374 .. |MANIFEST.in| replace:: ``MANIFEST.in``