Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40495: compileall option to hardlink duplicate pyc files #19901

Merged
merged 13 commits into from
May 14, 2020
21 changes: 16 additions & 5 deletions Doc/library/compileall.rst
Original file line number Diff line number Diff line change
@@ -113,6 +113,11 @@ compile Python sources.

Ignore symlinks pointing outside the given directory.

.. cmdoption:: --hardlink-dupes

If two ``.pyc`` files with different optimization level have
the same content, use hard links to consolidate duplicate files.

.. versionchanged:: 3.2
Added the ``-i``, ``-b`` and ``-h`` options.

@@ -125,7 +130,7 @@ compile Python sources.
Added the ``--invalidation-mode`` option.

.. versionchanged:: 3.9
Added the ``-s``, ``-p``, ``-e`` options.
Added the ``-s``, ``-p``, ``-e`` and ``--hardlink-dupes`` options.
Raised the default recursion limit from 10 to
:py:func:`sys.getrecursionlimit()`.
Added the possibility to specify the ``-o`` option multiple times.
@@ -143,7 +148,7 @@ runtime.
Public functions
----------------

.. function:: compile_dir(dir, maxlevels=sys.getrecursionlimit(), ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, workers=1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None)
.. function:: compile_dir(dir, maxlevels=sys.getrecursionlimit(), ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, workers=1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False)

Recursively descend the directory tree named by *dir*, compiling all :file:`.py`
files along the way. Return a true value if all the files compiled successfully,
@@ -193,6 +198,9 @@ Public functions
the ``-s``, ``-p`` and ``-e`` options described above.
They may be specified as ``str``, ``bytes`` or :py:class:`os.PathLike`.

If *hardlink_dupes* is true and two ``.pyc`` files with different optimization
level have the same content, use hard links to consolidate duplicate files.

.. versionchanged:: 3.2
Added the *legacy* and *optimize* parameter.

@@ -219,9 +227,9 @@ Public functions
Setting *workers* to 0 now chooses the optimal number of cores.

.. versionchanged:: 3.9
Added *stripdir*, *prependdir* and *limit_sl_dest* arguments.
Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments.

.. function:: compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None)
.. function:: compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, invalidation_mode=None, \*, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False)

Compile the file with path *fullname*. Return a true value if the file
compiled successfully, and a false value otherwise.
@@ -257,6 +265,9 @@ Public functions
the ``-s``, ``-p`` and ``-e`` options described above.
They may be specified as ``str``, ``bytes`` or :py:class:`os.PathLike`.

If *hardlink_dupes* is true and two ``.pyc`` files with different optimization
level have the same content, use hard links to consolidate duplicate files.

.. versionadded:: 3.2

.. versionchanged:: 3.5
@@ -273,7 +284,7 @@ Public functions
The *invalidation_mode* parameter's default value is updated to None.

.. versionchanged:: 3.9
Added *stripdir*, *prependdir* and *limit_sl_dest* arguments.
Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments.

.. function:: compile_path(skip_curdir=True, maxlevels=0, force=False, quiet=0, legacy=False, optimize=-1, invalidation_mode=None)

10 changes: 10 additions & 0 deletions Doc/whatsnew/3.9.rst
Original file line number Diff line number Diff line change
@@ -245,6 +245,16 @@ that schedules a shutdown for the default executor that waits on the
Added :class:`asyncio.PidfdChildWatcher`, a Linux-specific child watcher
implementation that polls process file descriptors. (:issue:`38692`)

compileall
----------

Added new possibility to use hardlinks for duplicated ``.pyc`` files: *hardlink_dupes* parameter and --hardlink-dupes command line option.
(Contributed by Lumír 'Frenzy' Balhar in :issue:`40495`.)

Added new options for path manipulation in resulting ``.pyc`` files: *stripdir*, *prependdir*, *limit_sl_dest* parameters and -s, -p, -e command line options.
Added the possibility to specify the option for an optimization level multiple times.
(Contributed by Lumír 'Frenzy' Balhar in :issue:`38112`.)

concurrent.futures
------------------

42 changes: 35 additions & 7 deletions Lib/compileall.py
Original file line number Diff line number Diff line change
@@ -15,6 +15,7 @@
import importlib.util
import py_compile
import struct
import filecmp

from functools import partial
from pathlib import Path
@@ -47,7 +48,7 @@ def _walk_dir(dir, maxlevels, quiet=0):
def compile_dir(dir, maxlevels=None, ddir=None, force=False,
rx=None, quiet=0, legacy=False, optimize=-1, workers=1,
invalidation_mode=None, *, stripdir=None,
prependdir=None, limit_sl_dest=None):
prependdir=None, limit_sl_dest=None, hardlink_dupes=False):
"""Byte-compile all modules in the given directory tree.

Arguments (only dir is required):
@@ -70,6 +71,7 @@ def compile_dir(dir, maxlevels=None, ddir=None, force=False,
after stripdir
limit_sl_dest: ignore symlinks if they are pointing outside of
the defined path
hardlink_dupes: hardlink duplicated pyc files
"""
ProcessPoolExecutor = None
if ddir is not None and (stripdir is not None or prependdir is not None):
@@ -104,22 +106,24 @@ def compile_dir(dir, maxlevels=None, ddir=None, force=False,
invalidation_mode=invalidation_mode,
stripdir=stripdir,
prependdir=prependdir,
limit_sl_dest=limit_sl_dest),
limit_sl_dest=limit_sl_dest,
hardlink_dupes=hardlink_dupes),
files)
success = min(results, default=True)
else:
for file in files:
if not compile_file(file, ddir, force, rx, quiet,
legacy, optimize, invalidation_mode,
stripdir=stripdir, prependdir=prependdir,
limit_sl_dest=limit_sl_dest):
limit_sl_dest=limit_sl_dest,
hardlink_dupes=hardlink_dupes):
success = False
return success

def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
legacy=False, optimize=-1,
invalidation_mode=None, *, stripdir=None, prependdir=None,
limit_sl_dest=None):
limit_sl_dest=None, hardlink_dupes=False):
"""Byte-compile one file.

Arguments (only fullname is required):
@@ -140,6 +144,7 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
after stripdir
limit_sl_dest: ignore symlinks if they are pointing outside of
the defined path.
hardlink_dupes: hardlink duplicated pyc files
"""

if ddir is not None and (stripdir is not None or prependdir is not None):
@@ -176,6 +181,14 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
if isinstance(optimize, int):
optimize = [optimize]

# Use set() to remove duplicates.
# Use sorted() to create pyc files in a deterministic order.
optimize = sorted(set(optimize))

if hardlink_dupes and len(optimize) < 2:
raise ValueError("Hardlinking of duplicated bytecode makes sense "
"only for more than one optimization level")

if rx is not None:
mo = rx.search(fullname)
if mo:
@@ -220,10 +233,16 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
if not quiet:
print('Compiling {!r}...'.format(fullname))
try:
for opt_level, cfile in opt_cfiles.items():
for index, opt_level in enumerate(optimize):
cfile = opt_cfiles[opt_level]
ok = py_compile.compile(fullname, cfile, dfile, True,
optimize=opt_level,
invalidation_mode=invalidation_mode)
if index > 0 and hardlink_dupes:
previous_cfile = opt_cfiles[optimize[index - 1]]
if filecmp.cmp(cfile, previous_cfile, shallow=False):
os.unlink(cfile)
os.link(previous_cfile, cfile)
except py_compile.PyCompileError as err:
success = False
if quiet >= 2:
@@ -352,6 +371,9 @@ def main():
'Python interpreter itself (specified by -O).'))
parser.add_argument('-e', metavar='DIR', dest='limit_sl_dest',
help='Ignore symlinks pointing outsite of the DIR')
parser.add_argument('--hardlink-dupes', action='store_true',
dest='hardlink_dupes',
help='Hardlink duplicated pyc files')

args = parser.parse_args()
compile_dests = args.compile_dest
@@ -371,6 +393,10 @@ def main():
if args.opt_levels is None:
args.opt_levels = [-1]

if len(args.opt_levels) == 1 and args.hardlink_dupes:
parser.error(("Hardlinking of duplicated bytecode makes sense "
"only for more than one optimization level."))

if args.ddir is not None and (
args.stripdir is not None or args.prependdir is not None
):
@@ -404,7 +430,8 @@ def main():
stripdir=args.stripdir,
prependdir=args.prependdir,
optimize=args.opt_levels,
limit_sl_dest=args.limit_sl_dest):
limit_sl_dest=args.limit_sl_dest,
hardlink_dupes=args.hardlink_dupes):
success = False
else:
if not compile_dir(dest, maxlevels, args.ddir,
@@ -414,7 +441,8 @@ def main():
stripdir=args.stripdir,
prependdir=args.prependdir,
optimize=args.opt_levels,
limit_sl_dest=args.limit_sl_dest):
limit_sl_dest=args.limit_sl_dest,
hardlink_dupes=args.hardlink_dupes):
success = False
return success
else:
Loading