Skip to content

Commit

Permalink
GH-127010: Don't lazily track and untrack dicts (GH-127027)
Browse files Browse the repository at this point in the history
  • Loading branch information
markshannon authored Nov 20, 2024
1 parent 7191b76 commit aea0c58
Show file tree
Hide file tree
Showing 12 changed files with 44 additions and 283 deletions.
2 changes: 0 additions & 2 deletions Doc/library/gc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,8 +204,6 @@ The :mod:`gc` module provides the following functions:
>>> gc.is_tracked({})
False
>>> gc.is_tracked({"a": 1})
False
>>> gc.is_tracked({"a": []})
True

.. versionadded:: 3.1
Expand Down
2 changes: 0 additions & 2 deletions Include/internal/pycore_dict.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,6 @@ extern int _PyDict_Next(

extern int _PyDict_HasOnlyStringKeys(PyObject *mp);

extern void _PyDict_MaybeUntrack(PyObject *mp);

// Export for '_ctypes' shared extension
PyAPI_FUNC(Py_ssize_t) _PyDict_SizeOf(PyDictObject *);

Expand Down
41 changes: 16 additions & 25 deletions InternalDocs/garbage_collector.md
Original file line number Diff line number Diff line change
Expand Up @@ -532,8 +532,8 @@ of `PyGC_Head` discussed in the `Memory layout and object structure`_ section:
currently in. Instead, when that's needed, ad hoc tricks (like the
`NEXT_MASK_UNREACHABLE` flag) are employed.

Optimization: delay tracking containers
=======================================
Optimization: delayed untracking containers
===========================================

Certain types of containers cannot participate in a reference cycle, and so do
not need to be tracked by the garbage collector. Untracking these objects
Expand All @@ -546,26 +546,17 @@ a container:
2. When the container is examined by the garbage collector.

As a general rule, instances of atomic types aren't tracked and instances of
non-atomic types (containers, user-defined objects...) are. However, some
type-specific optimizations can be present in order to suppress the garbage
collector footprint of simple instances. Some examples of native types that
benefit from delayed tracking:

- Tuples containing only immutable objects (integers, strings etc,
and recursively, tuples of immutable objects) do not need to be tracked. The
interpreter creates a large number of tuples, many of which will not survive
until garbage collection. It is therefore not worthwhile to untrack eligible
tuples at creation time. Instead, all tuples except the empty tuple are tracked
when created. During garbage collection it is determined whether any surviving
tuples can be untracked. A tuple can be untracked if all of its contents are
already not tracked. Tuples are examined for untracking in all garbage collection
cycles. It may take more than one cycle to untrack a tuple.

- Dictionaries containing only immutable objects also do not need to be tracked.
Dictionaries are untracked when created. If a tracked item is inserted into a
dictionary (either as a key or value), the dictionary becomes tracked. During a
full garbage collection (all generations), the collector will untrack any dictionaries
whose contents are not tracked.
non-atomic types (containers, user-defined objects...) are.

Tuples containing only immutable objects (integers, strings etc,
and recursively, tuples of immutable objects) do not need to be tracked. The
interpreter creates a large number of tuples, many of which will not survive
until garbage collection. It is therefore not worthwhile to untrack eligible
tuples at creation time. Instead, all tuples except the empty tuple are tracked
when created. During garbage collection it is determined whether any surviving
tuples can be untracked. A tuple can be untracked if all of its contents are
already not tracked. Tuples are examined for untracking in all garbage collection
cycles.

The garbage collector module provides the Python function `is_tracked(obj)`, which returns
the current tracking status of the object. Subsequent garbage collections may change the
Expand All @@ -578,11 +569,11 @@ tracking status of the object.
False
>>> gc.is_tracked([])
True
>>> gc.is_tracked({})
>>> gc.is_tracked(())
False
>>> gc.is_tracked({})
True
>>> gc.is_tracked({"a": 1})
False
>>> gc.is_tracked({"a": []})
True
```

Expand Down
109 changes: 0 additions & 109 deletions Lib/test/test_dict.py
Original file line number Diff line number Diff line change
Expand Up @@ -880,115 +880,6 @@ class C(object):
gc.collect()
self.assertIs(ref(), None, "Cycle was not collected")

def _not_tracked(self, t):
# Nested containers can take several collections to untrack
gc.collect()
gc.collect()
self.assertFalse(gc.is_tracked(t), t)

def _tracked(self, t):
self.assertTrue(gc.is_tracked(t), t)
gc.collect()
gc.collect()
self.assertTrue(gc.is_tracked(t), t)

def test_string_keys_can_track_values(self):
# Test that this doesn't leak.
for i in range(10):
d = {}
for j in range(10):
d[str(j)] = j
d["foo"] = d

@support.cpython_only
def test_track_literals(self):
# Test GC-optimization of dict literals
x, y, z, w = 1.5, "a", (1, None), []

self._not_tracked({})
self._not_tracked({x:(), y:x, z:1})
self._not_tracked({1: "a", "b": 2})
self._not_tracked({1: 2, (None, True, False, ()): int})
self._not_tracked({1: object()})

# Dicts with mutable elements are always tracked, even if those
# elements are not tracked right now.
self._tracked({1: []})
self._tracked({1: ([],)})
self._tracked({1: {}})
self._tracked({1: set()})

@support.cpython_only
def test_track_dynamic(self):
# Test GC-optimization of dynamically-created dicts
class MyObject(object):
pass
x, y, z, w, o = 1.5, "a", (1, object()), [], MyObject()

d = dict()
self._not_tracked(d)
d[1] = "a"
self._not_tracked(d)
d[y] = 2
self._not_tracked(d)
d[z] = 3
self._not_tracked(d)
self._not_tracked(d.copy())
d[4] = w
self._tracked(d)
self._tracked(d.copy())
d[4] = None
self._not_tracked(d)
self._not_tracked(d.copy())

# dd isn't tracked right now, but it may mutate and therefore d
# which contains it must be tracked.
d = dict()
dd = dict()
d[1] = dd
self._not_tracked(dd)
self._tracked(d)
dd[1] = d
self._tracked(dd)

d = dict.fromkeys([x, y, z])
self._not_tracked(d)
dd = dict()
dd.update(d)
self._not_tracked(dd)
d = dict.fromkeys([x, y, z, o])
self._tracked(d)
dd = dict()
dd.update(d)
self._tracked(dd)

d = dict(x=x, y=y, z=z)
self._not_tracked(d)
d = dict(x=x, y=y, z=z, w=w)
self._tracked(d)
d = dict()
d.update(x=x, y=y, z=z)
self._not_tracked(d)
d.update(w=w)
self._tracked(d)

d = dict([(x, y), (z, 1)])
self._not_tracked(d)
d = dict([(x, y), (z, w)])
self._tracked(d)
d = dict()
d.update([(x, y), (z, 1)])
self._not_tracked(d)
d.update([(x, y), (z, w)])
self._tracked(d)

@support.cpython_only
def test_track_subtypes(self):
# Dict subtypes are always tracked
class MyDict(dict):
pass
self._tracked(MyDict())

def make_shared_key_dict(self, n):
class C:
pass
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Simplify GC tracking of dictionaries. All dictionaries are tracked when
created, rather than being lazily tracked when a trackable object was added
to them. This simplifies the code considerably and results in a slight
speedup.
Loading

0 comments on commit aea0c58

Please sign in to comment.