Skip to content

Commit

Permalink
chore(internal): symbol inference (#6920)
Browse files Browse the repository at this point in the history
We introduce logic to infer the symbols exported by a module. This can
be used by any product that requires symbol information to provide
features such as expression auto-completion. The actual enablement of
the symbol inference and upload is controlled via a remote configuration
signal. For this feature to work as intended, RC is an essential
internal requirement.

For performance reasons, the symbol inference is performed starting from
module objects, after they have been imported. This way we skip a second
parsing step, and reuse the result of the original operation performed
by CPython itself when the module was loaded for the first time.

Because the main consumer of this feature is Dynamic Instrumentation,
the information that is retrieved from each module reflects the DI
capabilities of discovering "instrumentable" symbols at runtime. Some
finer details (such as nested functions/classes) might therefore be
missing at this stage.

There is also a special handling for scenarios where child processes are
spawned via fork to parallelise the workload. Because each child process
would perform the same symbol discovery, the backend would receive
multiple uploads of duplicate information. To avoid wasting resources
and bandwidth, we added logic to ensure that uploads happen from the
parent process, and _at most_ one forked child process. The risk of this
approach is that, if fork is used for anything else other than the
worker child model, the symbol picture reconstructed on the backend site
might be incomplete. However, we are betting on this scenario to be very
rare, compared to the more common forked child process model.

## Checklist

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [ ] Title is accurate.
- [ ] No unnecessary changes are introduced.
- [ ] Description motivates each change.
- [ ] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [ ] Testing strategy adequately addresses listed risk(s).
- [ ] Change is maintainable (easy to change, telemetry, documentation).
- [ ] Release note makes sense to a user of the library.
- [ ] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [ ] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [ ] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [ ] This PR doesn't touch any of that.
  • Loading branch information
P403n1x87 authored Feb 7, 2024
1 parent 896e603 commit 8bcf2fd
Show file tree
Hide file tree
Showing 15 changed files with 1,035 additions and 25 deletions.
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,12 @@ ddtrace/settings/dynamic_instrumentation.py @DataDog/debugger-python
ddtrace/internal/injection.py @DataDog/debugger-python @DataDog/apm-core-python
ddtrace/internal/wrapping.py @DataDog/debugger-python @DataDog/apm-core-python
ddtrace/internal/module.py @DataDog/debugger-python @DataDog/apm-core-python
ddtrace/internal/symbol_db/ @DataDog/debugger-python
tests/debugging/ @DataDog/debugger-python
tests/internal/test_injection.py @DataDog/debugger-python @DataDog/apm-core-python
tests/internal/test_wrapping.py @DataDog/debugger-python @DataDog/apm-core-python
tests/internal/test_module.py @DataDog/debugger-python @DataDog/apm-core-python
tests/internal/symbol_db/ @DataDog/debugger-python

# ASM
ddtrace/appsec/ @DataDog/asm-python
Expand Down
6 changes: 6 additions & 0 deletions ddtrace/bootstrap/preload.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from ddtrace.internal.utils.formats import asbool # noqa:F401
from ddtrace.internal.utils.formats import parse_tags_str # noqa:F401
from ddtrace.settings.asm import config as asm_config # noqa:F401
from ddtrace.settings.symbol_db import config as symdb_config # noqa:F401
from ddtrace import tracer


Expand Down Expand Up @@ -48,6 +49,11 @@ def register_post_preload(func: t.Callable) -> None:
except Exception:
log.error("failed to enable profiling", exc_info=True)

if symdb_config.enabled:
from ddtrace.internal import symbol_db

symbol_db.bootstrap()

if di_config.enabled or ed_config.enabled:
from ddtrace.debugging import DynamicInstrumentation

Expand Down
17 changes: 17 additions & 0 deletions ddtrace/internal/compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -456,3 +456,20 @@ def _method(*args, **kwargs):
@property
def __isabstractmethod__(self):
return getattr(self.func, "__isabstractmethod__", False)


if PYTHON_VERSION_INFO >= (3, 9):
from pathlib import Path
else:
from pathlib import Path

# Taken from Python 3.9. This is not implemented in older versions of Python
def is_relative_to(self, other):
"""Return True if the path is relative to another path or False."""
try:
self.relative_to(other)
return True
except ValueError:
return False

Path.is_relative_to = is_relative_to # type: ignore[assignment]
16 changes: 15 additions & 1 deletion ddtrace/internal/forksafe.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,20 @@
_soft = True


# Flag to determine, from the parent process, if fork has been called
_forked = False


def set_forked():
global _forked

_forked = True


def has_forked():
return _forked


def ddtrace_after_in_child():
# type: () -> None
global _registry
Expand Down Expand Up @@ -57,7 +71,7 @@ def unregister(after_in_child):


if hasattr(os, "register_at_fork"):
os.register_at_fork(after_in_child=ddtrace_after_in_child)
os.register_at_fork(after_in_child=ddtrace_after_in_child, after_in_parent=set_forked)
elif hasattr(os, "fork"):
# DEV: This "should" be the correct way of implementing this, but it doesn't
# work if hooks create new threads.
Expand Down
42 changes: 31 additions & 11 deletions ddtrace/internal/packages.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
import logging
import os
import sysconfig
from types import ModuleType
import typing as t

from ddtrace.internal.compat import Path
from ddtrace.internal.module import origin
from ddtrace.internal.utils.cache import cached
from ddtrace.internal.utils.cache import callonce


try:
import pathlib # noqa: F401
except ImportError:
import pathlib2 as pathlib # type: ignore[no-redef] # noqa: F401


LOG = logging.getLogger(__name__)

if t.TYPE_CHECKING:
import pathlib # noqa

try:
fspath = os.fspath
Expand Down Expand Up @@ -43,11 +44,10 @@ def fspath(path):
raise TypeError("expected str, bytes or os.PathLike object, not " + path_type.__name__)
if isinstance(path_repr, (str, bytes)):
return path_repr
else:
raise TypeError(
"expected {}.__fspath__() to return str or bytes, "
"not {}".format(path_type.__name__, type(path_repr).__name__)
)
raise TypeError(
"expected {}.__fspath__() to return str or bytes, "
"not {}".format(path_type.__name__, type(path_repr).__name__)
)


# We don't store every file of every package but filter commonly used extensions
Expand Down Expand Up @@ -132,3 +132,23 @@ def filename_to_package(filename):
filename = filename[:-1]

return mapping.get(filename)


def module_to_package(module: ModuleType) -> t.Optional[Distribution]:
"""Returns the package distribution for a module"""
return filename_to_package(str(origin(module)))


stdlib_path = Path(sysconfig.get_path("stdlib")).resolve()
platstdlib_path = Path(sysconfig.get_path("platstdlib")).resolve()
purelib_path = Path(sysconfig.get_path("purelib")).resolve()
platlib_path = Path(sysconfig.get_path("platlib")).resolve()


@cached()
def is_stdlib(path: Path) -> bool:
rpath = path.resolve()

return (rpath.is_relative_to(stdlib_path) or rpath.is_relative_to(platstdlib_path)) and not (
rpath.is_relative_to(purelib_path) or rpath.is_relative_to(platlib_path)
)
20 changes: 20 additions & 0 deletions ddtrace/internal/symbol_db/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from ddtrace.internal import forksafe
from ddtrace.internal.remoteconfig.worker import remoteconfig_poller
from ddtrace.internal.symbol_db.remoteconfig import SymbolDatabaseAdapter
from ddtrace.settings.symbol_db import config as symdb_config


def bootstrap():
if symdb_config._force:
# Force the upload of symbols, ignoring RCM instructions.
from ddtrace.internal.symbol_db.symbols import SymbolDatabaseUploader

SymbolDatabaseUploader.install()
else:
# Start the RCM subscriber to determine if and when to upload symbols.
remoteconfig_poller.register("LIVE_DEBUGGING_SYMBOL_DB", SymbolDatabaseAdapter())

@forksafe.register
def _():
remoteconfig_poller.unregister("LIVE_DEBUGGING_SYMBOL_DB")
remoteconfig_poller.register("LIVE_DEBUGGING_SYMBOL_DB", SymbolDatabaseAdapter())
70 changes: 70 additions & 0 deletions ddtrace/internal/symbol_db/remoteconfig.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import os

from ddtrace.internal.forksafe import has_forked
from ddtrace.internal.logger import get_logger
from ddtrace.internal.remoteconfig._connectors import PublisherSubscriberConnector
from ddtrace.internal.remoteconfig._publishers import RemoteConfigPublisher
from ddtrace.internal.remoteconfig._pubsub import PubSub
from ddtrace.internal.remoteconfig._subscribers import RemoteConfigSubscriber
from ddtrace.internal.remoteconfig.worker import remoteconfig_poller
from ddtrace.internal.runtime import get_ancestor_runtime_id
from ddtrace.internal.symbol_db.symbols import SymbolDatabaseUploader


log = get_logger(__name__)


def _rc_callback(data, test_tracer=None):
if get_ancestor_runtime_id() is not None and has_forked():
log.debug("[PID %d] SymDB: Disabling Symbol DB in forked process", os.getpid())
# We assume that forking is being used for spawning child worker
# processes. Therefore, we avoid uploading the same symbols from each
# child process. We restrict the enablement of Symbol DB to just the
# parent process and the first fork child.
remoteconfig_poller.unregister("LIVE_DEBUGGING_SYMBOL_DB")

if SymbolDatabaseUploader.is_installed():
SymbolDatabaseUploader.uninstall()

return

for metadata, config in zip(data["metadata"], data["config"]):
if metadata is None:
continue

if not isinstance(config, dict):
continue

upload_symbols = config.get("upload_symbols")
if upload_symbols is None:
continue

if upload_symbols:
log.debug("[PID %d] SymDB: Symbol DB RCM enablement signal received", os.getpid())
if not SymbolDatabaseUploader.is_installed():
try:
SymbolDatabaseUploader.install()
log.debug("[PID %d] SymDB: Symbol DB uploader installed", os.getpid())
except Exception:
log.error("[PID %d] SymDB: Failed to install Symbol DB uploader", os.getpid(), exc_info=True)
remoteconfig_poller.unregister("LIVE_DEBUGGING_SYMBOL_DB")
else:
log.debug("[PID %d] SymDB: Symbol DB RCM shutdown signal received", os.getpid())
if SymbolDatabaseUploader.is_installed():
try:
SymbolDatabaseUploader.uninstall()
log.debug("[PID %d] SymDB: Symbol DB uploader uninstalled", os.getpid())
except Exception:
log.error("[PID %d] SymDB: Failed to uninstall Symbol DB uploader", os.getpid(), exc_info=True)
remoteconfig_poller.unregister("LIVE_DEBUGGING_SYMBOL_DB")
break


class SymbolDatabaseAdapter(PubSub):
__publisher_class__ = RemoteConfigPublisher
__subscriber_class__ = RemoteConfigSubscriber
__shared_data__ = PublisherSubscriberConnector()

def __init__(self):
self._publisher = self.__publisher_class__(self.__shared_data__)
self._subscriber = self.__subscriber_class__(self.__shared_data__, _rc_callback, "LIVE_DEBUGGING_SYMBOL_DB")
Loading

0 comments on commit 8bcf2fd

Please sign in to comment.