Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make metadata accessible #4725

Closed
blumamir opened this issue May 21, 2024 · 4 comments
Closed

How to make metadata accessible #4725

blumamir opened this issue May 21, 2024 · 4 comments
Labels

Comments

@blumamir
Copy link
Member

I am exploring ways to enhance our handling of metadata for instrumentations, aiming to streamline processes and boost efficiency.

Instrumentation (or OpenTelemetry component) metadata comprises static information about OpenTelemetry JS instrumentation (or other components) that is valuable for distributions, control planes, APMs, and similar tools.

We currently record the name and version for each instrumentation, which also serves as the scope name for the signals we emit

Although metadata is not recorded into signals, it can significantly enhance user experience and automate tasks when utilized by distributions, offering a smoother and more intuitive interface.

Metadata Examples

  • instrumentation description - this text is currently found only in package.json. It provides a concise, user-facing description that includes the instrumented packages and OpenTelemetry context. It was aligned across the codebase to have consistent and meaningful content in docs: enhanced description for instrumentations in package.json #4715 and docs: enhanced description for instrumentations in package.json opentelemetry-js-contrib#2202. Example text: "OpenTelemetry instrumentation for the amqplib messaging client for RabbitMQ"
  • Instrumented packages and supported version range - this text is currently only found in the README.md of each instrumentation. fix!: standardize supported versions and set upper bound limit opentelemetry-js-contrib#2196 is an attempt to align it across the codebase. The instrumented packages is the user-facing package name, which can defer from the "patched packages" which init() returns. The instrumented package is the most user friendly name to show in documentation and UIs thus it is quite useful IMO.
  • github repository - of where the code can be found ("open-telemetry/opentelemetry-js-contrib", "open-telemetry/opentelemetry-js", or third party repos). It is currently found in the package.json for each instrumentation.
  • github path - the path inside the github repository where the code can be found. For example - plugins/node/instrumentation-amqplib. This info can potentially be extracted from the "homepage" attribute in package.json.
  • stability status
  • semantic conventions version implementation
  • emitted signals

and more info that we might need oneday...

Essentially, any information that might be useful for users to consume through various interfaces (documentation, README, UI, links, status) in its raw format

Usages

Here are a few practical applications of how this metadata can be effectively utilized:

  • distributions tools, to create automatic READMEs, docs, and any markdown file, where the content is auto generated based on this data. See auto-instrumentations-node README. The instrumentations list can be auto-generated, and include more info to the user, like the instrumentation description, instrumented package names and supported versions, as well as a link to the homepage. This can enhance the user experience of our contrib distribution users, which can also be leveraged by other third party distributions. Auto-generated text reduce mistakes, maintenance, promote consistent content and is less prone to get out of sync.
  • OpenTelemetry control planes - If an OpenTelemetry control plane displays information about the components at runtime (via UI, files, or databases), details like the instrumented package can be useful for user-facing interfaces.
  • Enhancements for UIs - providing enriched information about instrumentation can significantly improve the user experience when interacting with these details

Suggestion

I want to suggest aggregating the metadata to achieve the goals above. I can work on the relevant PRs to implement something if there is an agreement. I will start with just the info we already have available, and then introduce a script for the auto-instrumentations-node README auto-generation and enhancement. Additionally, I plan to utilize this data for the odigos distribution of js agent to auto-generate a Node.js section in the Odigos documentation and potentially report back instrumentation statuses to the Odigos control plane based on this data.

Some objectives to consider:

Options

  1. The simplest and straight forward way would be to add this data to instrumentation interface, and then have each instrumentation setting it up:
  • as constructor argument, similar to instrumentation name and version which are already passed this way
  • as a function that instrumentation can override and return a metadata object, like the current init() function for patched packages info.
  • by defining an optional property from the base class which will expose this data on instrumentation instances.

If we decide to proceed this way, we must address TypeScript compatibility issues across versions to ensure that adding new properties does not introduce complexity.

Consider omitting it from web components at the moment so not to increase bundle size.

  1. save this data as a json file for each package, and publish it to npm alongside the instrumentations. Then tools can maybe pick the node_modules folder to extract this info from code, and remote users can git pull to the tag or make an http request to fetch the data when needed. See Collector metadata.yaml as an inspiration.

Considerations

  • many of these fields can be auto-generated and are not a burden to the implementations (github repo, github path, description)
  • some of the data is already available in the README and can be documented into a json file where it can be consumed easily.
  • It makes sense to me that if we already record such data, we might want to make sure it can now or one day, potentially be uses for other components like detectors, propagators, processors, samplers, etc.

I think that once we come up with a good way to record this info, introducing it to existing components is a relatively simple technical task which I am up for doing.

I would appreciate your thoughts, concerns, suggestions or support, to help make this initiative a success!

@pichlermarc
Copy link
Member

As discussed offline I'd be in favor of Option 2, or maybe even including the data in package.json as it's allowed to put extra fields in there - which can open additional avenues for other use-cases as well (auto-finding instrumentation packages via the registry). 🙂

@blumamir
Copy link
Member Author

As discussed offline I'd be in favor of Option 2, or maybe even including the data in package.json as it's allowed to put extra fields in there - which can open additional avenues for other use-cases as well (auto-finding instrumentation packages via the registry). 🙂

Thank you marc. I really like the idea of adding it to package.json.

Opened the first PR to add metadata this way here

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Aug 26, 2024
Copy link

github-actions bot commented Sep 9, 2024

This issue was closed because it has been stale for 14 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants