Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ai): Extract AI category #3449

Merged
merged 9 commits into from
Apr 23, 2024
Merged

feat(ai): Extract AI category #3449

merged 9 commits into from
Apr 23, 2024

Conversation

colin-sentry
Copy link
Member

We want to separate the 'ai' and 'ai.pipeline' categories to be able to query them independently in pages.

@colin-sentry colin-sentry requested a review from a team as a code owner April 17, 2024 19:12
@colin-sentry colin-sentry changed the title Extract AI category feat(ai): Extract AI category Apr 17, 2024
@colin-sentry colin-sentry force-pushed the ai_category branch 2 times, most recently from f740933 to 354a213 Compare April 17, 2024 21:08
@colin-sentry colin-sentry force-pushed the ai_category branch 6 times, most recently from 9d7bf0c to 43eb272 Compare April 18, 2024 18:08
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not add is_ai to known_modules_condition, and add it to the necessary tag conditions explicitly instead. After that, I think this PR is good to go!

I will follow-up with a PR that removes known_modules_condition entirely -- It makes it hard to understand what modules actually need specific tags.

@colin-sentry
Copy link
Member Author

Let's not add is_ai to known_modules_condition, and add it to the necessary tag conditions explicitly instead. After that, I think this PR is good to go!

I will follow-up with a PR that removes known_modules_condition entirely -- It makes it hard to understand what modules actually need specific tags.

Done

@colin-sentry colin-sentry force-pushed the ai_category branch 2 times, most recently from 836f0eb to 373f9d7 Compare April 19, 2024 15:23
@colin-sentry colin-sentry requested a review from jjbayer April 19, 2024 15:35
relay-dynamic-config/src/defaults.rs Outdated Show resolved Hide resolved
Comment on lines +486 to +536
Tag::with_key("span.category")
.from_field("span.sentry_tags.category")
.always(), // already guarded by condition on metric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The category and op are very similar (see snapshots). Do we need both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need it for AI at least - category = ai.pipeline is how we identify the top level spans

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does looking at the prefix of op work? When category=ai.pipeline, op=ai.pipeline.whatever. Cardinality should be the same, just wondering whether we need the extra tag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do prefix matches like that in snuba from what I've read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO having this tag is fine, it does not increase cardinality because it is derived from op.

tests/integration/test_spans.py Outdated Show resolved Hide resolved
@@ -1053,6 +1053,24 @@ mod tests {
"ai_total_tokens_used": {
"value": 20
}
},
"data": {
"ai.pipeline.name": "Autofix Pipeline"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name of the parent span which represents a specific AI pipeline.

This gets converted into the tag span.ai.pipeline.group which can then be queried in the frontend to find how many tokens were used in total for a single span

Span -> span group -> "foreign key join" on ai.pipeline.group -> sum (counter)

Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test case I mention below, other than that LGTM.

Comment on lines +486 to +536
Tag::with_key("span.category")
.from_field("span.sentry_tags.category")
.always(), // already guarded by condition on metric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does looking at the prefix of op work? When category=ai.pipeline, op=ai.pipeline.whatever. Cardinality should be the same, just wondering whether we need the extra tag.

Comment on lines +475 to +476
let mut ai_pipeline_group = format!("{:?}", md5::compute(ai_pipeline_name));
ai_pipeline_group.truncate(16);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually worth hashing this value? The ai pipeline name looks pretty short, so why not set that as a tag directly, instead of hashing it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hashed the same way description is because it needs to be "joined" on the group ID

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I did not get that it's the span group of the parent before.

Is the AI pipeline always a segment span? In that case, we could alternatively use the segment_name field (see SpanData) to store the parent name.

@@ -11572,8 +11576,10 @@ expression: metrics
tags: {
"environment": "fake_environment",
"release": "1.2.3",
"span.description": "Autofix Pipeline",
"span.group": "86148ae2d6c09430",
"span.ai.pipeline.group": "86148ae2d6c09430",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iker-barriocanal there is a snapshot test for it here

@colin-sentry colin-sentry requested a review from jjbayer April 23, 2024 14:11
@colin-sentry colin-sentry enabled auto-merge (squash) April 23, 2024 14:11
@colin-sentry colin-sentry dismissed jjbayer’s stale review April 23, 2024 14:13

Added relevant field to span data

@colin-sentry colin-sentry merged commit 891b521 into master Apr 23, 2024
20 checks passed
@colin-sentry colin-sentry deleted the ai_category branch April 23, 2024 14:35
Comment on lines +486 to +536
Tag::with_key("span.category")
.from_field("span.sentry_tags.category")
.always(), // already guarded by condition on metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO having this tag is fine, it does not increase cardinality because it is derived from op.

Comment on lines +475 to +476
let mut ai_pipeline_group = format!("{:?}", md5::compute(ai_pipeline_name));
ai_pipeline_group.truncate(16);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I did not get that it's the span group of the parent before.

Is the AI pipeline always a segment span? In that case, we could alternatively use the segment_name field (see SpanData) to store the parent name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants