-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ai): Extract AI category #3449
Conversation
45a2ca9
to
8e6f968
Compare
f740933
to
354a213
Compare
9d7bf0c
to
43eb272
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add is_ai
to known_modules_condition
, and add it to the necessary tag conditions explicitly instead. After that, I think this PR is good to go!
I will follow-up with a PR that removes known_modules_condition
entirely -- It makes it hard to understand what modules actually need specific tags.
Done |
836f0eb
to
373f9d7
Compare
Tag::with_key("span.category") | ||
.from_field("span.sentry_tags.category") | ||
.always(), // already guarded by condition on metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The category and op are very similar (see snapshots). Do we need both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need it for AI at least - category = ai.pipeline
is how we identify the top level spans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does looking at the prefix of op
work? When category=ai.pipeline
, op=ai.pipeline.whatever
. Cardinality should be the same, just wondering whether we need the extra tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't do prefix matches like that in snuba from what I've read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO having this tag is fine, it does not increase cardinality because it is derived from op
.
7b474d6
to
55afaa8
Compare
9409a3b
to
129210c
Compare
@@ -1053,6 +1053,24 @@ mod tests { | |||
"ai_total_tokens_used": { | |||
"value": 20 | |||
} | |||
}, | |||
"data": { | |||
"ai.pipeline.name": "Autofix Pipeline" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the name of the parent span which represents a specific AI pipeline.
This gets converted into the tag span.ai.pipeline.group
which can then be queried in the frontend to find how many tokens were used in total for a single span
Span -> span group -> "foreign key join" on ai.pipeline.group -> sum (counter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test case I mention below, other than that LGTM.
Tag::with_key("span.category") | ||
.from_field("span.sentry_tags.category") | ||
.always(), // already guarded by condition on metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does looking at the prefix of op
work? When category=ai.pipeline
, op=ai.pipeline.whatever
. Cardinality should be the same, just wondering whether we need the extra tag.
let mut ai_pipeline_group = format!("{:?}", md5::compute(ai_pipeline_name)); | ||
ai_pipeline_group.truncate(16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually worth hashing this value? The ai pipeline name looks pretty short, so why not set that as a tag directly, instead of hashing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hashed the same way description is because it needs to be "joined" on the group ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I did not get that it's the span group of the parent before.
Is the AI pipeline always a segment span? In that case, we could alternatively use the segment_name field (see SpanData
) to store the parent name.
@@ -11572,8 +11576,10 @@ expression: metrics | |||
tags: { | |||
"environment": "fake_environment", | |||
"release": "1.2.3", | |||
"span.description": "Autofix Pipeline", | |||
"span.group": "86148ae2d6c09430", | |||
"span.ai.pipeline.group": "86148ae2d6c09430", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iker-barriocanal there is a snapshot test for it here
bdee805
to
59ae0aa
Compare
Added relevant field to span data
Tag::with_key("span.category") | ||
.from_field("span.sentry_tags.category") | ||
.always(), // already guarded by condition on metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO having this tag is fine, it does not increase cardinality because it is derived from op
.
let mut ai_pipeline_group = format!("{:?}", md5::compute(ai_pipeline_name)); | ||
ai_pipeline_group.truncate(16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I did not get that it's the span group of the parent before.
Is the AI pipeline always a segment span? In that case, we could alternatively use the segment_name field (see SpanData
) to store the parent name.
We want to separate the 'ai' and 'ai.pipeline' categories to be able to query them independently in pages.