feat(ai): Extract AI category #3449

colin-sentry · 2024-04-17T19:12:56Z

We want to separate the 'ai' and 'ai.pipeline' categories to be able to query them independently in pages.

relay-dynamic-config/src/defaults.rs

jjbayer

Let's not add is_ai to known_modules_condition, and add it to the necessary tag conditions explicitly instead. After that, I think this PR is good to go!

I will follow-up with a PR that removes known_modules_condition entirely -- It makes it hard to understand what modules actually need specific tags.

colin-sentry · 2024-04-19T15:06:08Z

Let's not add is_ai to known_modules_condition, and add it to the necessary tag conditions explicitly instead. After that, I think this PR is good to go!

I will follow-up with a PR that removes known_modules_condition entirely -- It makes it hard to understand what modules actually need specific tags.

Done

relay-dynamic-config/src/defaults.rs

iker-barriocanal · 2024-04-22T13:47:08Z

relay-dynamic-config/src/defaults.rs

+                Tag::with_key("span.category")
+                    .from_field("span.sentry_tags.category")
+                    .always(), // already guarded by condition on metric


The category and op are very similar (see snapshots). Do we need both?

I need it for AI at least - category = ai.pipeline is how we identify the top level spans

Does looking at the prefix of op work? When category=ai.pipeline, op=ai.pipeline.whatever. Cardinality should be the same, just wondering whether we need the extra tag.

You can't do prefix matches like that in snuba from what I've read

IMO having this tag is fine, it does not increase cardinality because it is derived from op.

tests/integration/test_spans.py

colin-sentry · 2024-04-22T19:36:23Z

relay-server/src/metrics_extraction/event.rs

@@ -1053,6 +1053,24 @@ mod tests {
                        "ai_total_tokens_used": {
                            "value": 20
                        }
+                    },
+                    "data": {
+                        "ai.pipeline.name": "Autofix Pipeline"


This is the name of the parent span which represents a specific AI pipeline.

This gets converted into the tag span.ai.pipeline.group which can then be queried in the frontend to find how many tokens were used in total for a single span

Span -> span group -> "foreign key join" on ai.pipeline.group -> sum (counter)

iker-barriocanal

Let's add a test case I mention below, other than that LGTM.

relay-event-normalization/src/normalize/span/tag_extraction.rs

iker-barriocanal · 2024-04-23T08:00:58Z

relay-dynamic-config/src/defaults.rs

+                Tag::with_key("span.category")
+                    .from_field("span.sentry_tags.category")
+                    .always(), // already guarded by condition on metric


Does looking at the prefix of op work? When category=ai.pipeline, op=ai.pipeline.whatever. Cardinality should be the same, just wondering whether we need the extra tag.

relay-event-normalization/src/normalize/span/tag_extraction.rs

jjbayer · 2024-04-23T09:22:32Z

relay-event-normalization/src/normalize/span/tag_extraction.rs

+                let mut ai_pipeline_group = format!("{:?}", md5::compute(ai_pipeline_name));
+                ai_pipeline_group.truncate(16);


Is it actually worth hashing this value? The ai pipeline name looks pretty short, so why not set that as a tag directly, instead of hashing it?

It's hashed the same way description is because it needs to be "joined" on the group ID

Makes sense, I did not get that it's the span group of the parent before.

Is the AI pipeline always a segment span? In that case, we could alternatively use the segment_name field (see SpanData) to store the parent name.

colin-sentry · 2024-04-23T14:00:03Z

...traction/snapshots/relay_server__metrics_extraction__event__tests__extract_span_metrics.snap

@@ -11572,8 +11576,10 @@ expression: metrics
        tags: {
            "environment": "fake_environment",
            "release": "1.2.3",
-            "span.description": "Autofix Pipeline",
-            "span.group": "86148ae2d6c09430",
+            "span.ai.pipeline.group": "86148ae2d6c09430",


@iker-barriocanal there is a snapshot test for it here

Added relevant field to span data

jjbayer · 2024-04-23T14:35:38Z

relay-dynamic-config/src/defaults.rs

+                Tag::with_key("span.category")
+                    .from_field("span.sentry_tags.category")
+                    .always(), // already guarded by condition on metric


IMO having this tag is fine, it does not increase cardinality because it is derived from op.

jjbayer · 2024-04-23T14:41:30Z

relay-event-normalization/src/normalize/span/tag_extraction.rs

+                let mut ai_pipeline_group = format!("{:?}", md5::compute(ai_pipeline_name));
+                ai_pipeline_group.truncate(16);


Makes sense, I did not get that it's the span group of the parent before.

Is the AI pipeline always a segment span? In that case, we could alternatively use the segment_name field (see SpanData) to store the parent name.

colin-sentry requested a review from a team as a code owner April 17, 2024 19:12

colin-sentry force-pushed the ai_category branch from 45a2ca9 to 8e6f968 Compare April 17, 2024 19:14

colin-sentry changed the title ~~Extract AI category~~ feat(ai): Extract AI category Apr 17, 2024

colin-sentry force-pushed the ai_category branch 2 times, most recently from f740933 to 354a213 Compare April 17, 2024 21:08

colin-sentry commented Apr 17, 2024

View reviewed changes

relay-dynamic-config/src/defaults.rs Outdated Show resolved Hide resolved

jjbayer reviewed Apr 18, 2024

View reviewed changes

relay-dynamic-config/src/defaults.rs Outdated Show resolved Hide resolved

colin-sentry force-pushed the ai_category branch 6 times, most recently from 9d7bf0c to 43eb272 Compare April 18, 2024 18:08

jjbayer previously requested changes Apr 19, 2024

View reviewed changes

colin-sentry force-pushed the ai_category branch 2 times, most recently from 836f0eb to 373f9d7 Compare April 19, 2024 15:23

colin-sentry requested a review from jjbayer April 19, 2024 15:35

iker-barriocanal reviewed Apr 22, 2024

View reviewed changes

colin-sentry added 6 commits April 22, 2024 13:17

Extract AI category

48bb5d0

Add span category to AI spans

dabcb5b

Make sure span group iff description

58c3468

Fix acceptance tests

1abcc4f

Remove AI from known modules condition

581d3c0

Do not extract span.category on duration

55afaa8

colin-sentry force-pushed the ai_category branch from 7b474d6 to 55afaa8 Compare April 22, 2024 17:20

colin-sentry added 2 commits April 22, 2024 14:33

Remove double duration tags

62940d2

Add AI pipeline group tag to count tokens

129210c

colin-sentry force-pushed the ai_category branch from 9409a3b to 129210c Compare April 22, 2024 19:33

colin-sentry commented Apr 22, 2024

View reviewed changes

iker-barriocanal approved these changes Apr 23, 2024

View reviewed changes

jjbayer reviewed Apr 23, 2024

View reviewed changes

colin-sentry commented Apr 23, 2024

View reviewed changes

Add tag to span data

59ae0aa

colin-sentry force-pushed the ai_category branch from bdee805 to 59ae0aa Compare April 23, 2024 14:10

colin-sentry requested a review from jjbayer April 23, 2024 14:11

colin-sentry enabled auto-merge (squash) April 23, 2024 14:11

colin-sentry merged commit 891b521 into master Apr 23, 2024
20 checks passed

colin-sentry deleted the ai_category branch April 23, 2024 14:35

jjbayer approved these changes Apr 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): Extract AI category #3449

feat(ai): Extract AI category #3449

colin-sentry commented Apr 17, 2024

jjbayer left a comment

colin-sentry commented Apr 19, 2024

iker-barriocanal Apr 22, 2024

colin-sentry Apr 22, 2024

iker-barriocanal Apr 23, 2024

colin-sentry Apr 23, 2024

jjbayer Apr 23, 2024

colin-sentry Apr 22, 2024

iker-barriocanal left a comment

iker-barriocanal Apr 23, 2024

jjbayer Apr 23, 2024

colin-sentry Apr 23, 2024

jjbayer Apr 23, 2024

colin-sentry Apr 23, 2024

jjbayer Apr 23, 2024

jjbayer Apr 23, 2024

		let mut ai_pipeline_group = format!("{:?}", md5::compute(ai_pipeline_name));
		ai_pipeline_group.truncate(16);

feat(ai): Extract AI category #3449

feat(ai): Extract AI category #3449

Conversation

colin-sentry commented Apr 17, 2024

jjbayer left a comment

Choose a reason for hiding this comment

colin-sentry commented Apr 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iker-barriocanal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment