Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include sqlcomment in db.statement span attribute from SQLAlchemy instrumentation #2938

Closed
tammy-baylis-swi opened this issue Oct 29, 2024 · 3 comments · Fixed by #2937
Closed
Assignees

Comments

@tammy-baylis-swi
Copy link
Contributor

What problem do you want to solve?

SQLAlchemy instrumentation's opt-in sqlcommenting feature (doc) works to insert OTel context into MySQL or PostgreSQL general logs, e.g. this when used with mysql-connector driver:

2024-10-29T22:09:35.469520Z	  240 Query	SELECT city.id, city.name, city.district, city.population, city.countrycode
FROM city
WHERE city.id = '1818' /*db_driver='mysqlconnector',db_framework='sqlalchemy%%3A2.0.36',traceparent='00-3b654158a48190e458f6c4e15e125814-b62c6e3e794e9f05-01'*/

For correlation of db queries to corresponding query spans, it would be convenient to also include the sqlcomment in the db.statement attribute of those spans, e.g.

InstrumentationScope opentelemetry.instrumentation.sqlalchemy 0.49b0.dev
Span #0
    Trace ID       : 3b654158a48190e458f6c4e15e125814
    Parent ID      : ed65cfad7167aecd
    ID             : 69d58000ef348179
    Name           : connect
    Kind           : Client
    Start time     : 2024-10-29 22:09:35.42945621 +0000 UTC
    End time       : 2024-10-29 22:09:35.466973419 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> net.peer.name: Str(mysql-world-db)
     -> net.peer.port: Int(3306)
     -> db.name: Str(world)
     -> db.user: Str(world)
     -> db.system: Str(mysql)
Span #1
    Trace ID       : 3b654158a48190e458f6c4e15e125814
    Parent ID      : ed65cfad7167aecd
    ID             : b62c6e3e794e9f05
    Name           : SELECT world
    Kind           : Client
    Start time     : 2024-10-29 22:09:35.469197169 +0000 UTC
    End time       : 2024-10-29 22:09:35.469996085 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> db.statement: Str(SELECT city.id, city.name, city.district, city.population, city.countrycode
FROM city
WHERE city.id = %(id_1)s /*db_driver='mysqlconnector',db_framework='sqlalchemy%%3A2.0.36',traceparent='00-3b654158a48190e458f6c4e15e125814-b62c6e3e794e9f05-01'*/)
     -> db.system: Str(mysql)
     -> net.peer.name: Str(mysql-world-db)
     -> net.peer.port: Int(3306)
     -> db.name: Str(world)
     -> db.user: Str(world)

Describe the solution you'd like

Include the sqlcomment in db.statement span attribute from SQLAlchemy instrumentation.

Describe alternatives you've considered

No response

Additional Context

Similar to #2936.

Would you like to implement a fix?

None

@alexmojaki
Copy link
Contributor

For correlation of db queries to corresponding query spans

I don't understand this argument. The comment (e.g. /*db_driver='mysqlconnector',db_framework='sqlalchemy%%3A2.0.36',traceparent='00-3b654158a48190e458f6c4e15e125814-b62c6e3e794e9f05-01'*/) is already in the query, the point being that you can find it in the database logs which are separate from opentelemetry. The values inside could be included as actual structured queryable attributes, e.g. db_driver='mysqlconnector' could be a single attribute. traceparent in particular is the thing that would be used for correlation but it's mostly just the trace and span ID which are already in the span, it's not adding information to include it in db.statement.

I think this behaviour should be optional and off by default. It makes it much less efficient to store and query the db.statement attribute. Previously making the same query 1000 times would only produce one unique value of db.statement, now it produces 1000.

@tammy-baylis-swi
Copy link
Contributor Author

Thanks for your input @alexmojaki !

It is indeed not needed for correlation. But it's convenient for some backends as a diagnostic containing the "true value" of a full query statement that was sent to a db server.

I get the concern for db.statement uniqueness! In the short term, I'm working on making full sqlcomment inclusion in db.statement attribute an opt-in feature: #3107. Please let me know if you have any thoughts on how it could work.

For longer term, we should generally update how the Python db client driver/ORM instrumentors are adhering to the db and mysql semconv. We currently have an open epic to migrate the most popular frameworks' instrumentors to the current semconv. Part of this is updating db.statement to db.query.text (completed in this issue).

Regarding uniqueness, db.query.text as currently defined does warn that it can still have high cardinality. As part of semconv upgrade (or after that), we should start populating db.query.summary as the low-cardinality representation. Else backends can do a normalization after telemetry is exported.

Regarding

actual structured queryable attributes, e.g. db_driver='mysqlconnector'

I think this would be nice in addition to defining db.query.text and db.query.summary, so maybe it's also an addition to the semconv?

@alexmojaki
Copy link
Contributor

Regarding uniqueness, db.query.text as currently defined does warn that it can still have high cardinality. As part of semconv upgrade (or after that), we should start populating db.query.summary as the low-cardinality representation. Else backends can do a normalization after telemetry is exported.

Yes it can be high cardinality, but the lower cardinality (without losing info like db.query.summary) the more useful. As well as improving performance, it's more useful for the user as they can do things like find the most common queries easily. So I don't think that making this opt-in should be considered just a short term solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants