- 
                Notifications
    
You must be signed in to change notification settings  - Fork 161
 
feat(csharp/src/Drivers/BigQuery): Enhanced tracing and large resultset improvements #3022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
      
          
      
      
            davidhcoe
  
      
      
      commented
        Jun 26, 2025 
      
    
  
- Adds tracing for the BigQuery driver
 - Modifies the behavior for large result sets:
- Uses a default dataset ID for large result sets if one is not specified.
 - The caller can specify their own dataset. If it does not exist, the driver will attempt to create it.
 
 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally ...
- You won't need the "Action" tag - it's automatically added as the activity name with the 
TraceActivitycall. - Consider tag keys carefully to try to keep "semantic convention" recommendations.
 
| catch (Exception ex) | ||
| { | ||
| activity?.AddBigQueryTag("retry_attempt", retryCount); | ||
| activity?.AddException(ex); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly right. A handled exception should still be traced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've left some feedback and some questions.
| 
               | 
          ||
| AdbcStatement statement = adbcConnection.CreateStatement(); | ||
| statement.SqlQuery = environment.Query; | ||
| statement.SqlQuery = "select * from mashuptest-154002.NWIND.AdbcAllTypes"; // environment.Query; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be reverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted in recent PR
| // default value per https://pkg.go.dev/cloud.google.com/go/bigquery#section-readme | ||
| public const string DetectProjectId = "*detect-project-id*"; | ||
| 
               | 
          ||
| // What ODBC uses | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider amending the comment to explain why we would use the same value as ODBC (versus e.g. _bqadbc_temp_tables).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment, but I wonder if using _bqadbc_temp_tables might actually be better? On the one hand, assuming the dataset had already been created, it would just reuse that. On the other hand, having a specific one dedicated to ADBC could be useful in some scenarios if people want to understand the usage patterns.
| { | ||
| activity?.AddBigQueryTag("update_token.status", "Required"); | ||
| await tokenProtectedResource.UpdateToken(); | ||
| activity?.AddBigQueryTag("update_token.status", "Completed"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the behavior when adding the same tag twice? It keeps the most recent one or it keeps all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For AddTag, it will keep all of them. For SetTag, it will override and keep the latest.
| 
               | 
          ||
| internal static string GetAssemblyVersion(Type type) => FileVersionInfo.GetVersionInfo(type.Assembly.Location).ProductVersion ?? string.Empty; | ||
| 
               | 
          ||
| internal static bool TracingToFile() { return false; } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making a property instead of a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a method because I thought I would have to put some logic in there to determine whether a file listener was active, but I could be misunderstanding how that part will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name and added a TODO in there for now.
| const string infoVendorName = "BigQuery"; | ||
| const string infoDriverArrowVersion = "19.0.0"; | ||
| const string publicProjectId = "bigquery-public-data"; | ||
| readonly string? infoDriverArrowVersion; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an instance field while the next two are static fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in PR
| { | ||
| UpdateClientToken(); | ||
| } | ||
| activity?.AddTag(key + ".set", value); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a bad idea to log an access token under any circumstances. Are there other sensitive option values we should also avoid logging (or tie to BigQueryUtils.TracingToFile, which looks like a proxy for "is safe to log sensitive information")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an opt-in list in the BigQueryParameters to indicate which BigQueryParameter is safe to log no matter what the listener is.
| readonly string? infoDriverArrowVersion; | ||
| 
               | 
          ||
| private static readonly string s_assemblyName = BigQueryUtils.GetAssemblyName(typeof(BigQueryStatement)); | ||
| private static readonly string s_assemblyVersion = BigQueryUtils.GetAssemblyVersion(typeof(BigQueryStatement)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of capturing these three times in BigQueryConnection, BigQueryStatement and BigQueryStatement.MultiArrowReader, consider storing them in just BigQueryUtils and referencing them from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in PR
| { | ||
| // Ideally we wouldn't need to indirect through a stream, but the necessary APIs in Arrow | ||
| // are internal. (TODO: consider changing Arrow). | ||
| activity?.AddBigQueryTag("read_stream", streamName); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the stream name system-generated or can it expose an arbitrary user identifier?
I'll stop asking this question, but might apply to other things like the destination table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's system generated in the form projects/{project_id}/locations/{location}/sessions/{session_id}/streams/{stream_id}. Do you not think that should be logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ids are GUIDs then that seems unlikely to be a problem. @birschick-bq, is there an OTel convention for marking possibly-sensitive data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems OTel
has described the need to be careful, but the only concrete action seems to be to have filters/scrubbers on the OTel Collector via a processor.
https://opentelemetry.io/docs/security/handling-sensitive-data/
I believe we'll need to add overloads to scrub/anonymize potential PII data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up making this a conditional output to only use if wring to a file.
| { | ||
| string destinationTable = keyValuePair.Value; | ||
| case BigQueryParameters.AllowLargeResults: | ||
| options.AllowLargeResults = true ? keyValuePair.Value.ToLower().Equals("true") : false; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two existing and one new instance of s.ToLower().Equals("true") in this file, which is strictly worse than s.Equals("true", StringComparer.OrdinalIgnoreCase).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks largely good! Let me know if you want to check-in as-is or adjust the ToLowers first.
| 
               | 
          ||
| public static bool IsSafeToLog(string name) | ||
| { | ||
| if (safeToLog.Contains(name.ToLower())) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's generally better to declare the HashSet as taking a case-insensitive comparer than to do ToLower as the latter may cause an allocation. (In this case it probably won't because the options are generally defined as lower-case; this is more about the pattern.)
Otherwise, this should probably be ToLowerInvariant() as I don't think we'd want a culture-specific conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I just noticed that there are at least three other ToLower() in this project that maybe should be ToLowerInvariant() -- though in the case of BigQueryConnection.DescToField I'm 1) not sure why we're changing the case of the column name and 2) we're calling ToLower twice on the same value instead of just once and storing it in a local.)