Skip to content
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.

_PARTITIONTIME for partitioned datasets not working correctly #321

Closed
kyriazis opened this issue Apr 12, 2021 · 20 comments · Fixed by #385
Closed

_PARTITIONTIME for partitioned datasets not working correctly #321

kyriazis opened this issue Apr 12, 2021 · 20 comments · Fixed by #385
Milestone

Comments

@kyriazis
Copy link

Bug Report

the query sent to BigQuery contains both a timestamp range (that is being measured using TIMESTAMP_MILLIS), and a _PARTITIONTIME range that is used (presumbly) to limit the partitions the query should work on.

The _PARTITIONTIME range limits are the limits of the time range to be displayed. However, that may not be bracketing the labels for the partitions that we want to search.

For example, for a 3hr window centered today, I get a query that contains:

_PARTITIONTIME >= '2021-04-11 20:05:29' AND _PARTITIONTIME < '2021-04-11 23:05:29'

I have created my BigQuery table to do partitioning on a per-month basis. For the single partition that I have, it has an auto-created date of the beginning of the month, namely 2021-04-01 00:00:00.

Consequently, the query does not pick up this partition, and no results are returned

Expected Behavior

Choose a _PARTITIONTIME range that picks the correct partitions for the needed time range

Actual Behavior

the _PARTITIONTIME range uses the display range ignoring the characteristics of each paritition.

Steps to Reproduce the Problem

  1. Create a BigQuery partitioned table with monthly partitioning
  2. Add some time-series data with correct timestamps
  3. Do a query for a small time range (hours) centered around "now".

Specifications

  • Version: git commit 0db71ef (latest master as of 4/11/21)
  • Platform: x86
  • Grafana Version: hub.docker.com -> grafana/grafana:latest as of 4/11/21
@kyriazis
Copy link
Author

In addition, it looks like the date/time that is inserted to compare to _PARTITIONTIME is in local time, while the field is in UTC. Comparison has an timezone offset which gives wrong query results. This is not corrected by checking the "Convert time filter to UTC".

This behavior is the same for both tables partitioned with ingestion time (_PARTITIONTIME field) and also partitioned with a timedate column.

The only tables that work for me are unpartitioned tables.

@bacheson
Copy link

is this project dead? this is a showstopper

@gkyriazis
Copy link

I wonder the same thing. No response whatsoever from doitintl.

@ofir5300
Copy link
Collaborator

This project is live and kicking.
We are currently busy approving previous plugin signed version (v2.0.2) and then we'll be free for additional updates & improvements.

@gkyriazis
Copy link

Thanks Ofir. Does 2.0.2 happen to include fixes for the above? If you are going to look into those issues post-2.0.2 (ie. to be released in 2 .0.3) it would seem that the it will take a long time before they are available for use.

Thank you!

@ofir5300
Copy link
Collaborator

Hi @gkyriazis
Version 2.0.2 is in pending status for a while now, so it does not include fix for the above. (will be approved very soon)
But do not worry, the minute we'll have an update for the above you will be able to use the latest code with unsigned plugin version before even creating v2.0.3.

@odin568
Copy link

odin568 commented Sep 14, 2021

Unfortunately face exatly the same issue.
GCP Cost monitoring partitioned table

@bacheson
Copy link

don't mean to sound like a jerk...and this is free software (although I do pay for grafana cloud) so I understand and apologise for my unjust entitlement but this whole plugin has been broken for half a year now...should we all just move on?

@ofir5300
Copy link
Collaborator

Hi @kyriazis,
I have been able to reproduce that issue and i'm currently working to solve it.

If I may I would like to suggest a workaround to allow you to query partitioned table as you wish for now.

  1. Switch to run the query over SQL rather than query builder
  2. Entirely omit WHERE clause

This way - the query will result with data which its _TIMESTAMP correlates the range configured in the time range picker

@kyriazis
Copy link
Author

Thank you @ofir5300,

yes, I've tried that in the past and it works, however that increases the cost, since there is more data fetched from GCP.

@ofir5300
Copy link
Collaborator

@bacheson @kyriazis
I would like to consult you.

What would you suggest as a good solution, taking in consideration not only that issue but also best practicing time filtering partitioned tables (using Query Builder):

  1. Avoiding _PARTITIONTIME filter when one of the macros are used?
    e.g. Im using __timeFilter with last 6 hours time range , it will only range the time field:
    "time" BETWEEN TIMESTAMP_MILLIS (1637140593954) AND TIMESTAMP_MILLIS (1637162193954)
    Rather than ranging the same time period for _PARTITIONTIME as well

  2. Doing something similar but only when the time range picked does not correlates table's partitioning type?
    e.g. your example when ranging few hours period but the table is using monthly partition (bit more complicated)

  3. Other suggestion?

@kyriazis
Copy link
Author

@ofir5300 , thank you for looking into this. The more I think about it the more complicated it becomes. :-)

Forgive me, it's been a while since I've dealt with this, so I'm describing the logic of how I think it should work.

Let's pick up the various cases:

  1. Non-partitioned tables: Obviously, there should not be any _PARTITIONTIME filter

  2. Partitioned tables: Data may belong to zero, one or many partition tables.

  • 1 partition: This happens when the partition time period is longer than the displayed time period, for example you are selecting 6 hours on a partition that keeps data for a month. In this case, the partition time is less than the beginning of the displayed time, so some search has to happen to find out which partition table is the correct one, since the partition time is less than the left edge of the displayed time period.
  • 2 partitions: That happens when the partition period is smaller than the displayed time, and data from multiple partitions fits on the display. For example you are displaying a year's worth of data with a partition that keeps data for 1 month. Then you'll have to search to find out which partitions have data that belong to the display and run the query for those partitions.
  • 0 partitions: Still some search would have to be done to see if there is any partition that matches the display range. This may fall as a special case of the above, though.

Ideally, the query should fetch "just the right amount" of data to fill the graph. For a graph that is (say) 500 pixels wide, you wouldn't want to transfer a lot more data than 500 points, since data querie cost is proportional to the size of the returned data.

Also, I want to bring again the first comment that I made, which has to do with timezones. _PARTITIONTIME was in UTC (while still filled with local time) while timestamp was in local time.

Thank you!

ofir5300 added a commit that referenced this issue Nov 21, 2021
…ioned (#385)

* Handle time range filtering issue for both partitioned and non-partitioned (fixes #321, #325, #334, #362)

* Fix UTC conversion & usage
@ofir5300 ofir5300 added this to the 2.0.3 milestone Nov 21, 2021
@ofir5300
Copy link
Collaborator

Hopefully solved!

Please give new version a try and let me know what you think..

grafana-cli --pluginUrl https://github.com/doitintl/bigquery-grafana/releases/download/2.0.3/doitintl-bigquery-datasource-2.0.3.zip 
plugins install doitintl-bigquery-datasource

@gkyriazis
Copy link

Sorry for the delay, but finally got back to this..

I installed the new plugin, and then navigating to Configuration->Data sources->Google BigQuery, I am welcomed by this error message:

Invalid regular expression: invalid group specifier name Instantiating http://grafana.lan:3000/public/plugins/doitintl-bigquery-datasource/module.js Loading plugins/doitintl-bigquery-datasource/module

Some simple configuration issue I presume? Thank you!

@ofir5300
Copy link
Collaborator

@gkyriazis Are you using a browser different than Chrome?

@gkyriazis
Copy link

Yes, Safari on Mac (Safari 15.3 (17612.2.9.1.5) on MacOS Monterey 12.2)

I just tried Chrome and it works

@ofir5300
Copy link
Collaborator

ofir5300 commented Feb 22, 2022

FIx need to be made for Safari #393

@gkyriazis
Copy link

Hmm, fix seems to be "downgrade to 2.0.2", which is not going to fix this issue. :-) BTW, when I said "it works" I meant that I can run the module. I haven't verified with PARTITIONTIME yet.

Thanks!

@ofir5300
Copy link
Collaborator

I mean a fix should be published from our side in order to solve Safari issues... Will let you know once done

@gkyriazis
Copy link

trying to see if queries work. I have a simple time series of data points stored in a big query table at a rate of 1 per minute. If I use a time interval of anything less than 12 hours, it works fine. If I use 12 hours or more, then I only get the first (older) portion of the data displayed, as shown in the attached picture. By looking at the queries and re-running them in the console, it looks like I'm getting the right data from bigquery.

Funny thing is if I modify and refresh in the "Edit Panel" mode, then sometimes the query works correctly. Maybe an issue with sorting through the data during display?

Thanks!

Image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants