Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(pinot) regression on area chart (DB engine error) #25749

Closed
martin-raymond opened this issue Oct 24, 2023 · 41 comments
Closed

refactor(pinot) regression on area chart (DB engine error) #25749

martin-raymond opened this issue Oct 24, 2023 · 41 comments
Labels
data:connect:pinot Related to Pinot

Comments

@martin-raymond
Copy link
Contributor

We are using a pinot table as a dataset, with a temporal column startDate in epoch_ms. Before the 3.0.0.rc4 version, we were using it to create the followin area chart :

areachart

Since the release of the 3.0.0.rc4 version, our chart is not loading :

DB engine Erreur : could not convert string to Timestamp

Before 3.0.0.rc4, the query generated by the simple area chart was :

SELECT DATETIMECONVERT(startDate, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:DAYS'), count(*) FROM "dataset" WHERE startDate >= 1694908800000 AND startDate < 1697500800000 GROUP BY DATETIMECONVERT(startDate, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:DAYS') ORDER BY count(*) DESC LIMIT 10000;

(1694908800000 and 1697500800000 are just examples of our time filter)

After the 3.0.0.rc4, the query that causes the error is :

SELECT CAST(DATE_TRUNC('day', CAST(DATETIMECONVERT((startDate/1000), '1:SECONDS:EPOCH', '1:SECONDS:EPOCH', '1:SECONDS') AS TIMESTAMP)) AS TIMESTAMP), count(*) FROM "dataset" WHERE startDate >= 1694908800000 AND startDate < 1697500800000 GROUP BY CAST(DATE_TRUNC('day', CAST(DATETIMECONVERT((startDate/1000), '1:SECONDS:EPOCH', '1:SECONDS:EPOCH', '1:SECONDS') AS TIMESTAMP)) AS TIMESTAMP) ORDER BY count(*) DESC LIMIT 10000;

The part :

DATETIMECONVERT((startDate/1000), '1:SECONDS:EPOCH', '1:SECONDS:EPOCH', '1:SECONDS')

seems odd because it literally said "convert seconds in seconds", and we think it might causes the crash.

We think the problem is caused by : #24942

Thx for your help

  • superset version: 3.0.0.rc4
@zhaoyongjie
Copy link
Member

@martin-raymond I'm wondering it might be a bug in Apache Calcite side because it can't correctly calculate expression DATETIMECONVERT((startDate/1000), '1:SECONDS:EPOCH', '1:SECONDS:EPOCH', '1:SECONDS').

There is a workaround might be useful, 1) try to create a derived column based on the startDate in Pinot as timestamp type instead of long type, 2) then remove epoch_ms tag in column config, 3), syncing column metadata in Dataset Model in Superset. 4) use the new derived column replace with previous one in Chart.

Hope it helps you.

@martin-raymond
Copy link
Contributor Author

martin-raymond commented Oct 25, 2023

we tried that @zhaoyongjie, but with pinot 1.0.0 and superset 3.0.1, the table does not regenerated properly in superset (error on the field timestamp in the schema)

@zhaoyongjie
Copy link
Member

@martin-raymond Could you please post error screenshot or log?

@martin-raymond
Copy link
Contributor Author

@zhaoyongjie yes, here it is :

2023-10-25 17:13:09,270:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False 2023-10-25 17:13:09,270:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000/ and False 2023-10-25 17:13:09,274:DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): pinot-controller.pinot.svc.cluster.local:9000 2023-10-25 17:13:09,280:DEBUG:urllib3.connectionpool:http://pinot-controller.pinot.svc.cluster.local:9000/ "GET /tables/knada-indicateur-visite-v2/schema HTTP/1.1" 200 1350 2023-10-25 17:13:09,281:INFO:pinotdb.sqlalchemy:Getting columns for knada-indicateur-visite-v2 from http://pinot-controller.pinot.svc.cluster.local:9000/: {'schemaName': 'knada-indicateur-visite-v2', 'dimensionFieldSpecs': [{'name': 'profil', 'dataType': 'STRING'}, {'name': 'codeSite', 'dataType': 'STRING'}, {'name': 'libelleSite', 'dataType': 'STRING'}, {'name': 'timezone', 'dataType': 'STRING'}, {'name': 'plateforme', 'dataType': 'STRING'}, {'name': 'service', 'dataType': 'STRING', 'singleValueField': False}, {'name': 'categorieService', 'dataType': 'STRING', 'singleValueField': False}, {'name': 'codeReseau', 'dataType': 'STRING', 'singleValueField': False}, {'name': 'reseau', 'dataType': 'STRING', 'singleValueField': False}, {'name': 'type', 'dataType': 'STRING', 'singleValueField': False}, {'name': 'nature', 'dataType': 'STRING', 'singleValueField': False}], 'dateTimeFieldSpecs': [{'name': 'dateHeureDebut', 'dataType': 'LONG', 'format': '1:MILLISECONDS:EPOCH', 'granularity': '1:MILLISECONDS'}, {'name': 'dateHeureFin', 'dataType': 'LONG', 'format': '1:MILLISECONDS:EPOCH', 'granularity': '1:MILLISECONDS'}, {'name': 'ts', 'dataType': 'TIMESTAMP', 'format': '1:MILLISECONDS:EPOCH', 'granularity': '1:MILLISECONDS'}]} 2023-10-25 17:13:09,282:ERROR:flask_appbuilder.api:'timestamp' Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/api/__init__.py", line 110, in wraps return f(self, *args, **kwargs) File "/app/superset/views/base_api.py", line 127, in wraps raise ex File "/app/superset/views/base_api.py", line 121, in wraps duration, response = time_function(f, self, *args, **kwargs) File "/app/superset/utils/core.py", line 1526, in time_function response = func(*args, **kwargs) File "/app/superset/utils/log.py", line 255, in wrapper value = f(*args, **kwargs) File "/app/superset/databases/api.py", line 728, in table_metadata table_info = get_table_metadata(database, table_name, schema_name) File "/app/superset/databases/utils.py", line 67, in get_table_metadata columns = database.get_columns(table_name, schema_name) File "/app/superset/models/core.py", line 847, in get_columns return self.db_engine_spec.get_columns(inspector, table_name, schema) File "/app/superset/db_engine_specs/base.py", line 1294, in get_columns cast(list[SQLAColumnType], inspector.get_columns(table_name, schema)) File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 497, in get_columns col_defs = self.dialect.get_columns( File "/usr/local/lib/python3.9/site-packages/pinotdb/sqlalchemy.py", line 415, in get_columns columns = [ File "/usr/local/lib/python3.9/site-packages/pinotdb/sqlalchemy.py", line 418, in <listcomp> "type": get_type(spec["dataType"], spec.get("fieldSize")), File "/usr/local/lib/python3.9/site-packages/pinotdb/sqlalchemy.py", line 486, in get_type return type_map[data_type.lower()] KeyError: 'timestamp' 10.42.9.0 - - [25/Oct/2023:17:13:09 +0000] "GET /api/v1/database/1/table/knada-indicateur-visite-v2/knada-indicateur-visite-v2/ HTTP/1.1" 500 26 "https://superset.knada.rancher.kosmos.fr/dataset/add/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"

@martin-raymond
Copy link
Contributor Author

@zhaoyongjie we discussed with the pinot team and the problem should be in superset and not in pinot or apache calcite

@zhaoyongjie
Copy link
Member

@martin-raymond Do you try my suggestion that removing "ms" tag and create a timestamp column instead of integer column as time column in Superset dataset?

@zhaoyongjie
Copy link
Member

@martin-raymond
Could you post the config of startDate column in Superset Dataset ? I guess this column might be a second EPOCH but set it as ms in config.

@martin-raymond
Copy link
Contributor Author

martin-raymond commented Oct 26, 2023

image
@zhaoyongjie stardDate is dateHeureDebut

you want me to try to add a calculated column in the dataset in superset directly ? the timestamp type is not present

@zhaoyongjie
Copy link
Member

@martin-raymond
I tested this case in my local Pinot environment, which is might be DATETIMECONVERT expression got an exception if the first arugument is float value.

image

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Oct 26, 2023

image @zhaoyongjie stardDate is dateHeureDebut

you want me to try to add a calculated column in the dataset in superset directly ?

Yes, from your screenshot the dateHeureDebut is Long.

the timestamp type is not present

Not True, try the expressioncast(dateHeureDebut as timestamp) use it and create a calculated column in Superset.

@martin-raymond
Copy link
Contributor Author

ok @zhaoyongjie . that's not it. there is no error, but the result is not right. Here is the query executed in pinot :
image

The syntax is right, but the result is not the result expected

@martin-raymond
Copy link
Contributor Author

@zhaoyongjie we tried to use a calculated column as you said

superset seems to not be able to display the result from pinot. here are the logs when we try to create a chart :

10.42.9.0 - - [26/Oct/2023:19:47:43 +0000] "GET /api/v1/me/ HTTP/1.1" 200 150 "https://superset.knada.rancher.kosmos.fr/explore/?form_data_key=BjW4HfY8NeJzcurrBzb6L16PZtorS2K4FnVzTPuFXrk6Pfk0YUH1mQOFAuNUK4jc&datasource_type=table&datasource_id=54" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
2023-10-26 19:47:45,016:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'debug': False}: None and False
2023-10-26 19:47:45,038:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-26 19:47:45,038:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000 and False
2023-10-26 19:47:45,062:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-26 19:47:45,062:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000 and False
2023-10-26 19:47:45,066:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-26 19:47:45,066:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000 and False
2023-10-26 19:47:45,072:DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): pinot-broker.pinot.svc.cluster.local:8099
2023-10-26 19:47:45,114:DEBUG:urllib3.connectionpool:http://pinot-broker.pinot.svc.cluster.local:8099 "POST /query/sql HTTP/1.1" 200 3744
2023-10-26 19:47:45,115:DEBUG:pinotdb.db:Got the rows as a type <class 'list'> of size 84
2023-10-26 19:47:45,116:DEBUG:pinotdb.db:[['2023-06-08 00:00:00.0', 443],
 ['2023-03-22 00:00:00.0', 101],
 ['2023-10-06 00:00:00.0', 82],
 ['2023-06-27 00:00:00.0', 80],
 ['2023-03-09 00:00:00.0', 78],
 ['2023-06-28 00:00:00.0', 78],
 ['2023-03-24 00:00:00.0', 74],
 ['2023-05-02 00:00:00.0', 69],
 ['2023-04-19 00:00:00.0', 69],
 ['2023-03-23 00:00:00.0', 65],
 ['2023-03-28 00:00:00.0', 62],
 ['2023-03-21 00:00:00.0', 61],
 ['2023-04-05 00:00:00.0', 58],
 ['2023-06-26 00:00:00.0', 58],
 ['2023-06-15 00:00:00.0', 58],
 ['2023-02-28 00:00:00.0', 56],
 ['2023-02-27 00:00:00.0', 55],
 ['2023-06-06 00:00:00.0', 53],
 ['2023-03-10 00:00:00.0', 52],
 ['2023-05-09 00:00:00.0', 51],
 ['2023-06-16 00:00:00.0', 50],
 ['2023-04-21 00:00:00.0', 49],
 ['2023-06-14 00:00:00.0', 48],
 ['2023-06-30 00:00:00.0', 48],
 ['2023-04-11 00:00:00.0', 48],
 ['2023-05-10 00:00:00.0', 48],
 ['2023-08-30 00:00:00.0', 47],
 ['2023-03-27 00:00:00.0', 46],
 ['2023-05-03 00:00:00.0', 45],
 ['2023-03-13 00:00:00.0', 44],
 ['2023-03-30 00:00:00.0', 44],
 ['2023-05-04 00:00:00.0', 42],
 ['2023-06-02 00:00:00.0', 42],
 ['2023-09-18 00:00:00.0', 41],
 ['2023-03-16 00:00:00.0', 38],
 ['2023-05-05 00:00:00.0', 38],
 ['2023-06-13 00:00:00.0', 38],
 ['2023-05-12 00:00:00.0', 38],
 ['2023-03-15 00:00:00.0', 36],
 ['2023-04-03 00:00:00.0', 36],
 ['2023-03-01 00:00:00.0', 36],
 ['2023-04-12 00:00:00.0', 36],
 ['2023-03-20 00:00:00.0', 36],
 ['2023-04-18 00:00:00.0', 34],
 ['2023-03-14 00:00:00.0', 34],
 ['2023-04-04 00:00:00.0', 33],
 ['2023-03-07 00:00:00.0', 33],
 ['2023-06-07 00:00:00.0', 31],
 ['2023-03-02 00:00:00.0', 31],
 ['2023-06-21 00:00:00.0', 31],
 ['2023-03-31 00:00:00.0', 30],
 ['2023-04-07 00:00:00.0', 29],
 ['2023-08-29 00:00:00.0', 29],
 ['2023-06-19 00:00:00.0', 28],
 ['2023-05-31 00:00:00.0', 27],
 ['2023-05-11 00:00:00.0', 26],
 ['2023-06-20 00:00:00.0', 26],
 ['2023-04-06 00:00:00.0', 25],
 ['2023-06-05 00:00:00.0', 25],
 ['2023-06-01 00:00:00.0', 24],
 ['2023-09-20 00:00:00.0', 23],
 ['2023-03-17 00:00:00.0', 23],
 ['2023-09-19 00:00:00.0', 22],
 ['2023-07-03 00:00:00.0', 22],
 ['2023-06-22 00:00:00.0', 22],
 ['2023-03-29 00:00:00.0', 21],
 ['2023-04-25 00:00:00.0', 20],
 ['2023-04-28 00:00:00.0', 18],
 ['2023-04-27 00:00:00.0', 17],
 ['2023-04-26 00:00:00.0', 17],
 ['2023-03-08 00:00:00.0', 16],
 ['2023-05-08 00:00:00.0', 13],
 ['2023-04-20 00:00:00.0', 12],
 ['2023-05-01 00:00:00.0', 10],
 ['2023-02-24 00:00:00.0', 8],
 ['2023-04-24 00:00:00.0', 8],
 ['2023-09-12 00:00:00.0', 7],
 ['2023-04-17 00:00:00.0', 7],
 ['2023-06-29 00:00:00.0', 6],
 ['2023-08-31 00:00:00.0', 4],
 ['2023-10-05 00:00:00.0', 2],
 ['2023-09-11 00:00:00.0', 1],
 ['2023-05-30 00:00:00.0', 1],
 ['2023-08-28 00:00:00.0', 1]]
10.42.9.0 - - [26/Oct/2023:19:47:45 +0000] "POST /api/v1/chart/data?force HTTP/1.1" 200 472 "https://superset.knada.rancher.kosmos.fr/explore/?form_data_key=BjW4HfY8NeJzcurrBzb6L16PZtorS2K4FnVzTPuFXrk6Pfk0YUH1mQOFAuNUK4jc&datasource_type=table&datasource_id=54" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
10.42.9.0 - - [26/Oct/2023:19:47:46 +0000] "PUT /api/v1/explore/form_data/BjW4HfY8NeJzcurrBzb6L16PZtorS2K4FnVzTPuFXrk6Pfk0YUH1mQOFAuNUK4jc?tab_id=63 HTTP/1.1" 200 75 "https://superset.knada.rancher.kosmos.fr/explore/?form_data_key=BjW4HfY8NeJzcurrBzb6L16PZtorS2K4FnVzTPuFXrk6Pfk0YUH1mQOFAuNUK4jc&datasource_type=table&datasource_id=54" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
10.42.9.0 - - [26/Oct/2023:19:47:46 +0000] "POST /superset/log/?explode=events&dashboard_id=1 HTTP/1.1" 200 1 "https://superset.knada.rancher.kosmos.fr/explore/?form_data_key=BjW4HfY8NeJzcurrBzb6L16PZtorS2K4FnVzTPuFXrk6Pfk0YUH1mQOFAuNUK4jc&datasource_type=table&datasource_id=54" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"

The calculated column :
image
image

(we tried DATETIME and STRING, same result)

There is no error, just nothing is displayed in superset (no result)

@zhaoyongjie
Copy link
Member

@martin-raymond column type should be "timestamp" rather than "string", you don't need to fill out the format datetime input box.

@martin-raymond
Copy link
Contributor Author

@zhaoyongjie yes we tried TIMESTAMP, STRING, with a format datetime, without a format, same results in the end

@zhaoyongjie
Copy link
Member

@zhaoyongjie yes we tried TIMESTAMP, STRING, with a format datetime, without a format, same results in the end

Could you please post screenshot of explore?

@martin-raymond
Copy link
Contributor Author

@zhaoyongjie what do you mean by "explore" ?

@zhaoyongjie
Copy link
Member

@martin-raymond Data Explore page, the other words that open the chart in Superset.

@martin-raymond
Copy link
Contributor Author

martin-raymond commented Oct 27, 2023

image

The query :

SELECT CAST(DATE_TRUNC('day', CAST(cast(dateHeureDebut as timestamp) AS TIMESTAMP)) AS TIMESTAMP),
       COUNT(*)
FROM "knada-indicateur-visite-v2"."knada-indicateur-visite-v2"
GROUP BY CAST(DATE_TRUNC('day', CAST(cast(dateHeureDebut as timestamp) AS TIMESTAMP)) AS TIMESTAMP)
ORDER BY COUNT(*) DESC
LIMIT 10000;

which works fine in pinot

@zhaoyongjie that is what you wanted ?
and thank you very much for your help by the way !

@martin-raymond
Copy link
Contributor Author

image

The drill to detail shows that we are able to get events, but for some reason the display is bogged

@zhaoyongjie
Copy link
Member

image

The drill to detail shows that we are able to get events, but for some reason the display is bogged

Let's focus on the Pinot and Area chart.

@zhaoyongjie
Copy link
Member

image

The query :

SELECT CAST(DATE_TRUNC('day', CAST(cast(dateHeureDebut as timestamp) AS TIMESTAMP)) AS TIMESTAMP),
       COUNT(*)
FROM "knada-indicateur-visite-v2"."knada-indicateur-visite-v2"
GROUP BY CAST(DATE_TRUNC('day', CAST(cast(dateHeureDebut as timestamp) AS TIMESTAMP)) AS TIMESTAMP)
ORDER BY COUNT(*) DESC
LIMIT 10000;

which works fine in pinot

@zhaoyongjie that is what you wanted ? and thank you very much for your help by the way !

Please clear up the "Contribution Mode" and try it again.

@martin-raymond
Copy link
Contributor Author

Aucun is the french word for "no contribution mode" so it is already cleared up @zhaoyongjie

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Oct 27, 2023

Aucun is the french word for "no contribution mode" so it is already cleared up @zhaoyongjie

It's weird. Could you click the three dots on the right corner of Explore page, and check out what Query is? and please check the logs of Superset, does superset acturally send a query?

@martin-raymond
Copy link
Contributor Author

martin-raymond commented Oct 27, 2023

the query :

SELECT CAST(DATE_TRUNC('day', CAST(cast(dateHeureDebut as timestamp) AS TIMESTAMP)) AS TIMESTAMP),
       count(*)
FROM "knada-indicateur-visite-v2"."knada-indicateur-visite-v2"
GROUP BY CAST(DATE_TRUNC('day', CAST(cast(dateHeureDebut as timestamp) AS TIMESTAMP)) AS TIMESTAMP)
ORDER BY count(*) DESC
LIMIT 10000;

in sql lab : we can see that it works fine
image

in explore, we see that the "Results" is loading infinitely

the logs :

172.16.106.82 - - [30/Oct/2023:20:35:13 +0000] "PUT /api/v1/explore/form_data/cS8y0nAsGYlEjAXn_avRVTnk0jNkX8pfWxRvp2KSbMueNESWZFozfLz6mvNhPz4l?tab_id=97 HTTP/1.1" 200 75 "https://superset.knada.rancher.kosmos.fr/explore/?form_data_key=cS8y0nAsGYlEjAXn_avRVTnk0jNkX8pfWxRvp2KSbMueNESWZFozfLz6mvNhPz4l&slice_id=9" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"
2023-10-30 20:35:16,734:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'debug': False}: None and False
2023-10-30 20:35:16,750:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'debug': False}: None and False
2023-10-30 20:35:16,771:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-30 20:35:16,776:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000/ and False
2023-10-30 20:35:16,791:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-30 20:35:16,791:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000/ and False
2023-10-30 20:35:16,806:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-30 20:35:16,810:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000/ and False
2023-10-30 20:35:16,813:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-30 20:35:16,815:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-30 20:35:16,815:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000/ and False
2023-10-30 20:35:16,815:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000/ and False
2023-10-30 20:35:16,820:DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): pinot-broker.pinot.svc.cluster.local:8099
2023-10-30 20:35:16,823:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'dbapi': <module 'pinotdb' from '/usr/local/lib/python3.9/site-packages/pinotdb/__init__.py'>, 'compiler_linting': 1, 'debug': False}: None and False
2023-10-30 20:35:16,823:INFO:pinotdb.sqlalchemy:Updated pinot dialect args from {'host': 'pinot-broker.pinot.svc.cluster.local', 'port': 8099, 'path': 'query/sql', 'scheme': 'http', 'username': None, 'password': None, 'debug': False}: http://pinot-controller.pinot.svc.cluster.local:9000/ and False
2023-10-30 20:35:16,825:DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): pinot-broker.pinot.svc.cluster.local:8099
2023-10-30 20:35:16,848:DEBUG:urllib3.connectionpool:http://pinot-broker.pinot.svc.cluster.local:8099/ "POST /query/sql HTTP/1.1" 200 3744
2023-10-30 20:35:16,850:DEBUG:pinotdb.db:Got the rows as a type <class 'list'> of size 84
2023-10-30 20:35:16,852:DEBUG:pinotdb.db:[['2023-06-08 00:00:00.0', 443],
 ['2023-03-22 00:00:00.0', 101],
 ['2023-10-06 00:00:00.0', 82],
 ['2023-06-27 00:00:00.0', 80],
 ['2023-03-09 00:00:00.0', 78],
 ['2023-06-28 00:00:00.0', 78],
 ['2023-03-24 00:00:00.0', 74],
 ['2023-05-02 00:00:00.0', 69],
 ['2023-04-19 00:00:00.0', 69],
 ['2023-03-23 00:00:00.0', 65],
 ['2023-03-28 00:00:00.0', 62],
 ['2023-03-21 00:00:00.0', 61],
 ['2023-04-05 00:00:00.0', 58],
 ['2023-06-26 00:00:00.0', 58],
 ['2023-06-15 00:00:00.0', 58],
 ['2023-02-28 00:00:00.0', 56],
 ['2023-02-27 00:00:00.0', 55],
 ['2023-06-06 00:00:00.0', 53],
 ['2023-03-10 00:00:00.0', 52],
 ['2023-05-09 00:00:00.0', 51],
 ['2023-06-16 00:00:00.0', 50],
 ['2023-04-21 00:00:00.0', 49],
 ['2023-06-14 00:00:00.0', 48],
 ['2023-06-30 00:00:00.0', 48],
 ['2023-04-11 00:00:00.0', 48],
 ['2023-05-10 00:00:00.0', 48],
 ['2023-08-30 00:00:00.0', 47],
 ['2023-03-27 00:00:00.0', 46],
 ['2023-05-03 00:00:00.0', 45],
 ['2023-03-13 00:00:00.0', 44],
 ['2023-03-30 00:00:00.0', 44],
 ['2023-05-04 00:00:00.0', 42],
 ['2023-06-02 00:00:00.0', 42],
 ['2023-09-18 00:00:00.0', 41],
 ['2023-03-16 00:00:00.0', 38],
 ['2023-05-05 00:00:00.0', 38],
 ['2023-06-13 00:00:00.0', 38],
 ['2023-05-12 00:00:00.0', 38],
 ['2023-03-15 00:00:00.0', 36],
 ['2023-04-03 00:00:00.0', 36],
 ['2023-03-01 00:00:00.0', 36],
 ['2023-04-12 00:00:00.0', 36],
 ['2023-03-20 00:00:00.0', 36],
 ['2023-04-18 00:00:00.0', 34],
 ['2023-03-14 00:00:00.0', 34],
 ['2023-04-04 00:00:00.0', 33],
 ['2023-03-07 00:00:00.0', 33],
 ['2023-06-07 00:00:00.0', 31],
 ['2023-03-02 00:00:00.0', 31],
 ['2023-06-21 00:00:00.0', 31],
 ['2023-03-31 00:00:00.0', 30],
 ['2023-04-07 00:00:00.0', 29],
 ['2023-08-29 00:00:00.0', 29],
 ['2023-06-19 00:00:00.0', 28],
 ['2023-05-31 00:00:00.0', 27],
 ['2023-05-11 00:00:00.0', 26],
 ['2023-06-20 00:00:00.0', 26],
 ['2023-04-06 00:00:00.0', 25],
 ['2023-06-05 00:00:00.0', 25],
 ['2023-06-01 00:00:00.0', 24],
 ['2023-09-20 00:00:00.0', 23],
 ['2023-03-17 00:00:00.0', 23],
 ['2023-09-19 00:00:00.0', 22],
 ['2023-07-03 00:00:00.0', 22],
 ['2023-06-22 00:00:00.0', 22],
 ['2023-03-29 00:00:00.0', 21],
 ['2023-04-25 00:00:00.0', 20],
 ['2023-04-28 00:00:00.0', 18],
 ['2023-04-27 00:00:00.0', 17],
 ['2023-04-26 00:00:00.0', 17],
 ['2023-03-08 00:00:00.0', 16],
 ['2023-05-08 00:00:00.0', 13],
 ['2023-04-20 00:00:00.0', 12],
 ['2023-05-01 00:00:00.0', 10],
 ['2023-02-24 00:00:00.0', 8],
 ['2023-04-24 00:00:00.0', 8],
 ['2023-09-12 00:00:00.0', 7],
 ['2023-04-17 00:00:00.0', 7],
 ['2023-06-29 00:00:00.0', 6],
 ['2023-08-31 00:00:00.0', 4],
 ['2023-10-05 00:00:00.0', 2],
 ['2023-09-11 00:00:00.0', 1],
 ['2023-05-30 00:00:00.0', 1],
 ['2023-08-28 00:00:00.0', 1]]

@zhaoyongjie

@martin-raymond
Copy link
Contributor Author

Hi @zhaoyongjie and @ege-st do you have any news on the issue ? After discussion with the pinot team, the problem is definitly in superset, and from our perspective, it is a huge issue because it prevents us to be able to use the 3.0.0 for our clients

@zhaoyongjie
Copy link
Member

@martin-raymond I was wondering we should checkout all the list for this case

  1. please confirm remove "ms" or other "python datetime format" in column config
  2. please confirm the type of temporal column is DATETIME instead of LONG/INTERGER or TEXT
  3. please confirm the x-axis used termporal column

BTW, we have deployed this patch and run Pinot about 3 month, everything works fine.

@martin-raymond
Copy link
Contributor Author

@zhaoyongjie thx for your answer

yes i confirm all that, i just re-tried everything from scratch, same results

@zhaoyongjie
Copy link
Member

@zhaoyongjie thx for your answer

yes i confirm all that, i just re-tried everything from scratch, same results

Could you post screenshot as following

  1. Column config in Dataset Edit model, especially, the config of temporal column setting.
  2. Explore Page

@martin-raymond
Copy link
Contributor Author

image
image
image
image

@zhaoyongjie

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Nov 8, 2023

The first column of results seems like TEXT instead of DATATIME from the screenshot of SQLLab. Please check out 1) Pinot side 2) SQLalchemy driver of Pinot 3) or what did you change in Superset codebase? @martin-raymond

@martin-raymond
Copy link
Contributor Author

pinot seems fine. what do you want me to check there ? it worked weel until the new request from 3.0.0
we changed nothing in superset codebase. @zhaoyongjie

@zhaoyongjie
Copy link
Member

What did version of SQLAlchemy Driver use? try following command in your superset cluser.

pip list | grep pinot

@martin-raymond
Copy link
Contributor Author

pinotdb : 0.3.8

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Nov 8, 2023 via email

@martin-raymond
Copy link
Contributor Author

@zhaoyongjie well, it seems to have solved the issue (but we have to use a calculated column as you adviced).
we have a lots of charts to verify now, but it seems to work fine on the first

thank you very much @zhaoyongjie for your help. i will keep you informed here if it works on every charts, then i will close my issue.

@martin-raymond
Copy link
Contributor Author

i confirm all our charts are now ok. thx very much @zhaoyongjie

@martin-raymond
Copy link
Contributor Author

@zhaoyongjie the issue is still ongoing.

The workaround is theoretically correct, but in practice, on our production data, the volume is so significant that casting to a timestamp, applied to the temproel field through the time range filter, results in processing times multiplied by 40, making our service unusable.

This is ultimately due to a faulty query in Superset 3.0.1. Therefore, the problem persists.

@rusackas rusackas added the data:connect:pinot Related to Pinot label Apr 8, 2024
@rusackas
Copy link
Member

rusackas commented Apr 8, 2024

It's been quite a while since this thread was revisited, but it's one of the best support threads I've read on here in a long time! @martin-raymond are you still up against this, and @zhaoyongjie do you have any new theories? I'm wondering if this is still an issue in Superset 3.1 / 4.0. If there is a faulty query being generated, maybe you can share what it looks like now?

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Apr 8, 2024 via email

@rusackas
Copy link
Member

rusackas commented Apr 8, 2024

Agreed. I'll be optimistic here and just close this one. If @martin-raymond is still facing this issue, we can revisit and/or reopen as necessary.

@rusackas rusackas closed this as completed Apr 8, 2024
@martin-raymond
Copy link
Contributor Author

we changed all of our temporal field in our pinot table, so we skipped the problem completely. the thread can stay closed, thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:pinot Related to Pinot
Projects
None yet
Development

No branches or pull requests

3 participants