You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECT"URL", COUNT(*) AS PageViews FROM hits WHERE"CounterID"=62AND"EventDate"::INT::DATE>='2013-07-01'AND"EventDate"::INT::DATE<='2013-07-31'AND"DontCountHits"=0AND"IsRefresh"=0AND"URL"<>''GROUP BY"URL"ORDER BY PageViews DESCLIMIT10;
SELECT"Title", COUNT(*) AS PageViews FROM hits WHERE"CounterID"=62AND"EventDate"::INT::DATE>='2013-07-01'AND"EventDate"::INT::DATE<='2013-07-31'AND"DontCountHits"=0AND"IsRefresh"=0AND"Title"<>''GROUP BY"Title"ORDER BY PageViews DESCLIMIT10;
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
./datafusion-cli-44 -c "SELECT \"URL\", COUNT(*) AS PageViews FROM 'hits.parquet' WHERE \"CounterID\" = 62 AND \"EventDate\"::INT::DATE >= '2013-07-01' AND \"EventDate\"::INT::DATE <= '2013-07-31' AND \"DontCountHits\" = 0 AND \"IsRefresh\" = 0 AND \"URL\" <> '' GROUP BY \"URL\" ORDER BY PageViews DESC LIMIT 10; "
And made flamegraphs with
sudo flamegraph -- ./datafusion-cli-43 -c "SELECT \"URL\", COUNT(*) AS PageViews FROM 'hits.parquet' WHERE \"CounterID\" = 62 AND \"EventDate\"::INT::DATE >= '2013-07-01' AND \"EventDate\"::INT::DATE <= '2013-07-31' AND \"DontCountHits\" = 0 AND \"IsRefresh\" = 0 AND \"URL\" <> '' GROUP BY \"URL\" ORDER BY PageViews DESC LIMIT 10;
Here is DataFusion 43:
Here is DataFusion 44:
A largre amount of the time is spent decoding ParquetMetadata
Given how much time is spent decoding ParquetMetadata, maybe it would be good to add some sort of small built in cache for parquet metadata 🤔 I think @Ted-Jiang made hooks to do this a long time ago but we don't have anything in by default
Is your feature request related to a problem or challenge?
@pmcgleenon ran ClickBench on DataFusion 44 ❤
44.0.0
#13983 (comment)Here are the results of ClickBench across several DataFusion versions:
clickbench-latest.html.zip
Q36 and Q37 look like they got slower
Describe the solution you'd like
Investigate (and hopefully restore) the performance in Q36 and Q37
Here are the queries (note the queries are numbered starting at 0 but the line numbers start at 1):
datafusion/benchmarks/queries/clickbench/queries.sql
Lines 37 to 38 in 0d9f845
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: