-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
1.If the table is partitioned by date and only the data of the current day is updated, the data before the current day will not be updated. When querying 30 days of data, 29 days can be queried from the cache.
For example, today is 20191226,Data is being imported in real time through StreamLoad.
SELECT event_date,COUNT( event_id ) AS event_count FROM music_event WHERE event_date>=20191127 and event_date <= 20191226 GROUP BY event_date ORDER BY event_date;
The row batch of event_date and event_count of 20191127-20191225 can be obtained from the memory cache, and only the data of 20191226 can be queried from the physical table.
If the first query fails to hit the cache, the query results will be cached.
2.In another case, although the data is not queried by partition key, the data is updated by day. For example, userprofile uses UserID as partition key to query the number of users in each country。
SELECT country,COUNT( UserID ) FROM UserProfile GROUP BY country;
This query result can be cached.
Therefore, queries that are not updated in real time can be cached, or data that is partitioned on a daily basis can be cached on a partition by partition basis only when the most recent partition is updated.
This feature will reduce query time and improve cluster QPS.
The specific design and use details will be supplemented according to the development progress.