You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DIAMetrics is an end-to-end benchmarking and performance framework for query engines developed by Google.
Componenets
Note that there are more details than mentioned here; this is only as an overview, and if we need to add details about more parts, we can do that further down the line
Workload Extractor:
According to the paper, this component extracts a "representative workload" from a live production workload. "DIAMetrics employs a workload extractor and summarizer, which is a feature-based way to ‘mine’ the query logs of a customer and extract a subset of queries that adequately represent the workload of the customer."
For our current purposes, I feel like the best way we can utilize a component like this is to pinpoint a set of heavy workloads that we can keep a list of and then just run those workloads for the time being. To this end, I am working on a PR that will hopefully bring more XQuery files for us to run against from this repository. I will update this issue with a PR number so that we can keep track of everything.
Data and Query Scrambler
This component aims to help protect sensitive data and create variations of the representative sets to prevent sensitive data leakage. The paper lists off a few ways that they achieve this, but for the time being, we can put less emphasis on this part since we will use this internally for the moment.
Workload Runner
According to the paper, this component "allows users to specify various combinations of workloads and systems to be benchmarked. For instance, we may want to run TPC-H on various query engines over various storage formats to see which storage format is the best option for which engine." The runner can either schedule runs of specific engines or spin up and manage (including cleanup and shutdown) entire engine instances for the runs
Monitoring
There are two parts to this:
Visualization Framework - which brings up dashboards
Alerting Framework - which compares workload performance to historical data and alerts when there iareconcerns
Background on DIAMetrics
DIAMetrics is an end-to-end benchmarking and performance framework for query engines developed by Google.
Componenets
Note that there are more details than mentioned here; this is only as an overview, and if we need to add details about more parts, we can do that further down the line
Workload Extractor:
According to the paper, this component extracts a "representative workload" from a live production workload. "DIAMetrics employs a workload extractor and summarizer, which is a feature-based way to ‘mine’ the query logs of a customer and extract a subset of queries that adequately represent the workload of the customer."
For our current purposes, I feel like the best way we can utilize a component like this is to pinpoint a set of heavy workloads that we can keep a list of and then just run those workloads for the time being. To this end, I am working on a PR that will hopefully bring more XQuery files for us to run against from this repository. I will update this issue with a PR number so that we can keep track of everything.
Data and Query Scrambler
This component aims to help protect sensitive data and create variations of the representative sets to prevent sensitive data leakage. The paper lists off a few ways that they achieve this, but for the time being, we can put less emphasis on this part since we will use this internally for the moment.
Workload Runner
According to the paper, this component "allows users to specify various combinations of workloads and systems to be benchmarked. For instance, we may want to run TPC-H on various query engines over various storage formats to see which storage format is the best option for which engine." The runner can either schedule runs of specific engines or spin up and manage (including cleanup and shutdown) entire engine instances for the runs
Monitoring
There are two parts to this:
TODO (more to come as we get further along)
The text was updated successfully, but these errors were encountered: