Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opa eval: add '--count=#' flag #3655

Merged

Conversation

srenatus
Copy link
Contributor

@srenatus srenatus commented Jul 16, 2021

opa eval --count=# --profile will now evaluate the query # times,
and display aggregated results for metrics and the expression-time
profile.

Before, opa eval --profile would yield output like this (JSON):

"metrics": {
  "timer_rego_external_resolve_ns": 233,
  "timer_rego_load_files_ns": 138866,
  "timer_rego_module_compile_ns": 432654,
  "timer_rego_module_parse_ns": 81454,
  "timer_rego_query_compile_ns": 54892,
  "timer_rego_query_eval_ns": 135624,
  "timer_rego_query_parse_ns": 3886
},
"profile": [
  {
    "total_time_ns": 75705,
    "num_eval": 4,
    "num_redo": 2,
    "location": {
      "file": "t.rego",
      "row": 8,
      "col": 2
    }
  },

Now, both of these are replaced by "aggragated_" variants that
include some statistics:

"aggregated_metrics": {
  "timer_rego_external_resolve_ns": {
    "75%": 1618.5,
    "90%": 1933.8000000000002,
    "95%": 1954,
    "99%": 1954,
    "99.9%": 1954,
    "99.99%": 1954,
    "count": 10,
    "max": 1954,
    "mean": 1137.1,
    "median": 1140,
    "min": 311,
    "stddev": 514.9278493148337
  },

and

"aggregated_profile": [
  {
    "total_time_ns_stats": {
      "75%": 63369.75,
      "90%": 69523.5,
      "95%": 69691,
      "99%": 69691,
      "99.9%": 69691,
      "99.99%": 69691,
      "count": 10,
      "max": 69691,
      "mean": 39030.9,
      "median": 53758.5,
      "min": 3390,
      "stddev": 27635.790954666016
    },
    "num_eval": 1,
    "num_redo": 1,
    "location": {
      "file": "t.rego",
      "row": 9,
      "col": 2
    }
  }

The table format has been adjusted as well, but quickly becomes
unwieldy:

+------------------------------+---------+----------+---------------+----------------+---------------+
|            METRIC            |   MIN   |   MAX    |     MEAN      |      90%       |      99%      |
+------------------------------+---------+----------+---------------+----------------+---------------+
| timer_rego_load_files_ns     | 349969  | 2549399  | 1.4760619e+06 | 2.5312689e+06  | 2.549399e+06  |
| timer_rego_module_compile_ns | 1087507 | 24537496 | 1.120074e+07  | 2.41699473e+07 | 2.4537496e+07 |
| timer_rego_module_parse_ns   | 275531  | 1915263  | 1.126406e+06  | 1.9016968e+06  | 1.915263e+06  |
| timer_rego_query_compile_ns  | 61663   | 64395    | 63062.5       | 64374.1        | 64395         |
| timer_rego_query_eval_ns     | 161812  | 1198092  | 637754        | 1.1846622e+06  | 1.198092e+06  |
| timer_rego_query_parse_ns    | 6078    | 6078     | 6078          | 6078           | 6078          |
+------------------------------+---------+----------+---------------+----------------+---------------+
+----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+
|   MIN    |     MAX     |    MEAN     |     90%     |     99%     | NUM EVAL | NUM REDO |    LOCATION     |
+----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+
| 43.875µs | 26.135469ms | 11.494512ms | 25.746215ms | 26.135469ms | 1        | 1        | data.rbac.allow |
| 21.478µs | 211.461µs   | 98.102µs    | 205.72µs    | 211.461µs   | 1        | 1        | rbac.rego:13    |
| 19.652µs | 123.537µs   | 73.161µs    | 122.75µs    | 123.537µs   | 1        | 1        | rbac.rego:40    |
| 12.303µs | 117.277µs   | 61.59µs     | 116.733µs   | 117.277µs   | 2        | 1        | rbac.rego:50    |
| 12.224µs | 93.214µs    | 51.289µs    | 92.217µs    | 93.214µs    | 1        | 1        | rbac.rego:44    |
| 5.561µs  | 84.121µs    | 43.002µs    | 83.469µs    | 84.121µs    | 1        | 1        | rbac.rego:51    |
| 5.56µs   | 71.712µs    | 36.545µs    | 71.158µs    | 71.712µs    | 1        | 0        | rbac.rego:45    |
| 4.958µs  | 66.04µs     | 33.161µs    | 65.636µs    | 66.04µs     | 1        | 2        | rbac.rego:49    |
| 4.326µs  | 65.836µs    | 30.461µs    | 65.083µs    | 65.836µs    | 1        | 1        | rbac.rego:6     |
| 3.948µs  | 43.399µs    | 24.167µs    | 43.055µs    | 43.399µs    | 1        | 2        | rbac.rego:55    |
+----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+

Fixes #3651.

@@ -458,6 +475,18 @@ func prettyMetrics(w io.Writer, m metrics.Metrics, limit int) error {
return nil
}

var statKeys = []string{"min", "max", "mean", "stddev", "median", "75%", "90%", "95%", "99%", "99.9%", "99.99%"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the table..what if we do only min, mean, max, 90%, 99%, 99.9% and maybe reduce the precision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, thanks. I suppose we could make it configurable, but if we just keep the JSON format with all the details, then that'll be what to reach for when the pretty abbreviated output isn't enough...

@srenatus srenatus marked this pull request as ready for review July 19, 2021 12:32
@tsandall
Copy link
Member

tsandall commented Jul 19, 2021

I tried this out and the output format made sense to me. @srenatus I think we need to do something with the metrics passed into the evaluator though; with the current implementation, I think the timers are just accumulating:

$ opa eval --metrics --format pretty --count=100 true
true
+-----------------------------+--------+--------+--------+--------+--------+
|           METRIC            |  MIN   |  MAX   |  MEAN  |  90%   |  99%   |
+-----------------------------+--------+--------+--------+--------+--------+
| timer_rego_query_compile_ns | 153800 | 158900 | 156526 | 158500 | 158900 |
| timer_rego_query_eval_ns    | 6500   | 118600 | 63233  | 111110 | 118591 |
| timer_rego_query_parse_ns   | 8700   | 8700   | 8700   | 8700   | 8700   |
+-----------------------------+--------+--------+--------+--------+--------+
$ opa eval --metrics --format pretty --count=1000 true
true
+-----------------------------+--------+---------+----------+-------------+--------------+
|           METRIC            |  MIN   |   MAX   |   MEAN   |     90%     |     99%      |
+-----------------------------+--------+---------+----------+-------------+--------------+
| timer_rego_query_compile_ns | 167200 | 215900  | 192027.1 | 211290      | 215499       |
| timer_rego_query_eval_ns    | 5900   | 1376300 | 748224.4 | 1.28912e+06 | 1.368892e+06 |
| timer_rego_query_parse_ns   | 9000   | 9000    | 9000     | 9000        | 9000         |
+-----------------------------+--------+---------+----------+-------------+--------------+
$ opa eval --metrics --format pretty --count=10000 true
true
+-----------------------------+--------+----------+-----------------------+--------------+---------------+
|           METRIC            |  MIN   |   MAX    |         MEAN          |     90%      |      99%      |
+-----------------------------+--------+----------+-----------------------+--------------+---------------+
| timer_rego_query_compile_ns | 322100 | 996800   | 612382.0038910506     | 938070       | 991200        |
| timer_rego_query_eval_ns    | 335900 | 12166800 | 6.472141634241245e+06 | 1.089422e+07 | 1.2111666e+07 |
| timer_rego_query_parse_ns   | 14600  | 14600    | 14600                 | 14600        | 14600         |
+-----------------------------+--------+----------+-----------------------+--------------+---------------+

Notice the mean goes up by 10x as the --count is increased by 10x. I think we have to find someway to reset the timers on each pass (I'm not sure about other metrics...)

@srenatus
Copy link
Contributor Author

Notice the mean goes up by 10x as the --count is increased by 10x. I think we have to find someway to reset the timers on each pass (I'm not sure about other metrics...)

Oops, my bad. I had looked at the code and believed the metrics had been cleared already, but apparently that isn't the case. Will fix.

@srenatus srenatus force-pushed the sr/cmd/eval-profile-with-count branch 3 times, most recently from 1b06b79 to 616a2d7 Compare July 20, 2021 12:13
tsandall
tsandall previously approved these changes Jul 21, 2021
Copy link
Member

@tsandall tsandall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

`opa eval --count=# --profile` will now evaluate the query # times,
and display aggregated results for metrics and the expression-time
profile.

Before, `opa eval --profile` would yield output like this (JSON):

    "metrics": {
      "timer_rego_external_resolve_ns": 233,
      "timer_rego_load_files_ns": 138866,
      "timer_rego_module_compile_ns": 432654,
      "timer_rego_module_parse_ns": 81454,
      "timer_rego_query_compile_ns": 54892,
      "timer_rego_query_eval_ns": 135624,
      "timer_rego_query_parse_ns": 3886
    },
    "profile": [
      {
        "total_time_ns": 75705,
        "num_eval": 4,
        "num_redo": 2,
        "location": {
          "file": "t.rego",
          "row": 8,
          "col": 2
        }
      },

Now, both of these are replaced by "aggragated_" variants that
include some statistics:

    "aggregated_metrics": {
      "timer_rego_external_resolve_ns": {
        "75%": 1618.5,
        "90%": 1933.8000000000002,
        "95%": 1954,
        "99%": 1954,
        "99.9%": 1954,
        "99.99%": 1954,
        "count": 10,
        "max": 1954,
        "mean": 1137.1,
        "median": 1140,
        "min": 311,
        "stddev": 514.9278493148337
      },

and

    "aggregated_profile": [
      {
        "total_time_ns_stats": {
          "75%": 63369.75,
          "90%": 69523.5,
          "95%": 69691,
          "99%": 69691,
          "99.9%": 69691,
          "99.99%": 69691,
          "count": 10,
          "max": 69691,
          "mean": 39030.9,
          "median": 53758.5,
          "min": 3390,
          "stddev": 27635.790954666016
        },
        "num_eval": 1,
        "num_redo": 1,
        "location": {
          "file": "t.rego",
          "row": 9,
          "col": 2
        }
      }

The table format has been adjusted as well, and only displays a subset of
the statistics gathered:
Especially the 99.9 and 99.99 percentiles hardly matter for the sample
sizes I'd expect to be used here: they will almost certainly match "max".

The output is less unwieldy now:

    +------------------------------+---------+----------+---------------+----------------+---------------+
    |            METRIC            |   MIN   |   MAX    |     MEAN      |      90%       |      99%      |
    +------------------------------+---------+----------+---------------+----------------+---------------+
    | timer_rego_load_files_ns     | 349969  | 2549399  | 1.4760619e+06 | 2.5312689e+06  | 2.549399e+06  |
    | timer_rego_module_compile_ns | 1087507 | 24537496 | 1.120074e+07  | 2.41699473e+07 | 2.4537496e+07 |
    | timer_rego_module_parse_ns   | 275531  | 1915263  | 1.126406e+06  | 1.9016968e+06  | 1.915263e+06  |
    | timer_rego_query_compile_ns  | 61663   | 64395    | 63062.5       | 64374.1        | 64395         |
    | timer_rego_query_eval_ns     | 161812  | 1198092  | 637754        | 1.1846622e+06  | 1.198092e+06  |
    | timer_rego_query_parse_ns    | 6078    | 6078     | 6078          | 6078           | 6078          |
    +------------------------------+---------+----------+---------------+----------------+---------------+
    +----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+
    |   MIN    |     MAX     |    MEAN     |     90%     |     99%     | NUM EVAL | NUM REDO |    LOCATION     |
    +----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+
    | 43.875µs | 26.135469ms | 11.494512ms | 25.746215ms | 26.135469ms | 1        | 1        | data.rbac.allow |
    | 21.478µs | 211.461µs   | 98.102µs    | 205.72µs    | 211.461µs   | 1        | 1        | rbac.rego:13    |
    | 19.652µs | 123.537µs   | 73.161µs    | 122.75µs    | 123.537µs   | 1        | 1        | rbac.rego:40    |
    | 12.303µs | 117.277µs   | 61.59µs     | 116.733µs   | 117.277µs   | 2        | 1        | rbac.rego:50    |
    | 12.224µs | 93.214µs    | 51.289µs    | 92.217µs    | 93.214µs    | 1        | 1        | rbac.rego:44    |
    | 5.561µs  | 84.121µs    | 43.002µs    | 83.469µs    | 84.121µs    | 1        | 1        | rbac.rego:51    |
    | 5.56µs   | 71.712µs    | 36.545µs    | 71.158µs    | 71.712µs    | 1        | 0        | rbac.rego:45    |
    | 4.958µs  | 66.04µs     | 33.161µs    | 65.636µs    | 66.04µs     | 1        | 2        | rbac.rego:49    |
    | 4.326µs  | 65.836µs    | 30.461µs    | 65.083µs    | 65.836µs    | 1        | 1        | rbac.rego:6     |
    | 3.948µs  | 43.399µs    | 24.167µs    | 43.055µs    | 43.399µs    | 1        | 2        | rbac.rego:55    |
    +----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+

Fixes open-policy-agent#3651.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
@srenatus srenatus force-pushed the sr/cmd/eval-profile-with-count branch from 9780fd7 to 3de4bbd Compare July 21, 2021 19:17
@srenatus srenatus merged commit ef716a3 into open-policy-agent:main Jul 21, 2021
@srenatus srenatus deleted the sr/cmd/eval-profile-with-count branch July 21, 2021 19:32
@gabbifish
Copy link

@srenatus, this is amazing!!! Thank you so much for this!

dolevf pushed a commit to dolevf/opa that referenced this pull request Nov 4, 2021
`opa eval --count=# --profile` will now evaluate the query # times,
and display aggregated results for metrics and the expression-time
profile.

Before, `opa eval --profile` would yield output like this (JSON):

    "metrics": {
      "timer_rego_external_resolve_ns": 233,
      "timer_rego_load_files_ns": 138866,
      "timer_rego_module_compile_ns": 432654,
      "timer_rego_module_parse_ns": 81454,
      "timer_rego_query_compile_ns": 54892,
      "timer_rego_query_eval_ns": 135624,
      "timer_rego_query_parse_ns": 3886
    },
    "profile": [
      {
        "total_time_ns": 75705,
        "num_eval": 4,
        "num_redo": 2,
        "location": {
          "file": "t.rego",
          "row": 8,
          "col": 2
        }
      },

Now, both of these are replaced by "aggragated_" variants that
include some statistics:

    "aggregated_metrics": {
      "timer_rego_external_resolve_ns": {
        "75%": 1618.5,
        "90%": 1933.8000000000002,
        "95%": 1954,
        "99%": 1954,
        "99.9%": 1954,
        "99.99%": 1954,
        "count": 10,
        "max": 1954,
        "mean": 1137.1,
        "median": 1140,
        "min": 311,
        "stddev": 514.9278493148337
      },

and

    "aggregated_profile": [
      {
        "total_time_ns_stats": {
          "75%": 63369.75,
          "90%": 69523.5,
          "95%": 69691,
          "99%": 69691,
          "99.9%": 69691,
          "99.99%": 69691,
          "count": 10,
          "max": 69691,
          "mean": 39030.9,
          "median": 53758.5,
          "min": 3390,
          "stddev": 27635.790954666016
        },
        "num_eval": 1,
        "num_redo": 1,
        "location": {
          "file": "t.rego",
          "row": 9,
          "col": 2
        }
      }

The table format has been adjusted as well, and only displays a subset of
the statistics gathered:
Especially the 99.9 and 99.99 percentiles hardly matter for the sample
sizes I'd expect to be used here: they will almost certainly match "max".

The output is less unwieldy now:

    +------------------------------+---------+----------+---------------+----------------+---------------+
    |            METRIC            |   MIN   |   MAX    |     MEAN      |      90%       |      99%      |
    +------------------------------+---------+----------+---------------+----------------+---------------+
    | timer_rego_load_files_ns     | 349969  | 2549399  | 1.4760619e+06 | 2.5312689e+06  | 2.549399e+06  |
    | timer_rego_module_compile_ns | 1087507 | 24537496 | 1.120074e+07  | 2.41699473e+07 | 2.4537496e+07 |
    | timer_rego_module_parse_ns   | 275531  | 1915263  | 1.126406e+06  | 1.9016968e+06  | 1.915263e+06  |
    | timer_rego_query_compile_ns  | 61663   | 64395    | 63062.5       | 64374.1        | 64395         |
    | timer_rego_query_eval_ns     | 161812  | 1198092  | 637754        | 1.1846622e+06  | 1.198092e+06  |
    | timer_rego_query_parse_ns    | 6078    | 6078     | 6078          | 6078           | 6078          |
    +------------------------------+---------+----------+---------------+----------------+---------------+
    +----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+
    |   MIN    |     MAX     |    MEAN     |     90%     |     99%     | NUM EVAL | NUM REDO |    LOCATION     |
    +----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+
    | 43.875µs | 26.135469ms | 11.494512ms | 25.746215ms | 26.135469ms | 1        | 1        | data.rbac.allow |
    | 21.478µs | 211.461µs   | 98.102µs    | 205.72µs    | 211.461µs   | 1        | 1        | rbac.rego:13    |
    | 19.652µs | 123.537µs   | 73.161µs    | 122.75µs    | 123.537µs   | 1        | 1        | rbac.rego:40    |
    | 12.303µs | 117.277µs   | 61.59µs     | 116.733µs   | 117.277µs   | 2        | 1        | rbac.rego:50    |
    | 12.224µs | 93.214µs    | 51.289µs    | 92.217µs    | 93.214µs    | 1        | 1        | rbac.rego:44    |
    | 5.561µs  | 84.121µs    | 43.002µs    | 83.469µs    | 84.121µs    | 1        | 1        | rbac.rego:51    |
    | 5.56µs   | 71.712µs    | 36.545µs    | 71.158µs    | 71.712µs    | 1        | 0        | rbac.rego:45    |
    | 4.958µs  | 66.04µs     | 33.161µs    | 65.636µs    | 66.04µs     | 1        | 2        | rbac.rego:49    |
    | 4.326µs  | 65.836µs    | 30.461µs    | 65.083µs    | 65.836µs    | 1        | 1        | rbac.rego:6     |
    | 3.948µs  | 43.399µs    | 24.167µs    | 43.055µs    | 43.399µs    | 1        | 2        | rbac.rego:55    |
    +----------+-------------+-------------+-------------+-------------+----------+----------+-----------------+

Fixes open-policy-agent#3651.

Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Signed-off-by: Dolev Farhi <farhi.dolev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add --sample flag to eval running in profiler mode
4 participants