[feat] Introduce permanent result storage and support performance results comparisons between runs #3227

vkarak · 2024-07-01T16:55:01Z

This PR enables performance result comparison natively in ReFrame.

New functionality

The --performance-report is extended to accept an optional specifier that controls how the performance results of the current run will be compared to past results.

A new --performance-compare option is provided that allows ReFrame to compare the performance results of two past periods of time.

The syntax of both options is very similar:

--performance-report=<target_period>/<aggr_spec>/<extra_cols>
--performance-compare=<base_period>/<target_period>/<aggr_spec>/<extra_cols>

Each time period is specified in the following simplified syntax:

<time_period> ::= ("now" | <timestamp>) ":" ("now" | <timestamp>) (("+" | "-") [0-9]+ <unit>)?
<timestamp> ::= ([a-z] | [A-Z] | [0-9])+
<unit> ::= "h" | "m" | "s" | "d" | "w"

Allowed timestamps must follow any of the formats below:

%Y%m%d
%Y%m%dT%H%M
%Y%m%dT%H%M%S
%Y%m%dT%H%M%S%z

A time period can also be specified implicitly through a session UUID.
The session UUID is essentially the start timestamp of the run session in the format %Y%m%dT%H%M%S%z and is always printed at the end of the run session.
The session UUID is a unique identifier assigned automatically from ReFrame.
If you want to use a session UUID as a timestamp you should specify it as ^<session_uuid>.

A session UUID can also be passed ~~specified~~ instead of a timestamp ~~with ^<session_uuid>~~, in which case the test cases of that particular session will be used.

NOTE: Test cases are also assigned a UUID which is essentially an extension of the session UUID: <session_uuid>:<run_index>:<test_index>. run_index and test_index is the position of the test case in the run/testcases matrix of the session.

An aggregation specifier follows the syntax:

<aggr_spec> ::= <aggr_op> ":" <extra_group_by>
<aggr_op> ::= "first" | "last" | "mean" | "median" | "min" | "max"
<extra_group_by> ::= ("+" <column_name>)*
<column_name> ::= ([a-z] | [A-Z] | [0-9])+

Finally, the extra columns can be specified with the following syntax:

<extra_columns> ::= ("+" <column_name>)*
<column_name> ::= ([a-z] | [A-Z] | [0-9])+

Note that the above EBNF forms are simplified in terms of the accepted column names and/or timestamps.
Note also that the column names accepted are the the same as the log format specifiers but without the the check_ prefix.

Results are always aggregated by test name, performance variable name and performance variable unit and this cannot be changed. User can only provide additional columns to group.

Similarly, the output will contain all the group by columns plus any additional columns specified by the user. For non-aggregated columns, the unique values across all samples will be combined as a single string using the | separator.

Examples

Compare the current run with the mean performance of the last 7 days grouping also by system partition and environment and listing also the test result.

--performance=now:now-7d/mean:+system+partition+environment/+result

Compare the median performance of two session runs and group by results by nodelist and print also the test result

--performance-compare=2df155f4-a6bc-443a-ab95-2b37b4b1b2e7/^d13d0498-ceee-4df2-971c-8a844af71b8a/median:+job_nodelist/+result

Key design concepts

Upon finishing running the tests and after generating the JSON run report, ReFrame stores the run results in a database. The full JSON report blob is stored and the test cases of the run are indexed by their job_completion_time_unix.

Querying results is purposefully kept simple and is fulfilled by ~~a single API~~ two API calls to fetch all the test cases that have run in a certain period of time or in a specific session. ~~Even querying by a session UUID goes through this path, by first obtaining the time span of the session.~~

The obtained test cases are then grouped by the specified attributes and their performance values are aggregated using the aggregation function.

Finally, a 2D array with the table data is returned to the caller.

New options for querying session and test case data:

--delete-stored-session <uuid>: delete a stored session by its UUID
--describe-stored-session <uuid>: return a detailed JSON record about a specific session
--describe-stored-testcases <session_uuid>|<testcase_uuid>|<time_period>: return a detailed JSON about a specific test case or about the test cases of the specific session or time period
--list-stored-sessions: List brief information about all stored sessions
--list-stored-testcases <session_uuid>|<testcase_uuid>|<time_period>: List brief information about a specific test case or about the test cases of the specific session or time period

Implementation details

The database file is placed inside the report file folder and is named results.db and it cannot be changed.

Also the report format is changed thus we bump its data version to 4.0. The check_vars and check_params are flattened and are stored inside the testcase.

Code organization

The existing reporting code is considerably refactored and obviously new code is added. Here is a summary of the structure:

reframe.frontend.statistics is removed and the TestStats is deprived of any reporting functionality. Its definition is moved to reframe.frontend.executors and it's only purpose is to hold statistics of the run as ReFrame runs the tests.
reframe.frontend.printer gets all the responsibility for formatting and presenting the various reports (failure report, retry report, performance report etc.)
reframe.frontend.reporting holds everything related to generating a run report and restoring a session from a run report. It also contains all the analytics functionality described above.
reframe.frontend.reporting.storage provides an abstraction to the backend storage for storing run reports. It essentially provides the basic API for querying the test cases that have run in a certain period of time.

Finally, this PR fixes some coding style issues to make flake8 happy again.

Other changes

Default format of --timestamp-dirs uses %Y instead of %y which was a typo.
Reports are only generated if non-empty.
Some general coding style fixes.
Fixed buggy behaviour of --timestamp-dirs that did not respect the respective environment variable. This was a limitation to the current implementation of ArgumentParser, so we had to enhance our own version also in the aspect to support the scenario of optional arguments with default values (ie., `nargs='?', const='value').
Support for using always unqualified hostnames in local scheduler through a new configuration parameter. The reason for adding this option to enable grouping of performance results by nodelists even when the same test has run through a scheduler (which usually uses an unqualified names) or locally.

Todos

Open issues to discuss

~~Add the option to not store the json reports at all as now these are stored in the database? This would require us to add an option to list and show old reports from the command line.~~
- This is now option (defaults to generating files)
~~We likely need an option to list quickly the stored sessions by UUID so as to be able to easily select two sessions.~~
- See added options above

Closes #2733.

pep8speaks · 2024-07-01T16:55:14Z

Hello @vkarak, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2024-08-28 13:30:22 UTC

reframe/frontend/reporting/utility.py

reframe/frontend/cli.py

ekouts · 2024-07-03T10:12:44Z

Still testing the PR and going through it. Just some preliminary comment regarding the open issues for discussion:

Add the option to not store the json reports at all as now these are stored in the database? This would require us to add an option to list and show old reports from the command line.

We likely need an option to list quickly the stored sessions by UUID so as to be able to easily select two sessions.

I think both are good ideas. The second one in particular is useful to not go through the report manually to find it.

I would like to add some more items to discuss:

As far as I can tell, testcases that are generated by --distribute are treated as different tests. I think it would be good to be able to aggregate the performance results of testcases that are run in different nodes (but same partitions).

Results are always aggregated by test name, performance variable name and performance variable unit and this cannot be changed.

Since the variable has a "special" name it shouldn't be too hard to be able to ignore this, right?
Would it make sense to give the options to users to remove certain UUID from the database? If you have some "invalid" results then the only options is to remove the whole database now or do it outside of reframe. If we don't want to support it through reframe, maybe we can have a small explanation of how to manually "change" the db?

ekouts

Some general issues:

Not sure how we would properly address this but let's say you are comparing two runs and using max as aggregator. What would be the way to find the uuid of the session? Currently you have to manually look in the report, right?
Same issue of the --distribute comes up with --repeat.
In the performance report now you see only one entry for each test (in case of retries). This is a different behaviour than the past one and I think some people may prefer to see the individual results.

We are not printing anymore the title

=======================
PERFORMANCE REPORT
----------------------------

I tried running two reframe runs in parallel and the database can get silently corrupted. Maybe not an immediate issue, but good to keep in mind.
From the first run I got:

┍━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━━━━━━━━━┯━━━━━━━━━━┑
│ name                  │ pvar       │     pval │ punit   │ pdiff   │ job_nodelist    │ result   │
┝━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━━━━━━━━━┿━━━━━━━━━━┥
│ Foorun %$repeat_no=1  │ extract_bw │ 0.175468 │ MB/s    │ -63.21% │ vpn-110.cscs.ch │ fail     │
├───────────────────────┼────────────┼──────────┼─────────┼─────────┼─────────────────┼──────────┤
│ Foorun %$repeat_no=0  │ extract_bw │ 2.33529  │ MB/s    │ -60.05% │ vpn-110.cscs.ch │ fail     │
├───────────────────────┼────────────┼──────────┼─────────┼─────────┼─────────────────┼──────────┤
│ Foorun2 %$repeat_no=1 │ copy_bw    │ 2.33805  │ MB/s    │ -74.89% │ vpn-110.cscs.ch │ fail     │
├───────────────────────┼────────────┼──────────┼─────────┼─────────┼─────────────────┼──────────┤
│ Foorun2 %$repeat_no=0 │ copy_bw    │ 3.36739  │ MB/s    │ -23.22% │ vpn-110.cscs.ch │ fail     │
┕━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━━━━━━━━━┷━━━━━━━━━━┙
Current session stored with UUID: 20240703T161944
(printed the run_report as well: run-report-601.json)

The second:

┍━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━━━━━━━━━┯━━━━━━━━━━┑
│ name    │ pvar       │     pval │ punit   │ pdiff    │ job_nodelist    │ result   │
┝━━━━━━━━━┿━━━━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━━━━━━━━━┿━━━━━━━━━━┥
│ Foorun2 │ copy_bw    │ 0.387238 │ MB/s    │ -95.21%  │ vpn-110.cscs.ch │ fail     │
├─────────┼────────────┼──────────┼─────────┼──────────┼─────────────────┼──────────┤
│ Foorun  │ extract_bw │ 5.07053  │ MB/s    │ +521.14% │ vpn-110.cscs.ch │ pass     │
┕━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━━━━━━━━━┷━━━━━━━━━━┙
Current session stored with UUID: 20240703T161944
run report file: run-report-600.json

And then when I run with --performance-compare="^20240703T161944/^20240703T161944/mean:+system+partition+environ/+result" I get:

┍━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━┯━━━━━━━━━━┯━━━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━┑
│ name                 │ pvar       │     pval │ punit   │ pdiff   │ system   │ partition   │ environ   │ result   │
┝━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━┿━━━━━━━━━━┿━━━━━━━━━━━━━┿━━━━━━━━━━━┿━━━━━━━━━━┥
│ Foorun2              │ copy_bw    │ 0.387238 │ MB/s    │ +0.00%  │ generic  │ default     │ builtin   │ fail     │
├──────────────────────┼────────────┼──────────┼─────────┼─────────┼──────────┼─────────────┼───────────┼──────────┤
│ Foorun               │ extract_bw │ 5.07053  │ MB/s    │ +0.00%  │ generic  │ default     │ builtin   │ pass     │
├──────────────────────┼────────────┼──────────┼─────────┼─────────┼──────────┼─────────────┼───────────┼──────────┤
│ Foorun %$repeat_no=1 │ extract_bw │ 0.175468 │ MB/s    │ +0.00%  │ generic  │ default     │ builtin   │ fail     │
┕━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━┷━━━━━━━━━━┷━━━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━┙

reframe/frontend/reporting/utility.py

reframe/frontend/cli.py

reframe/schemas/runreport.json

vkarak · 2024-07-08T09:35:39Z

As far as I can tell, testcases that are generated by --distribute are treated as different tests. I think it would be good to be able to aggregate the performance results of testcases that are run in different nodes (but same partitions).

I think the --distribute option has two problems:

The parameters added added by the test generation mechanism alter the test name; the same problem is shared with all other test generation options.
For the --distribute option we also modify the base test name adding the partition name to it, which also makes it different from the same test that has run on the same partition and same node.

I think the solution to this naming problem should probably be independent to how we just treat it in the results DB, but I agree that it needs fixing in this context.

Would it make sense to give the options to users to remove certain UUID from the database? If you have some "invalid" results then the only options is to remove the whole database now or do it outside of reframe. If we don't want to support it through reframe, maybe we can have a small explanation of how to manually "change" the db?

~~I think it makes sense in general.~~

This is now possible with the --delete-stored-session option.

Not sure how we would properly address this but let's say you are comparing two runs and using max as aggregator. What would be the way to find the uuid of the session? Currently you have to manually look in the report, right?

If you add +session_uuid to the extra columns argument, it should show all the unique session UUIDs that the aggregation came from. The same happens for any column that you select. ~~I have tried this with test attributes but not with the session UUID, so if it doesn't work, I should have a look at it.~~

In the performance report now you see only one entry for each test (in case of retries). This is a different behaviour than the past one and I think some people may prefer to see the individual results.

You should still be getting now different entries as each repeated test has a different name, which is something we want to change though. But in any case I don't believe we should offer this sort of customization for the performance report at the moment given that the raw performance results are available in the json reports, in the database and the perflogs.

~~Nonetheless, I believe you can still get the same if you added +pval to the extra columns (haven't tried it).~~

~~Maybe we could get this by grouping also by +run_index (again we should try that as I haven't :-) )~~

Grouping by +runid gives the expected result now.

We are not printing anymore the title

~~I have fixed that but didn't push yet :-)~~

Fixed.

I tried running two reframe runs in parallel and the database can get silently corrupted. Maybe not an immediate issue, but good to keep in mind.

~~Interesting will have a 👀 I thought that sqlite3 was handling multiple processes... Will try to reproduce.~~

Fixed, we are now using file locks to guard against race conditions.

ekouts · 2024-07-09T07:57:31Z

If you add +session_uuid to the extra columns argument, it should show all the unique session UUIDs that the aggregation came from. The same happens for any column that you select. I have tried this with test attributes but not with the session UUID, so if it doesn't work, I should have a look at it.

Maybe we could get this by grouping also by +run_index (again we should try that as I haven't :-) )

Hm I am getting a key error for both +run_index and +session_uuid. If you try let me know.

vkarak · 2024-07-09T16:38:20Z

Hm I am getting a key error for both +run_index and +session_uuid. If you try let me know.

Indeed, since these refer only to test case attributes at the moment.

vkarak · 2024-07-10T17:59:57Z

Not sure how we would properly address this but let's say you are comparing two runs and using max as aggregator. What would be the way to find the uuid of the session? Currently you have to manually look in the report, right?

A follow up comment regarding this. Currently, extra columns are aggregated only for the base period and not from the target one. I see it as analyzing the current period and comparing it to a target one, so the extra information should refer to the base one. Users could simply swap the period arguments to --performance-comparison to get this extra information for the target period.

vkarak · 2024-07-11T12:54:46Z

Hm I am getting a key error for both +run_index and +session_uuid. If you try let me know.

Now this is supported with +run_index and +session_uuid.

ekouts

I would like to test it a bit more before we merge, but a couple of small comments.

requirements.txt

reframe/schemas/runreport.json

docs/manpage.rst

Co-authored-by: Theofilos Manitaras <manitaras@cscs.ch>

vkarak added request for enhancement prio: normal labels Jul 1, 2024

vkarak added this to the ReFrame 4.7 milestone Jul 1, 2024

vkarak requested review from ekouts, victorusu and teojgo July 1, 2024 16:55

vkarak self-assigned this Jul 1, 2024

victorusu reviewed Jul 2, 2024

View reviewed changes

reframe/frontend/reporting/utility.py Show resolved Hide resolved

ekouts reviewed Jul 3, 2024

View reviewed changes

reframe/frontend/cli.py Show resolved Hide resolved

ekouts reviewed Jul 3, 2024

View reviewed changes

vkarak force-pushed the feat/compare-past-results branch from 7be3dd2 to e9f34ef Compare July 9, 2024 16:32

vkarak force-pushed the feat/compare-past-results branch from 63285ba to fa078f4 Compare July 11, 2024 06:59

vkarak changed the title ~~[feat] Support performance results comparisons~~ [feat] Introduce permanent result storage and support performance results comparisons between current and past runs Jul 12, 2024

ekouts reviewed Aug 5, 2024

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

reframe/schemas/runreport.json Show resolved Hide resolved

vkarak mentioned this pull request Aug 5, 2024

Specify run-report path from within a test #3240

Closed

ekouts and others added 8 commits August 6, 2024 19:09

Index job reports

5e90f69

Refine report indexing

36a3efd

Performance comparison of past results

4438503

Refactor reporting code

71c777d

Support for selecting sessions explicitly

e73e7d6

Fix unit tests

e6afa06

Fix coding style issues

7a30788

Fix tabulate requirement for Python 3.6

fe36ed5

vkarak force-pushed the feat/compare-past-results branch from 97e1719 to d2e0844 Compare August 12, 2024 09:36

vkarak added 6 commits August 13, 2024 20:19

Fix broken nodelist check for PBS-based backends

d08528f

Allow setting the database file from an environment variable

ddb8ad7

Add CLI and config option to control the table format

b1206ad

More compact pretty table output

db5b4be

Add profiler calls to SQlite storage backend

91c87c2

Optimize decoding of JSON blobs

34969e5

teojgo requested changes Aug 19, 2024

View reviewed changes

docs/manpage.rst Outdated Show resolved Hide resolved

vkarak added 2 commits August 22, 2024 17:44

Create DB index for testcases table

5aa3da5

Support filtering by testcase query options

dd37f86

vkarak force-pushed the feat/compare-past-results branch from c2a35ce to dd37f86 Compare August 23, 2024 08:45

vkarak and others added 8 commits August 23, 2024 12:12

Add unit tests for test filtering in testcase query options

e0f4320

Do not print "ReFrame setup" header with --performance-compare option

159894d

Do not store session if dry run

c4f6fc1

Update docs and tutorial

f2d3223

Fix listing formatting in tutorial

e2c7cb5

Support for using always unqualified hostnames in local scheduler

d85690c

Make perf. reference selectable as a column

4088f2b

Apply suggestions from code review

73af49f

Co-authored-by: Theofilos Manitaras <manitaras@cscs.ch>

teojgo approved these changes Aug 28, 2024

View reviewed changes

vkarak added 3 commits August 28, 2024 14:12

Print info line about results DB file

d2612e6

Fix coding style

75763c9

Address remaining PR comments

2572d1f

vkarak merged commit 320e69d into reframe-hpc:develop Aug 28, 2024
23 checks passed

vkarak deleted the feat/compare-past-results branch August 28, 2024 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Introduce permanent result storage and support performance results comparisons between runs #3227

[feat] Introduce permanent result storage and support performance results comparisons between runs #3227

vkarak commented Jul 1, 2024 •

edited

Loading

pep8speaks commented Jul 1, 2024 •

edited

Loading

ekouts commented Jul 3, 2024 •

edited

Loading

ekouts left a comment •

edited

Loading

vkarak commented Jul 8, 2024 •

edited

Loading

ekouts commented Jul 9, 2024

vkarak commented Jul 9, 2024

vkarak commented Jul 10, 2024

vkarak commented Jul 11, 2024 •

edited

Loading

ekouts left a comment

[feat] Introduce permanent result storage and support performance results comparisons between runs #3227

[feat] Introduce permanent result storage and support performance results comparisons between runs #3227

Conversation

vkarak commented Jul 1, 2024 • edited Loading

New functionality

Examples

Key design concepts

New options for querying session and test case data:

Implementation details

Code organization

Other changes

Todos

Open issues to discuss

pep8speaks commented Jul 1, 2024 • edited Loading

Comment last updated at 2024-08-28 13:30:22 UTC

ekouts commented Jul 3, 2024 • edited Loading

ekouts left a comment • edited Loading

Choose a reason for hiding this comment

vkarak commented Jul 8, 2024 • edited Loading

ekouts commented Jul 9, 2024

vkarak commented Jul 9, 2024

vkarak commented Jul 10, 2024

vkarak commented Jul 11, 2024 • edited Loading

ekouts left a comment

Choose a reason for hiding this comment

vkarak commented Jul 1, 2024 •

edited

Loading

pep8speaks commented Jul 1, 2024 •

edited

Loading

ekouts commented Jul 3, 2024 •

edited

Loading

ekouts left a comment •

edited

Loading

vkarak commented Jul 8, 2024 •

edited

Loading

vkarak commented Jul 11, 2024 •

edited

Loading