Skip to content

Commit

Permalink
TrickOps: Update send_hs.py to work with latest Trick output (#1810)
Browse files Browse the repository at this point in the history
* Update send_hs.py to work with the new fields Trick's output produces
* Add unit tests and doctests for send_hs module
* Update TrickOps documentation to include info on send_hs usage
* Remove MonteCarloGenerationhelper.py's unecessary dependency on send_hs

Unrelated:

* Update pip when testing TrickOps per pip best practices

Closes #1807

Co-authored-by: Dan Jordan <daniel.d.jordan@nasa.gov>
  • Loading branch information
ddj116 and Dan Jordan authored Dec 12, 2024
1 parent 1330c42 commit 16828a9
Show file tree
Hide file tree
Showing 11 changed files with 574 additions and 58 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/trickops.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
- name: create virtual environment
run: |
cd share/trick/trickops/
python3 -m venv .venv && source .venv/bin/activate && pip3 install -r requirements.txt
python3 -m venv .venv && source .venv/bin/activate && pip3 install --upgrade pip && pip3 install -r requirements.txt
- name: get and build koviz
run: |
cd /tmp/ && wget -q https://github.com/nasa/koviz/archive/refs/heads/master.zip && unzip master.zip
Expand Down
51 changes: 51 additions & 0 deletions docs/documentation/miscellaneous_trick_tools/TrickOps.md
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,57 @@ if ret == 0: # Successful generation
Note that the number of runs to-be-generated is configured somewhere in the `input.py` code and this module cannot robustly know that information for any particular use-case. This is why `monte_dir` is a required input to several functions - this directory is processed by the module to understand how many runs were generated.


## `send_hs` - TrickOps Helper Class for Parsing Simulation Diagnostics

Each Trick simulation run directory contains a number of Trick-generated metadata files, one of which is the `send_hs` file which represents all output during the simulation run sent through the Trick "(h)ealth and (s)tatus" messaging system. At the end of the `send_hs` message, internal diagnostic information is printed by Trick that looks a lot like this:

```
|L 0|2024/11/21,15:54:20|myworkstation| |T 0|68.955000|
REALTIME SHUTDOWN STATS:
ACTUAL INIT TIME: 42.606
ACTUAL ELAPSED TIME: 55.551
|L 0|2024/11/21,15:54:20|myworkstation| |T 0|68.955000|
SIMULATION TERMINATED IN
PROCESS: 0
ROUTINE: Executive_loop_single_thread.cpp:98
DIAGNOSTIC: Reached termination time
SIMULATION START TIME: 0.000
SIMULATION STOP TIME: 68.955
SIMULATION ELAPSED TIME: 68.955
USER CPU TIME USED: 55.690
SYSTEM CPU TIME USED: 0.935
SIMULATION / CPU TIME: 1.218
INITIALIZATION USER CPU TIME: 42.783
INITIALIZATION SYSTEM CPU TIME: 0.901
SIMULATION RAM USAGE: 1198.867MB
(External program RAM usage not included!)
VOLUNTARY CONTEXT SWITCHES (INIT): 792
INVOLUNTARY CONTEXT SWITCHES (INIT): 187
VOLUNTARY CONTEXT SWITCHES (RUN): 97
INVOLUNTARY CONTEXT SWITCHES (RUN): 14
```

The information provided here is a summary of how long the simulation ran, in both wall-clock time, and cpu time, both for initialization and run-time, and other useful metrics like how much peak RAM was used during run-time execution. Tracking this information can be useful for Trick-using groups so TrickOps provides a utility class for parsing this data. Here's an example of how you might use the `send_hs` module:

```python
import send_hs # Import the module
# Instantiate a send_hs instance, reading the given send_hs file
shs = send_hs.send_hs("path/to/SIM_A/RUN_01/send_hs")
start_time = shs.get('SIMULATION START TIME') # Get the value of sim start time
stop_time = shs.get('SIMULATION STOP TIME') # Get the value of sim stop time
realtime_ratio = shs.get('SIMULATION / CPU TIME') # Get the realtime ratio (how fast the sim ran)
# Instead of getting diagnostics individually, you can ask for the full dictionary
diagnostics = shs.get_diagnostics()
# Print the RAM usage from the dictionary
print(diagnostics['SIMULATION RAM USAGE'])
```

Plotting this data for regression scenarios can be quite useful - a lot of groups would want to know if the sim slowed down significantly and if so, when in the history this occurred. If you are already invested in Jenkins CI, you might be interested in using the `send_hs` module in conjunction with the [Jenkins plot plugin](https://plugins.jenkins.io/plot/). Here's an example of what tracking sim realtime ratio over time looks like in a workflow with about 15 regression scenarios all shown on a single plot:

![ExampleWorkflow In Action](images/sim_speed_example.png)

## More Information

A lot of time was spent adding `python` docstrings to the modules in the `trickops/` directory and tests under the `trickops/tests/`. This README does not cover all functionality, so please see the in-code documentation and unit tests for more detailed information on the framework capabilities.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion share/trick/trickops/MonteCarloGenerationHelper.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@

import sys, os

import send_hs
import argparse, glob

import subprocess, errno
Expand Down
297 changes: 242 additions & 55 deletions share/trick/trickops/send_hs.py
Original file line number Diff line number Diff line change
@@ -1,67 +1,254 @@
import re, os
import pdb

# This global is the result of hours of frustration and debugging. This is only used by doctest
# but appears to be the only solution to the problem of __file__ not being an absolute path in
# some cases for some python versions and how that interacts with this class's os.chdir() when its
# base class initializes. If you attempt to define this_trick locally in the doctest block,
# which was my original attempt, you will find that the value of this_trick is different inside
# the __init__ doctest evaluation compared to any other member function. I believe this is only
# the case when using python version < 3.9, according to the information found here:
# https://note.nkmk.me/en/python-script-file-path/
# I do not like adding globals to "production code" just to facilitate a testing mechanism, but
# I don't know of any cleaner way way to do this. -Jordan 12/2024
this_trick = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)), '../../..'))
class send_hs(object):
"""
Reads a file containing the send_hs output and returns a send_hs
object containing the values from that output
Utility class for parsing simulation diagnostic data at the end of a
Trick-generated send_hs output file.
"""
def __init__(self, hs_file):
def __init__(self, hs_file=None):
"""
Initialize this instance.
>>> sh = send_hs(hs_file=os.path.join(this_trick,"share/trick/trickops/tests/send_hs.nominal"))
Parameters
----------
hs_file : str
Path to the send_hs output to read and parse
"""
self.hs_file = hs_file
self.actual_init_time = None
self.actual_elapsed_time = None
self.start_time = None
self.stop_time = None
self.elapsed_time = None
self.actual_cpu_time_used = None
self.sim_cpu_time = None
self.init_cpu_time = None
self.parse()
def parse(self):
f = open(self.hs_file, 'r')
lines = f.readlines()
for line in lines:
self.actual_init_time = self.attempt_hs_match('ACTUAL INIT TIME',self.actual_init_time, line)
self.actual_elapsed_time = self.attempt_hs_match('ACTUAL ELAPSED TIME',self.actual_elapsed_time, line)
self.start_time = self.attempt_hs_match('SIMULATION START TIME',self.start_time, line)
self.stop_time = self.attempt_hs_match('SIMULATION STOP TIME',self.stop_time, line)
self.elapsed_time = self.attempt_hs_match('SIMULATION ELAPSED TIME',self.elapsed_time, line)
self.actual_cpu_time_used = self.attempt_hs_match('ACTUAL CPU TIME USED',self.actual_cpu_time_used, line)
self.sim_cpu_time = self.attempt_hs_match('SIMULATION / CPU TIME',self.sim_cpu_time, line)
self.init_cpu_time = self.attempt_hs_match('INITIALIZATION CPU TIME',self.init_cpu_time, line)
# TODO add capture of blade and DIAGNOSTIC: Reached termination time as success criteria

def attempt_hs_match(self, name, var, text):
self.num_lines = None
self._diagnostics = {} # Internal dict of diagnostic keys and values
self._missing_diagnostics = [] # List of diagnostics we failed to find
for k in self.keys():
self._diagnostics[k] = None

if hs_file:
self.parse()

def is_valid(self):
"""
Check for validity of the parsed send_hs file. If any expected internal members
were not able to be parsed, store them in missing_diagnostics and return False.
If everything was found, return True
>>> sh = send_hs(hs_file=os.path.join(this_trick,"share/trick/trickops/tests/send_hs.nominal"))
>>> sh.is_valid()
True
Returns
-------
True if all expected internal members were parsed. False if any member wasn't found.
"""
self._missing_diagnostics = [] # Reset the internal list
for d in self._diagnostics:
if self._diagnostics[d] is None:
self._missing_diagnostics.append(d)
if self._missing_diagnostics:
return False
else:
return True

def missing_diagnostics(self):
'''
Check for validity and return a list of any missing diagnostics that we were
unable to find in the send_hs output
>>> sh = send_hs(hs_file=os.path.join(this_trick,"share/trick/trickops/tests/send_hs.nominal"))
>>> sh.missing_diagnostics()
[]
Returns
-------
list
All diagnotics that were unable to be parsed
'''
self.is_valid()
return self._missing_diagnostics

def parse(self, hs_file=None):
'''
Parse the content of self.hs_file and assign internal variables for each field found
if hs_file is given, overwrite self.hs_file with it's value
>>> sh = send_hs()
>>> sh.parse(os.path.join(this_trick,"share/trick/trickops/tests/send_hs.nominal"))
>>> sh2 = send_hs()
>>> sh2.parse()
Traceback (most recent call last):
...
RuntimeError: send_hs file: 'None' cannot be read. You must provide a valid send_hs output file.
Parameters
----------
hs_file : str
Path to hs_file to parse. If None, self.hs_file is used instead.
Raises
------
RuntimeError
If no send_hs file was provided at construction or when calling this function
'''
if hs_file:
self.hs_file = hs_file

if not self.hs_file or not os.path.isfile(self.hs_file):
raise RuntimeError(f"send_hs file: '{self.hs_file}' cannot be read."
" You must provide a valid send_hs output file.")

self.num_lines = 0
with open(self.hs_file) as fp:
# Efficiency guard - we don't do an re pattern match until we've starting parsing the diagnostic msg
start_reading_diagnostics = False
for line in fp:
self.num_lines += 1
# NOTE this "attempt match" approach is less efficient but it should be
# robust to a future ordering change of the lines from Trick's output.
if start_reading_diagnostics:
for d in self._diagnostics:
if self._diagnostics[d] is None:
self._diagnostics[d] = self._attempt_hs_match(d, line)
# This text precedes the diagnostics output, use it as the trigger to start parsing
if 'REALTIME SHUTDOWN STATS:' in line:
start_reading_diagnostics = True
continue
# Set validity status immediately after parsing
self.is_valid()

def _attempt_hs_match(self, name, text):
"""
name: pattern to match (e.g. SIMULATION START TIME)
var: variable to assign value if match found
text: text to search for pattern
returns: var if not found, found value if found
Parameters
----------
hs_file : str
Path to hs_file to parse. If None, self.hs_file is used instead.
name : str
Pattern to match (e.g. SIMULATION START TIME)
text : str
Text to search for pattern within
Returns
----------
float or None
Value of name if found, else: None
"""
name = name.replace('(', '\(').replace(')', '\)')
m = re.match(name+': +([-]?[0-9]*\.?[0-9]+)', text.strip())
if m:
return(float(m.group(1)))
return(var)

def get(self,name):
"""
Get a value by the name that appears in the send_hs message
"""
if 'ACTUAL INIT TIME' in name:
return self.actual_init_time
if 'ACTUAL ELAPSED TIME' in name:
return self.actual_elapsed_time
if 'SIMULATION START TIME' in name:
return self.start_time
if 'SIMULATION STOP TIME' in name:
return self.stop_time
if 'SIMULATION ELAPSED TIME' in name:
return self.elapsed_time
if 'ACTUAL CPU TIME USED' in name:
return self.actual_cpu_time_used
if 'SIMULATION / CPU TIME' in name:
return self.sim_cpu_time
if 'INITIALIZATION CPU TIME' in name:
return self.init_cpu_time
else:
return None
else:
return(None)

def keys(self) -> list:
"""
Return a list of all possible keys/names that get() can accept This is
the master list of all diagnostics we search for when we parse the send_hs
file, and naming should match 1:1 with the output of a Trick simulation
>>> sh = send_hs(hs_file=os.path.join(this_trick,"share/trick/trickops/tests/send_hs.nominal"))
>>> sh.keys() #doctest: +ELLIPSIS
['ACTUAL INIT TIME', ... 'INVOLUNTARY CONTEXT SWITCHES (RUN)']
Returns
-------
list
All possible diagnostic names that get() can accept
"""
return (['ACTUAL INIT TIME',
'ACTUAL ELAPSED TIME',
'SIMULATION START TIME',
'SIMULATION STOP TIME',
'SIMULATION ELAPSED TIME',
'USER CPU TIME USED',
'SYSTEM CPU TIME USED',
'SIMULATION / CPU TIME',
'INITIALIZATION USER CPU TIME',
'INITIALIZATION SYSTEM CPU TIME',
'SIMULATION RAM USAGE',
'VOLUNTARY CONTEXT SWITCHES (INIT)',
'INVOLUNTARY CONTEXT SWITCHES (INIT)',
'VOLUNTARY CONTEXT SWITCHES (RUN)',
'INVOLUNTARY CONTEXT SWITCHES (RUN)'
])

def get_diagnostic (self, name: str) -> float:
"""
Get a diagnostic value by it's name or partial name
The first name matched in the self._diagnostics dict will be returned
>>> sh = send_hs(hs_file=os.path.join(this_trick,"share/trick/trickops/tests/send_hs.nominal"))
>>> sh.get_diagnostic('ACTUAL INIT TIME')
42.606
>>> sh.get_diagnostic('SIMULATION RAM USAGE')
1198.867
Parameters
----------
name : str
Name or partial name of diagnostic to retrieve
Returns
-------
float
Value of diagnostic name given
Raises
------
LookupError
If name cannot be found in self._diagnostics
"""

for d in self._diagnostics:
if name in d:
return(self._diagnostics[d])
raise LookupError(f"Unable to get diagnostic '{name}'. Is it spelled correctly?")

def get(self, name: str) -> float:
"""
Get a diagnostic value by it's name or partial name. Convienence function that
calls self.get() directly.
>>> sh = send_hs(hs_file=os.path.join(this_trick,"share/trick/trickops/tests/send_hs.nominal"))
>>> sh.get('ACTUAL INIT TIME')
42.606
>>> sh.get('SIMULATION RAM USAGE')
1198.867
Returns
-------
float
Value of diagnostic name given
"""
return self.get_diagnostic(name)

@property
def diagnostics (self) -> dict:
return dict(self._diagnostics)

def get_diagnostics (self) -> dict:
"""
Get all diagnostics stored in internal self._diagnostics dictionary
>>> sh = send_hs(hs_file=os.path.join(this_trick,"share/trick/trickops/tests/send_hs.nominal"))
>>> sh.get_diagnostics() #doctest: +ELLIPSIS
{'ACTUAL INIT TIME': 42.606, 'ACTUAL ELAPSED TIME': 55.551, ... 14.0}
Returns
-------
dict
A copy of the internal self._diagnostics dictionary
"""
return (self.diagnostics)
2 changes: 1 addition & 1 deletion share/trick/trickops/tests/run_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def run_tests(args):

# Run all doc tests by eating our own dogfood
doctest_files = ['TrickWorkflow.py', 'WorkflowCommon.py', 'TrickWorkflowYamlVerifier.py',
'MonteCarloGenerationHelper.py']
'MonteCarloGenerationHelper.py', 'send_hs.py']
wc = WorkflowCommon(this_dir, quiet=True)
jobs = []
log_prepend = '_doctest_log.txt'
Expand Down
5 changes: 5 additions & 0 deletions share/trick/trickops/tests/send_hs.missing_all
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
|L 3|2024/11/21,15:54:06|myworkstation| |T 0|0.000000| TrickParamCopy(UserCodeExample): Failed to resolve parameter 'fsw_out.fswOut_block.fswOut.fsw1HzOut.domain.subdomain.variable'
|L 2|2024/11/21,15:54:06|myworkstation| |T 0|0.000000| TrickParamCopy(UserCodeExample): Mismatched data types for 'fsw_out.fswOut_block.fswOut.fsw1HzOut.domain.subdomain.variable' -> 'another_structure.myvariable'
|L 2|2024/11/21,15:54:07|myworkstation| |T 0|0.000000| Could not find Data Record variable flex.flex_Obj.flexMode.
|L 2|2024/11/21,15:54:07|myworkstation| |T 0|0.000000| Could not find Data Record variable flex.flex_Obj.myarray[0].
|L 2|2024/11/21,15:54:07|myworkstation| |T 0|0.000000| Could not find Data Record variable foo.bar.yippy.skippy.axial_sep_distance.
Loading

0 comments on commit 16828a9

Please sign in to comment.