Releases: geopm/geopm
Releases · geopm/geopm
GEOPM 1.0.0
- Tue Apr 16 2019 Christopher M. Cantalupo christopher.m.cantalupo@intel.com v1.0.0
- Release overview:
- The official 1.0 release of the GEOPM software!
- Primary changes are bug fixes and documentation updates since release candidate 3.
- Updates to integration tests:
- Fix test_runtime_regulator integration test which had improper tolerances for sleep() interface.
- Update some integration tests to print errors when platform read/write fails.
- Updates to unit tests:
- Add more unit tests for launcher affinity.
- Updates to documentation:
- Clean up geopm_pio_c(3) and geopm_topo_c(3) man pages.
- Remove references to Comm man pages that are not installed.
- Add include and linking instructions to geopm_pio.3.ronn.
- Installed header clean up:
- Update PlatformTopo singleton to return const reference.
- Clean up forward declaration in public header.
- Bug fixes:
- Fix tprof API calls when Controller is not present to avoid segmentation fault.
- Fix issue by removing call to EnergyEfficientRegion::update_freq_range().
- Fix issue where FrequencyGovernor was being used but not created by agents above the leaf.
- Fix missing hidden header dependencies.
- Fix OMP_NUM_THREADS calculation when --geopm-hyperthreads-disable option is provided to launcher.
- Fix IOGroup and Agent tutorials to use new Agent interfaces.
- Fix domain for frequency signal/control on some x86 platforms.
GEOPM 1.0.0 Release Candidate 3
- Wed Apr 3 2019 Christopher M. Cantalupo christopher.m.cantalupo@intel.com v1.0.0+rc3
- Modified implementations and interfaces:
- Finalized interfaces for 1.0.0 release.
- Changed class naming scheme to drop "I" prefix from interface base classes and add "Imp" suffix to implementation classes.
- Replaced ascend() and descend() Agent methods with more fine grained interface.
- Modified MSRIOGroup to use JSON to store MSR data.
- Updated utility classes for Agent interface changes.
- Removed use of raw pointers from MSRIOGroup.
- Added Helper function to list files in a directory.
- Renamed split_string() to string_split().
- Removed sort call from table dump since no longer needed.
- Removed samples sent up tree from MonitorAgent.
- Moved "PlatformTopo::m_domain_e" to a C enum "geopm_domain_e" in geopm_topo.h.
- Changed GEOPM_DOMAIN_INVALID to -1 and shifted the all other domains values by one.
- Renamed all references to the PlatformTopo::m_domain_e enum to use geopm_domain_e.
- Removed PlatformIO::num_signal() and PlatformIO::num_control() from public interface.
- Renamed PlatformIO method is_domain_within() to is_nested_domain().
- Moved geopm_region_info_s to geopm.h.
- Renamed Agent::report_node() to report_host().
- Removed ProfileIOGroup from installed headers.
- Renamed CircularBufferImp to CircularBuffer.
- Moved MSRSignal and MSRControl into their own files.
- Moved Imp classes for installed classes to own non-installed header.
- Moved SharedMemory and SharedMemoryUser classes into separate headers.
- Introduced FrequencyGovernor that holds common code for setting frequency.
- Updated EnergyEfficientAgent and FrequencyMapAgent to use FrequencyGovernor.
- Replaced ascend() and descend() methods in all built in agents to use new APIs.
- Removed num_signal_pushed() and num_control_pushed() from public PlatformIO APIs.
- Made tutorial shell scripts compatible with more shell variants.
- Updated features:
- Implemented and documented C wrappers for the PlatformIO class: geopm_pio_c(3).
- Implemented and documented C wrappers for the PlatformTopo class: geopm_topo_c(3).
- Changed implementation to stop sending messages about MPI regions nested inside of network hint regions.
- Added command line option to geopmread(1) and geopmwrite(1) to create topology cache file.
- Added make_unique and make_shared factory methods all installed C++ header classes.
- Added check for RAPL lock bit when using power controls
- Added UNCORE_RATIO_LIMIT MSR support for HSX, BDX, and SKX.
- Added per-region power to Report.
- Enabled MSRIOGroup to extend MSRs through JSON file at runtime located in GEOPM_PLUGIN_PATH.
- Added MSR methods for parsing function and units strings.
- Introduced FrequencyMapAgent which runs regions at specified frequencies.
- Added --enable-beta configure flag which installs beta features with make install target.
- Updated and extended integration tests:
- Ignore failures for missing python packages.
- Added feature to save/restore power limit and frequency between each integration test.
- Updated unit tests:
- Added more unit tests for Helper.
- Fixed AgentFactoryTest.
- Updates to documentation:
- Added documentation on MPI requirements for geopm_prof_c(3) APIs.
- Removed references to endpoint in documentation since this is still a beta feature.
- Added documentation about Agent report/trace extension name conventions.
- Add man page for geopm_pio_c(3) and geopm_topo_c(3).
- Add man page for geopm_agent_frequency_map(7).
- Bug fixes:
- Fixed EnergyEfficientAgent so it actually functions properly.
- Fixed issue with using temporary script in launcher to execute lscpu.
- Fixed missing input parameter checks in PlatformTopo and PlatformIO.
- Fixed Fortran build and missing dependency that could break parallel builds.
GEOPM 1.0.0 Release Candidate 2
- Fri Feb 22 2019 Christopher M. Cantalupo christopher.m.cantalupo@intel.com v1.0.0+rc2
- Modified implementations and interfaces:
- Rename GEOPM_PROFILE_TIMEOUT environment variable to GEOPM_TIMEOUT.
- Modify default behavior when using the geopmlaunch: --geopm-ctl=process --geopm-report=geopm.report.
- Introduce --geopm-disable-ctl CLI option for geopmlaunch to preserve passthrough behavior.
- Remove geopm_prof_init() interface from installed header.
- Fix geopmhash example command line tool.
- Update plugin loading implementation to use C++.
- Refactor IOGroup lookup in PlatformIO.
- Modify analysis power sweep to consider multiple packages.
- Support lscpu versions that omit 0x from hex values.
- Do not install Comm.hpp or MPIComm.hpp.
- Modify time signal to be scoped to the CPU.
- Rename M_UNITS_HZ to M_UNITS_HERTZ
- Add tables module to Python requirements.
- Change MSR names to match names in Intel (R) Software Developers Manual.
- Make end bit of MSR bitfield inclusive.
- Add descriptions for built-in signals and controls.
- Align launcher names and programmatically generate list of supported launchers.
- Modified Agent::validate_policy() interface.
- Add stricter domain checks in TimeIOGroup and CpuinfoIOGroup
- Fix configuration and build issues with ompt.
- Disable python unit testing in RPM check target.
- Remove uninstalled files from spec file.
- Updated features:
- Update tracer to enable user specified column signals to also specify domain.
- Update reporter to enable user specified signals and domains.
- Add REGION_HASH and REGION_HINT signals.
- Remove all references to the region_id from public interfaces.
- Add domain aggregation for read_signal and write_control.
- Add TEMPERATURE as default trace column.
- Add split_string() helper function.
- Install geopm_hash.h and add man page.
- Add helper function to replace gethostname().
- Improve trace column header names for PowerBalancerAgent.
- Modify how epoch totals are calculated.
- Updated and extended integration tests:
- Fix fence-post problem in test_trace_runtimes.
- Skip EnergyEfficientAgent integration test on non-BDX platforms.
- Updated unit tests:
- Fix timing issue with PowerGovernorAgentTest.wait test.
- Fix geopmagent CLI test.
- Clean up PlatformIOTest.
- Update to googletest v1.8.1.
- Optimize Travis CI build.
- Updates to documentation:
- Update man pages to reflect environment extension of report and trace.
- Update man pages for Agg, CircularBuffer, IOGroup, Exception, Helper, RegionAggregator, SharedMemory, PluginFactory, MSR, MSRIO, and MSRIOGroup classes.
- Update geopm_region_id_c.3 man page.
- Update geopm_sched.3.ronn.
- Clean up geopmlaunch man page.
- Update man pages for IOGroups
- Add tutorial about plugin loading order.
- Add missing links to geopm(7) man page.
- Update copyright date to 2019.
- Use BLURB in geopm.7 man page.
- Sync spec file for OpenHPC with the one published with OpenHPC.
- Change die.net links to man7.org
- Bug fixes:
- Fix all timeouts for usages of SharedMemoryUser to reflect geopm_env_profile_timeout().
- Fix energy status units for DRAM on Haswell and Broadwell.
- Fix energy reporting on multi-socket systems.
- Fix issue when application calls MPI_Init_thread() to increase thread level to match GEOPM requirements.
- Fix broken build when configured with --enable-overhead.
- Fix issues detected with clang.
- Fix launcher args for IMPI.
- Fix throw in Tracer when reading hash and hint which are allowed to be zero.
GEOPM 1.0.0 Release Candidate 1
- Fri Dec 21 2018 Christopher M. Cantalupo christopher.m.cantalupo@intel.com v1.0.0-rc1
- Release overview:
- This is the first candidate for the v1.0.0 release of the GEOPM package.
- The version 1.0 is significant in that semantic versioning https://semver.org/ is intended for all subsequent releases.
- The APIs defined by all installed header files and the documented behavior of those interfaces shall remain compatible with linking applications until version 2.0.
- The documented definition for all built in signals and controls supported by PlatformIO is not intended to change prior to version 2.0.
- Expected changes prior to v1.0.0 release:
- The documentation included in this release candidate will be improved upon prior to the actual v1.0.0 release.
- Man pages which currently link to doxygen will be filled in.
- The definition of the high order bits in the REGION_ID# signal supported by PlatformIO may be changed in the way documented in the PlatformIO(3) man page to split into two signals (REGION_ID AND REGION_HINT).
- It is possible that interface classes currently prefixed with "I" may be renamed to exclude the "I" (e.g. IPlatformIO -> PlatformIO).
- In this case the concrete implementation would be appended with "Imp" (e.g. PlatformIO -> PlatformIOImp).
- The appearance of the epoch signal in the REGION_ID column of the trace will be removed.
- The EPOCH_COUNT signal will be added to the default set of traced signals to enable tracking of epoch calls.
- High level summary of changes since v0.6.1:
- With this release we have removed all references to the Policy, Decider, Platform and PlatformImp objects.
- These have been replaced by the PlatformIO / IOGroup / Agent class interactions.
- The Kontroller object which was supporting the new code path has been renamed Controller.
- The legacy Controller implementation has been removed.
- GEOPM no longer depends on the hwloc library, and is relying on running lscpu on compute node instead.
- Modified implementations and interfaces:
- Rename launcher to geopmlaunch.
- Do not install geopmanalysis and geopmplotter command line utilities.
- The command line interfaces for these tools will be changing.
- Once they are committed, we will begin installing them again.
- Remove unused error codes from geopm_error.h.
- Remove some deprecated interfaces and files.
- Remove legacy artifacts from Reporter and Tracer.
- Remove legacy structures from geopm_message.h.
- Remove deprecated API headers.
- Remove CtlConf Python object.
- Remove region ID memory from derivative for power signals, this is a feature for agent to implement.
- Remove unused arguments from the geopmctl_main.
- Remove push_combined_signal() from PlatformIO interface.
- Remove NAN check for policy in Controller. Agents are responsible for handling NAN.
- Remove IPlatformTopo::define_cpu_group(). This method is not implemented and not used.
- Remove MPI bit from region ID in report.
- Remove install of geopm_message.h and geopm_plugin.h.
- Remove environment variables for min/max frequency used by EnergyEfficientAgent: this functionality is provided through the policy as documented.
- Fixes for online mode of EnergyEfficientAgent: ignore 0.0 when sampling runtime, fix min/max frequency range in analysis.py, fix final requested frequency printed in report.
- EnergyEfficientAgent no longer considers DRAM energy in its optimization.
- Change default frequency for hints from min to max in EnergyEfficientAgent.
- Implement EnergyEfficientAgent analysis using hints only.
- Change meaning of EPOCH_RUNTIME signal: MPI and ignore time reported explicitly and a separately.
- Install many C++ headers into /usr/include/geopm.
- Move geopmbench source files files from tutorial directory into src.
- Don't copy any files from src into tutorials.
- Update tutorials to use Agent code path.
- Throw if multiple hints given to geopm_prof_region.
- Allow writing controls for containing domains: the same value will be written to every subdomain.
- Update EpochRuntimeRegulator accounting: PKG and DRAM energy dissociated from rank.
- Updated to report pre-epoch MPI and ignore runtime.
- Make TreeComm fan out configurable with environment variable.
- Per thread progress is supported by the 'REGION_THREAD_PROGRESS' signal.
- Align command line options to the launcher and the environment variables used by the controller.
- Merge tutorial Makefiles into one and remove duplicate scripts.
- Rename runtime related APIs.
- Merge ProfileIO into ProfileIOSample.
- Refactor analysis.py command line parsing to use argparse, etc.
- Move some header includes from headers into source files when possible.
- Change "POWER_PACKAGE" control name to "POWER_PACKAGE_LIMIT".
- Expose MSR PKG_POWER_LIMIT fields as signals.
- Reorder directory search in plugin load: load plugins from right to left to so leftmost plugin wins in case of IOGroup loading same name for controls and signals.
- Use accumulator member in EpochRuntimeRegulator for MPI runtime.
- Changes to the launcher for mpiexec using in hydra
- Move set_policy_defaults to Agent interface
- Aggregation functions have been moved out of PlatformIO and into their own class: Agg.
- Implement agg_function for IOGroups, including tutorial.
- Do not stop integration test in looper if one test fails.
- Increase shmem table size to 2MB per rank to reduce risk of overflow.
- Remove hash table structure in ProfileTable; all regions now use the same table entry.
- Change CpuinfoIOGroup to throw in constructor if cpuinfo could not be parsed.
- In python analysis do not parse traces if total size is more than half of memory.
- Remove redundant HDF5 cache from analysis.py.
- Remove TURBO_RATIO_LIMIT2 control for platforms where it is not in whitelist.
- Read multiple samples for a short time in geopmread to support POWER signals.
- Narrow scope of warning message about cpufreq governor: only print warning when an attempt is made to write to a control that begins with POWER or FREQUENCY.
- Prevent MSRIOGroup from throwing when saving MSRs.
- Implement and use AgentConf in python code to create agent polices.
- Updated features:
- Add timestamp counter to available signals.
- Add --info option to geopmread and geopmwrite.
- Add check for invalid GEOPM_CTL values.
- Add temperature signals.
- Add Imbalancer interface to libgeopm and libgeopmpolicy: Imbalancer_() -> geopm_imbalancer_().
- Add some placeholder descriptions to MSRIOGroup and TimeIOGroup to support integration tests.
- Add methods to RegionAggregator to get region IDs and signals.
- Add methods to PlatformIO to provide signal/control descriptions: this will be used to augment geopmread/write with descriptions.
- Add description APIs for IOGroup: allows IOGroups to provide a user-friendly description of signals/controls.
- Add GEOPM_TIME_REF constant for use with geopm_time_*() APIs.
- Add INSTRUCTIONS_RETIRED alias signal.
- Add TIMESTAMP_COUNTER alias for MSRIOGroup.
- Add signal to enable reading of the RAPL lock bit.
- Add PKG_POWER_LIMIT MSR fields as a signal.
- Add expect_same aggregation function that returns NAN if any elements of the vector differ.
- Add average node frequency to EnergyEfficientAgent tree samples.
- Add support for POWER_* as signals that give meaningful results without runtime.
- Add module conflict of darshan to theta module file.
- Add psutils python dependency.
- Add warnings for system misconfiguration.
- Add read_file() to Helper.hpp.
- Add job start in Trace and Report headers.
- Add outlier detector script.
- Add handling of NAN for default policy values to all agents.
- Add parsing for overhead fields to io.py.
- Add reading of the thread table through PlatformIO.
- Updated and extended integration tests:
- Ignore misconfigured system warnings in integration test.
- Remove ignore of multiple plugin load warnings that stopped occurring after removal of legacy code.
- Do not test epoch runtime in test_region_runtimes.
- Add all2all to power_balancer integration test.
- Adjust power_balancer test logic to compare Governor and Balancer relatively.
- Fix EnergyEfficientAgent integration test.
- Test decorators implemented to use launcher. This forces the checks to be run on the compute nodes.
- Update integration tests to reflect removal of legacy code path.
- Update test_power_consumption to use PowerGovernor.
- Fix integration test to exclude MPI and model-init regions from tests using traces.
- Fix integration test to use assertNear to account for new MPI region markup.
- Move GEOPM_EXEC_WRAPPER functionality into integration test.
- Updated unit tests:
- Add tests of domain aggregation for pushed signals.
- Add test for geopmread signal aggregation.
- Stop the unit tests from littering files.
- Fixed signed / unsigned comparison issue in PlatformIO test.
- Update unit tests to reflect removal of legacy code path.
- Add test of IOGroup factory that checks that an IOGroup's list of signal/control names are all valid.
- Updates to documentation:
- Update GEOPM main README.
- Add doxygen target for public interface files.
- Add man pages for all C++ headers that are now installed to support plugin development.
- Full man pages have been added for PluginFactory, PlatformIO, PlatformTopo, Agent, and IOGroup.
- Add documentation about aliasing signals and controls.
- Update launcher ronn to include references to env vars.
- Add README for outlier_detection.
- Update the tutorial README.md to reference geopmbench and point out the agent and iogroup subdirectories.
- Document how to build GEOPM with Intel Toolchain.
- Fix e...
GEOPM 0.6.1
- Mon Oct 29 2018 Christopher M. Cantalupo christopher.m.cantalupo@intel.com v0.6.1
- Hotfix for v0.6.0 release.
- Fix MPI functions called during startup getting assigned region 0.
- Fix missing profiling of some MPI functions when called from fortran.
- Fix performance regression due to attempt to profile non-blocking MPI calls.
- Fix to remove unsupported MSR from skylake platform definition (TURBO_RATIO_LIMIT2).
- Fix to prevent throw when trying to save/restore MSRs that are not supported on the system.
GEOPM 0.6.0
- Tue Oct 02 2018 Christopher M. Cantalupo christopher.m.cantalupo@intel.com v0.6.0
- Stabilized Agent code path.
- Last release with Decider/Platform/PlatformImp support.
- Modified implementations and interfaces:
- Modify PowerGovernor to ignore DRAM power and tune parameters for power balancer.
- Profile larger set of MPI functions including non-blocking routines.
- Removed push_region_signal_total() and sample_region_total() from PlatformIO.
- This functionality is available to Agents by creating an instance of RegionAggregator.
- Redesigned geopmanalysis command line interface so that the first argument selects the analysis type.
- Add options to geopmanalysis for min and max frequency for frequency sweep analysis types.
- Remove geopmanalysis --level option and replace with --summary and --plot.
- This allows summaries and/or plots to be generated separately.
- Add option to use agent code path to geopmanalysis (use_agent).
- Change EnergyEfficientAgent frequency map to use JSON format.
- Introducing GEOPM_EXEC_WRAPPER environment variable useful for inserting a debugger into the integration tests.
- Reuse same idx val for repeated pushes of signals/controls.
- Cat lscpu output to /tmp prior to running job and avoid popen call inside of MPI app.
- Change PowerGovernorAgent::wait() to use time instead of RAPL updates.
- Get rid of C-string from ProfileTable implementation.
- Add max_level() to TreeComm.
- Introducing the PowerGovernor class.
- Introducing Agent::aggregate_sample() static helper function for Agents.
- Add agent field to io.py dataframe index. Note: this will break compatibility with scripts that use the old index.
- Rename RAPL related MSR names: SOFT_POWER_LIMIT to PL1_POWER_LIMIT and HARD_POWER_LIMIT to PL2_POWER_LIMIT.
- Add geopm_time_since() method.
- Update the analysis.py energy references.
- Add RegionAggregator class for per-region signal totals.
- Update Reporter to use RegionAggregator.
- Changed region counts to start at -1 before first entry.
- Get rid of unused and undocumented environment variable GEOPM_REPORT_VERBOSITY.
- Modify launcher to set LD_PRELOAD only for application.
- Change some AppOutput methods to return pandas Dataframes instead of Report/Region objects.
- Add barrier in MPI_Init prior to GEOPM startup.
- Have RootRole throw if bad power cap is set.
- Updated features:
- Introducing the new PowerBalancer agent with many commits since v0.5.1 that tweak the algorithm.
- Ignore epoch calls when made inside of a region marked with the ignore hint.
- Add MSRIOGroup signals that return the raw value of an MSR.
- Use slurm option to select the performance power governor when using GEOPM.
- Add a spec file for building GEOPM for ALCF Theta.
- Add profile name and agent to trace header.
- Add CYCLES_THREAD and CYCLES_REFERENCE to trace.
- Add Agent support in python scripts.
- Add CORAL 2 version of AMG to examples.
- Update markup for miniFE example to set region ID once per region.
- Update nekbone patches for scaling studies.
- Suppress OMP warnings in launcher when using Intel toolchain.
- Add PowerSweepAnalysis type to geopmanalysis.
- Add BalancerAnalysis type to geopmanalysis.
- Add NodeEfficiencyAnalysis type to geopmanalysis.
- Add NodePowerAnalysis type to geopmanalysis.
- Introduce a plotter method to generate histograms.
- Have ManagerIO skip policy file parsing if agent has no policies.
- Add HDF5 caching for parsed reports and traces to io.py.
- Add summary features to analysis where summarized data is written to files in ascii tables.
- Updated and extended integration tests:
- Updates to integration tests to support the Agent / PlatformIO code path are a major feature of this release.
- Adding back integration test for power balancer with increased time limit.
- Automatically infer architecture based on hostname.
- Add monitor as available agent to run integration tests.
- Use regular runtime for epoch in test_region_runtimes.
- Require balancer test to run in an allocation.
- Checks average power limit across nodes is under cap in test_power_balancer.
- Add integration test that runs GEOPM, but does not generate reports.
- Updates to documentation:
- Add documentation to the README about the scaling_governor.
- Add documentation of constructor attribute for plugins to geopm(7) man page.
- Add documentation for hint ignore interaction with geopm_prof_epoch().
- Add documentation for all of the supported region hints.
- Remove documentation about node barrier enforced by epoch call, this is no longer true.
- Remove reference to MPIEXEC from spec file.
- Add missing launcher options to help text.
- Updated unit tests:
- Add PowerBalancer unit tests.
- Add PowerBalancerAgent unit tests.
- Add analysis.py unit tests.
- Add more detailed checks of TreeComm calls to KontrollerTest.
- Add tests of geopmanalysis CLI.
- Fix tests for ControlMessage.
- Bug fixes:
- Fix catch-value warning from GCC 8.
- Fix possible C string truncation.
- Fix for null characters sometimes appearing in report header.
- Fix string sizing for strncpy and snprintf for gnu8.
- Fix null termination in case of string overflow.
- Fix in PowerGovernorAgent where fan_in could be accessed out of bounds.
- Fix Kontroller index into Agent array; the level 0 Agent should not do descend() or ascend().
- Fix issue where second region runtime is longer than first: move region exit barrier after call to sample.
- Fix geopmagent so it can create empty json files.
- Fix launcher to handle --cpu-bind as well as --cpu_bind.
- Fix failure to restore fixed counter MSRs at end of GEOPM runtime.
- Fix epoch region ID detection in io.py.
- Fix for test_trace_runtimes with agent code path.
- Fix performance issue: if power will be controlled, adjust one CPU per package.
- Fix EnergyEfficientAgent init().
- Fix issue where geopm would try to restore MSR MISC_ENABLE which is read only.
- Fix test_power_consumption to measure socket power only.
- Fix order of MSR save / agent init() to avoid failure to restore time window setting.
- Fix --enable-overhead configure option
- Fix pthread launch for Agent code path.
- Fix Fortran comm initialization.
- Fix handling of bad OMP masks.
- Fix for klocwork error: missing null check.
- Fix pthread launch when using MPICH by enabling MPI_THREAD_MULTIPLE in environment.
- Fix pthread launch issue in Cray Linux by using secure versions of the CPU_SET macros.
- Fix hang when runtime is active but report has not been requested.
- Fix python scripts to support old data missing separate dram energy in report.
- Fix python scripts to handle new agent field in parsed header.
- Fix race in ControlMessage that could cause hang at GEOPM runtime start up.
- Fix for ompt region names in Reporter.
- Fix issue where slack was calculated prior to adding in extra power in PowerBalancingAgent.
GEOPM 0.5.1
- Sat Jun 23 2018 Brad Geltz brad.geltz@intel.com v0.5.1
GEOPM beta hotfix release!
- Introduce the PowerGovernorAgent. This agent is implemented and fully featured.
- Restoring the MSR values at the end of a run is now best effort since the system whitelist may prevent the write from being allowed.
- Allow min/max frequencies to be specified in the EnergyEfficientAgent's policy.
- Fix geopmread usages for tutorial.
- Fix MSR overflow logic, performance counter initialization, and MSR encode/decode functions.
- Fix integration tests for geopmwrite use cases.
GEOPM 0.5.0
- Wed May 30 2018 Christopher M. Cantalupo christopher.m.cantalupo@intel.com v0.5.0
GEOPM beta release!
-
Community updates:
- New landing page https://geopm.github.io
- New Slack channel https://geopm.slack.com
- New Code of Conduct
- New pull request template
- Contributing instructions updated with details of gerrit review process.
-
Modified implementations and interfaces:
- Major refactor of the controller and plugin architecture is provided as an optional new code path.
- Most of the changes made to the implementation for this release modify the new code path.
- The old code path is still available for users as long as the controller is run without the GEOPM_AGENT environment variable set.
- The new code path will be active if the user selects an agent by name with the GEOPM_AGENT environment variable when launching the controller.
- The old code path is maintained in the current Controller object along with the the Decider / Platform / PlatformImp plugins.
- The new code path is maintained in a replacement for the Controller which has been temporarily named the Kontroller.
- The Kontroller will be renamed the Controller after this release, and the old code path will no longer be available.
- Similar to the Kontroller/Controller replacement, the KprofileIOGroup KprofileIOSample and KruntimeRegulator are temporary replacements for their non-K counterparts and will be renamed.
- The beta release enables a new set of plugin interfaces named the IOGroup, Agent, and Comm.
- It is through the IOGroup, Agent and Comm plugins that the GEOPM runtime can be extended.
- The Decider / Platform / PlatformImp plugin extensions are deprecated and will be removed after this release.
- The IOGroup plugin enables a user to add new signal and control mechanisms for an Agent to read and write.
- The Agent plugin enables a user to add new monitor and control algorithms to the GEOPM runtime.
- MPI use by the GEOPM runtime which is not linked by application has been completely encapsulated in the Comm object.
- The tutorial has been extended with two new directories: tutorial/agent and tutorial/iogroup.
- The tutorial/iogroup directory documents how to write an IOGroup plugin.
- The tutorial/agent directory documents how to write an Agent plugin.
- The interface to the resource manager has been made much more flexible for supporting the new Agent interfaces.
- The resource manager interface is documented in the geopm_agent_c(3) and geopm_endpoint_c(3) man pages.
- Additionally command line tools have been proposed and partially implemented to support the interfaces documented in those man pages.
- The geopm_agent_c(3) APIs and geopmagent(1) CLI has software support.
- The endpoint interfaces are a work in progress that has not yet been integrated into the mainline source.
- The PlatformIO object provides the interface to the IOGroups.
- The PlatformIO C++ object will soon have an associated C interface documented as geopm_platformio_c(3).
- The geopmread and geopmwrite provide a CLI to the PlatformIO features.
- Introducing the MSRIOGroup which provides an implementation of the IOGroup for MSRs.
- Introducing the TimeIOGroup which provides an IOGroup for the time signal.
- Introducing the CpuinfoIOGroup which provides data from /proc/cpuinfo as signals.
- Introducing the ProfileIOGroup which provides profile data collected from the main compute application through the geopm_prof_c(3) APIs.
- The release includes three new installed binaries: geopmread, geopmwrite, and geopmagent.
- Each of these command line interfaces is documented with a man page and there is a man page for a future command line tool called geopmendpoint.
- Deprecated geopm_policy_() interfaces that have been replaced with the geopm_agent_() and geopm_endpoint_*() APIs.
- Introducing the first three Agent implementations: MonitorAgent, PowerBalancerAgent, and EnergyEfficientAgent.
- Introducing PlatformTopo, replacement for PlatformTopology.
- Introducing DefaultProfile singleton which supports geopm_prof_c(3) APIs for profiling.
- Added documentation for monitor, energy_efficient, and power_balancer Agents, but the implementation is not currently aligned.
- The monitor agent is implemented and fully featured.
- The energy_efficient agent will soon be extended to match the man page, and currently use of the network is not enabled.
- The existing implementation of the energy_efficient agent does currently provide similar functionality to the efficient_freq Decider.
- The power_balancer agent is a work in progress that is not well aligned with the man page, but will be feature complete soon.
- Reports and traces generated by Agent code path are designed to be backward compatible with reports and traces generated with the Decider code path.
- New environment variables documented in geopm(7): GEOPM_ENDPOINT, GEOPM_AGENT, GEOPM_TRACE_SIGNALS, and GEOPM_DISABLE_HYPERTHREADS.
- Remove GEOPM_ERROR_AFFINITY_IGNORE environment variable, no longer required for testing.
- New plugin registration mechanism has been put in place and new factory has been implemented.
- Replace independent factories with single templated class the PluginFactory.
- No longer register a plugin using a half instantiated object.
- Removed call to dlsym, and plugins now use attribute((constructor)) to specify a callback target used when plugin is loaded.
- In this callback the plugin should register with its respective factory.
- Each plugin type has a make_plugin() static method that creates the plugin object and returns a pointer to the base class.
- The make_plugin() function pointer is what is registered with the factory.
- Extend the PluginFactory to require a the registration of a dictionary (map<string,string>) to enable queries of plugin capabilities.
- Use stricter criterion for selecting plugin files to load, name must be of the form libgeopmpi*.so.0.0.0 where 0.0.0 is the GEOPM ABI version.
- Moved geopm_plugin_description_s definition to geopm.h.
- Add a configure option to enable use of the msr-safe ioctl interface for writing with PlatformIO.
- The msr-safe ioctl interface should not be used for writing unless the system has an msr-safe installation that has fixed LLNL/msr-safe#38.
- Added APIs for manipulating hint bits in region id hash.
- Many changes were made to modernize the use of C++.
- Change protected members of all classes to private where possible.
- Replace all raw pointer usage with C++11 smart pointers if possible.
- Use default keyword for constructors and destructors where appropriate.
- Use delete keyword rather than throw to avoid copy constructor.
- Add override keyword to derived classes.
- Use forward declaration of classes rather than include one header inside of another.
- Add and integrate make_unique implementation for C++11.
- Confirmed const correctness for all class methods.
- Add public interface to register IOGroups with PlatformIO which enables IOGroups to be created at runtime.
- Standardize the IOGroup signal and control names so that they are prefixed by the IOGroup name and two colons.
- Agents should generally use high level aliases rather than these low level signals and controls.
- Introduce functions for converting between signals and bit-fields to allow for PlatformIO to provide full 64 bit integer signals like the region ID.
- Add overflow function type to MSR class.
- Change frequency APIs to use Hz to enforce uniform use of SI units.
- Use instruction offset in OMPT derived region name; this resolves a name ambiguity when more than one OpenMP region is discovered within the same function.
- Use gmock archive uploaded to the geopm organization on github.
- PlatformTopo is built on top of lscpu and does not require hwloc.
- Throw on GlobalPolicy misconfiguration earlier in the runtime execution.
- Rename SimpleFreqDecider to EfficientFreqDecider which will be replaced by EnergyEfficientAgent.
- Update to efficient Decider and Agent related environment variables according to above name changes.
- The json-c library is no longer a dependency, all references have been removed.
- Now using the json11 library which is distributed in the "contrib" sub-directory.
-
Updated features:
- Enable Agent to augment report and trace.
- Enable user to augment trace through environment variable GEOPM_TRACE_SIGNALS in new code path.
- Changes to PlatformIO to support non-CPU domains.
- Added MSR save/restore functionality to PlatformIO save/reset interfaces.
- Allow loading PlatformIO when some IOGroups fail to load.
- Add aggregation functions to PlatformIO to encode how to combine signals.
- Add PlatformTopo methods for converting domain to string and vice-versa.
- Add signal_names() and control_names() to PlatformIO and IOGroup.
- Add Skylake server (SKX) as a supported platform.
- Add Haswell and SandyBridge MSRs to PlatformIO interface.
- OMPT report region names include instruction offset, now two OpenMP regions within the same function can be distinguished.
- Add region runtime as default trace column.
- Simpler column names in trace; print some columns using old names.
- Change region ID to hex in report and trace.
- Order regions in report by runtime.
- Add application total ignore time to report.
- Replace tabs with spaces for report formatting.
- Enable PlatformIO to support Epoch based signals.
- Add power signals to PlatformIO using derivative calculation previously done in Region object.
- Add PlatformIO aliases for region ID, progress, frequency and energy.
- Add CombinedSignal class which is used to combine signals from different IOGroups.
- Allow for a user provided number o...
GEOPM 0.4.0
- Fri Jan 12 2018 Christopher M. Cantalupo christopher.m.cantalupo@intel.com v0.4.0
- Modified implementations and interfaces:
- Updated algorithm for choosing CPU affinity in the launcher: fill application CPUs from back to front, and never share physical cores between MPI ranks.
- Created new abstraction for interfacing with MSRs and more broadly for abstracting hardware IO (PlatformIO, MSRIO, and MSR classes).
- Application region hints are now properly exposed to the decider.
- Added geopmanalysis executable to the geopmpy package; this executable runs applications and performs analysis of power and performance based on GEOPM report and trace data.
- Added geopmbench to the installed binaries; this is simply an installed version of the tutorial_6 executable.
- Added GEOPM_RM environment variable and --geopm-rm command line option to select geopmpy.launcher's back end resource manager.
- Updated man pages to include geopmanalysis and geopmbench.
- Removed handling of SIGCHLD signal in GEOPM runtime (commonly raised in non-error conditions when using popen(3)).
- Launcher will guess correct number of OpenMP threads if user has not specified.
- Added warning message at start up if report and trace files will not be created due to permissions issues.
- Added better error handling to tutorial sources.
- Added support for geopmctl to be run as a different user than application.
- Added support for user provided shmkey's that do not begin with '/'.
- Added error checking in launcher user requests more ranks per node than there are cores per node.
- Added more robust error checking for command line issues in launcher.
- Added command line option to launcher to exclude use of hyperthreads: --geopm-disable-hyperthreads.
- If a plugin fails at registration time, do not bring down the controller; a warning is printed if debug is enabled.
- Remove -s parameter from geopmctl CLI (was being ignored).
- Encapsulated use of MPI by GEOPM inside of a class abstraction (IComm), but controller has not been modified to use the new class due to deadlock bug.
- Encapsulated in a class the handshake interface between the controller and the application across shared memory.
- General clean up of the geompy.plotter implementation.
- Added more error checking in Controller.
- Some fixes for issues exposed by static analysis.
- Updated features:
- Added new decider called "simple_freq" that adjusts CPU frequency to save energy with a small impact to performance; name will likely change to "efficient_freq" in the future.
- Added region runtime reporting to traces and Region objects based on the average execution time of a region by all of the ranks on a node.
- Added a method to the Region object to give access to the telemetry time stamps to the decider.
- Added online learning approach to energy efficient frequency decider.
- Added support to geopmpy.launcher for launching with Intel(R) MPI's mpiexec.
- Added option to plotter to use all samples or just epoch samples.
- Modified the tutorials to enable use of the geopmpy launcher.
- Improved tutorial Makefile to allow user override of GNU Make standard variables.
- Added an RPM spec file for use with the OpenHPC distribution.
- Updated and extended integration tests:
- Moved Controller death test from the unit tests to the integration tests.
- Added integration tests for pthread an application launch of the controller.
- Added an isolated hardware test for RAPL power limit functionality.
- Updated documentation: both man pages and doxygen have been reviewed and cleaned up.
- Updated unit tests:
- Added unit test for SubsetOptionParser.
- Reduced dependence of unit tests on MPI runtime.
- Removed MPIProfileTest unit test which is covered by integration tests, and not really a unit test.
- Removed unused MPIControllerTest.
- Removed MVAPICH2 Fortran tests.
- Bug fixes:
- Fixed broken build in tutorials (tutorial_region.c).
- Fixed faulty argument parsing by the geopmpy launcher.
- Fixed error reporting when using geopmpy with python 3.x.
- Fixed issues with affinity when launching the controller as a pthread.
- Fixed issue in passing power budgets down a multi-level tree.
- Fixed issue in platform choice when head node architecture differs from the compute nodes.
- Fixed broken build if --disable-doc configuration option is passed.
- Fixed decider setup code to correctly propagate power bounds down tree.
- Fixed the way RAPL time window is set.
- Fixed the use of cached data by geopmpy.plotter.
- Fixed integration test issues related to systems with multiple cluster node partitions.
- Fixed process CPU affinity implementation (don't use hwloc) and added unit tests for this.
- Fixed potential overflow issue with error messages in PlatformImp.cpp.
- Fixed race in SharedMemory test.
- Fixed markup patch for MiniFE.
- Fixed launcher when user explicitly requests OMP_NUM_THREADS=1.
- Fixed MPIInterfaceTests so it uses only mocked MPI interfaces, and does not explicitly require MPI.
- Fixed memory leaks in GlobalPolicy.
- Fixed linking order of libgeopm and libmpi.
- Fixed non-performance mode integration test launcher.
- Fixed issue where libgeopmpolicy had false dependence on OMPT.cpp
- Fixed rpm Makefile target to avoid the rpmbuild -t option to avoid trying to use the OpenHPC spec file.
- Fixed issue where platform topology could be determined from nodes other than the ones that run the job.
- Fixed Intel(R) MPI launcher's use of host files and the --ppn CLI.
- Fixed incompatibility between MVAPICH2 affinity and srun affinity.
- Fixed test_progress_exit integration test to account for extrapolation error.
- Fixed integration test for MPI time accounting.
- Fixed launcher problem when node is listed in multiple queues by sinfo.
- Fixed and improved affinity assignment in corner cases.
- Fixed use of sched_getcpu() for Mac OS X.
GEOPM Alpha Release
v0.3.0 * Mon Jun 19 2017 Christopher M. Cantalupo <christopher.m.cantalupo@i…