Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dev and main branch from workflows. #404

Merged
merged 2 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Documentation

on:
push:
branches: [ main, amd-mainline ]
branches: [ amd-mainline ]
paths:
- 'docs/archive/docs-2.x/**'
- 'docs/archive/docs-1.x/**'
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/formatting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ name: Formatting

on:
push:
branches: [ main, dev, amd-mainline, amd-staging ]
branches: [ amd-mainline, amd-staging ]
pull_request:
branches: [ main, dev, amd-mainline, amd-staging ]
branches: [ amd-mainline, amd-staging ]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/mi-rhel9.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: mi-rhel9

on:
push:
branches: [ main, amd-mainline ]
branches: [ amd-mainline ]

# Allows manual execution
workflow_dispatch:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/rhel-8.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ name: RHEL 8
# Controls when the workflow will run
on:
push:
branches: [ main, dev, amd-mainline, amd-staging ]
branches: [ amd-mainline, amd-staging ]
pull_request:
branches: [ main, dev, amd-mainline, amd-staging ]
branches: [ amd-mainline, amd-staging ]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tarball.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: tarball

on:
push:
branches: [ main, amd-mainline ]
branches: [ amd-mainline ]
pull_request:

concurrency:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/ubuntu-jammy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ name: Ubuntu 22.04

on:
push:
branches: [ main, dev, amd-mainline, amd-staging ]
branches: [ amd-mainline, amd-staging ]
pull_request:
branches: [ main, dev, amd-mainline, amd-staging ]
branches: [ amd-mainline, amd-staging ]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down
2 changes: 1 addition & 1 deletion docs/archive/docs-1.x/analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ $ omniperf analyze -p workloads/vcopy/mi200/ --list-metrics gfx90a
├─────────┼─────────────────────────────┤
...
```
2. Choose your own customized subset of metrics with `-b` (a.k.a. `--metric`), or build your own config following [config_template](https://github.com/ROCm/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light).
2. Choose your own customized subset of metrics with `-b` (a.k.a. `--metric`), or build your own config following [config_template](https://github.com/ROCm/omniperf/blob/amd-mainline/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light).
```shell-session
$ omniperf analyze -p workloads/vcopy/mi200/ -b 2
--------
Expand Down
2 changes: 1 addition & 1 deletion docs/archive/docs-2.x/analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ Analysis mode = cli
2.1.30 -> L1I Fetch Latency
...
```
3. Choose your own customized subset of metrics with `-b` (a.k.a. `--block`), or build your own config following [config_template](https://github.com/ROCm/omniperf/blob/main/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light).
3. Choose your own customized subset of metrics with `-b` (a.k.a. `--block`), or build your own config following [config_template](https://github.com/ROCm/omniperf/blob/amd-mainline/src/omniperf_analyze/configs/panel_config_template.yaml). Below shows how to generate a report containing only metric 2 (a.k.a. System Speed-of-Light).
```shell-session
$ omniperf analyze -p workloads/vcopy/MI200/ -b 2
--------
Expand Down
16 changes: 8 additions & 8 deletions docs/archive/docs-2.x/performance_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -2178,7 +2178,7 @@ A good discussion of coarse and fine grained memory allocations and what type of
(VALU_inst_mix_example)=
## VALU Arithmetic Instruction Mix

For this example, we consider the [instruction mix sample](https://github.com/ROCm/omniperf/blob/dev/sample/instmix.hip) distributed as a part of Omniperf.
For this example, we consider the [instruction mix sample](https://github.com/ROCm/omniperf/blob/amd-mainline/sample/instmix.hip) distributed as a part of Omniperf.

```{note}
This example is expected to work on all CDNA accelerators, however the results in this section were collected on an [MI2XX](2xxnote) accelerator
Expand Down Expand Up @@ -2269,7 +2269,7 @@ shows that we have exactly one of each type of VALU arithmetic instruction, by c
(Fabric_transactions_example)=
## Infinity-Fabric(tm) transactions

For this example, we consider the [Infinity Fabric(tm) sample](https://github.com/ROCm/omniperf/blob/dev/sample/fabric.hip) distributed as a part of Omniperf.
For this example, we consider the [Infinity Fabric(tm) sample](https://github.com/ROCm/omniperf/blob/amd-mainline/sample/fabric.hip) distributed as a part of Omniperf.
This code launches a simple read-only kernel, e.g.:

```c++
Expand Down Expand Up @@ -2826,7 +2826,7 @@ On an AMD [MI2XX](2xxnote) accelerator, for FP32 values this will generate a `gl
(flatmembench)=
### Global / Generic (FLAT)

For this example, we consider the [vector-memory sample](https://github.com/ROCm/omniperf/blob/dev/sample/vmem.hip) distributed as a part of Omniperf.
For this example, we consider the [vector-memory sample](https://github.com/ROCm/omniperf/blob/amd-mainline/sample/vmem.hip) distributed as a part of Omniperf.
This code launches many different versions of a simple read/write/atomic-only kernels targeting various address spaces, e.g. below is our simple `global_write` kernel:

```c++
Expand Down Expand Up @@ -2976,7 +2976,7 @@ The assembly in these experiments were generated for an [MI2XX](2xxnote) acceler
Next, we examine a generic write.
As discussed [previously](Flat_design), our `generic_write` kernel uses an address space cast to _force_ the compiler to choose our desired address space, regardless of other optimizations that may be possible.

We also note that the `filter` parameter passed in as a kernel argument (see [example](https://github.com/ROCm/omniperf/blob/dev/sample/vmem.hip), or [design note](Flat_design)) is set to zero on the host, such that we always write to the 'local' (LDS) memory allocation `lds`.
We also note that the `filter` parameter passed in as a kernel argument (see [example](https://github.com/ROCm/omniperf/blob/amd-mainline/sample/vmem.hip), or [design note](Flat_design)) is set to zero on the host, such that we always write to the 'local' (LDS) memory allocation `lds`.

Examining this kernel in the VMEM Instruction Mix table yields:

Expand Down Expand Up @@ -3339,7 +3339,7 @@ Next we examine the use of 'Spill/Scratch' memory.
On current CDNA accelerators such as the [MI2XX](2xxnote), this is implemented using the [private](mspace) memory space, which maps to ['scratch' memory](https://llvm.org/docs/AMDGPUUsage.html#amdgpu-address-spaces) in AMDGPU hardware terminology.
This type of memory can be accessed via different instructions depending on the specific architecture targeted. However, current CDNA accelerators such as the [MI2XX](2xxnote) use so called `buffer` instructions to access private memory in a simple (and typically) coalesced manner. See [Sec. 9.1, 'Vector Memory Buffer Instructions' of the CDNA2 ISA guide](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf) for further reading on this instruction type.

We develop a [simple kernel](https://github.com/ROCm/omniperf/blob/dev/sample/stack.hip) that uses stack memory:
We develop a [simple kernel](https://github.com/ROCm/omniperf/blob/amd-mainline/sample/stack.hip) that uses stack memory:
```c++
#include <hip/hip_runtime.h>
__global__ void knl(int* out, int filter) {
Expand Down Expand Up @@ -3404,7 +3404,7 @@ Here we see a single write to the stack (10.3.6), which corresponds to an L1-L2
(IPC_example)=
## Instructions-per-cycle and Utilizations example

For this section, we use the instructions-per-cycle (IPC) [example](https://github.com/ROCm/omniperf/blob/dev/sample/ipc.hip) included with Omniperf.
For this section, we use the instructions-per-cycle (IPC) [example](https://github.com/ROCm/omniperf/blob/amd-mainline/sample/ipc.hip) included with Omniperf.

This example is compiled using `c++17` support:

Expand Down Expand Up @@ -3824,7 +3824,7 @@ Finally, we note that our branch utilization (11.2.5) has increased slightly fro

## LDS Examples

For this example, we consider the [LDS sample](https://github.com/ROCm/omniperf/blob/dev/sample/lds.hip) distributed as a part of Omniperf.
For this example, we consider the [LDS sample](https://github.com/ROCm/omniperf/blob/amd-mainline/sample/lds.hip) distributed as a part of Omniperf.
This code contains two kernels to explore how both [LDS](lds) bandwidth and bank conflicts are calculated in Omniperf.

This example was compiled and run on an MI250 accelerator using ROCm v5.6.0, and Omniperf v2.0.0.
Expand Down Expand Up @@ -4037,7 +4037,7 @@ The bank conflict rate linearly increases with the number of work-items within a
## Occupancy Limiters Example


In this [example](https://github.com/ROCm/omniperf/blob/dev/sample/occupancy.hip), we will investigate the use of the resource allocation panel in the [Workgroup Manager](SPI)'s metrics section to determine occupancy limiters.
In this [example](https://github.com/ROCm/omniperf/blob/amd-mainline/sample/occupancy.hip), we will investigate the use of the resource allocation panel in the [Workgroup Manager](SPI)'s metrics section to determine occupancy limiters.
This code contains several kernels to explore how both various kernel resources impact achieved occupancy, and how this is reported in Omniperf.

This example was compiled and run on a MI250 accelerator using ROCm v5.6.0, and Omniperf v2.0.0:
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@

# frequently used external resources
extlinks = {
"dev-sample": ("https://github.com/ROCm/omniperf/blob/dev/sample/%s", "%s"),
"dev-sample": ("https://github.com/ROCm/omniperf/blob/amd-mainline/sample/%s", "%s"),
"prod-page": (
"https://www.amd.com/en/products/accelerators/instinct/%s.html",
"%s",
Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/analyze/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ Walkthrough

3. Choose your own customized subset of metrics with the ``-b`` (or ``--block``)
option. Or, build your own configuration following
`config_template <https://github.com/ROCm/omniperf/blob/main/src/omniperf_soc/analysis_configs/panel_config_template.yaml>`_.
`config_template <https://github.com/ROCm/omniperf/blob/amd-mainline/src/omniperf_soc/analysis_configs/panel_config_template.yaml>`_.
The following snippet shows how to generate a report containing only metric 2
(:doc:`System Speed-of-Light </conceptual/system-speed-of-light>`).

Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/profile/mode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Run ``omniperf profile -h`` for more details. See
Profiling example
-----------------

The `<https://github.com/ROCm/omniperf/blob/main/sample/vcopy.cpp>`__ repository
The `<https://github.com/ROCm/omniperf/blob/amd-mainline/sample/vcopy.cpp>`__ repository
includes source code for a sample GPU compute workload, ``vcopy.cpp``. A copy of
this file is available in the ``share/sample`` subdirectory after a normal
Omniperf installation, or via the ``$OMNIPERF_SHARE/sample`` directory when
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -623,7 +623,7 @@ manner. See
for further reading on this instruction type.

We develop a `simple
kernel <https://github.com/ROCm/omniperf/blob/dev/sample/stack.hip>`__
kernel <https://github.com/ROCm/omniperf/blob/amd-mainline/sample/stack.hip>`__
that uses stack memory:

.. code-block:: cpp
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/profiling-by-example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Profiling by example
********************

The following examples refer to sample :doc:`HIP <hip:index>` code located in
:fab:`github` :dev-sample:`ROCm/omniperf/blob/dev/sample <>` and distributed
:fab:`github` :dev-sample:`ROCm/omniperf/blob/amd-mainline/sample <>` and distributed
as part of Omniperf.

.. include:: ./includes/valu-arithmetic-instruction-mix.rst
Expand Down