Skip to content

Conversation

machshev
Copy link
Collaborator

@machshev machshev commented Oct 3, 2025

Debug feature to dump a JSON representation of the Deploy objects, just before they are sent to the scheduler. This feature is hidden behind an environment variable to avoid cluttering the CLI interface. The full Deploy object is not JSON serialisable in it's current form, so we only serialise the attributes that the LocalLauncher uses.

To enable set DVSIM_DEPLOY_DUMP=true, run DVSim and there will be a file called deploy-<timestamp>.json generated in the scratch directory scratch/<branch>/deploy-<timestamp.json>.

This file can they be compared between runs to check for regressions in the flow leading up to job deployment. With --fixed-seed=1 --branch=baseline, the resulting json files should be identical. Setting the "branch" doesn't actually change the git branch, it just overrides what DVSim
thinks is the branch as far as paths in the scratch directory are concerned. We need to either set this to something fixed so that the deployment objects contain the same paths, otherwise a diff will fail.

Generating a diff or hash of the two files and comparing shows if the job deployments would be identical or not.

Using this functionality I have confirmed, by backporting this commit to the first release tag, that none of the refactorings made so far in this repository have changed the deployment objects in such a way that the launched jobs are different in any way.

✦ ❯ sha512sum baseline.json deploy_25.10.03_11.35.33.json
af732c3011753cfc7dc40068e1ce9b6cf3624973ffbbd25f828e4d7727f27dd65b5ada19407500315ea6f279946222d185dc939fe4978b0b32b3e7e8b4f7f60c  baseline.json
af732c3011753cfc7dc40068e1ce9b6cf3624973ffbbd25f828e4d7727f27dd65b5ada19407500315ea6f279946222d185dc939fe4978b0b32b3e7e8b4f7f60c  deploy_25.10.03_11.35.33.json

The first JSON file was generated from backporting the this feature to the tagged commit. The second file was generated with this branch and includes all the tidyup work made so far to DVSim.

The DVSim command used is:

DVSIM_DEPLOY_DUMP=true dvsim \
    ~/base/opentitan/hw/top_earlgrey/dv/top_earlgrey_sim_cfgs.hjson \
    --proj-root ~/base/opentitan -i nightly --scratch ~/scratch \
    --branch baseline --fixed-seed 1 \
    --verbose=debug --max-parallel=50 \
    --cov

Note: your hashes will be different from mine as the directory paths will be different. However if you run this against the same versions your hashes should be the same as each other.

Copy link

@mkj121 mkj121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pirmin is away until Oct 20 so I suggest you remove him as a reviewer

@machshev machshev removed the request for review from vogelpi October 3, 2025 13:13
@machshev machshev force-pushed the deploy_object_dump branch 3 times, most recently from 4699d57 to 61776ff Compare October 3, 2025 16:17
@machshev
Copy link
Collaborator Author

machshev commented Oct 3, 2025

Turns out that the coverage merge directories are provided in a semi random order. Which means that if you run DVSim several times then the hash can be different due to that.

I've updated the PR to sort the deployment object model dump JSON objects by the job full name to make sure the jobs themselves are in the same order each time. However small variations may be present anyway due to issues like the above.

We could make sure these directory paths are sorted to make them more deterministic. But that's for another PR. For the moment this mechanism seems good enough.

@machshev machshev mentioned this pull request Oct 7, 2025
4 tasks
Debug feature to dump a JSON representation of the `Deploy` objects,
just before they are sent to the scheduler. This feature is hidden behind
an environment variable to avoid cluttering the CLI interface. The full
`Deploy` object is not JSON serialisable in it's current form, so we
only serialise the attributes that the `LocalLauncher` uses and some
identifying attributes like the full name of the job and the deployment
class name.

To enable set `DVSIM_DEPLOY_DUMP=true`, run DVSim and there will be a
file called `deploy-<branch>-<timestamp>.json` generated in the scratch
directory `scratch/deploy-<branch>-<timestamp.json>`.

This file can they be compared between runs to check for regressions in
the flow leading up to job deployment. With `--fixed-seed=1 --branch=baseline`,
the resulting json files should be identical. Setting the "branch"
doesn't actually change the git branch, it just overrides what DVSim
thinks is the branch as far as paths in the scratch directory are
concerned. We need to either set this to something fixed so that the
deployment objects contain the same paths, otherwise a diff will fail.

Generating a `diff` or hash of the two files and comparing shows if the
job deployments would be identical or not.

Using this functionality I have confirmed, by backporting this commit to
the first release tag, that none of the refactorings made so far in this
repository have changed the deployment objects in such a way that the
launched jobs are different in any way.

```
✦ ❯ sha512sum baseline.json deploy_25.10.03_11.35.33.json
af732c3011753cfc7dc40068e1ce9b6cf3624973ffbbd25f828e4d7727f27dd65b5ada19407500315ea6f279946222d185dc939fe4978b0b32b3e7e8b4f7f60c  baseline.json
af732c3011753cfc7dc40068e1ce9b6cf3624973ffbbd25f828e4d7727f27dd65b5ada19407500315ea6f279946222d185dc939fe4978b0b32b3e7e8b4f7f60c  deploy_25.10.03_11.35.33.json
```

The first JSON file was generated from backporting the this feature to
the tagged commit. The second file was generated with this branch and
includes all the tidyup work made so far to DVSim.

The DVSim command used is:

```
DVSIM_DEPLOY_DUMP=true dvsim \
    ~/base/opentitan/hw/top_earlgrey/dv/top_earlgrey_sim_cfgs.hjson \
    --proj-root ~/base/opentitan -i nightly --scratch ~/scratch \
    --branch baseline --fixed-seed 1 \
    --verbose=debug --max-parallel=50 \
    --cov
```

**Note:** *your hashes will be different from mine as the directory paths
will be different. However if you run this against the same versions
your hashes should be the same as each other.*

**Note:** *Depending on the flags you pass to DVSim there may be some
minor indeterminism in the configuration. For example with CovMerge the
coverage directories are not always given in the same order. It can take
several runs to get a different hash, or it could be different on every
run. In such cases it's worth using `diff` across the files to see what
the actual differences are, they may not be consequential.*

Signed-off-by: James McCorrie <james.mccorrie@lowrisc.org>
@machshev machshev force-pushed the deploy_object_dump branch from 61776ff to 591cbd3 Compare October 7, 2025 10:09
Copy link

@hcallahan-lowrisc hcallahan-lowrisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be using this to aid my own debugging, I am sure. Thanks a lot @machshev :)

The only thing I would say, is that it would be nice to add this feature to documentation somewhere so developers can discover it.

@machshev
Copy link
Collaborator Author

machshev commented Oct 8, 2025

Thanks @hcallahan-lowrisc 👍

Yes documentation would be good... right now there isn't a good place for this kind of thing.

I'm wondering about setting up sphinx to generate API docs? Then we could add this kind of thing in a debugging/developer page. Some architecture diagrams wouldn't go amiss either.

Vaguely recollect seeing something like this in the main opentitan docs. However if we have to maintain it there as well as here it's going to diverge. Might be worth migrating at least the bits that make sense?

@machshev machshev merged commit f682788 into lowRISC:master Oct 8, 2025
6 checks passed
@hcallahan-lowrisc
Copy link

Thanks @hcallahan-lowrisc 👍

I'm wondering about setting up sphinx to generate API docs? Then we could add this kind of thing in a debugging/developer page. Some architecture diagrams wouldn't go amiss either.

Yes, yes, yes. All the yesses. API docs would be amazing.

Vaguely recollect seeing something like this in the main opentitan docs. However if we have to maintain it there as well as here it's going to diverge. Might be worth migrating at least the bits that make sense?

Good point, I think there is still some dvsim-specific markdown docs in the OT repo. We should definitely migrate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants