Metrics dependency on deployment patterns #46

dgrenner · 2020-12-09T15:46:56Z

The statistics works as it should if each merge to main branch is deployed to production separately. If there are more merges to main branch before the deploy to production, either the lead time or the deployment frequency will be incorrect.

Option 1: Create deployment event for each merge to main branch. Will include all commits in lead time, but will generate higher number of deployment events.
Option 2: Create one deployment event for last merge to main branch. Will only include commits for last merge in lead time calculation, but will generate correct number of deployment events.

Example to hopefully make it more clear (using option 2):
Create branch a, make 2 commits, merge in to master. Create branch b, make 2 commits, merge to master.

Git log afterwards:

git log --decorate=no --date-order --reverse --pretty=oneline

6bd52a46ab919384b043a40750e0aed8e0e0d43b Branch a, commit 1
5a93561896c7c04758df9fe05eaaa4f7154e53f6 Branch a, commit 2
807b8acfc1007e544941944df182d10e6f9f52fd Merge pull request #3 from org-name/branch-a
964198b26178b4203f14a77c77080af70a445750 B - 1
bb13c56e5e9b214682c28647adc787ff061183e2 B - 2
21e706a11d41e467fe055dbdb5fe21a609427a20 Merge pull request #4 from org-name/branch-b

Create deployment with GitHub API for last merge:

curl -u $GITHUB_USER:$GITHUB_TOKEN \
  -X POST \
  -H "Accept: application/vnd.github.v3+json" \
  https://$HOSTNAME/api/v3/repos/org-name/test-four-keys/deployments \
  -d '{"ref":"21e706a11d41e467fe055dbdb5fe21a609427a20"}'

curl -u $GITHUB_USER:$GITHUB_TOKEN \
  -X POST \
  -H "Accept: application/vnd.github.v3+json" \
  https://$HOSTNAME/api/v3/repos/org-name/test-four-keys/deployments/8/statuses \
  -d '{"state":"success"}'

Resulting BigQuery contents in deployments table:

  {
    "source": "github",
    "deploy_id": "8",
    "time_created": "2020-12-09 11:44:48 UTC",
    "repository": "org-name/test-four-keys",
    "changes": [
      "21e706a11d41e467fe055dbdb5fe21a609427a20",
      "964198b26178b4203f14a77c77080af70a445750",
      "bb13c56e5e9b214682c28647adc787ff061183e2"
    ]
  }

Note that the commits related to branch a are not included.

In this case only the changes on branch b will be included in the lead time dashboard, but deployment frequency will show 1 deployment. If another deployment event had been created for merge commit of branch a, the frequency would be too high.

The text was updated successfully, but these errors were encountered:

dinagraves · 2020-12-10T23:26:58Z

Interesting. If we're not continuously deploying, we have to find a way to associate the changes from branch A with the deployment that happened after branch B was merged. I imagine a lot of teams would have this same issue!

Thank you so much for bringing this up! I'd love to play with it. Would you be willing to share a CSV of your events_raw table (with all sensitive data stripped out)?

dgrenner · 2020-12-11T12:35:37Z

I think there are two ways it can be solved:

The push event contains before, so it is possible to go backwards until you find the previous sha that was deployed to main branch. That could work if deployments are only done from a single branch. With deploys from different branches that will most likely not work.
Instead of using GitHub's deployment_status event, send in custom deploy events (similar to the testing setup) with sha of main commit, and then list of additional commits (structure could be different if it simplifies later steps - maybe single list with all shas would be better):
{ "deployment": { "id": 8, "updated_at": "2020-12-09T11:44:48Z", "state": "success" "sha": "21e706a11d41e467fe055dbdb5fe21a609427a20", "additional_sha":[ "807b8acfc1007e544941944df182d10e6f9f52fd" ] } }

CSV of events_raw table:
events_raw_test-four-keys.zip

jgrumboe · 2020-12-15T12:57:57Z

This issue is very interesting, because I think we have a similar problem.
I would probably also vote for a custom deploy event, this would make it more flexibel for integrating other deployment solutions. For example in our setup we do heroku deployments for each PR but the production deployment is done via spinnaker to AWS.
The issue with this is we get events for each PR and our change lead time is incredible low (cz of heroku deploying every PR to a previewstage) ... but that's not the truth.
A custom deployment event would give us the possibility to trigger it exactly there where the "real" prod deployment is happening.

dinagraves · 2020-12-15T23:37:21Z

I think we can add support for custom deployments, but I do worry that it puts a lot of work on the developer to track and include all the changes associated with the deployment! I think it might be worth implementing both ideas -- working backwards with the before field to find all the associated changes, and adding the custom deployment support.

Also, for excluding deployments to staging, I would adjust the SQL script in your BQ project to filter them out.

StevePJ-Sainsburys · 2021-03-25T19:43:48Z

Interesting! We have been talking about this recently as we do not do continuous delivery, and still very not sure if we should be including each merge in the lead time/deployment metrics. At the moment we think the existing behaviour of only including branch b works best, but very happy to be corrected!

Our thoughts at the moment are that this is batching, which is what the deployment frequency is capturing i.e. lots of changes but one prod deployment means deployment frequency is lower. So only including batch of a + b as 1 deployment works well.

In terms of lead time, think it's more about interpretation of 'time since first commit' and 'time to prod'. From accelerate, interpret it as measuring the predictable delivery process, where we can milk efficiency with better automation. For us, if we measure from merging b to the time merge_b is in prod, we are measuring our capability to deliver changes to customers. It's the length of a pipeline or the time it takes for manual testing if stuck in a UAT environment. We may lose the time taken for merge_a to reach prod, but that is variable. It could be because a bug became apparent in E2E tests requiring a patch or because a feature isn't complete but a pair want their changes on the trunk to avoid merge conflicts. If we capture that time, there will be a lot of variance and the trend will not match capabilities as much as ways of working. If we find ourselves batching a lot, then this will come up in the deployment frequency metric. If changes gets blocked a lot (and doesn't require a patch or isn't part of a batch) then it will flag in our lead time metric.

There is a talk with Nicole Gorsgren where this is touched on, and she even argues lead time could be the time from commit to preprod, when batching is desired.

It's very interesting how much variance there is online about when to include something as a deployment or what to measure for lead time, adds a whole other layer of complexity when trying to generalise it

dinagraves · 2021-03-29T22:11:47Z

Thank you Steve! Please feel free to join our fourkeys channel on the GCP slack: https://join.slack.com/t/googlecloud-community/shared_invite/zt-m973j990-IMij2Xh8qKPu7SaHfOcCFg. I'm also doing a panel on Wednesday if you'd like to join! https://www.crowdcast.io/e/356fkadm/register

Whenever we can, we do like to track the first commit because we find that the longer a commit stays in the deployment pipeline, the more likely we are to see incidents and bugs show up. Also, first commit to prod is a predictive measurement for higher job satisfaction, lower burnout, etc.

But you're absolutely right that deployment frequency will capture this batching behavior! This is part of the reason why we have 4 metrics!

"If we capture that time, there will be a lot of variance and the trend will not match capabilities as much as ways of working." I agree and this is why we use medians, to better handle the variance! This is also why I like to use every commit to deploy, not just the first commit of a PR. If we look at the example below, if we use first commits only, the lead time appears higher than if we look at all commits. Measuring lead time in this way is more resistant to anomalies and outliers.

However, not everyone agrees with this method. I know many teams that prefer first commit and first commit only. One PM expressed that he wanted to see the first commit method b/c it better captured how the developers were working, which was important to him. Why did the developer in B make one change and then do no more work on it for 25 days? How can we improve our prioritization so that this doesn't happen again? Or maybe it's a design or architecture problem? If we consider that the our goal is to better serve our customers, then having this insight into the way our developers is working is very useful.

All that being said, if you are using these metrics to improve your own team's performance, the most important thing is that you consistently use the same metrics definitions and methodology. I like to compare this to using the bathroom scale vs the scale at the gym! It doesn't really matter if my bathroom scale is 5 lbs off, as long as it's the scale I use to measure my improvements -- it'll still be relatively correct. But if I go to the gym and compare that to my bathroom scale, then we have a problem!

I would be a little careful about the pre-prod environment argument. Obviously Dr. Forsgren is correct and it's important to acknowledge that one cannot always be deploying to certain environments (eg app stores), in which case it is completely acceptable to use a pre-prod environment. However, we want to be very mindful that we don't use this one example to give ourselves the leeway to count our dev or QA environments as "prod."

I think the important thing is to remember that the goal is to improve performance, not to improve metrics. The metrics are like a diagnostic tool -- they help us understand where we can improve and how far we've come. When we focus just on improving metrics, it can be very tempting to redefine the "prod" environment and "lead time" in ways that artificially inflate our numbers. If we hide our flaws in this way, we miss the opportunity to improve.

But again! If you define your "deployments", "changes", and "incidents" in a way that feels best aligned to your business and your operations, if you use that as your scale consistently, then you're 99% of the way there, and these little details are just academic.

dgrenner mentioned this issue Jan 4, 2021

Support multiple commits per deployment #50

Merged

dinagraves mentioned this issue Jan 7, 2021

Create script to attach additional commit information to deployment #52

Open

donmccasland added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Feb 4, 2021

rogerthatdev closed this as completed Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics dependency on deployment patterns #46

Metrics dependency on deployment patterns #46

dgrenner commented Dec 9, 2020

dinagraves commented Dec 10, 2020

dgrenner commented Dec 11, 2020

jgrumboe commented Dec 15, 2020

dinagraves commented Dec 15, 2020

StevePJ-Sainsburys commented Mar 25, 2021

dinagraves commented Mar 29, 2021

Metrics dependency on deployment patterns #46

Metrics dependency on deployment patterns #46

Comments

dgrenner commented Dec 9, 2020

dinagraves commented Dec 10, 2020

dgrenner commented Dec 11, 2020

jgrumboe commented Dec 15, 2020

dinagraves commented Dec 15, 2020

StevePJ-Sainsburys commented Mar 25, 2021

dinagraves commented Mar 29, 2021