-
Notifications
You must be signed in to change notification settings - Fork 600
Metrics dependency on deployment patterns #46
Comments
Interesting. If we're not continuously deploying, we have to find a way to associate the changes from branch A with the deployment that happened after branch B was merged. I imagine a lot of teams would have this same issue! Thank you so much for bringing this up! I'd love to play with it. Would you be willing to share a CSV of your events_raw table (with all sensitive data stripped out)? |
I think there are two ways it can be solved:
CSV of events_raw table: |
This issue is very interesting, because I think we have a similar problem. |
I think we can add support for custom deployments, but I do worry that it puts a lot of work on the developer to track and include all the changes associated with the deployment! I think it might be worth implementing both ideas -- working backwards with the Also, for excluding deployments to staging, I would adjust the SQL script in your BQ project to filter them out. |
Interesting! We have been talking about this recently as we do not do continuous delivery, and still very not sure if we should be including each merge in the lead time/deployment metrics. At the moment we think the existing behaviour of only including branch b works best, but very happy to be corrected! Our thoughts at the moment are that this is batching, which is what the deployment frequency is capturing i.e. lots of changes but one prod deployment means deployment frequency is lower. So only including batch of a + b as 1 deployment works well. In terms of lead time, think it's more about interpretation of 'time since first commit' and 'time to prod'. From accelerate, interpret it as measuring the predictable delivery process, where we can milk efficiency with better automation. For us, if we measure from merging b to the time merge_b is in prod, we are measuring our capability to deliver changes to customers. It's the length of a pipeline or the time it takes for manual testing if stuck in a UAT environment. We may lose the time taken for merge_a to reach prod, but that is variable. It could be because a bug became apparent in E2E tests requiring a patch or because a feature isn't complete but a pair want their changes on the trunk to avoid merge conflicts. If we capture that time, there will be a lot of variance and the trend will not match capabilities as much as ways of working. If we find ourselves batching a lot, then this will come up in the deployment frequency metric. If changes gets blocked a lot (and doesn't require a patch or isn't part of a batch) then it will flag in our lead time metric. There is a talk with Nicole Gorsgren where this is touched on, and she even argues lead time could be the time from commit to preprod, when batching is desired. It's very interesting how much variance there is online about when to include something as a deployment or what to measure for lead time, adds a whole other layer of complexity when trying to generalise it |
Thank you Steve! Please feel free to join our fourkeys channel on the GCP slack: https://join.slack.com/t/googlecloud-community/shared_invite/zt-m973j990-IMij2Xh8qKPu7SaHfOcCFg. I'm also doing a panel on Wednesday if you'd like to join! https://www.crowdcast.io/e/356fkadm/register Whenever we can, we do like to track the first commit because we find that the longer a commit stays in the deployment pipeline, the more likely we are to see incidents and bugs show up. Also, first commit to prod is a predictive measurement for higher job satisfaction, lower burnout, etc. But you're absolutely right that deployment frequency will capture this batching behavior! This is part of the reason why we have 4 metrics! "If we capture that time, there will be a lot of variance and the trend will not match capabilities as much as ways of working." I agree and this is why we use medians, to better handle the variance! This is also why I like to use every commit to deploy, not just the first commit of a PR. If we look at the example below, if we use first commits only, the lead time appears higher than if we look at all commits. Measuring lead time in this way is more resistant to anomalies and outliers. However, not everyone agrees with this method. I know many teams that prefer first commit and first commit only. One PM expressed that he wanted to see the first commit method b/c it better captured how the developers were working, which was important to him. Why did the developer in B make one change and then do no more work on it for 25 days? How can we improve our prioritization so that this doesn't happen again? Or maybe it's a design or architecture problem? If we consider that the our goal is to better serve our customers, then having this insight into the way our developers is working is very useful. All that being said, if you are using these metrics to improve your own team's performance, the most important thing is that you consistently use the same metrics definitions and methodology. I like to compare this to using the bathroom scale vs the scale at the gym! It doesn't really matter if my bathroom scale is 5 lbs off, as long as it's the scale I use to measure my improvements -- it'll still be relatively correct. But if I go to the gym and compare that to my bathroom scale, then we have a problem! I would be a little careful about the pre-prod environment argument. Obviously Dr. Forsgren is correct and it's important to acknowledge that one cannot always be deploying to certain environments (eg app stores), in which case it is completely acceptable to use a pre-prod environment. However, we want to be very mindful that we don't use this one example to give ourselves the leeway to count our dev or QA environments as "prod." I think the important thing is to remember that the goal is to improve performance, not to improve metrics. The metrics are like a diagnostic tool -- they help us understand where we can improve and how far we've come. When we focus just on improving metrics, it can be very tempting to redefine the "prod" environment and "lead time" in ways that artificially inflate our numbers. If we hide our flaws in this way, we miss the opportunity to improve. But again! If you define your "deployments", "changes", and "incidents" in a way that feels best aligned to your business and your operations, if you use that as your scale consistently, then you're 99% of the way there, and these little details are just academic. |
The statistics works as it should if each merge to main branch is deployed to production separately. If there are more merges to main branch before the deploy to production, either the lead time or the deployment frequency will be incorrect.
Option 1: Create deployment event for each merge to main branch. Will include all commits in lead time, but will generate higher number of deployment events.
Option 2: Create one deployment event for last merge to main branch. Will only include commits for last merge in lead time calculation, but will generate correct number of deployment events.
Example to hopefully make it more clear (using option 2):
Create branch a, make 2 commits, merge in to master. Create branch b, make 2 commits, merge to master.
Git log afterwards:
Create deployment with GitHub API for last merge:
Resulting BigQuery contents in deployments table:
Note that the commits related to branch a are not included.
In this case only the changes on branch b will be included in the lead time dashboard, but deployment frequency will show 1 deployment. If another deployment event had been created for merge commit of branch a, the frequency would be too high.
The text was updated successfully, but these errors were encountered: