Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release pipeline unreliability - improvements needed #3194

Closed
planetf1 opened this issue Jun 12, 2020 · 7 comments
Closed

Release pipeline unreliability - improvements needed #3194

planetf1 opened this issue Jun 12, 2020 · 7 comments
Assignees
Labels
release Work to create a new releae

Comments

@planetf1
Copy link
Member

planetf1 commented Jun 12, 2020

Over a number of releases we have hit problems with the current release pipelines which need further investigation

  • Missing artifacts in bintray

In #3132 we observed two artifacts (including 'org.odpi.egeria:rex-view' were not present at the 1.8 level in bintray, jcenter - and therefore not synced to maven central. They were present in other artifactory repos include egeria-release-local

No errors were shown in the logs that I could see - need to understand how this happened, and whether we need some explicit artifact by artifact checks to ensure artifactory->bintray has gone correctly, beyond trusting the jfrog cli

  • Whenever a new maven artifact is added we need to explicitly add to JCenter

There seems no way to programatically add an artifact to JCenter. It would be useful if we could either automate this or invoke some kind of stewardship task to get a person to add.

On occasion this may fail due to pom structural issues -- and in these cases we need to be able to build that artifact from the release git branch again and push through the process.

(originally raised in #2917 ( IT-19687 )- feel free to reopen, but tracking here only for now)

(Current workaround is a local/manual build and upload)

  • Syncing from JC times out

The syncing to maven central can take many hours. A pipeline job can only last 6 hours before it times out and is killed. The last release ended up taking 4-5 iterations purely due to time, and another 3 or so to address anomolies.

We need to automatically restart the job in some way - perhaps through an additinoal job that monitors the release. It should be able to resume from the right place. I can't see maven central performance changing any time soon, we still want to publish there.

(originally raised in #2704 ( IT-19687 )- feel free to reopen, but tracking here only for now)

  • Maven Central has outages

The JC->maven central job can also fail to sync artifacts due to outages at maven central. As above we need to be able to recover from this

  • Release isn't transactional

Currently we seen to sync artifact by artifact from artifactory->bintray->JCenter->maven central. Whilst the earlier 2 steps are fairly quick, the last in particular takes a long time (as above). This means a new egeria release drips out over many days. This can cause users issues as they see a new release (or use 'latest') and it is incomplete.

Need to figure out how to make this transactional ie all or nothing.

  • We don't typically pick up any issues until release time

Since the pipelines are only run monthly, we often don't see issues until too late.

We need to consider how to make it easier to run this pipelines more easily as a test - perhaps publishing to a different version tag or package name... but keeping as close to the normal process as possible.. Maybe utilizing snapshots. The objective is to spot issues sooner.

@planetf1 planetf1 added cicd release Work to create a new releae labels Jun 12, 2020
@planetf1
Copy link
Member Author

Updated https://jira.linuxfoundation.org/servicedesk/customer/portal/2/IT-19687 with link to issues here

@planetf1 planetf1 mentioned this issue Jun 12, 2020
16 tasks
@planetf1
Copy link
Member Author

planetf1 commented Jul 2, 2020

When running the 2.0 pipeline, surprisingly the release pipeline finished without error

However on inspection the sync failed - for example

2020-07-01T16:15:06.5935329Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/f2b8bfca-b2cb-4243-bfe6-e58464d6cf40.sh
2020-07-01T16:15:06.6018613Z STARTING - Sync access-services:2.0
2020-07-01T16:15:06.7999816Z RESULT - Sync access-services:2.0 = {"message":"Version '2.0' was not found"}
2020-07-01T16:15:06.8021440Z STARTING - Sync access-services-fvt:2.0
2020-07-01T16:15:07.0121468Z RESULT - Sync access-services-fvt:2.0 = {"message":"Version '2.0' was not found"}
2020-07-01T16:15:07.0141333Z STARTING - Sync access-services-samples:2.0
2020-07-01T16:19:55.6504477Z RESULT - Sync access-services-samples:2.0 = {"status":"Validation Failed","messages":"[Failed to close repository: orgodpi-4081., Dropping existing partial staging repository.]"}
2020-07-01T16:19:55.6525164Z STARTING - Sync adapters:2.0
2020-07-01T16:19:55.7815634Z RESULT - Sync adapters:2.0 = {"message":"Version '2.0' was not found"}
2020-07-01T16:19:55.7840143Z STARTING - Sync ***-services:2.0
2020-07-01T16:19:55.9619449Z RESULT - Sync ***-services:2.0 = {"message":"Version '2.0' was not found"}
2020-07-01T16:19:55.9642375Z STARTING - Sync ***-services-api:2.0
2020-07-01T16:27:20.3623700Z RESULT - Sync ***-services-api:2.0 = {"status":"Validation Failed","messages":"[Failed to close repository: orgodpi-4082., Dropping existing partial staging repository.]"}
2020-07-01T16:27:20.3649265Z STARTING - Sync ***-services-client:2.0
2020-07-01T16:27:20.7612173Z RESULT - Sync ***-services-client:2.0 = {"message":"Version '2.0' was not found"}
2020-07-01T16:27:20.7649623Z STARTING - Sync ***-services-config-metadata-server-sample:2.0
2020-07-01T16:27:20.8901847Z RESULT - Sync ***-services-config-metadata-server-sample:2.0 = {"message":"Version '2.0' was not found"}
2020-07-01T16:27:20.8919201Z STARTING - Sync ***-services-registration:2.0
2020-07-01T16:32:24.0262650Z RESULT - Sync ***-services-registration:2.0 = {"status":"Validation Failed","messages":"[Failed to close repository: orgodpi-4083., Dropping existing partial staging repository.]"}
2020-07-01T16:32:24.0288041Z STARTING - Sync ***-services-samples:2.0
2020-07-01T16:32:24.1938399Z RESULT - Sync ***-services-samples:2.0 = {"message":"Version '2.0' was not found"}
2020-07-01T16:32:24.1959282Z STARTING - Sync ***-services-server:2.0
2020-07-01T16:32:38.2561809Z RESULT - Sync ***-services-server:2.0 = {"status":"Sync Failed","messages":"[Failed to close repository: orgodpi-4083. Server response:\n <nexus-error>\n  <errors>\n    <error>\n      <id>*<\u002fid>\n      <msg>Unhandled: Staging repository is already transitioning: orgodpi-4083<\u002fmsg>\n    <\u002ferror>\n  <\u002ferrors>\n<\u002fnexus-error>, Dropping existing partial staging repository.]"}
2020-07-01T16:32:38.2590350Z STARTING - Sync ***-services-spring:2.0
2020-07-01T16:36:57.0753612Z RESULT - Sync ***-services-spring:2.0 = {"status":"Sync Failed","messages":"[Could not sync artifact ***-services-spring-2.0.pom.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0-sources.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0-javadoc.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0.pom. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0-sources.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0-sources.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0.pom.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0-javadoc.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0-sources.jar.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0-javadoc.jar.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0.pom.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact ***-services-spring-2.0.jar.md5. Server response:\n Method not allowed during maintenance]"}
2020-07-01T16:36:57.0758997Z STARTING - Sync analytics-modeling:2.0
2020-07-01T16:36:58.2152373Z RESULT - Sync analytics-modeling:2.0 = {"status":"Sync Failed","messages":"[Could not sync artifact analytics-modeling-2.0.pom.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-2.0.pom. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-2.0.pom.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-2.0.pom.md5. Server response:\n Method not allowed during maintenance]"}
2020-07-01T16:36:58.2176204Z STARTING - Sync analytics-modeling-api:2.0
2020-07-01T16:37:03.9654822Z RESULT - Sync analytics-modeling-api:2.0 = {"status":"Sync Failed","messages":"[Could not sync artifact analytics-modeling-api-2.0.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-sources.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-javadoc.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-tests.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-sources.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0.pom.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0.pom. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-tests.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-javadoc.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-javadoc.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0.pom.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-sources.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-tests.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-tests.jar.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-sources.jar.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0.pom.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0-javadoc.jar.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-api-2.0.jar.md5. Server response:\n Method not allowed during maintenance]"}
2020-07-01T16:37:03.9672457Z STARTING - Sync analytics-modeling-client:2.0
2020-07-01T16:37:04.2162835Z RESULT - Sync analytics-modeling-client:2.0 = {"message":"In order to sync to Maven Central your package must be included in the JCenter repository"}
2020-07-01T16:37:04.2185893Z STARTING - Sync analytics-modeling-server:2.0
2020-07-01T16:37:06.6061042Z RESULT - Sync analytics-modeling-server:2.0 = {"status":"Sync Failed","messages":"[Could not sync artifact analytics-modeling-server-2.0.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0-sources.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0.pom. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0-javadoc.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0-sources.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0.pom.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0.pom.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0.jar.asc. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0-javadoc.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0-sources.jar. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0-javadoc.jar.sha1. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0.pom.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0.jar.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0-javadoc.jar.md5. Server response:\n Method not allowed during maintenance, Could not sync artifact analytics-modeling-server-2.0-sources.jar.md5. Server response:\n Method not allowed during maintenance]"}
2020-07-01T16:37:06.6076758Z STARTING - Sync analytics-modeling-spring:2.0
2020-07-01T16:37:06.7363799Z RESULT - Sync analytics-modeling-spring:2.0 = {"

It's not clear how the staging repo issue happens -- here we had the first run of the job, and since that is all handled by jfrog it looks to be an issue their end.

The maintenance is common with maven central - we need to detect, or at least verify at the end of the job & iterate.

(kicked off job again manually for this release, but process needs improving)

@github-actions
Copy link

github-actions bot commented Sep 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Sep 1, 2020
@planetf1 planetf1 removed the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Sep 14, 2020
@planetf1
Copy link
Member Author

planetf1 commented Nov 4, 2020

These problems persist, and make the release process take longer than idea (a single click/approval)

In addition we have recently switched our PR process over to github actions, and started investigating gradle builds.

So other areas to consider include

  • Should we move the release process itself over to github actions - perhaps driving the push of maven artifacts from the release create action in github
  • Can we recode the distribution to maven central to be direct & in a single transaction. The creation of a new staging area for every artifact is slow, unreliable, and co-incidentally whenever we do it there seems to be a maven central outage (cause or effect....)
  • We currently ship EVERY build directory as a maven artifact. This is unnecessary - we only need maven artifacts for those components other developers would consume as maven artifacts. Whilst technically possible to wrap up as such, I'm not convinced for example that our nodejs based UI application is a good fit. Nor are all the empty directories which are dummy pom modules used only for the build really, rather then being like our caller package - potentially used for dependencies. Reducing the number of components could speed build, and improve distribution. Most likely we need to go to gradle to have a better seperation between build & package. maven is too focussed on a 1:1 relationship

@planetf1 planetf1 mentioned this issue Nov 4, 2020
20 tasks
@planetf1 planetf1 self-assigned this Nov 4, 2020
@planetf1
Copy link
Member Author

planetf1 commented Nov 5, 2020

Opened up a request with the Linux Foundation to see what support they may be able to provide generally, and also specifically in terms of the bintray/JCenter/maven central sync -- which is the current biggest pain point

https://jira.linuxfoundation.org/plugins/servlet/theme/portal/2/IT-20990

@github-actions
Copy link

github-actions bot commented Jan 5, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Jan 5, 2021
@planetf1 planetf1 removed the no-issue-activity Issues automatically marked as stale because they have not had recent activity. label Jan 5, 2021
@planetf1
Copy link
Member Author

planetf1 commented Feb 4, 2021

This is now superceeded by #4664 as we need to rewrite the pipeline completely (most likely).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release Work to create a new releae
Projects
None yet
Development

No branches or pull requests

2 participants