Releases: cloudfoundry/diego-release
Diego 0.1431.0
Changes from 0.1430.0 to 0.1431.0
- Depends on garden-linux-release v0.305.0.
Breaking changes
Auction request payloads
Associated to Diego story "The auction should only send resources and identifiers back and forth". This may cause instance downtime during an upgrade from 0.1430.0 and earlier.
Mutual SSL Auth to BBS
Associated to Diego story "All communication with the BBS should be secured via mutually-authenticated SSL". By default, the BBS will now require mutual SSL authentication for access to its API endpoints. If this is enabled, components on an older release will be unable to communicate with the BBS when deploying an update, so cells may be unable to evacuate.
To configure the BBS with SSL correctly, it is easiest to use the scripts/generate-bbs-certs
script to generate a CA certificate and key and certificates and keys for the BBS server and its clients. The contents of these certificates and client and server keys must then be included in the deployment manifest. If using the spiff-based manifest-generation tooling, these values can be included in the property-overrides.yml
stub once and will flow to the BBS server and its clients.
Significant changes
- DesiredLRP data should be split across separate records
- As a BBS client, I can efficiently get frequently accessed data for all DesiredLRPs in a domain
- NSYNC's bulker should fetch the minimal set of DesiredLRP data
- Route-Emitter's bulk loop should fetch the minimal set of DesiredLRP data
- If a migration fails, BOSH aborts the deploy and I should be able to BOSH deploy the previously deployed release and recover.
- If no
/version
key is present in etcd, the BBS should not run any migrations - As a Diego developer, I would like to run vizzini as an errand
- As a Diego operator, I can specify a set of decryption keys to use to decrypt data at rest, with the BBS migrating data to the designated active key in the set
- Diego etcd on bosh-lite should default to requiring ssl
- As a Diego operator, I can opt out of the SSH DATs that do not use the plugin
- vizzini test errand runs against BBS with mutual SSL auth enabled
- Provide vizzini job with BBS URL and local consul agent URL
BOSH job changes
- Added
vizzini
job to run the vizzini test suite as an errand.
BOSH property changes
- Added
acceptance_tests.skip_ssh_without_plugin_tests
: When true, skip SSH DATs that do not use the SSH plugin. - Added properties for vizzini job:
vizzini.bbs.api_location
: Address for vizzini to reach the BBS.vizzini.routable_domain_suffix
: Domain to use for vizzini to register routes during the test.vizzini.nodes
: Number of tests to run in parallel.vizzini.verbose
: Whether to log verbosely during the test run.
- Added BBS encryption properties:
diego.bbs.encryption_keys
: List of label/passphrase pairs available to the BBS for encryption.diego.bbs.active_key_label
: Label of the encryption key to be used to encrypt the database.
- Added BBS mutual SSL auth properties:
- Properties for BBS server job:
diego.bbs.require_ssl
: whether the BBS requires SSL for communication.diego.bbs.ca_cert
: CA certificate used to sign BBS client and server SSL certificates.diego.bbs.server_cert
: SSL certificate that the BBS presents.diego.bbs.server_key
: Private key paired with the BBS's SSL certificate.
- New BBS properties for client jobs:
- Properties:
diego.*.bbs.ca_cert
diego.*.bbs.client_cert
diego.*.bbs.client_key
diego.*.bbs.require_ssl
- Jobs:
- auctioneer
- converger
- nsync
- receptor
- rep
- route_emitter
- ssh_proxy
- stager
- tps
- vizzini
- Properties:
- Properties for BBS server job:
- Changed
diego.*.bbs.api_url
todiego.*.bbs.api_location
for all jobs using the old property. - Removed etcd communication properties from Diego core jobs:
- Properties:
diego.*.etcd.machines
diego.*.etcd.ca_cert
diego.*.etcd.client_cert
diego.*.etcd.client_key
diego.*.etcd.require_ssl
- Jobs:
- auctioneer
- converger
- receptor
- rep
- Properties:
Diego 0.1430.0
Version 0.1430.0 of Diego is recommended for use with CF v218.
Changes from 0.1428.0 to 0.1430.0
- Depends on garden-linux-release v0.305.0.
Configuration notes
- If upgrading from 0.1428.0 to 0.1430.0, we recommend you to deploy 0.1428.0 with the
diego.bbs.serialization_format
BOSH property set toproto
. 0.1430.0 contains a BBS migration that encodes all the data in etcd as protobufs, which the first BBS server that receives the update will run. Setting this property toproto
in advance guarantees that the other BBS servers will not accidentally write JSON-encoded records back into etcd before they also update to 0.1430.0. Note that unfortunately this property is not configurable via the manifest-generation templates in 0.1428.0, but it can be added directly to the properties section of the BOSH manifest.
Significant changes
- cloudfoundry-incubator/diego-release #72: ./scripts/update should fail fast when permission denied
- All CC-Bridge communication should happen directly with the BBS
- All Route-Emitter communication should happen directly with the BBS
- All SSH-Proxy communication should happen directly with the BBS
- All access to the BBS should go through one, master-elected, BBS server
- BBS server should emit metrics, remove the metrics server
- After a BOSH deploy, all data in the BBS should be stored in base64 encoded protobuf format
- If the Rep repeatedly fails to mark its ActualLRPs as EVACUATING it should fail to drain and the BOSH deploy should abort.
- Bump up the converger http timeout to one minute
- Never log environment variables and commands/arguments
- BBS Client should retry requests that fail because the BBS is migrating/lost the lock
- update cflinuxfs2 rootfs to 1.8.0+
- cloudfoundry-incubator/diego-ssh #5: Add -skipCertVerify to ssh-proxy
- The windows app lifecycle bundle should include a dummy diego-sshd executable
- provide user with helpful error message when they push a non-valid app
BOSH job changes
- Remove
runtime_metrics_server
job: the BBS server now emits Task and LRP metrics during convergence, and periodically emits etcd metrics.
BOSH property changes
- Add
diego.nsync.bbs.api_url
: Address for the Nsync processes to contact the BBS server. - Add
diego.route_emitter.bbs.api_url
: Address for the Route-Emitter to contact the BBS server. - Add
diego.ssh_proxy.bbs.api_url
: Address for the SSH-Proxy to contact the BBS server. - Add
diego.ssh_proxy.diego_credentials
: Credentials to be used with the Diego authentication method. - Add
diego.tps.bbs.api_url
: Address for the TPS processes to contact the BBS server. - Remove
diego.bbs.serialization_format
. - Remove
diego.nsync.diego_api_url
. - Remove
diego.route_emitter.diego_api_url
. - Remove
diego.ssh_proxy.diego_api_url
. - Remove
diego.tps.diego_api_url
.
Diego 0.1429.0
Create final release 0.1429.0
Diego 0.1428.0
Version 0.1428.0 of Diego is recommended for use with CF v217.
Changes from 0.1412.0 to 0.1428.0
- garden-linux-release
is no longer a submodule of diego-release, therefore is not bundled with
Diego and needs to be uploaded to BOSH separately. - Depends on garden-linux-release v0.303.0.
Breaking changes
- As a Diego operator, I can colocate all the diego-release jobs onto a single
VM: This changes the
default ports of some of the Diego components, meaning that they will no
longer be reachable on the previous ports. Specifically, the default port for
the receptor component changed from 8888 to 8887. - Refactor error handling in the
BBS: This changes how
errors are serialized internally and may cause inter-component communication to
fail unexpectedly during a deploy. No major outage is predicted
for this change. - As a Diego operator, I can run the static file-server and the cc-uploader
server on separate VMs:
This introduces a new component, cc-uploader, split out of the file-server.
This change may cause staging failures during the deploy until the at least one of
the cc-uploaders is up and running. - Make BBS API consistently
RPC-ful: This is a
re-organization of the internal BBS API, changing every endpoint from a
REST-like interface to a RPC-like interface. During the deploy, it will cause
requests to fail, and may cause LRPs to lose routability temporarily.
Other significant changes
- Consul agents and servers communicate securely to the consul
cluster - DATs Java buildpack test should not specify the buildpack
explicitly - Update cflinuxfs2 stack to
1.4.0+ - The BBS server handles calling the task completion callback
URL - Remove extra disk allocation for staging
tasks - Diego should provide the appropriate instance environment variables to its
containers - As a CF app developer, when staging an app on Diego fails, I would like to
see a specific error type depending on the type of
failure - Fix instance env var
names - As a Diego operator, I can configure garden's
allow_host_access
flag
through the manifest-generation
templates - Update cflinuxfs2 rootfs to
1.5.0+ - clean up and integration test the Task
BBS - Clean up Task BBS code from
runtime-schema - Rename 'internal' packages to be compatible with Go 1.5 package
conventions - As a Diego operator, I want always to allow 'root' actions on the
cells - DiskMB limit should apply to total disk size for an instance with a docker
rootfs - Add test coverage for DesiredLRP event
stream - auctioneer should time out requests to cells
correctly - Executor GardenStore RemainingResources and TotalResources can now
fail - All Stager communication should happen directly with the
BBS - Only detect buildpack on java
DAT - Diego core components evacuate ActualLRPs through the BBS API
server - Diego core components create and update DesiredLRPs through the BBS API
server - Diego core components converge LRPs through the BBS API
server - Diego core components desire a task through the BBS API
server - Diego core components take a task through its lifecycle through the BBS API
server - Diego core components trigger task convergence through the BBS API
server - As a CF developer, I expect my app to have CC's global running environment
variables in its environment when running on
Diego - The BBS server serializes different versions of Task data based on BOSH-deployed config
- As a CF user, I would like my cached docker images to start regardless of
the IP addresses of the docker registry
nodes - All BBS serialization can be encoded based on BOSH-deployed config
- As a Diego operator or developer, I would like Diego to consume
garden-linux-release
BOSH job changes
- Add
cc_uploader
job: contains cc-upload-brokering handlers formerly present in the file-server. - Add
rootfses
job: unpackages the cflinuxfs2 rootfs. - Remove
garden-linux
job.
BOSH property changes
- Move
diego.file_server.cc.*
todiego.cc_uploader.cc.*
- This includes:
diego.cc_uploader.cc.base_url
,
diego.cc_uploader.cc.basic_auth_password
,
diego.cc_uploader.cc.job_polling_interval_in_seconds
,
diego.cc_uploader.cc.staging_upload_user
and
diego.cc_uploader.cc.staging_upload_password
. - Also keeps
diego.file_server.log_level
anddiego.cc_uploader.log_level
available.
- This includes:
- Add
diego.cc_uploader.address
: Address on which cc-uploader handles requests. - Add
diego.cc_uploader.debug_addr
: Address for cc-uploader debug server. - Add
diego.cc_uploader.cc.external_port
: CC Port for cc-uploader. - Add
diego.rep.evacuation_timeout_in_seconds
: The time to wait for evacuation to complete in seconds. - Add
diego.bbs.serialization_format
: Default format for BBS records. - Add
diego.converger.bbs.api_url
: Address for the converger to contact the BBS server. - Add
diego.stager.bbs.api_url
: Address for the stager to contact the BBS server. - Add
diego.stager.cc_uploader_url
: Address for the stager to contact the cc-uploader. - Add
diego.stager.docker_registry_address
: Address for stager to contact the caching docker registry. - Remove
diego.auctioneer.receptor_task_handler_url
. - Remove
diego.converger.receptor_task_handler_url
. - Remove
diego.rep.receptor_task_handler_url
. - Remove
diego.stager.diego_api_url
. - Remove
diego.executor.allow_privileged
: Executor now always allows privileged actions (those running as 'root'). - Remove
diego.garden-linux
in favor ofgarden
:diego.garden-linux.listen_network
=>garden.listen_network
diego.garden-linux.listen_address
=>garden.listen_address
diego.garden-linux.allow_networks
=>garden.allow_networks
diego.garden-linux.insecure_docker_registry_list
=>garden.insecure_docker_registry_list
diego.garden-linux.mtu
=>garden.network_mtu
- Add
garden.deny_networks
: List of CIDR blocks to which containers will be denied access. - A full list of the garden-linux-release properties can be found here
Diego 0.1427.1
Create final release 0.1427.1
Diego 0.1427.0
Create final release 0.1427.0
Diego 0.1426.0
Create final release 0.1426.0
Diego 0.1425.0
Create final release 0.1425.0
Diego 0.1424.1
Create final release 0.1424.1
Diego 0.1412.0
Version 0.1412.0 of Diego is recommended for use with CF v215.
Changes from 0.1398.0 to 0.1412.0
Known issues
- garden-linux-release v0.292.0 has a goroutine/memory leak associated to container creation. This leak was fixed in Garden story #100896804, which is included in v0.293.0 and later, and will be used by the Diego version to be recommended for use with CF v216. We recommend that operators of any long-term deployments of this version of Diego monitor the goroutine counts of the garden-linux processes and restart them safely if needed.
Breaking changes
- The BBS API server provides handlers for starting LRP auctions and stopping LRP instances: This changes the internal API endpoints on the auctioneer that handle requests for Task and LRP auctions. As long as the active auctioneer and converger processes are from the same release during a rolling update of a Diego cluster, units of work will eventually get assigned to cells. This configuration should happen naturally during a rolling update of a 2-AZ deployment with 1 brain VM per zone.
Other significant changes
- For instances based on preloaded rootfses, Diego should apply disk limits to and report disk usage from only that container's read/write layer
- As a CF developer, I expect my app to have CC's global staging environment variables in its environment when staging on Diego
- As a CF developer, I expect my app to have CC's global running environment variables in its environment when running on Diego
- As a Diego operator, when I run the DATs in no-internet mode, I expect to run the language-specific buildpack tests
- Document setup required to run SSH DATs in DATs README
- Diego core components take an instance ActualLRP through its lifecycle through the BBS API server
- Diego core components read tasks through the BBS API server
- Bump to cflinuxfs2 1.3.0
BOSH property changes
- Add
diego.bbs.auctioneer.api_url
: Address for BBS server to connect to the auctioneer.