Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

failed while deploying on AWS #9

Open
moyun opened this issue Sep 17, 2015 · 10 comments
Open

failed while deploying on AWS #9

moyun opened this issue Sep 17, 2015 · 10 comments

Comments

@moyun
Copy link

moyun commented Sep 17, 2015

I installed newest version BOSH ( Version 1.3072.0 (00000000)) on AWS.
I got the following error messages when executing "bosh -n deploy":

Director task 36
  Started unknown
  Started unknown > Binding deployment. Done (00:00:00)

  Started preparing deployment
  Started preparing deployment > Binding releases. Done (00:00:00)
  Started preparing deployment > Binding existing deployment. Done (00:00:00)
  Started preparing deployment > Binding resource pools. Done (00:00:00)
  Started preparing deployment > Binding stemcells. Done (00:00:00)
  Started preparing deployment > Binding templates. Done (00:00:00)
  Started preparing deployment > Binding properties. Done (00:00:00)
  Started preparing deployment > Binding unallocated VMs. Done (00:00:00)
  Started preparing deployment > Binding instance networks. Done (00:00:00)

  Started preparing package compilation > Finding packages to compile. Done (00:00:00)

  Started compiling packages
  Started compiling packages > scala/c8d47f38692fcccedb90531059129e7404be3a62
  Started compiling packages > ruby/37f6db3eb29cd5b2288144c0539321f362797f34
  Started compiling packages > node/a763d1461eec1e18719969d07ce2aae9b58c7f19
   Failed compiling packages > ruby/37f6db3eb29cd5b2288144c0539321f362797f34: Unknown CPI error 'Unknown' with message 'Value () for parameter groupId is invalid. The value cannot be empty' (00:00:16)
   Failed compiling packages > node/a763d1461eec1e18719969d07ce2aae9b58c7f19: Unknown CPI error 'Unknown' with message 'Value () for parameter groupId is invalid. The value cannot be empty' (00:00:17)
   Failed compiling packages > scala/c8d47f38692fcccedb90531059129e7404be3a62: Unknown CPI error 'Unknown' with message 'Value () for parameter groupId is invalid. The value cannot be empty' (00:00:17)

Error 100: Unknown CPI error 'Unknown' with message 'Value () for parameter groupId is invalid. The value cannot be empty'
@moyun
Copy link
Author

moyun commented Sep 17, 2015

I am a new user of BOSH and I am not sure "parameter groupId is invalid" means which parameter is wrong.
I just modified the template deployment manifest: mesos-aws.yml.
And I have set the correct security group ID of the subnet.

The related debug message is this:

 DEBUG -- DirectorJobRunner: SENT: hm.director.alert {"id":"39b0783d-58eb-4bdd-9588-
xxxxx","severity":3,"title":"director - error during update deployment","summary":"Error during update 
deployment for 'mesos' against Director '847328c8-6b9c-4ed6-be23-xxxx':  
# <Bosh::Clouds::ExternalCpi::UnknownError: Unknown CPI error 'Unknown' with message 'Value () for 
parameter groupId is invalid. The value cannot be empty'>","created_at":1442492174}

@frodenas
Copy link
Contributor

Can you please paste your sanitized deployment manifest?

@moyun
Copy link
Author

moyun commented Sep 18, 2015

Here it is:

<%
director_uuid = '847328c8-6b9c-4ed6-be23-8fb1a997e175'
deployment_name = 'mesos'
num_zookeepers = 3 # Odd number
num_masters = 3 # Odd number
num_marathons = 1
num_chronos = 1
num_jenkins = 1
num_storm = 1
num_slaves = 3
%>
---
name: <%= deployment_name %>
director_uuid: <%= director_uuid %>

releases:
 - name: mesos
   version: latest

compilation:
  workers: 3
  network: default
  reuse_compilation_vms: true
  cloud_properties:
    instance_type: m3.xlarge

update:
  canaries: 0
  canary_watch_time: 30000-60000
  update_watch_time: 30000-60000
  max_in_flight: 32
  serial: false

networks:
  - name: default
    type: dynamic
    cloud_properties:
     # subnet: subnet-3c999c48
      security_groups:
        - mygroup
        - <%= deployment_name %>

resource_pools:
  - name: default
    network: default
    stemcell:
      name: bosh-aws-xen-ubuntu-trusty-go_agent
      version: latest
    cloud_properties:
      instance_type: m3.medium

  - name: slave
    network: default
    stemcell:
      name: bosh-aws-xen-ubuntu-trusty-go_agent
      version: latest
    cloud_properties:
      instance_type: m3.xlarge

jobs:
  - name: zookeeper
    templates:
      - name: zookeeper
    instances: <%= num_zookeepers %>
    resource_pool: default
    persistent_disk: 10240
    networks:
      - name: default
        default: [dns, gateway]

  - name: mesos-master
    templates:
      - name: mesos-master
    instances: <%= num_masters %>
    resource_pool: default
    networks:
      - name: default
        default: [dns, gateway]

  - name: marathon
    templates:
      - name: marathon
    instances: <%= num_marathons %>
    resource_pool: default
    networks:
      - name: default
        default: [dns, gateway]

  - name: chronos
    templates:
      - name: chronos
    instances: <%= num_chronos %>
    resource_pool: default
    networks:
      - name: default
        default: [dns, gateway]

  - name: jenkins
    templates:
      - name: jenkins
    instances: <%= num_jenkins %>
    resource_pool: default
    persistent_disk: 20480
    networks:
      - name: default
        default: [dns, gateway]

  - name: storm
    templates:
      - name: storm
    instances: <%= num_storm %>
    resource_pool: default
    networks:
      - name: default
        default: [dns, gateway]

  - name: mesos-slave
    templates:
      - name: mesos-slave
    instances: <%= num_slaves %>
    resource_pool: slave
    persistent_disk: 65536
    networks:
      - name: default
        default: [dns, gateway]

properties:
  mesos:
    principal: "principal"
    secret: "secret"
    master:
      quorum: <%= (num_masters/2) + 1 %>
      authenticate_frameworks: true
      authenticate_slaves: true

  zookeeper:
    servers:
      <% num_zookeepers.times do |i| %>
      <%= "- #{i}.zookeeper.default.#{deployment_name}.microbosh\n" %>
      <% end %>

@moyun
Copy link
Author

moyun commented Sep 18, 2015

Basically, I didn't change much.Mainly in the network section:

networks:
  - name: default
    type: dynamic
    cloud_properties:
     # subnet: subnet-3c999c48
      security_groups:
        - mygroup
        - <%= deployment_name %>

The name of my own security group is "mygroup".
If I set the subnet property, then the compilation will timeout, but I found that 3 EC2 instances were running on AWS.

@frodenas
Copy link
Contributor

Do you have also a mesos security group? The <%= deployment_name %> at your deployment manifest is going to be replaced with the name of the deployment (mesos), so if you don't have it, AWS might complain about the groupId.

If you don't want to create the mesos security group, just remove the <%= deployment_name %> line.

@moyun
Copy link
Author

moyun commented Sep 19, 2015

Yes I have the "mesos" security group, that's why I feel very strange.

@frodenas
Copy link
Contributor

Do those security groups belong to a VPC? If this is the case, you'll need to set the subnet id. If the compilation timeout, it because there's something wrong with the bosh security group. Check this guide about what ports are needed.

@moyun
Copy link
Author

moyun commented Sep 30, 2015

Yes they belong to a VPC.
I have allowed all traffic in the security group mesos.
But I still get a timeout:

Director task 9
  Started unknown
  Started unknown > Binding deployment. Done (00:00:00)

  Started preparing deployment
  Started preparing deployment > Binding releases. Done (00:00:00)
  Started preparing deployment > Binding existing deployment. Done (00:00:00)
  Started preparing deployment > Binding resource pools. Done (00:00:00)
  Started preparing deployment > Binding stemcells. Done (00:00:00)
  Started preparing deployment > Binding templates. Done (00:00:00)
  Started preparing deployment > Binding properties. Done (00:00:00)
  Started preparing deployment > Binding unallocated VMs. Done (00:00:00)
  Started preparing deployment > Binding instance networks. Done (00:00:00)

  Started preparing package compilation > Finding packages to compile. Done (00:00:00)

  Started compiling packages
  Started compiling packages > scala/c8d47f38692fcccedb90531059129e7404be3a62
  Started compiling packages > ruby/37f6db3eb29cd5b2288144c0539321f362797f34
  Started compiling packages > node/a763d1461eec1e18719969d07ce2aae9b58c7f19
   Failed compiling packages > scala/c8d47f38692fcccedb90531059129e7404be3a62: Timed out pinging to 11dc975e-d6e2-4c6f-ba58-3319f7d8bdb0 after 600 seconds (00:11:44)
   Failed compiling packages > node/a763d1461eec1e18719969d07ce2aae9b58c7f19: Timed out pinging to 6aa66904-068e-4273-8699-1a938d155d03 after 600 seconds (00:11:44)
   Failed compiling packages > ruby/37f6db3eb29cd5b2288144c0539321f362797f34: Timed out pinging to d9a50429-b641-4e80-a92f-a1958c2025cd after 600 seconds (00:11:48)

Error 450002: Timed out pinging to 11dc975e-d6e2-4c6f-ba58-3319f7d8bdb0 after 600 seconds

And I checked vms:

+-----------------+--------------------+---------------+-----+
| Job/index       | State              | Resource Pool | IPs |
+-----------------+--------------------+---------------+-----+
| unknown/unknown | unresponsive agent |               |     |
| unknown/unknown | unresponsive agent |               |     |
| unknown/unknown | unresponsive agent |               |     |
+-----------------+--------------------+---------------+-----+

It seems the agents are not responsive and the information of IPs could not be fetched. But actually, I can see all the vms are healthy on the EC2.

@frodenas
Copy link
Contributor

frodenas commented Oct 5, 2015

The time outs are usually related to wrong security groups. VM's need to access the bosh director VM to get some metadata. There're several ports involved, but if the new VMs have also the bosh security group and this sg has all TCP/UDP ports opened from within the same security group, then it must work,

@yohanwadia88
Copy link

We are facing the same error:

Director task 75
Started preparing deployment > Preparing deployment. Done (00:00:01)

Started preparing package compilation > Finding packages to compile. Done (00:00:00)

Started compiling packages
Started compiling packages > rtr/2d7de4f6fc25938c21c5be87174f95583feb14b5
Started compiling packages > cli/6fb52c578aad523ba3c78bc350313d4aa4db7da9
Started compiling packages > rootfs_cflinuxfs2/a7925f16851fa1cfe6b05f8353faf35d1e268ff0
Started compiling packages > buildpack_binary/e0c8736b073d83c2459519851b5736c288311d92
Started compiling packages > buildpack_staticfile/ac1a56c13b8e90a3bf466db7fc1ac03c71107648
Started compiling packages > buildpack_php/18abb2e373f0c8b98d395f815d3e61c4b4928725
Failed compiling packages > buildpack_staticfile/ac1a56c13b8e90a3bf466db7fc1ac03c71107648: Timed out pinging to 9e960a3c-5fee-4586-9c68-d5c19b014710 after 600 seconds (00:11:42)
Failed compiling packages > buildpack_php/18abb2e373f0c8b98d395f815d3e61c4b4928725: Timed out pinging to 06c424b8-3bac-48c7-a293-b012b2fb7fcd after 600 seconds (00:11:43)
Failed compiling packages > rootfs_cflinuxfs2/a7925f16851fa1cfe6b05f8353faf35d1e268ff0: Timed out pinging to abed38f4-b2d4-4092-947e-7316b467d477 after 600 seconds (00:11:43)
Failed compiling packages > cli/6fb52c578aad523ba3c78bc350313d4aa4db7da9: Timed out pinging to 1bb6a992-c999-4f8b-ab85-1ff68aa18943 after 600 seconds (00:11:44)
Failed compiling packages > buildpack_binary/e0c8736b073d83c2459519851b5736c288311d92: Timed out pinging to 2a03ba12-a91a-4f7f-9994-5f2e97bbe608 after 600 seconds (00:11:44)
Failed compiling packages > rtr/2d7de4f6fc25938c21c5be87174f95583feb14b5: Timed out pinging to 7f72936b-b36b-44ba-b157-c0926345b971 after 600 seconds (00:11:44)

Error 450002: Timed out pinging to 9e960a3c-5fee-4586-9c68-d5c19b014710 after 600 seconds


We have opened all the ports on security group, but the issue still persists.
Any resolution on the same.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants