Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ERROR: more units requested" when --hyperconverged is used for OpenStack #293

Open
marino-mrc opened this issue Jan 9, 2025 · 4 comments · May be fixed by #298
Open

"ERROR: more units requested" when --hyperconverged is used for OpenStack #293

marino-mrc opened this issue Jan 9, 2025 · 4 comments · May be fixed by #298

Comments

@marino-mrc
Copy link

Hello,
I'm trying to run the script with the --hyperconverged flag and I'm getting the following error:

$ ./generate-bundle.sh -r ussuri -s focal --ceph --ml2-ovs --default-binding oam  --hyperconverged -n teststs
Juju model 'teststs' already exists and is the current context - skipping create

Created focal-ussuri bundle and overlays:
  + openstack/glance.yaml
  + openstack/keystone.yaml
  + openstack/neutron-gateway.yaml
  + openstack/neutron-openvswitch.yaml
  + openstack/rabbitmq-source.yaml
  + ceph/ceph.yaml
  + openstack/openstack-ceph.yaml
  + openstack/neutron-ml2dns.yaml
  + openstack/neutron-ml2dns-gateway.yaml
  + openstack/neutron-ml2dns-openvswitch.yaml
  + mysql-innodb-cluster.yaml
  + mysql-innodb-cluster-router.yaml
  + openstack/placement.yaml

ERROR: more units requested (3) that machines available (2) (template=mysql.yaml) - hint: add more compute nodes/units

I think the problem is in common/generate_bundle_base when ${num_placement_machines} is calculated.
Actually, we have the following:

num_placement_machines=$((${MOD_PARAMS[__NUM_NEUTRON_GATEWAY_UNITS__]} + ${MOD_PARAMS[__NUM_COMPUTE_UNITS__]}))

But this is conceptually wrong because it doesn't consider the number of controller units (I'm assuming 3 because a MySQL cluster is the default) and miscalculates the minimum number of ceph-osd units (think about the case where the number of compute units is less than 3. We still need at least 3 ceph-osd units).

Assuming that we want to use hyper-convergency for storage and compute, we should have the following pseudo-code:

num_controller_units = 3  # We have 3 machines, and on top of them many LXD containers

# Check if the number of compute units has been defined, and use the appropriate number of num_ceph_osd_units
if num_compute_units < 3
num_ceph_osd_units = 3
else
num_ceph_osd_units = num_compute_units
end

num_placement_machines = num_controller_units + num_ceph_osd_units + num_neutron_gateway_units

Also, I tried to work around the code using a fixed value and I noticed that compute and ceph-osd units are misplaced (LXD containers of the control plane are started on top of compute units)

Regards,
Marco

@dosaboy
Copy link
Member

dosaboy commented Jan 10, 2025

@marino-mrc that message is printed when the bundle being deployed does not have enough physical machines to accommodate 3 units of a charm. This is because the default value of num computes is 1. A simple way to fix this would be to default to 3 computes if the deployments is hyperconverged. What do you think?

dosaboy added a commit to dosaboy/stsstack-bundles that referenced this issue Jan 10, 2025
Hyperconvereged mode needs at least 3 phyiscal machines
to deploy so that applications with min 3 units have
somewhere to go.

Resolves: canonical#293
@dosaboy dosaboy linked a pull request Jan 10, 2025 that will close this issue
@marino-mrc
Copy link
Author

marino-mrc commented Jan 10, 2025

@dosaboy I think your solution doesn't work because even with 3 compute nodes, we still have an insufficient number of units. I mean, the script could work, but incompatible services will be placed on the same units.

Some assumptions should be made for the controller (I'm trying to align with what I think was the initial idea when --hyperconverged is used in the stsstack-bundles project):

  • MySQL is always installed in a cluster, this requires 3 units (which cannot overlap with Ceph-OSD or Compute or neutron-gateway IMO)
  • The number of neutron-gateway units is always 1 by default (and cannot be changed from the CLI)
  • The neutron-gateway unit cannot be installed on the same units of ceph-osd and controllers
  • Compute and ceph-osd will be installed on the same unit by default (and this cannot be changed from the CLI)
  • If the number of compute units is less than 3, we still need 3 units for ceph-osd. Note that the number of compute units can be passed through the CLI

Under the previous assumptions, the minimum number of units is 7 and 3 of them MUST support LXD containers (I'm assuming --hyperconverged and --ml2-ovs):

  • 3 for the control plane (mysql + OpenStack related services)
  • 1 for neutron-gateway
  • 3 for ceph-osds (and eventually for the same number of nova-compute units)

If you agree with my vision, the following pseudo-code works (only if you use --hyperconverged and --ml2-ovs):

#define a static value for the number of controller units
num_controller_units = 3  # We have 3 machines, and on top of them many LXD containers
num_neutron_gateway_units = 1  # We don't need this in the code. It's defined in openstack/pipeline/02configure. This line is just for clarity

# Check if the number of compute units has been defined, and use the appropriate number of num_ceph_osd_units
if num_compute_units < 3
  num_ceph_osd_units = 3
else
  num_ceph_osd_units = num_compute_units
end

num_placement_machines = num_controller_units + num_ceph_osd_units + num_neutron_gateway_units

I tried to set the value to 7 and the script works without any error.
But again, the units are misplaced (for example: I have MySQL units on the same machine where nova-compute is running). This should be fixed later because additional work is needed.

EDIT: The previous pseudo-code should be added in substitution of line 182

@dosaboy
Copy link
Member

dosaboy commented Jan 10, 2025

So, the placement of the units is something that you can change to suite your needs by modifying the unit_placement.yaml file. After you make changes to that file you need to use the --replay option to avoid overwriting your changes. Whenever I do a hyperconverged deployment I have to do this anyway because I have to manually set the machine constraints so that MAAS deploys using the correct machines. Does that not work for you?

@marino-mrc
Copy link
Author

@dosaboy It works for me. Based on your suggestion, the current workaround is to simply pass --num-compute 3 to the CLI:

./generate-bundle.sh -r ussuri -s focal --ceph --ml2-ovs --default-binding oam  --hyperconverged --num-compute 3 -n teststs

Just tested and it works.

This should be summarized by saying that if --hyperconverged is used, --num-compute is forced to 3.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants