Introduction

Deployment Manager, i.e. Scheduling Abstraction Layer (SAL) is an abstraction layer initially developed as part of the EU project Morphemic by Activeeon. Its development continued through the NebulOuS EU project. SAL aims to enhance the usability of Execution Adapter, i.e. ProActive Scheduler & Resource Manager, by providing abstraction, making it easier for users to interact with the scheduler and take advantage of its features. Seamlessly supporting REST calls and direct communication with the Execution Adapter SAL empowers users to harness the scheduler's capabilities.

SAL Code repository and documentation can be found here. Information on how to obtain the ProActive licence with installation instructions can be found here

Report Issue

In a case of issue, please create the bug error report here. When reporting issue, for faster resolution of your problem, please include:

the description of the scenario e.g. NebulOuS sequence diagrams which were executed
date and time, SAL and ProActive environment
SAL logs (especially ones inside of container)
ProActive logs i.e. connector-iaas.log
ProActive job id (in a case of error during the ProActive workflow execution)
detailed description of the action during which it happened

NebulOuS Development

Note that there is additional documentation for NebulOuS development is provided here. For preset NebulOuS environment for testing and development, you can find more information on how to access SAL here, and regarding ProActive here

NebulOus scenario

This section describes how the Deployment Manager and Execution Adapter support the NebulOuS scenario. It outlines the sequence of SAL operations provided to facilitate NebulOuS deployment and execution. For more information on using SAL endpoints, refer to the SAL Endpoint Documentation.

Developers can utilize the provided Postman collection to get started with the endpoints or consult the previous documentation for further details on the testing scenario.

1. Prerequisites

To use SAL, you must have the Execution Adapter (ProActive) installed and properly configured. In the configuration script, it is necessary only to set

<PROACTIVE_URL>
<USERNAME>
<PASSWORD>

The rest of the configuration is automatically handled by NebulOuS (see NebulOuS SAL deployment for more details).

For additional information on setting up the SAL Kubernetes deployment script, refer to this guide. You can find details on using the endpoints here.

1.1. Connect endpoint - Establishing the connection to ProActive server.

SAL must be connected to ProActive to use any of the endpoints. If you encounter an HTTP 500 when calling endpoints, which reports a NotConnectedException, it indicates that SAL is not connected to ProActive. You can verify this in the SAL logs (particularly those within the container). Keep in mind that the connection to ProActive may be lost during scenario execution and may need to be reestablished.

2. Cloud registration

2.1. Add cloud endpoint - Defining a cloud infrastructure.

To use this endpoint, you must specify a unique cloud_name that has not already been registered. Note that after a SAL restart, cloud information is erased from the SAL database, though it remains in the Execution Adapter. If you use a cloud_name that has already been registered, the infrastructure will not be updated with new information, and resources on the cloud provider may not be properly released. The only proper way to remove cloud resources is by using the Cloud deregistration endpoint.

For more information on setting up cloud providers for NebulOuS, refer to the Managing Cloud Providers documentation.

Additionally, while the infrastructure may appear registered, this does not guarantee the correctness of the configured cloud infrastructure. Once registration is complete, an asynchronous process begins to retrieve images and node candidates, and provided authentication can be validated if it is correctly configured (see how isAnyAsyncNodeCandidatesProcessesInProgress and GetCloudImages endpoints can be used for validation). Note that SSH credentials are only utilized during Cluster Deployment.

Finally, keep in mind that the cloud should be properly deregistered using Remove Clouds endpoint, so that the used nodes are undeployed from cloud provider, and there are no 'hanging' clouds left inside of the ProActive server which can cause unexpected behaviour especially in a case the same cloud name is used.

2.2. isAnyAsyncNodeCandidatesProcessesInProgress endpoint - Checking for ongoing asynchronous processes for retrieving cloud images or node candidates.

You should wait until this process returns false, indicating that the retrieval of cloud images and node candidates from the cloud provider is complete.

2.3. GetCloudImages endpoint - Retrieving cloud images.

This endpoint can be used to verify that the cloud images and authentication settings are correct. If there is a problem with authentication, the endpoint will return an error. For issues related to incorrect credentials or insufficient permissions, consult the Execution Adapter logs. If an image retrieval problem occurs, the image will not be returned by this endpoint.

3. Edge device registration

3.1. RegisterNewEdgeNode endpoint - Registering a New Edge Device.

This endpoint is used to register a new edge device. Upon successful registration, it returns the defined edge node structure, the unique edge device ID, and the node candidate ID representing this device.

Note that during this process, the device is only registered with its associated information, while validation occurs during the actual Cluster Deployment, which uses the registered edge node. To fully deregister an edge device, you must use the Edge Deregistration endpoint, which ensures proper removal from the system.

3.2. GetEdgeNodes endpoint - Retrieving All Registered Edge Devices.

This endpoint retrieves all registered edge devices, providing all information initially returned during the device registration process.

4. Filtering of node candidates

4.1. findNodeCandidates endpoint - Filtering Node Candidates Based on Deployment Requirements.

This endpoint allows you to filter node candidates using various criteria to select suitable nodes for deployment. Specify the required conditions for master or worker nodes within the cluster and store the retrieved node candidate IDs for future use.

In NebulOuS, there are only two node types:IAAS for the cloud nodes, and EDGE for nodes representing edge devices.

Example of Searching for Node Candidates in an OpenStack Cloud:

Node Type: IAAS (cloud node)
Cloud ID: Matches a specific cloud (use {{cloud_name}} to reference)
Operating System: Ubuntu, version 22
Region: bgo
Hardware Specifications: 8GB RAM and 4 CPU cores

[
    {
        "type": "NodeTypeRequirement",
        "nodeTypes": ["IAAS"]
    },
    {
        "type": "AttributeRequirement",
        "requirementClass": "cloud",
        "requirementAttribute": "id",
        "requirementOperator": "EQ",
        "value": "{{cloud_name}}"
    },
    {
        "type": "AttributeRequirement",
        "requirementClass": "image",
        "requirementAttribute": "operatingSystem.family",
        "requirementOperator": "IN",
        "value": "UBUNTU"
    },
    {
        "type": "AttributeRequirement",
        "requirementClass": "image",
        "requirementAttribute": "name",
        "requirementOperator": "INC",
        "value": "22"
    },
    {
        "type": "AttributeRequirement",
        "requirementClass": "location",
        "requirementAttribute": "name",
        "requirementOperator": "EQ",
        "value": "bgo"
    },
    {
        "type": "AttributeRequirement",
        "requirementClass": "hardware",
        "requirementAttribute": "ram",
        "requirementOperator": "EQ",
        "value": "8192"
    },
    {
        "type": "AttributeRequirement",
        "requirementClass": "hardware",
        "requirementAttribute": "cores",
        "requirementOperator": "EQ",
        "value": "4"
    }
]

Example of Searching for a Node Candidate Representing an EDGE Device:

[
    {
        "type": "NodeTypeRequirement",
        "nodeTypes": ["EDGE"]
    }
]

Note that for the EDGE devices, their node candidate ID is returned during registration. In a case you target a specific edge device it is to store it during the registration process, or to introduce the unique identifier into device name which can be search then using attribute requirement name in hardware class.

4.2. getLengthOfNodeCandidates endpoint - Returns total number of existing node candidates.

5. Cluster deployment

5.1. DefineCluster endpoint - Defining Kubernetes cluster.

This endpoint is used to define and configure Kubernetes cluster deployments. When setting up a Kubernetes cluster using this endpoint, scripts maintained by NebulOuS developers streamline the deployment process by installing essential software components within the cluster. These scripts and other parts of the deployment workflow can be debugged and tested using ProActive workflows, enabling seamless integration and troubleshooting.

The script templates provided by SAL offer predefined structures for deployment, allowing for efficient configuration. Ensure that any required environmental variables and their values are specified in the cluster definition; these variables are maintained by the owner of the component that uses them for NebulOuS development purposes.

5.2. DeployCluster endpoint - Deploying a Kubernetes Cluster.

This endpoint initializes the cluster deployment process. Once started, you can monitor the progress of the deployment.

If the deployment fails (i.e., the SAL does not return true), consult the SAL logs (especially ones inside of container) and ProActive logs i.e. connector-iaas.log.

Note that deployment failures can occur due to various factors. To ensure a successful deployment and execution, confirm that selected cloud and edge nodes are available. Additionally, the information regarding SSH credentials and Execution Adapter scripts used for edge devices during Cloud Registration or Edge Device Registration is validated only at the time of deployment execution.

If the deployment succeeds and returns true, you can track the ongoing progress and troubleshoot any issues using the Execution Adapter interface. Monitoring tools include:

The ProActive dashboard for an overview of the entire deployment,
The ProActive Scheduler for details on individual task execution,
The ProActive Resource Manager to monitor resource utilization.

5.3. GetCluster endpoint - Retrieving Cluster Deployment Status.

This endpoint provides detailed information on the current status of the Kubernetes cluster deployment.

5.4. DeleteCluster endpoint - Deleting a Cluster and Undeploying Resources.

This endpoint enables the deletion of an existing Kubernetes cluster deployment. It removes all associated resources, including worker nodes and applications, effectively undeploying the cluster. Use this endpoint to fully dismantle a cluster and free up resources once the deployment is no longer needed.

6. Application management

6.1. ManageApplication endpoint - Managing application deployment.

This endpoint is used to deploy and manage applications within a specified Kubernetes cluster. It supports both the initial deployment of applications and the reconfiguration of application replicas, allowing you to adjust the number of replicas as needed for scaling and performance optimization. Setting the number of replicas to 0 will effectively undeploy the application.

Additionally, the status of an execution for deployed application can be checked using the getJobState endpoint.

7. Cluster reconfiguration

7.1. ScaleOut endpoint - Scaling out the Cluster.

This endpoint enables dynamic expansion of the Kubernetes cluster by adding additional worker nodes as needed. Use this endpoint to increase the cluster's processing capacity and accommodate higher workloads by scaling out with new resources.

7.2. ScaleIn endpoint - Scaling In the Cluster

This endpoint allows you to scale in the Kubernetes cluster by removing specified worker nodes. Use this endpoint to decrease the cluster's size, optimize resource usage, and reduce operational costs by deallocating unneeded nodes.

7.3. LabelNode endpoint - Managing Node Labels

This endpoint allows you to manage node labels within a Kubernetes cluster, enabling you to add, modify, or remove labels on specific nodes. Use this feature to organize and categorize nodes effectively, which can aid in scheduling, resource management, and targeting specific nodes for workloads.

Additionally, the status of an execution for node labeling can be checked using the getJobState endpoint.

Scaling Out the application

To scale out an application, follow these steps:

Add New Worker Nodes: First, use the ScaleOut endpoint to add additional worker nodes to the existing Kubernetes cluster.
Label the New Worker Nodes: Once the new worker nodes are successfully deployed within the cluster, apply appropriate labels to them using the LabelNode endpoint. Proper labeling is essential for organizing and targeting nodes for specific workloads.
Increase Application Replicas: Finally, to complete the scale-out process, adjust the number of application replicas by calling the ManageApplication endpoint. This will ensure the application takes advantage of the newly added worker nodes.

Scaling In the application

To scale in an application, follow these steps:

Label the Nodes for Removal: First, use the LabelNode endpoint to mark specific worker nodes as unavailable for new application replicas. This ensures that no new replicas are assigned to these nodes during the scaling process.
Adjust Application Replicas: Next, call the ManageApplication endpoint with a reduced number of replicas to gradually remove the application from the marked nodes.
Remove Worker Nodes: Finally, once the application replicas have been removed from the designated nodes, use the ScaleIn endpoint to remove the worker nodes from the cluster, optimizing resource usage and reducing operational costs.

8. Edge device deregistration

8.1. DeleteEdgeNode endpoint

This endpoint is used to deregister edge device using its ID which is returned during (registration process)[https://github.com/eu-nebulous/sal/#3-edge-device-registration] and can be retrived by using GetEdgeNodes endpoint.

9. Cloud deregistration

9.1. RemoveClouds endpoint

This endpoint allows you to deregister one or more cloud infrastructures and undeploy its nodes from the cloud provider.

10. SAL Persistence (for developers and project mentors)

SAL supports the clean operations for clusters, clouds, edge devices and the SAL database. These are to be used for maintenance purposes and by NebulOuS developers to assure that all the resources were undeployed and removed properly, not just from SAL, but as well from the ProActive server and cloud providers.

10.1. CleanAll endpoint

10.2. CleanAll Clusters endpoint

10.3. CleanAll Clouds endpoint

10.4. CleanAll Edges endpoint

10.5. Clean SAL Database endpoint

Restarting the SAL Database

To support data persistence, the SAL database is loaded as a Persistent Volume. A PVC restart is required in the following cases:

When the database schema changes due to a new SAL update.
When an issue occurs (e.g., SQL exception or Hibernate error) that cannot be resolved using the persistence endpoints.

Step 1: Delete the Persistent Volume Claim (PVC)

Run the following command to delete the existing PVC:

kubectl delete pvc nebulous-sal-mariadb-pvc -n <nebulous-env>

Step 2: Verify PVC Deletion

Check if the PVC has been successfully deleted:

kubectl get pvc -n <nebulous-env>

Step 3: Manually Remove Finalizers (If Stuck in Terminating State)

If the PVC remains in a Terminating state, manually edit and remove the finalizer:

kubectl edit pvc nebulous-sal-mariadb-pvc -n <nebulous-env>

Find the following section and delete it:

finalizers:
  - kubernetes.io/pvc-protection

Then, save and exit the editor.

Step 4: Restart the SAL Deployment

Once the PVC is deleted, restart the SAL deployment to reinitialize the database.

Step 5: Confirm PVC Recreation

Check if the PVC has been successfully recreated and bound:

kubectl get pvc -n <nebulous-env>

NebulOuS SAL Deployment (managed by 7Bulls)

NebulOuS SAL is deployed with a chart managed at https://github.com/eu-nebulous/helm-charts/tree/main/charts/nebulous-sal In case there is a change requested create PR.

Values in the helm chart can be overwritten in the NREC deployment definition for different environments:

cd environment: https://github.com/eu-nebulous/nrec-flux-config/blob/main/clusters/primary/nebulous-cd/helm-releases/specific-patches/nebulous-sal.yaml

prod environment: https://github.com/eu-nebulous/nrec-flux-config/blob/main/clusters/primary/nebulous-prod/helm-releases/specific-patches/nebulous-sal.yaml

test environment: https://github.com/eu-nebulous/nrec-flux-config/blob/main/clusters/primary/nebulous-test/helm-releases/specific-patches/nebulous-sal.yaml

dev environment: https://github.com/eu-NebulOuS/nrec-flux-config/blob/main/clusters/primary/NebulOuS-dev/helm-releases/specific-patches/NebulOuS-sal.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
automated-tests		automated-tests
resources		resources
.gitignore		.gitignore
.yamllint		.yamllint
LICENSE		LICENSE
README.md		README.md
noxfile.py		noxfile.py

License

eu-nebulous/sal

Folders and files

Latest commit

History

Repository files navigation

Table of Contents