Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] Introduce new CRD as StorageNode for better usability #292

Closed
yikuaibro opened this issue Apr 4, 2023 · 5 comments · Fixed by #386
Closed

[DISCUSS] Introduce new CRD as StorageNode for better usability #292

yikuaibro opened this issue Apr 4, 2023 · 5 comments · Fixed by #386
Labels
operator solutions of operator type: discussion

Comments

@yikuaibro
Copy link
Contributor

yikuaibro commented Apr 4, 2023

StorageNode Design

1. Background

At present, the CRD of ShardingSphere Operator can meet the needs of rapid deployment and ShardingSphere PoC verification, but it has insufficient support for scalability, such as the lack of the Cluster concept, and it is impossible to define a ShardingSphere logical Cluster through CRD, such as the initialization of storage nodes and the DistSQL task. It is still necessary to manually intervene in the maintenance of ShardingSphere. In this context, it is hoped to optimize the CRD of ShardingSphere and improve the various capabilities of the Operator in the way of DBRE.

2. Tasks

How to effectively manage different types of database implementations, including data source configuration, registration, physical library management, instance configuration of different topological relationships, database operation and maintenance task assignment, etc.

3. Description

3.1 StorageNode design

Similar to ComputeNode representing the abstraction of a group of computing nodes, StorageNode should also represent the abstraction of a group of storage nodes. According to the current registration logic of ShardingSphere, StorageNode belongs to a certain logic library and expresses the data source at the Schema level. The use of StorageNode needs to correspond to various attributes involved in REGISTER STORAGE UNIT:

# Register storage node
REGISTER STORAGE UNIT ds_1 (HOST="mysql1.default",PORT=3306,DB="ds_1",USER="root",PASSWORD="root"),ds_2 (HOST="mysql2.default",PORT=3306, DB="ds_2", USER="root", PASSWORD="root");

include:

  • HOST: data source address
  • PORT: data source port
  • DB: Data source Schema (physical library)
  • USER: username
  • PASSWORD: Password
  • Properties: additional kv properties, such as maximumPoolSize=10, idleTimeout=30000

Note: There are two preconditions before registering a storage node:

  • The target logic library using this storage node has been registered on ShardingSphere
  • The corresponding physical library has been created on the data source

StorageNode represents a group of storage nodes, and its attributes include not only data sources and schemas, but also storage topology, types, and implementation methods. For example, a single MySQL instance, a MySQL cluster that supports MHA or MGR, AWS RDS Instance, AWS RDS Cluster, and AWS RDS Aurora have different information representation methods.

From the perspective of operation and maintenance, StorageNode can be created and specified manually, or can be automatically created by in-tree or dynamic-provisioner through CRD similar to DatabaseClass. In the same way, it can support manual and automatic operation and maintenance actions such as capacity expansion, configuration change, backup recovery, and offline. ShardingSphere also supports manual circuit breaking and mandatory traffic routing capabilities for designated storage nodes. Automatic creation is preferred here.

Based on the above description, there are the following solutions:

  • For data source configuration class properties, it can be described in StorageNodeSpec, and confidential information needs to be encrypted for storage or referenced to third-party component implementations (KMS, etc.).
  • For the pre-registered logic library, the user needs to manually create it after creating the ComputeNode
  • For the data source physical library, it is automatically created based on the StorageNode created by the user, and registered to the logical library after confirming the availability of the StorageNode
  • For different database implementations:
    • Existing data sources: currently not managed by StorageNode
    • MySQL, PG, etc. on Kubernetes: It is necessary to deploy the corresponding supported operator on the cluster, and generate the corresponding crd through DatabaseClass to complete the creation, and then synchronize the result to StorageNode according to the state of crd
    • RDS on the public cloud, etc.: the in-tree controller is implemented by the Operator, and is automatically created according to the vpc, subnet, security group information declared in the annotations, and the engine, version, storage, etc. configuration declared in the DatabaseClass
  • For different topological relationships:
    • Single instance: instance configuration is generated by Spec, user name, password, physical library
    • Multi-instance cluster: The configuration of each instance is generated by Spec, and the user name, password, and physical library are consistent; Status distinguishes between cluster-level status and instance-level status
  • For the operation and maintenance tasks of the database, according to the task type, it needs to be completed by the corresponding controller:
    • Operation and maintenance of the data source itself: completed by the controller corresponding to the DatabaseClass, such as changing the configuration, etc.
    • DistSQL: completed by the Operator calling the Proxy, such as TrafficRule and CircuitBreak, etc.
    • DistSQLWorkflow: a separate DistSQLWorkflowJob is completed, and will be discussed later

3.1.1 Spec design

type StorageNodeSpec struct {
        // The current version only supports DatabaseClass
        DatabaseClass string
        // Physical library name
        Schema string      
}

3.1.2 Status design

type StorageNodeStatus struct {
        // The generation observed by the storagenode controller.
        ObservedGeneration int64 `json:"observedGeneration,omitempty"`

        // Phase is a brief summary of the StorageNode life cycle
        // There are two possible phase values:
        // Ready: StorageNode can already provide external services
        // NotReady: StorageNode cannot provide external services
        // +optional
        Phase StorageNodePhaseStatus `json:"phase"`

        // Conditions The conditions array, the reason and message fields
        // +optional
        Conditions StorageNodeConditions `json:"conditions"`

        // Cluster contains the current status of the StorageNode cluster,
        // +optional
        Cluster ClusterStatus `json:"cluster,omitempty"`
        
        // Instance contains the current status of the StorageNode instance
        // +optional
        Instances []InstanceStatus `json:"instances,omitempty"`
}

type StorageNodePhaseStatus string
const (
    StorageNodePhaseReady StorageNodePhaseStatus = "Ready"
    StorageNodePhaseNotReady StorageNodePhaseStatus = "NotReady"
)

type StorageNodeConditionType string

// StorageNodeConditionType shows some states during the startup process of storage node
const (
        StorageNodeConditionInitialized StorageNodeConditionType = "Initialized"
        StorageNodeConditionStarted     StorageNodeConditionType = "Started"
        StorageNodeConditionReady       StorageNodeConditionType = "Ready"
        StorageNodeConditionUnknown     StorageNodeConditionType = "Unknown"
        StorageNodeConditionDeployed    StorageNodeConditionType = "Deployed"
        StorageNodeConditionFailed      StorageNodeConditionType = "Failed"
)

type StorageNodeConditions []StorageNodeCondition

type ConditionStatus string

const (
        ConditionStatusTrue    = "True"
        ConditionStatusFalse   = "False"
        ConditionStatusUnknown = "Unknown"
)

// StorageNodeCondition
type StorageNodeCondition struct {
        Type           StorageNodeConditionType `json:"type"`
        Status         ConditionStatus          `json:"status"`
        LastUpdateTime metav1.Time              `json:"lastUpdateTime,omitempty"`
        Reason         string                   `json:"reason"`
        Message        string                   `json:"message"`
}

type ClusterStatus struct {
        Arn              string         `json:"arn"`
        Identifier       string         `json:"identifier"`
        Status           string         `json:"status"`
        PrimaryEndpoint  Endpoint       `json:"primaryEndpoint"`
        ReaderEndpoints  []Endpoint     `json:"readerEndpoints"`
        Credentials      CredentialType `json:"credentials"`
}

type CredentialType struct {
        BasicCredential
}

type BasicCredential struct {
        Username string `json:"user"`
        Password string `json:"pass"`
}

type InstanceStatus struct {
        Arn             string         `json:"arn"`
        Identifier      string         `json:"identifier"`
        Status          string         `json:"status"`
        Endpoint        Endpoint       `json:"primaryEndpoint"`
        // Role ?
        Credentials     CredentialType `json:"credentials"`
}

type Endpoint struct {     
        Address   string      
        Port      int32
}

3.2 DatabaseClass design

Similar to StorageClass and IngressClass of Kubernetes, the resource control right is declared through XClass CRD, the specific configuration is declared by the corresponding PVC and Ingress, and the actual creation and other operations are performed by the controller specified by XClass.

Feature request:

  • The XClassSpec or definition needs to be abstract enough to be consumed by different controller implementations
  • Usually, x-controller identifies the public resource, and then determines that it belongs to its own configuration XClass CRD through annotation and other configurations, and completes subsequent creation actions according to the configuration
  • For StorageNode and DatabaseClass, it is a resource unique to ShardingSphere Operator, and the third-party controller cannot directly recognize it, so some conversion is required to complete it.
type DatabaseClassSpec struct {       
        // edb.pg.xxx/xxx
        // shardingsphere.apache.org/aws-rds-instance
        // shardingsphere.apache.org/aws-rds-cluster
        // shardingsphere.apache.org/aws-rds-aurora
        Provisioner        string                `json:"provisioner"`
        Parameters         map[string]string     `json:"parameters"`
        ReclaimPolicy      DatabaseReclaimPolicy `json:"reclaimPolicy"`
}

// DatabaseReclaimPolicy describes a policy for end-of-life maintenance of persistent volumes.
// +enum
type DatabaseReclaimPolicy string

const (
        // The database will be deleted with a final snapshot reserved.
        DatabaseReclaimDeleteWithFinalSnapshot DatabaseReclaimPolicy = "DeleteWithFinalSnapshot"
        // The database will be deleted.
        DatabaseReclaimDelete DatabaseReclaimPolicy = "Delete"
        // The database will be retained.
        // The default policy is Retain.
        DatabaseReclaimRetain DatabaseReclaimPolicy = "Retain"
)

const (
        AnnotationsVPCSecurityGroupIds = "databaseclass.database-mesh.io/vpc-security-group-ids"
        AnnotationsSubnetGroupName     = "databaseclass.database-mesh.io/vpc-subnet-group-name"
        AnnotationsAvailabilityZones   = "databaseclass.database-mesh.io/availability-zones"
)

type DatabaseClassStatus struct{}

3.3 StorageNode execution process

Take creating Aws rds as an example:
image

4 Expected outcome

Create StorageNode's CR, Specify the name of the DatabaseClass.

apiVersion: shardingsphere.apache.org/v1alpha1
kind: StorageNode
metadata:
  labels:
    app: foo 
  name: foo 
spec:
  databaseClass: aws-rds-instance
  schema: db_0 
  selector:
    matchLabels:
      app: foo 
  portBindings:
  - name: server
    containerPort: 8000
    servicePort: 8000
    protocol: TCP
  serviceType: ClusterIP 
apiVersion: core.database-mesh.io/v1alpha1
kind: DatabaseClass
metadata:
  name: aws-rds-instance
  annotations:
    "databaseclass.database-mesh.io/vpc-security-group-ids": "sg-xxx,sg-xxx"
    "databaseclass.database-mesh.io/vpc-subnet-group-name": "default"
    "databaseclass.database-mesh.io/availability-zones": "ap-southeast-1a,ap-southeast-1b,ap-southeast-1c"
spec:
  provisioner: shardingsphere.apache.org/aws-rds-instance
  parameters:
    "publicAccessible": "true"
    "autoGeneratedMasterPassword": "true"
    "defaultMasterUsername": "root"
    "engineName": "aurora-mysql"
    "engineVersion": "5.7.mysql_aurora.2.07.0"
    "instanceClass": "db.r5.large"
    "multiAZ": "true"
    "allocatedStorage": "50"
    "iops": "1000"
  reclaimPolicy: Delete

SS Operator will get the following outputs:

  1. The new CRD definition presents a global storage node cluster and a logical database that supports partitioning and sharding.

  2. Storage nodes that can be created manually or automatically, supporting different implementation methods.

  3. A storage node management platform that can be automatically registered. This platform can control operation and maintenance actions such as storage node expansion, configuration changes, backup recovery, and offline, as well as functions such as manual fusing and forced traffic routing.

  4. A controller module that supports the execution of storage node operation and maintenance, and DistSQL.

This can improve the scalability and configuration management flexibility of SS Operator, making it more suitable for cloud-native distributed architecture. It can enable ShardingSphere to better support automated deployment and maintenance across cloud platforms, while improving development efficiency and reducing the difficulty of operation and maintenance.

5 References

@Xu-Wentao
Copy link
Collaborator

#293

@Xu-Wentao
Copy link
Collaborator

Xu-Wentao commented Apr 21, 2023

support aws rds instance
#314

@Xu-Wentao
Copy link
Collaborator

two things need to fix

  1. when register storage unit, get datasource info from storage node's anno, like db's username and password. If these two param did not exist, need to get info from databaseclass default value.
  2. check aws rds instance CreateInstance with setDBName if it works.

@Xu-Wentao
Copy link
Collaborator

two things need to fix

  1. when register storage unit, get datasource info from storage node's anno, like db's username and password. If these two param did not exist, need to get info from databaseclass default value.
  2. check aws rds instance CreateInstance with setDBName if it works.

@yikuaibro If you have free time, plz check upon two points.

@yikuaibro
Copy link
Contributor Author

@Xu-Wentao Thank you very much for your guidance, I am honored to participate in the contribution, and I will try to fix them when I am free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operator solutions of operator type: discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants