This case introduces how to use federated learning job in surface defect detection scenario. In the safety surface defect detection, data is scattered in different places (such as server node, camera or others) and cannot be aggregated due to data privacy and bandwidth. As a result, we cannot use all the data for training. Using Federated Learning, we can solve the problem. Each place uses its own data for model training ,uploads the weight to the cloud for aggregation, and obtains the aggregation result for model update.
Assume that there are two edge nodes and a cloud node. Data on the edge nodes cannot be migrated to the cloud due to privacy issues. Base on this scenario, we will demonstrate the surface inspection.
CLOUD_NODE="cloud-node-name"
EDGE1_NODE="edge1-node-name"
EDGE2_NODE="edge2-node-name"
Follow the Sedna installation document to install Sedna.
Download dataset and the label file to /data
of EDGE1_NODE
.
mkdir -p /data
cd /data
git clone https://github.com/abin24/Magnetic-tile-defect-datasets..git Magnetic-tile-defect-datasets
curl -o 1.txt https://raw.githubusercontent.com/kubeedge/sedna/main/examples/federated_learning/surface_defect_detection/data/1.txt
Download dataset and the label file to /data
of EDGE2_NODE
.
mkdir -p /data
cd /data
git clone https://github.com/abin24/Magnetic-tile-defect-datasets..git Magnetic-tile-defect-datasets
curl -o 2.txt https://raw.githubusercontent.com/kubeedge/sedna/main/examples/federated_learning/surface_defect_detection/data/2.txt
This example uses these images:
- aggregation worker:
kubeedge/sedna-example-federated-learning-surface-defect-detection-aggregation:v0.3.0
- train worker:
kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.3.0
These images are generated by the script build_images.sh.
create dataset for $EDGE1_NODE
kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
kind: Dataset
metadata:
name: "edge1-surface-defect-detection-dataset"
spec:
url: "/data/1.txt"
format: "txt"
nodeName: $EDGE1_NODE
EOF
create dataset for $EDGE2_NODE
kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
kind: Dataset
metadata:
name: "edge2-surface-defect-detection-dataset"
spec:
url: "/data/2.txt"
format: "txt"
nodeName: $EDGE2_NODE
EOF
create the directory /model
in the host of $EDGE1_NODE
mkdir /model
create the directory /model
in the host of $EDGE2_NODE
mkdir /model
create model
kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
kind: Model
metadata:
name: "surface-defect-detection-model"
spec:
url: "/model"
format: "pb"
EOF
kubectl create -f - <<EOF
apiVersion: sedna.io/v1alpha1
kind: FederatedLearningJob
metadata:
name: surface-defect-detection
spec:
aggregationWorker:
model:
name: "surface-defect-detection-model"
template:
spec:
nodeName: $CLOUD_NODE
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-aggregation:v0.3.0
name: agg-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "exit_round"
value: "3"
resources: # user defined resources
limits:
memory: 2Gi
trainingWorkers:
- dataset:
name: "edge1-surface-defect-detection-dataset"
template:
spec:
nodeName: $EDGE1_NODE
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.3.0
name: train-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "2"
resources: # user defined resources
limits:
memory: 2Gi
- dataset:
name: "edge2-surface-defect-detection-dataset"
template:
spec:
nodeName: $EDGE2_NODE
containers:
- image: kubeedge/sedna-example-federated-learning-surface-defect-detection-train:v0.3.0
name: train-worker
imagePullPolicy: IfNotPresent
env: # user defined environments
- name: "batch_size"
value: "32"
- name: "learning_rate"
value: "0.001"
- name: "epochs"
value: "2"
resources: # user defined resources
limits:
memory: 2Gi
EOF
kubectl get federatedlearningjob surface-defect-detection
After the job completed, you will find the model generated on the directory /model
in $EDGE1_NODE
and $EDGE2_NODE
.