Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIB: Add mindspore backend #169

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

LIB: Add mindspore backend #169

wants to merge 5 commits into from

Conversation

chou-shun
Copy link

@chou-shun chou-shun commented Aug 24, 2021

  1. Add mindspore backend
  2. If both GPU and NPU exist in the environment, NPU is preferred

Signed-off-by: zhangjun zhangjun286@huawei.com

@kubeedge-bot kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 24, 2021
@chou-shun chou-shun force-pushed the main branch 2 times, most recently from 463b3b4 to 10a2ca7 Compare August 24, 2021 03:07
if 'CUDA_VISIBLE_DEVICES' in os.environ:

# NPU>GPU>CPU
if device_category == "ASCEND":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to change "ASCEND" to "NPU".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because mindspore takes “ASCEND" as "device_target", is it necessary to modify it?
https://www.mindspore.cn/docs/api/zh-CN/r1.3/api_python/mindspore.context.html?highlight=Ascend#module-mindspore.context

Copy link
Contributor

@JoeyHwong-gk JoeyHwong-gk Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, this variable device_category does not affect the framework, only use to guide our contributors to distinguish between processing units, as usually juxtapose NPU with CPU/GPU, "ASCEND" is only provided by huawei.org.

In the current scenario, if MindSpore supports ASCEND only, I think it is best not to set the limit here, avoid special requirements for other products that may support NPU.

"""todo: no support yet"""

def model_info(self, model, relpath=None, result=None):
_, _type = os.path.splitext(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be the same as the parent class, suggest to delete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

@kubeedge-bot kubeedge-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 24, 2021
1. Add mindspore backend
2. If both GPU and NPU environments exist in the environment, NPU environment is preferred
3. Modify parameter "use_ascend": "use_ascend"->"use_npu"
4. Delete function "model_info" in mindscore backend

Signed-off-by: chou-shun <zhangjun286@huawei.com>
@chou-shun
Copy link
Author

@JoeyHwong-gk I have revised it according to your opinion. Please review it again. Thanks:)

@JoeyHwong-gk
Copy link
Contributor

@JoeyHwong-gk I have revised it according to your opinion. Please review it again. Thanks:)

thx for your job, it looks good to me

@@ -0,0 +1,74 @@
# Copyright 2021 The KubeEdge Authors.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your great job! BTW, should we provide some example about mindspore to end user? joint inference or incremental learning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll upload a resnet example by tomorrow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chou-shun please put your example in lib/examples/backend/mindspore/.

1. Develop a resnet example based on mindspore backend
2. scripts/: Script to start train, test and inference
   src/: ResNet network and model config
   train.py: Entrance to model training
   test.py: Entrance to model testing
   infrence.py Entrance to model inference
   interface.py: Implements class `Estimator``

Signed-off-by: chou-shun <zhangjun286@huawei.com>
@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 27, 2021
from mindspore import Tensor
import mindspore.dataset.vision.c_transforms as C
import numpy as np
from lib.sedna.backend import set_backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change with from sedna.backend import set_backend better

import mindspore.dataset.vision.c_transforms as C
import numpy as np
from lib.sedna.backend import set_backend
import cv2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import cv2 before sedna

if callable(self.estimator):
self.estimator = self.estimator()

def train(self, train_data, valid_data=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on existing specifications, the first parameter in train is lib.sedna.datasources.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because both formats (CSV, TXT) in datasource do not support cifar-10 datasets.
cifar-10: data_batch_1.bin, data_batch_2.bin, data_batch_3.bin,..., test_batch.bin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the misunderstanding. datasource inherited of BaseDataSource,as that core feature of sedna require identifying the features and labels from data input, we specify that the first parameter for train/evaluate of the ML framework must be a specific object (inherited of BaseDataSource).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I get it. However, should I develop a specific datasource for the cifar-10 dataset? This will be a little complicated...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a huge help to our community. And I would like you to give the professional advice on how to design DataSource.
In this example, it might be solved by:

from sedna.datasources import BaseDataSource

mnist_ds = ds.MnistDataset(train_data_path)

train_data = BaseDataSource(data_type="train")
train_data.x = []
train_data.y = []
for item in mnist_ds.create_dict_iterator():
     train_data.x.append(item["image"].asnumpy())
     train_data.y.append(item["label"].asnumpy())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll solve it as soon as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your method, but it didn't work very well. That is because, for method mindspore.Model.train, the data passed to it MUST BE a mindspore.Dataset, and can not be a BaseDataSource.
Here are some solutions I thought of:

    1. Abandon class mindspore.Model and adopt a more flexible approach: for data in dataloader: loss = network(data), loss.update().
    1. Repackage train_data into a mindspore.Dataset in interface.py.

For solution 1, it increases the difficulty of model development and is not a popular way.
For solution 2, actually it is counterintuitive.
Which do you prefer? Or do you have a more reasonable solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @jaypume

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that BaseDataSorce is designed to compatible with existing Dataset such as mindspore.Dataset, torch.utils.data.Dataset, tf.data.Dataset. If the Dataset instance is passed to for example FederatedLearning.train(tfDatasetInstance), it should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that BaseDataSorce is designed to compatible with existing Dataset such as mindspore.Dataset, torch.utils.data.Dataset, tf.data.Dataset. If the Dataset instance is passed to for example FederatedLearning.train(tfDatasetInstance), it should work.

You can help us raise an issue so we can track it.

self.estimator.load_weights(model_path)

def get_weights(self):
"""todo: no support yet"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return self.estimator.parameters_dict()

"""todo: no support yet"""

def set_weights(self, weights):
"""todo: no support yet"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for name, weight in weights.items():
weights[name] = mindspore.Parameter(weight, name=name)
mindspore.load_param_into_net(self.estimator, weights, strict_load=True)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.estimator is not a net. get_weights and set_weights would to be developed later:)
mindspore.load_param_into_net(net, parameter_dict, strict_load=False)

1. Add directory description in README.md.
2. Modify "test" to "eval".
3. Add some comments.

Signed-off-by: chou-shun <zhangjun286@huawei.com>
├── run_standalone_train_cpu.sh # launch cpu training
├── src
├── config.py # parameter configuration
├── dataset.py # data preprocessing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move the way of data preprocessing for midnspore to sedna lib, instead of in examples? the other people may be want to use it when they develop a appliation based on mindspore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you think is the most appropriate ? sedna.datasource or sedna.backend.mindspore ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we develop preprocessing methods for each model, or develop a general preprocessing method?
In my assumption, for a certain scene (such as image classification), we can predefine several fixed preprocessing methods (such as normalize, resize), and the user only needs to pass some parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not to integrate data preprocessing functions into Sedna Lib, for the reason:

  • There are various preprocessing functions, and they cannot be covered by Sedna.
  • The data preprocessing functions are always already defined in developer's training code or AI framework.

@JoeyHwong-gk
Copy link
Contributor

/assign @jaypume

@kubeedge-bot kubeedge-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2022
chou-shun added 2 commits July 6, 2022 21:11
Signed-off-by: chou-shun <zhangjun286@huawei.com>
Signed-off-by: chou-shun <zhangjun286@huawei.com>
@kubeedge-bot kubeedge-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 6, 2022
@kubeedge-bot kubeedge-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 1, 2022
@kubeedge-bot
Copy link
Collaborator

@chou-shun: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants