-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIB: Add mindspore backend #169
base: main
Are you sure you want to change the base?
Conversation
463b3b4
to
10a2ca7
Compare
lib/sedna/backend/__init__.py
Outdated
if 'CUDA_VISIBLE_DEVICES' in os.environ: | ||
|
||
# NPU>GPU>CPU | ||
if device_category == "ASCEND": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better to change "ASCEND" to "NPU".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because mindspore takes “ASCEND" as "device_target", is it necessary to modify it?
https://www.mindspore.cn/docs/api/zh-CN/r1.3/api_python/mindspore.context.html?highlight=Ascend#module-mindspore.context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, this variable device_category
does not affect the framework, only use to guide our contributors to distinguish between processing units, as usually juxtapose NPU with CPU/GPU, "ASCEND" is only provided by huawei.org.
In the current scenario, if MindSpore
supports ASCEND
only, I think it is best not to set the limit here, avoid special requirements for other products that may support NPU.
"""todo: no support yet""" | ||
|
||
def model_info(self, model, relpath=None, result=None): | ||
_, _type = os.path.splitext(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be the same as the parent class, suggest to delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
1. Add mindspore backend 2. If both GPU and NPU environments exist in the environment, NPU environment is preferred 3. Modify parameter "use_ascend": "use_ascend"->"use_npu" 4. Delete function "model_info" in mindscore backend Signed-off-by: chou-shun <zhangjun286@huawei.com>
@JoeyHwong-gk I have revised it according to your opinion. Please review it again. Thanks:) |
thx for your job, it looks good to me |
@@ -0,0 +1,74 @@ | |||
# Copyright 2021 The KubeEdge Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your great job! BTW, should we provide some example about mindspore to end user? joint inference or incremental learning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll upload a resnet example by tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chou-shun please put your example in lib/examples/backend/mindspore/
.
1. Develop a resnet example based on mindspore backend 2. scripts/: Script to start train, test and inference src/: ResNet network and model config train.py: Entrance to model training test.py: Entrance to model testing infrence.py Entrance to model inference interface.py: Implements class `Estimator`` Signed-off-by: chou-shun <zhangjun286@huawei.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
from mindspore import Tensor | ||
import mindspore.dataset.vision.c_transforms as C | ||
import numpy as np | ||
from lib.sedna.backend import set_backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change with from sedna.backend import set_backend
better
import mindspore.dataset.vision.c_transforms as C | ||
import numpy as np | ||
from lib.sedna.backend import set_backend | ||
import cv2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import cv2 before sedna
if callable(self.estimator): | ||
self.estimator = self.estimator() | ||
|
||
def train(self, train_data, valid_data=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on existing specifications, the first parameter in train is lib.sedna.datasources
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is because both formats (CSV, TXT) in datasource
do not support cifar-10 datasets.
cifar-10: data_batch_1.bin, data_batch_2.bin, data_batch_3.bin,..., test_batch.bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the misunderstanding. datasource
inherited of BaseDataSource
,as that core feature of sedna require identifying the features
and labels
from data input, we specify that the first parameter for train/evaluate of the ML framework must be a specific object (inherited of BaseDataSource
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I get it. However, should I develop a specific datasource
for the cifar-10 dataset? This will be a little complicated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a huge help to our community. And I would like you to give the professional advice on how to design DataSource
.
In this example, it might be solved by:
from sedna.datasources import BaseDataSource
mnist_ds = ds.MnistDataset(train_data_path)
train_data = BaseDataSource(data_type="train")
train_data.x = []
train_data.y = []
for item in mnist_ds.create_dict_iterator():
train_data.x.append(item["image"].asnumpy())
train_data.y.append(item["label"].asnumpy())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll solve it as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your method, but it didn't work very well. That is because, for method mindspore.Model.train
, the data passed to it MUST BE a mindspore.Dataset
, and can not be a BaseDataSource
.
Here are some solutions I thought of:
-
- Abandon class
mindspore.Model
and adopt a more flexible approach:for data in dataloader: loss = network(data), loss.update()
.
- Abandon class
-
- Repackage
train_data
into amindspore.Dataset
ininterface.py
.
- Repackage
For solution 1, it increases the difficulty of model development and is not a popular way.
For solution 2, actually it is counterintuitive.
Which do you prefer? Or do you have a more reasonable solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @jaypume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that BaseDataSorce
is designed to compatible with existing Dataset
such as mindspore.Dataset
, torch.utils.data.Dataset
, tf.data.Dataset
. If the Dataset instance is passed to for example FederatedLearning.train(tfDatasetInstance)
, it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that
BaseDataSorce
is designed to compatible with existingDataset
such asmindspore.Dataset
,torch.utils.data.Dataset
,tf.data.Dataset
. If the Dataset instance is passed to for exampleFederatedLearning.train(tfDatasetInstance)
, it should work.
You can help us raise an issue so we can track it.
self.estimator.load_weights(model_path) | ||
|
||
def get_weights(self): | ||
"""todo: no support yet""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self.estimator.parameters_dict()
"""todo: no support yet""" | ||
|
||
def set_weights(self, weights): | ||
"""todo: no support yet""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for name, weight in weights.items():
weights[name] = mindspore.Parameter(weight, name=name)
mindspore.load_param_into_net(self.estimator, weights, strict_load=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.estimator
is not a net. get_weights
and set_weights
would to be developed later:)
mindspore.load_param_into_net(net, parameter_dict, strict_load=False)
1. Add directory description in README.md. 2. Modify "test" to "eval". 3. Add some comments. Signed-off-by: chou-shun <zhangjun286@huawei.com>
├── run_standalone_train_cpu.sh # launch cpu training | ||
├── src | ||
├── config.py # parameter configuration | ||
├── dataset.py # data preprocessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move the way of data preprocessing for midnspore to sedna lib, instead of in examples? the other people may be want to use it when they develop a appliation based on mindspore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you think is the most appropriate ? sedna.datasource
or sedna.backend.mindspore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we develop preprocessing methods for each model, or develop a general preprocessing method?
In my assumption, for a certain scene (such as image classification), we can predefine several fixed preprocessing methods (such as normalize, resize), and the user only needs to pass some parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not to integrate data preprocessing functions into Sedna Lib, for the reason:
- There are various preprocessing functions, and they cannot be covered by Sedna.
- The data preprocessing functions are always already defined in developer's training code or AI framework.
/assign @jaypume |
Signed-off-by: chou-shun <zhangjun286@huawei.com>
@chou-shun: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: zhangjun zhangjun286@huawei.com