Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hackathon 5th No.58 A physics-informed deep neural network for surrogate modeling in classical elasto-plasticity #558

Closed
wants to merge 2 commits into from

Conversation

co63oc
Copy link
Contributor

@co63oc co63oc commented Sep 29, 2023

PR types

Others

PR changes

Others

Describe

PaddlePaddle/Paddle#57262

已使用 transform修改模型,文档需要引用代码行,在代码修改完后再增加
训练精度

- torch paddle
- 0.098 0.097

torch
image

paddle
image

数据集在 https://github.com/meghbali/ANNElastoplasticity/tree/main/Datasets/WG
dstate-16-plas.dat
dstress-16-plas.dat

@paddle-bot
Copy link

paddle-bot bot commented Sep 29, 2023

Thanks for your contribution!

@HydrogenSulfate
Copy link
Collaborator

HydrogenSulfate commented Oct 1, 2023

  1. PaddleScience目前有多模型、多优化器的API:https://paddlescience-docs.readthedocs.io/zh/latest/zh/api/arch/#ppsci.arch.ModelListhttps://paddlescience-docs.readthedocs.io/zh/latest/zh/api/optimizer/#ppsci.optimizer.optimizer.OptimizerList 但缺少SchedulerList 类,可以添加到 ppsci\optimizer\lr_scheduler.py 中;
  2. 这个模型看起来是多模型的串联结构,如下所示
image PaddleScience里有几个串联模型的案例:https://paddlescience-docs.readthedocs.io/zh/latest/zh/examples/deephpms/?h=model_list#4,可以参考其实现方式,把实现逻辑合理分离到 Arch.input_transform、Arch.output_transform、Constraint.output_expr 中去。

@co63oc
Copy link
Contributor Author

co63oc commented Oct 2, 2023

训练模式也不同 epnn_wrapper.py

image
训练一次不是用EPOCH,是多个循环

@HydrogenSulfate
Copy link
Collaborator

训练模式也不同 epnn_wrapper.py

image 训练一次不是用EPOCH,是多个循环

ihlayers控制模型深度,ineurons控制模型宽度,itrain控制训练数据集的量级,irepeat控制实验的重复次数,都只有一个固定的取值,main.mainfunc只会被调用一次。复现源代码跟套件有很大出入是正常,论文复现的难点在于理解源代码的运行逻辑,用套件等价实现。

@co63oc
Copy link
Contributor Author

co63oc commented Oct 6, 2023

PaddleScience里有几个串联模型的案例

image
model_list的模型输入是x,是并行结构,串行应该是

y = model1(x)
y2 = model2(y)

@lijialin03
Copy link
Contributor

PaddleScience里有几个串联模型的案例

image model_list的模型输入是x,是并行结构,串行应该是

y = model1(x)
y2 = model2(y)

如果model2不需要进行反向,可以参照deephpms/burgers.py中,直接按照“y2 = model2(y)”这个方式写就可以,注意这样得到的y2是dict,输入的y也应该是dict
image

@co63oc
Copy link
Contributor Author

co63oc commented Oct 11, 2023

PaddleScience里有几个串联模型的案例

已使用 transform修改模型,文档需要引用代码行,在代码修改完后再增加
训练精度

- torch paddle
- 0.098 0.097

torch
image

paddle
image

数据集在 https://github.com/meghbali/ANNElastoplasticity/tree/main/Datasets/WG
dstate-16-plas.dat
dstress-16-plas.dat

@lijialin03
Copy link
Contributor

文档只需引用必要的代码,能够说清楚方案即可

@lijialin03
Copy link
Contributor

lijialin03 commented Oct 12, 2023

先只看了一下ppsci下的代码,example下我看目前基本上还是paddle的代码,验收标准是合入paddlescience,所以需要进行替换。
很多代码paddlescience有相关功能的实现,如果是像读取.dat文件这类没有实现的功能,应该在 ppsci/data/dataset/ 下添加代码。
训练过程需要理解之后改成paddlescience的格式,这个训练过程看起来应该是每次训练取的数据量不一样,也就是训练的batch_size会变,这样的话需要补充一个可以每个采样不同 batch_size 的 batch Sampler

ppsci/arch/epnn.py Outdated Show resolved Hide resolved
ppsci/arch/epnn.py Outdated Show resolved Hide resolved
ppsci/arch/epnn.py Show resolved Hide resolved
ppsci/arch/epnn.py Outdated Show resolved Hide resolved
examples/epnn/epnn.py Outdated Show resolved Hide resolved
examples/epnn/epnn.py Outdated Show resolved Hide resolved
@co63oc
Copy link
Contributor Author

co63oc commented Oct 13, 2023

文档只需引用必要的代码,能够说清楚方案即可

已增加文档,引用代码行在代码修改完成后再更新

@co63oc
Copy link
Contributor Author

co63oc commented Oct 13, 2023

先只看了一下ppsci下的代码,example下我看目前基本上还是paddle的代码,验收标准是合入paddlescience,所以需要进行替换。 很多代码paddlescience有相关功能的实现,如果是像读取.dat文件这类没有实现的功能,应该在 ppsci/data/dataset/ 下添加代码。 训练过程需要理解之后改成paddlescience的格式,这个训练过程看起来应该是每次训练取的数据量不一样,也就是训练的batch_size会变,这样的话需要补充一个可以每个采样不同 batch_size 的 batch Sampler

dataset格式是自定义字段,使用pickle.load读取,如果增加那是不是在ppsci.utils.reader.py增加
训练数据集是生成所有batch数据,使用NamedArrayDataset, 读取batch_size都是1

@co63oc
Copy link
Contributor Author

co63oc commented Oct 13, 2023

paddle.nn.L1Loss 替换为 ppsci.loss.L1Loss, ppsci.loss.L1Loss有loss.sum,如果元素shape为1会有错误

image

image

ppsci/arch/epnn.py Outdated Show resolved Hide resolved
ppsci/arch/epnn.py Show resolved Hide resolved
ppsci/arch/epnn.py Outdated Show resolved Hide resolved
examples/epnn/epnn.py Outdated Show resolved Hide resolved
@lijialin03
Copy link
Contributor

先只看了一下ppsci下的代码,example下我看目前基本上还是paddle的代码,验收标准是合入paddlescience,所以需要进行替换。 很多代码paddlescience有相关功能的实现,如果是像读取.dat文件这类没有实现的功能,应该在 ppsci/data/dataset/ 下添加代码。 训练过程需要理解之后改成paddlescience的格式,这个训练过程看起来应该是每次训练取的数据量不一样,也就是训练的batch_size会变,这样的话需要补充一个可以每个采样不同 batch_size 的 batch Sampler

dataset格式是自定义字段,使用pickle.load读取,如果增加那是不是在ppsci.utils.reader.py增加 训练数据集是生成所有batch数据,使用NamedArrayDataset, 读取batch_size都是1

是的,需要在ppsci.utils.reader.py增加读入.dat的部分

@lijialin03
Copy link
Contributor

paddle.nn.L1Loss 替换为 ppsci.loss.L1Loss, ppsci.loss.L1Loss有loss.sum,如果元素shape为1会有错误

image

image

参考 https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/functional/l1_loss_cn.html#l1-loss
paddle develop处于更新中的状态,因此paddlescience develop分支的代码会与当前版本paddle develop保持一致,可以尝试重新安装 paddle develop

@co63oc
Copy link
Contributor Author

co63oc commented Oct 16, 2023

是的,需要在ppsci.utils.reader.py增加读入.dat的部分

已修改

@co63oc
Copy link
Contributor Author

co63oc commented Oct 16, 2023

参考 https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/functional/l1_loss_cn.html#l1-loss paddle develop处于更新中的状态,因此paddlescience develop分支的代码会与当前版本paddle develop保持一致,可以尝试重新安装 paddle develop

使用最新版本paddle ,还是提示错误,错误位置是ppsci/loss/l1.py,如果label_key是List会有错误
image

image

@lijialin03
Copy link
Contributor

参考 https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/functional/l1_loss_cn.html#l1-loss paddle develop处于更新中的状态,因此paddlescience develop分支的代码会与当前版本paddle develop保持一致,可以尝试重新安装 paddle develop

使用最新版本paddle ,还是提示错误,错误位置是ppsci/loss/l1.py,如果label_key是List会有错误 image

image

在报错这一句之前把 input 和 label 打印看一下,是否符合下面指定的格式
image

ppsci/optimizer/lr_scheduler.py Show resolved Hide resolved
examples/epnn/epnn.py Show resolved Hide resolved
@co63oc
Copy link
Contributor Author

co63oc commented Oct 17, 2023

在报错这一句之前把 input 和 label 打印看一下,是否符合下面指定的格式 image

output_dict, label_dict的shape都是[187679, 1]
image

@lijialin03
Copy link
Contributor

在报错这一句之前把 input 和 label 打印看一下,是否符合下面指定的格式 image

output_dict, label_dict的shape都是[187679, 1] image

现在output_dict是list吗?那现在这个报错应该是output_dict[key]报的,而不是l1_loss,看type hint,这边要求是dict,应该print(output_dict[key].shape)

@co63oc
Copy link
Contributor Author

co63oc commented Oct 18, 2023

现在output_dict是list吗?那现在这个报错应该是output_dict[key]报的,而不是l1_loss,看type hint,这边要求是dict,应该print(output_dict[key].shape)

测试使用ppsci.loss.L1Loss后训练,ppsci.loss.L1Loss计算的值和paddle.nn.L1Loss相同,但是训练中返回的loss值和使用paddle.nn.L1Loss不同,ppsci.loss.L1Loss中增加了部分逻辑,可能是反向过程会不一样

使用 ppsci.loss.L1Loss
image

使用paddle.nn.L1Loss
image

@lijialin03
Copy link
Contributor

现在output_dict是list吗?那现在这个报错应该是output_dict[key]报的,而不是l1_loss,看type hint,这边要求是dict,应该print(output_dict[key].shape)

测试使用ppsci.loss.L1Loss后训练,ppsci.loss.L1Loss计算的值和paddle.nn.L1Loss相同,但是训练中返回的loss值和使用paddle.nn.L1Loss不同,ppsci.loss.L1Loss中增加了部分逻辑,可能是反向过程会不一样

是的,经过检查,paddle.nn.L1Loss是逐元素平均的,而ppsci.loss.L1Loss是按batch平均的,感谢你的反馈,麻烦参考ppsci.loss.MSELoss,增加一个MAELoss的API吧。

@co63oc
Copy link
Contributor Author

co63oc commented Oct 18, 2023

是的,经过检查,paddle.nn.L1Loss是逐元素平均的,而ppsci.loss.L1Loss是按batch平均的,感谢你的反馈,麻烦参考ppsci.loss.MSELoss,增加一个MAELoss的API吧。

已增加 MAELoss

ppsci/utils/reader.py Outdated Show resolved Hide resolved
ppsci/utils/reader.py Outdated Show resolved Hide resolved
ppsci/utils/reader.py Outdated Show resolved Hide resolved
ppsci/utils/reader.py Outdated Show resolved Hide resolved
ppsci/loss/mse.py Outdated Show resolved Hide resolved
examples/epnn/functions.py Outdated Show resolved Hide resolved
examples/epnn/functions.py Outdated Show resolved Hide resolved
examples/epnn/functions.py Outdated Show resolved Hide resolved
examples/epnn/functions.py Outdated Show resolved Hide resolved
examples/epnn/functions.py Outdated Show resolved Hide resolved
ppsci/arch/epnn.py Outdated Show resolved Hide resolved
ppsci/arch/epnn.py Outdated Show resolved Hide resolved
examples/epnn/functions.py Outdated Show resolved Hide resolved
examples/epnn/functions.py Outdated Show resolved Hide resolved
examples/epnn/epnn.py Outdated Show resolved Hide resolved
)
dstrainel = dstrain_real - dstrainpl_real
dstrainelv = paddle.matmul(x=dstrainel, y=oneten1)
dstrainelvten = paddle.multiply(x=dstrainelv, y=paddle.to_tensor(oneten2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把能合的合一下,写到同一个式子里吧

examples/epnn/epnn.py Outdated Show resolved Hide resolved
)

model_list = functions.get_model_list(
IHLAYERS, INEURONS, data_train_state1, data_train_stress1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理:

  1. 这边的data_train_state1并没有往本文件传一下的必要
  2. 其实进到get_model_list中看的话,只是用到了数据的shape,既然如此传递shape即可,似乎没有必要连数据一起传递

examples/epnn/epnn.py Outdated Show resolved Hide resolved
learning_rate=scheduler_stress,
weight_decay=0.0,
)(model_list[2])
optimizer_ratio = paddle.optimizer.Adam(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否可以进行类似这样的封装(只是举个例子,也可以再调整结构)
image

examples/epnn/epnn.py Outdated Show resolved Hide resolved
Returns:
paddle.Tensor: Loss value.
"""
criterion = val_loss_criterion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上这边也没必要,这边可以直接写成嵌套函数,类似这样
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval_loss_func需要统计信息,没有修改嵌套函数

examples/epnn/epnn.py Outdated Show resolved Hide resolved
min_stress = paddle.to_tensor(data=get_data["miny"])
range_stress = paddle.to_tensor(data=get_data["rangey"])
global common_param
common_param = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果要用set_params,最好封装个类,不然依然使用global,似乎没什么意义
这个值实际上并没有必要在functions和epnn中来回传递,只要把loss_function封装到functions中,然后参照hpinns,把common_param写在文件最开始即可,不过看functions.set_params(functions.Data.get_common_param(DATASET_STATE, DATASET_STRESS))这一行,这行代码将一个从functions中计算得到的值传回了functions文件,完全没有必要

examples/epnn/epnn_batch.py Outdated Show resolved Hide resolved
for i in range(epochs):
shuffled_indices = paddle.randperm(n=self.data_train_state1.x.shape[0])
input_dict_train["state_x"].append(
self.data_train_state1.x[shuffled_indices[0 : self.itrain]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,那就不用data sampler了吧

@co63oc
Copy link
Contributor Author

co63oc commented Oct 25, 2023

使用get_opt嵌套函数测试,精度会增加,所以没有使用get_opt

使用global loss_log记录信息,Solver中配置eval_during_train=True,在每次epochs计算eval error
然后使用functions.plot_loss画图

epochs=10000,Loss是ppsci.loss.MAELoss计算结果,Error是paddle.linalg.norm计算结果
image

论文图 epochs=10000
image

"dataset": {
"name": "NamedArrayDataset",
"input": input_dict_train,
"label": label_dict_train,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码运行报错,原因是NamedArrayDataset的输入类型不对
image
ppsci.utils.reader.load_dat_file()读进来的dict中value还是np.darray类型,在传到这里变成了List[Tensor,...]
functions中对数据的处理是否不需要一定转成tensor类型?
list转array可以用np.array(list)

get_data = ppsci.utils.reader.load_dat_file(dataset_state)
data_state = Data(
x=paddle.to_tensor(data=get_data["X"]),
y=paddle.to_tensor(data=get_data["y"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data 中调 Data 很奇怪

shuffled_indices[n_train + n_cross_valid : n_train + n_cross_valid + n_test]
],
)
return data_train, data_cross_valid, data_test
Copy link
Contributor

@lijialin03 lijialin03 Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数看起来只是切分了data。
一开始我感觉Data只是封装了x,y,其实不太需要,所以可以写成类似的形式,可以让Data这个类承担更多功能,相当于对数据进行简单处理的类,代码相应应该改成类似于这样
image
但是如果一定要用Data再对数据封装一层,它就是了一个对数据特征进行封装的类,不太一样,这样的话确实Data中不太适合放其他的函数,需要把get_data和set_common_param挪出去

self.y = y
self.n_samples = self.x.shape[0]

def get_data(dataset_state, dataset_stress, ntrain_size, irepeat):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把它和set_common_param挪出Data吧

).get(10)

def set_common_param(dataset_state, dataset_stress):
global common_param
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同一个文件里可以直接赋值,不用写global


def plot_loss():
global loss_log
global OUTPUT_DIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不用写global,另外solver里有plot_loss_history这个函数,可以参考看看能不能用



def loss_func(output_dict, criterion) -> paddle.Tensor:
global gkratio, common_param
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不用写global

Returns:
paddle.Tensor: Loss value.
"""
global loss_log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

train_size = train_size_float.astype(int)
itrain = train_size[ntrain_size - 1]

ppsci.utils.misc.set_random_seed(seed=10 + irepeat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句好像没有用到

@co63oc
Copy link
Contributor Author

co63oc commented Oct 26, 2023

PR内容较长,当前PR关闭再新建PR #606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants