Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions #23

Open
kevin-xuan opened this issue Aug 8, 2021 · 6 comments
Open

some questions #23

kevin-xuan opened this issue Aug 8, 2021 · 6 comments

Comments

@kevin-xuan
Copy link

作者,你好。我在PEMS03数据集上运行你们提供的demo,为什么运行出的结果在前2轮看起来很正常,到第3轮的train loss直接变成200多了,直到训练结束也还是200多,没有得到正确的实验结果。dtw文件就是adj_PEMS03_001.csv吧,我也尝试用fast_DTW_gen.py重新生成adj_PEMS03_001.csv,但还是在第3轮遇到同样的问题,请问原因是什么?

@MengzhangLI
Copy link
Owner

Hi, sorry for late reply.

If I remember well, I also met this problem (about 1 year ago). Do you still often encounter this problem?

I ignored this question because in next version of my code, I never met this question again. Maybe it is MXNet problem I guess.

Looking forward to your reply.

Best,

@kevin-xuan
Copy link
Author

是的,我在PEMS03数据集上跑了2次还是这个结果,是mxnet版本问题吗?我的是python3.6 mxnet 1.5.0 cuda version 11.2 。
我mxnet-cu100 mxnet-cu101 mxnet-cu102都试过了,在101版本运行成功,其他2个版本运行报错了,报错的问题也是与mxnet版本有关

@MengzhangLI
Copy link
Owner

是的,我在PEMS03数据集上跑了2次还是这个结果,是mxnet版本问题吗?我的是python3.6 mxnet 1.5.0 cuda version 11.2 。
我mxnet-cu100 mxnet-cu101 mxnet-cu102都试过了,在101版本运行成功,其他2个版本运行报错了,报错的问题也是与mxnet版本有关

不大像,mxnet安装错误会直接跑不起来,而你说的跟我之前遇到的都是模型不收敛 (虽然我大多时候是收敛的,就没在意这个问题). 您可以跑5-10次,看下不收敛的次数? 我印象中基本都能收敛.

@kevin-xuan
Copy link
Author

这代码我有时候能运行,有时候报mxnet版本错误...
val_loader.reset()
prediction = mod.predict(val_loader)[1].asnumpy()
loss = masked_mae_np(val_y, prediction, 0)
想问个问题,之前在处理数据时是把训练集,测试集,验证集的输入X归一化处理,而标签Y没有归一化处理,
prediction = mod.predict(val_loader)[1].asnumpy()这部分预测出来的值应该要逆归一化,也就是重新变成流量值才能和Y计算MAE吧?请问这个逆归一化处理在哪里呢?

@kevin-xuan
Copy link
Author

我在debug的时候经常遇到[01:04:15] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\linalg_impl.h:213: Check failed: e == CUBLAS_STATUS_SUCCESS (13 vs. 0) : cuBLAS: CUBLAS_STATUS_EXECUTION_FAILED错误。
for idx, databatch in enumerate(train_loader):
mod.forward_backward(databatch)
mod.update_metric(metric, databatch.label)
mod.update()
是因为这个错误导致 mod.update_metric(metric, databatch.label)这一行的mxnet内部所预测出的preds结果全是0吗?

@MengzhangLI
Copy link
Owner

抱歉,才看到。我认为是MXNet版本的问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants