-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU-PSLIB] Add consistency insepection of use_var_list and data_generator data, test=develop #34463
Conversation
…rator data, test=develop
…rator data, test=develop
Thanks for your contribution! |
…rator data, test=develop
…rator data, test=develop
…and data_generator
…and data_generator
… and data_generator data
… and data_generator data
… and data_generator data
… and data_generator data
python/paddle/fluid/dataset.py
Outdated
|
||
if var_list[ | ||
i].dtype == core.VarDesc.VarType.FP32 and not all( | ||
isinstance(ele, float) for ele in ele[1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- write var_list[i] in one line? 2. if not all(isinstance(ele,float)), you'd better pring ele out. otherwise, user doesn't know which element in the current line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- pre-commit format in this way...
- have already printed ele out, see "raise TypeError"
python/paddle/fluid/dataset.py
Outdated
from dataset_generator import CTRDataset | ||
dataset = fluid.DatasetFactory().create_dataset() | ||
generator_class = CTRDataset() | ||
dataset.check_use_var_with_data_generator([data, label], generator_class, "data/part-00000") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do this check in dataSet's loadintomemory? Sometimes, it's not easy to reproduce the error by using just one file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++端是按slot逐个解析的(MultiSlotDataFeed::ParseOneInstance),解析到报错点就停了,然后报错点感觉也没法加更多更明显的信息,因为它解析出来的已经是错误的值了,感觉还是不如Python端报错更明显
…erator data, test=develop
…erator data, test=develop
PR types
Function optimization
PR changes
Others
Describe
问题描述:
目前没有对use_var_list和pipe_command生成数据的一致性检查,可能长度不一致,可能type不匹配
问题复现:
下图报错由于data_generator中误加了一个slot_value,导致type跟use_var_list中的错位,但下图报错显示第24个slot的feasign是0,这无疑没有告诉使用者真实原因
修改说明:
dataset.py增加check_use_var_with_data_generator函数,检查以下几点
TestCase1:
data_generator中少了一个value,长度和var_list不一致,报长度不匹配错误,如下
TestCase2:
data_generator中误加了一个value,导致跟var_list中的type错位,type应该是float,但实际该位置的值是int,报错如下
TestCase3:
data_generator中一个value本应该是int(click=1),但却置成了float(click=1.0),导致type不匹配,报错如下