-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix distributed dbscan with empty partial results #657
base: main
Are you sure you want to change the base?
Fix distributed dbscan with empty partial results #657
Conversation
/intelci: run |
@@ -129,21 +129,23 @@ services::Status DistributedInput<step2Local>::check(const daal::algorithms::Par | |||
const size_t nBlocks = dcPartialData->size(); | |||
DAAL_CHECK_EX(nBlocks > 0, ErrorIncorrectNumberOfInputNumericTables, ArgumentName, partialDataStr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to handle case, when nBlocks == 0
in the same manner as the empty partialData tables.
@@ -129,21 +129,23 @@ services::Status DistributedInput<step2Local>::check(const daal::algorithms::Par | |||
const size_t nBlocks = dcPartialData->size(); | |||
DAAL_CHECK_EX(nBlocks > 0, ErrorIncorrectNumberOfInputNumericTables, ArgumentName, partialDataStr()); | |||
|
|||
size_t nFeatures = 0; | |||
for (size_t i = 0; i < nBlocks; i++) | |||
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, this check is incorrect. It should be getNumberOfRows() != 0
to make sense.
Second, it is reasonable to put this check inside loop and just skip tables with 0
rows.
@@ -249,19 +251,17 @@ services::Status DistributedInput<step3Local>::check(const daal::algorithms::Par | |||
const size_t nDataBlocks = dcPartialData->size(); | |||
DAAL_CHECK_EX(nDataBlocks > 0, ErrorIncorrectNumberOfInputNumericTables, ArgumentName, partialDataStr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to handle case, when nBlocks == 0 in the same manner as the empty partialData tables.
size_t nFeatures = 0; | ||
for (size_t i = 0; i < nDataBlocks; i++) | ||
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns(); | ||
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is reasonable to put this check inside loop and just skip tables with 0 rows.
size_t nFeatures = 0; | ||
for (size_t i = 0; i < nDataBlocks; i++) | ||
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns(); | ||
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is reasonable to put this check inside loop and just skip tables with 0 rows.
size_t nFeatures = 0; | ||
for (size_t i = 0; i < nDataBlocks; i++) | ||
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns(); | ||
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is reasonable to put this check inside loop and just skip tables with 0 rows.
@@ -1197,14 +1198,17 @@ services::Status DistributedInput<step12Local>::check(const daal::algorithms::Pa | |||
const size_t nQueriesBlocks = dcOrders->size(); | |||
DAAL_CHECK_EX(nQueriesBlocks > 0, ErrorIncorrectNumberOfInputNumericTables, ArgumentName, step12PartialOrdersStr()); | |||
|
|||
for (size_t i = 0; i < nQueriesBlocks; i++) | |||
if (NumericTable::cast((*dcOrders)[0])->getNumberOfRows() != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is reasonable to put this check inside loop and just skip tables with 0 rows.
|
||
const int unexpectedLayouts = (int)packed_mask; | ||
|
||
for (size_t i = 0; i < nBlocks; i++) | ||
if (NumericTable::cast((*dcPartitionedData)[0])->getNumberOfRows() != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is reasonable to put this check inside loop and just skip tables with 0 rows.
@@ -152,26 +152,35 @@ Status DistributedPartialResultStep4::check(const daal::algorithms::Input * inpu | |||
DataCollectionPtr dcPartitionedPartialOrders = get(partitionedPartialOrders); | |||
|
|||
DAAL_CHECK_EX(dcPartitionedData.get(), ErrorNullPartialResult, ArgumentName, partitionedDataStr()); | |||
DAAL_CHECK_EX(dcPartitionedData->size() == nBlocks, ErrorIncorrectDataCollectionSize, ArgumentName, partitionedDataStr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also add check for nBlocks == 0
and skip all further checks in such case.
@@ -184,6 +184,11 @@ void geometricPartitioning() | |||
sendCollectionAllToAll(beginId, endId, rankId, step4ResultPartitionedDataTag, curPartitionedData, partitionedData); | |||
sendCollectionAllToAll(beginId, endId, rankId, step4ResultPartitionedPartialOrdersTag, curPartitionedPartialOrders, partitionedPartialOrders); | |||
|
|||
if (partitionedData->size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workaround can be avoided using checks for nBlocks == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be avoided. Because this code sequence makes sample less clear for the end user. @Pahandrovich please add nBlocks == 0 checks inside daal and remove those lines of code from the sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to remove changes in MPI sample. As samples should be as clear/simple as possible. You'll need to add checks for nBlocks == 0 to do so.
I'd also suggest to remove code duplications and make other changes listed in the comments.
@@ -183,6 +183,16 @@ Status DBSCANDistrStep3Kernel<algorithmFPType, method, cpu>::compute(const DataC | |||
|
|||
DAAL_OVERFLOW_CHECK_BY_MULTIPLICATION(size_t, totalNRows, sizeof(algorithmFPType)); | |||
|
|||
if (totalNRows == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to move this if() before the overflow check.
size_t nFeatures = 0; | ||
for (size_t i = 0; i < nDataBlocks; i++) | ||
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns(); | ||
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code duplicates the pieces at lines 132, 256, 660. Please consider removing those duplicates.
size_t nFeatures = 0; | ||
for (size_t i = 0; i < nDataBlocks; i++) | ||
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns(); | ||
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree here
@@ -179,7 +179,14 @@ DAAL_EXPORT services::Status DistributedPartialResultStep6::allocate(const daal: | |||
|
|||
services::Status status; | |||
|
|||
set(step6ClusterStructure, HomogenNumericTable<int>::create(4, nRows, NumericTable::doAllocate, &status)); | |||
if (nRows != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if() statement looks meaningless to me. Why not revert to just
set(step6ClusterStructure, HomogenNumericTable::create(4, nRows, NumericTable::doAllocate, &status));
?
The same comment for other similar ifs in step8, step10 and step11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that creating table with 0
rows and doAllocate
flag leads to error raised.
@@ -184,6 +184,11 @@ void geometricPartitioning() | |||
sendCollectionAllToAll(beginId, endId, rankId, step4ResultPartitionedDataTag, curPartitionedData, partitionedData); | |||
sendCollectionAllToAll(beginId, endId, rankId, step4ResultPartitionedPartialOrdersTag, curPartitionedPartialOrders, partitionedPartialOrders); | |||
|
|||
if (partitionedData->size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be avoided. Because this code sequence makes sample less clear for the end user. @Pahandrovich please add nBlocks == 0 checks inside daal and remove those lines of code from the sample.
for (size_t part = 0; part < nBlocks; part++) | ||
{ | ||
NumericTablePtr ntPartialSplit = NumericTable::cast((*dcPartialSplits)[part]); | ||
ReadRows<algorithmFPType, cpu> partialSplitRows(ntPartialSplit.get(), 0, 1); | ||
DAAL_CHECK_BLOCK_STATUS(partialSplitRows); | ||
const algorithmFPType * const partialSplit = partialSplitRows.get(); | ||
|
||
partialSplitValues[part] = partialSplit[0]; | ||
if ((size_t)partialSplit[1] == -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider re-writing this if(). It has hard-to-understand comparison of unsigned size_t value to a signed const -1. I'd rewrite to something like:
const size_t noSplitDim = -1;
...
if ((size_t)partialSplit[1] == noSplitDim)
No description provided.