Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix distributed dbscan with empty partial results #657

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Pahandrovich
Copy link
Contributor

No description provided.

@Pahandrovich
Copy link
Contributor Author

/intelci: run

@@ -129,21 +129,23 @@ services::Status DistributedInput<step2Local>::check(const daal::algorithms::Par
const size_t nBlocks = dcPartialData->size();
DAAL_CHECK_EX(nBlocks > 0, ErrorIncorrectNumberOfInputNumericTables, ArgumentName, partialDataStr());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to handle case, when nBlocks == 0 in the same manner as the empty partialData tables.

@@ -129,21 +129,23 @@ services::Status DistributedInput<step2Local>::check(const daal::algorithms::Par
const size_t nBlocks = dcPartialData->size();
DAAL_CHECK_EX(nBlocks > 0, ErrorIncorrectNumberOfInputNumericTables, ArgumentName, partialDataStr());

size_t nFeatures = 0;
for (size_t i = 0; i < nBlocks; i++)
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this check is incorrect. It should be getNumberOfRows() != 0 to make sense.
Second, it is reasonable to put this check inside loop and just skip tables with 0 rows.

@@ -249,19 +251,17 @@ services::Status DistributedInput<step3Local>::check(const daal::algorithms::Par
const size_t nDataBlocks = dcPartialData->size();
DAAL_CHECK_EX(nDataBlocks > 0, ErrorIncorrectNumberOfInputNumericTables, ArgumentName, partialDataStr());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to handle case, when nBlocks == 0 in the same manner as the empty partialData tables.

size_t nFeatures = 0;
for (size_t i = 0; i < nDataBlocks; i++)
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns();
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is reasonable to put this check inside loop and just skip tables with 0 rows.

size_t nFeatures = 0;
for (size_t i = 0; i < nDataBlocks; i++)
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns();
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is reasonable to put this check inside loop and just skip tables with 0 rows.

size_t nFeatures = 0;
for (size_t i = 0; i < nDataBlocks; i++)
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns();
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is reasonable to put this check inside loop and just skip tables with 0 rows.

@@ -1197,14 +1198,17 @@ services::Status DistributedInput<step12Local>::check(const daal::algorithms::Pa
const size_t nQueriesBlocks = dcOrders->size();
DAAL_CHECK_EX(nQueriesBlocks > 0, ErrorIncorrectNumberOfInputNumericTables, ArgumentName, step12PartialOrdersStr());

for (size_t i = 0; i < nQueriesBlocks; i++)
if (NumericTable::cast((*dcOrders)[0])->getNumberOfRows() != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is reasonable to put this check inside loop and just skip tables with 0 rows.


const int unexpectedLayouts = (int)packed_mask;

for (size_t i = 0; i < nBlocks; i++)
if (NumericTable::cast((*dcPartitionedData)[0])->getNumberOfRows() != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is reasonable to put this check inside loop and just skip tables with 0 rows.

@@ -152,26 +152,35 @@ Status DistributedPartialResultStep4::check(const daal::algorithms::Input * inpu
DataCollectionPtr dcPartitionedPartialOrders = get(partitionedPartialOrders);

DAAL_CHECK_EX(dcPartitionedData.get(), ErrorNullPartialResult, ArgumentName, partitionedDataStr());
DAAL_CHECK_EX(dcPartitionedData->size() == nBlocks, ErrorIncorrectDataCollectionSize, ArgumentName, partitionedDataStr());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also add check for nBlocks == 0 and skip all further checks in such case.

@@ -184,6 +184,11 @@ void geometricPartitioning()
sendCollectionAllToAll(beginId, endId, rankId, step4ResultPartitionedDataTag, curPartitionedData, partitionedData);
sendCollectionAllToAll(beginId, endId, rankId, step4ResultPartitionedPartialOrdersTag, curPartitionedPartialOrders, partitionedPartialOrders);

if (partitionedData->size() == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workaround can be avoided using checks for nBlocks == 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be avoided. Because this code sequence makes sample less clear for the end user. @Pahandrovich please add nBlocks == 0 checks inside daal and remove those lines of code from the sample.

Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove changes in MPI sample. As samples should be as clear/simple as possible. You'll need to add checks for nBlocks == 0 to do so.

I'd also suggest to remove code duplications and make other changes listed in the comments.

@@ -183,6 +183,16 @@ Status DBSCANDistrStep3Kernel<algorithmFPType, method, cpu>::compute(const DataC

DAAL_OVERFLOW_CHECK_BY_MULTIPLICATION(size_t, totalNRows, sizeof(algorithmFPType));

if (totalNRows == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to move this if() before the overflow check.

size_t nFeatures = 0;
for (size_t i = 0; i < nDataBlocks; i++)
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns();
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code duplicates the pieces at lines 132, 256, 660. Please consider removing those duplicates.

size_t nFeatures = 0;
for (size_t i = 0; i < nDataBlocks; i++)
size_t nFeatures = NumericTable::cast((*dcPartialData)[0])->getNumberOfColumns();
if (NumericTable::cast((*dcPartialData)[0])->getNumberOfRows() != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree here

@@ -179,7 +179,14 @@ DAAL_EXPORT services::Status DistributedPartialResultStep6::allocate(const daal:

services::Status status;

set(step6ClusterStructure, HomogenNumericTable<int>::create(4, nRows, NumericTable::doAllocate, &status));
if (nRows != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if() statement looks meaningless to me. Why not revert to just
set(step6ClusterStructure, HomogenNumericTable::create(4, nRows, NumericTable::doAllocate, &status));
?

The same comment for other similar ifs in step8, step10 and step11.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that creating table with 0 rows and doAllocate flag leads to error raised.

@@ -184,6 +184,11 @@ void geometricPartitioning()
sendCollectionAllToAll(beginId, endId, rankId, step4ResultPartitionedDataTag, curPartitionedData, partitionedData);
sendCollectionAllToAll(beginId, endId, rankId, step4ResultPartitionedPartialOrdersTag, curPartitionedPartialOrders, partitionedPartialOrders);

if (partitionedData->size() == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be avoided. Because this code sequence makes sample less clear for the end user. @Pahandrovich please add nBlocks == 0 checks inside daal and remove those lines of code from the sample.

for (size_t part = 0; part < nBlocks; part++)
{
NumericTablePtr ntPartialSplit = NumericTable::cast((*dcPartialSplits)[part]);
ReadRows<algorithmFPType, cpu> partialSplitRows(ntPartialSplit.get(), 0, 1);
DAAL_CHECK_BLOCK_STATUS(partialSplitRows);
const algorithmFPType * const partialSplit = partialSplitRows.get();

partialSplitValues[part] = partialSplit[0];
if ((size_t)partialSplit[1] == -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider re-writing this if(). It has hard-to-understand comparison of unsigned size_t value to a signed const -1. I'd rewrite to something like:
const size_t noSplitDim = -1;
...
if ((size_t)partialSplit[1] == noSplitDim)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants