Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up cluster merge by batch copying arrays #280

Merged

Conversation

bouk
Copy link
Contributor

@bouk bouk commented Jun 15, 2023

I was looking through the code and found that a lot of time was being spent on 'make clusters' so I did a quick look through why that is the case. I found the merge_clusters function to be inefficient, so I optimized it to do batch memcopies wherever possible.

This gives me a ~20% reduction in time spent for a large image.

Before

Screenshot 2023-06-15 at 14 33 47

After

Screenshot 2023-06-15 at 14 33 56

Copy link
Collaborator

@christian-rauch christian-rauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this impressive speedup. I guess the improvement comes from changing the loop over source with element-wise calls to zarray_add, to the single memcpy?

I think the commit could use an explanation of what is going on and where the speed improvements come from.

common/zarray.h Outdated
@@ -437,26 +437,36 @@ static inline int zarray_index_of(const zarray_t *za, const void *p)
return -1;
}

/**
* Add elements from start up to and excluding end from 'source' into 'dest'.
* el_size must be the same for both lists
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "el_size" here be el_sz instead?

common/zarray.h Outdated
Comment on lines 467 to 470
static inline void zarray_add_all(zarray_t * dest, const zarray_t * source)
{
assert(dest->el_sz == source->el_sz);

// Don't allocate on stack because el_sz could be larger than ~8 MB
// stack size
char *tmp = (char*)calloc(1, dest->el_sz);

for (int i = 0; i < zarray_size(source); i++) {
zarray_get(source, i, tmp);
zarray_add(dest, tmp);
}

free(tmp);
zarray_add_range(dest, source, 0, source->size);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to keep the function zarray_add_all if it only maps to zarray_add_range? We could also just search & replace all calls to zarray_add_all instead.

merge_clusters was using zarray_add_all which was copying over elements
one-by-one doing 2 memcpys and a potential array resize per element.
Here we replace it by a range copy that does a single resize and memcpy
for the operation which is a lot faster.

In my testing it reduces the total runtime for an image that's 2000x3000
by 20%.
@bouk bouk force-pushed the bouk/faster-cluster-merge branch from 0ec3dbb to e1b143c Compare June 16, 2023 08:10
@bouk
Copy link
Contributor Author

bouk commented Jun 16, 2023

@christian-rauch addressed your comments!

@christian-rauch christian-rauch merged commit e2fd02f into AprilRobotics:master Jun 16, 2023
This was referenced Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants