-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A big refresh of the datatype engine #6695
Conversation
599f3ad
to
a2239d4
Compare
I put together the datatype related benchmarks we exchanged on the different datatype issues and PRs in a single repo (https://github.com/bosilca/ddt_bench). |
e1e92d1
to
d519e19
Compare
bot:ompi:retest |
bdec6b2
to
de6a289
Compare
As noted on the 2019-06-25 webex, this PR is both a performance enhancement for the DDT engine, but it's also a fix for #5540 (i.e., an issue that has been observed on the v4.0.x branch). |
025e09a
to
41aab40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just an idea, we could always define CHECKSUM
to 0
or 1
, and then if (CHECKSUM && convertor->flags & CONVERTOR_WITH_CHECKSUM)
and let the compiler handle the dead code.
That would make the code more compact and easier to maintain.
586e568
to
6e0ce95
Compare
@derbeyn could you re-review? |
@ggouaillardet could you review this PR when you have time? |
Move toward a base type of vector (count, type, blocklen, extent, disp) with disp and extent applying toward the count repertition and blocklen being a contiguous memory of type type. Implement 2 optimizations on this description used during type_commit: - collapse: successive similar datatype descriptions are collapsed together with an increased count. - fusion: fuse successive datatype descriptions in order to minimize the number of resulting memcpy during pack/unpack. Fixes at the OMPI datatype level including: - Fix the create_hindexed and vector creation. - Fix the handling of [get|set]_elements and _count. - Correctly compute the dispacement for block indexed types. - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Merge contiguous iov in order to minimize the number of returned iovec. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
- optimize handling of contiguous with gaps datatypes. - fixes a performance issue for all datatypes with a count of 1. - optimize the pack/unpack of contiguous with gaps datatype. - optimize the case of blocklen == 1 Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Rework the to_self test to be able to be used as a benchmark. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Upon detecting a datatype loop representation skip the entire loop according the the remaining space. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Optimize contiguous loops by collapsing them into a single element. During datatype optimization collapse similar elements into larger blocks. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Amazing how a bad instruction scheduling can have such a drastic impact on the code performance. With this change, the get a boost of at least 50% on the performance of data with a small blocklen and/or count. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
6e0ce95
to
3562d70
Compare
Looks like both |
Copying the failed results here, because CI logs get recycled every so often:
|
db8d2d0
to
0ec5c14
Compare
Start optimizing the code. This commit divides the operations in 2 parts, the first, outside the critical part, deals with partial blocks of predefined elements, and the second, inside the critical path, only deals with full blocks of elements. This reduces the number of expensive operations in the critical path and results in a decent performance increase. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
0ec5c14
to
aa17392
Compare
@hppritcha and I were discussing this PR. Would you mind PRing this back to v4.0.x (if you agree it's appropriate for v4.0.2) to resolve Issue #5540. That issue was marked as blocking a v4.0.2 release. |
@hppritcha and @gpaulsen here is the 4.0 PR #6863. |
Faster and less error prone. this patch is a significant redesign of the internals of the datatype engine. No API or ABI changes.
Fixes #5540 (Issue with overlapping vector datatype)