Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Jul 26, 2015

The current implementation of UnsafeExternalSort uses NoOpPrefixComparator for binary-typed data.
So, we need to add BinaryPrefixComparator in PrefixComparators.

@SparkQA
Copy link

SparkQA commented Jul 26, 2015

Test build #38475 has finished for PR 7676 at commit b9827d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator

@sarutak
Copy link
Member

sarutak commented Aug 2, 2015

/CC @rxin
@maropu Could you update this to merge cleanly?

@maropu
Copy link
Member Author

maropu commented Aug 4, 2015

thanks, I'll fix it.

@maropu maropu force-pushed the BinaryTypePrefixComparator branch from b9827d6 to ecf3ac5 Compare August 4, 2015 05:08
@maropu
Copy link
Member Author

maropu commented Aug 4, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #197 has finished for PR 7676 at commit ecf3ac5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39667 has finished for PR 7676 at commit ecf3ac5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39663 has finished for PR 7676 at commit ecf3ac5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39698 has finished for PR 7676 at commit d943c04.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator

@maropu
Copy link
Member Author

maropu commented Aug 4, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #209 has finished for PR 7676 at commit d943c04.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator

@SparkQA
Copy link

SparkQA commented Aug 4, 2015

Test build #39718 has finished for PR 7676 at commit d943c04.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a word directly, similar to how we do it in utf8string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is hard to do that.
This byte access is needed to map signed bytes to unsigned ones, and
this mapping can make UnsignedLongs#compare compare them in a order-preserving way according to TypeUtils#compareBinary.

If we have direct word access here, BinaryPrefixComparator#compare needs to inefficiently compare them by using TypeUtils#compareBinary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use 1 as the initial value? I think use 0 could be better.

for (int i=0; i<minLen; i++) {
 p <<= 8;
 p |= 128L + UNSAFE.getByte(bytes, BYTE_ARRAY_OFFSET + i)
}
p <<= (8 - minLen) * 8;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@maropu
Copy link
Member Author

maropu commented Aug 5, 2015

@rxin The tests failed though, ISTM the hive-thriftserver tests are not related to this pr.
This failure is correct, or not?

@rxin
Copy link
Contributor

rxin commented Aug 5, 2015

cc @davies for review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the cast here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@maropu
Copy link
Member Author

maropu commented Aug 5, 2015

@davies @rxin ok, all the comments applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, they were: they're needed for BYTE_ARRAY_OFFSET.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke the build, so I'm hotfixing now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just missed that, thanks!

@davies
Copy link
Contributor

davies commented Aug 5, 2015

LGTM, merging this into master and 1.5.

asfgit pushed a commit that referenced this pull request Aug 5, 2015
…ExternalSort

The current implementation of UnsafeExternalSort uses NoOpPrefixComparator for binary-typed data.
So, we need to add BinaryPrefixComparator in PrefixComparators.

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #7676 from maropu/BinaryTypePrefixComparator and squashes the following commits:

fe6f31b [Takeshi YAMAMURO] Apply comments
d943c04 [Takeshi YAMAMURO] Add a codegen'd entry for BinaryType in SortPrefix
ecf3ac5 [Takeshi YAMAMURO] Support BinaryType in PrefixComparator

(cherry picked from commit 6d8a6e4)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
@asfgit asfgit closed this in 6d8a6e4 Aug 5, 2015
@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39842 has finished for PR 7676 at commit fe6f31b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class BinaryPrefixComparator extends PrefixComparator
    • public static final class BinaryPrefixComparatorDesc extends PrefixComparator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants