Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Mar 9, 2023

What changes were proposed in this pull request?

This is pr using BloomFilterAggregate to implement bloomFilter function for DataFrameStatFunctions.

Why are the changes needed?

Add Spark connect jvm client api coverage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Add new test
  • Manually check Scala 2.13

@LuciferYang LuciferYang marked this pull request as draft March 9, 2023 11:41
@LuciferYang LuciferYang changed the title [SPARK-42664][CONNECT] Support bloomFilter function for DataFrameStatFunctions [WIP][SPARK-42664][CONNECT] Support bloomFilter function for DataFrameStatFunctions Mar 9, 2023
numBits: Long,
fpp: Double): BloomFilter = {

val dataType = sparkSession
Copy link
Contributor Author

@LuciferYang LuciferYang Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this due to:

  1. check col support type in server side
  2. Add Cast for IntegerType/ShortType/ByteType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the TypeOf expression for this instead? Alternatively we can try to figure this out in the planner.

@LuciferYang LuciferYang changed the title [WIP][SPARK-42664][CONNECT] Support bloomFilter function for DataFrameStatFunctions [SPARK-42664][CONNECT] Support bloomFilter function for DataFrameStatFunctions Mar 10, 2023
@LuciferYang LuciferYang marked this pull request as ready for review March 10, 2023 15:38
@LuciferYang
Copy link
Contributor Author

In the last commit, make BloomFilterAggregate explicitly supported IntegerType/ShortType/ByteType and added corresponding updaters, then removed pass dataType and adding cast nodes

@LuciferYang
Copy link
Contributor Author

GA failure is not related to the current PR

numBits: Long,
fpp: Double): BloomFilter = {

val agg = if (!fpp.isNaN) {
Copy link
Contributor Author

@LuciferYang LuciferYang Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before chanage to always pass 3 parameters, always pass all 4 parameters.

Now change to pass (col, expectedNumItems, fpp) if !fpp. isNaN , otherwise pass (col, expectedNumItems, numBits).

@hvanhovell
Copy link
Contributor

@LuciferYang can we restart this effort.

I promise I will look at the proto PR :)

@LuciferYang
Copy link
Contributor Author

This pr has been rebased many times, should I submit a new one...

@LuciferYang LuciferYang closed this Aug 9, 2023
@LuciferYang
Copy link
Contributor Author

I make a clean one #42414

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants