Skip to content

Conversation

@swapnilushinde
Copy link

What changes were proposed in this pull request?

Currently, DataType equals* methods has lot of code duplication. This PR adds -

  1. Helper function that removes code duplication and makes it easier to understand.
  2. It makes adding new variations of equals* methods easy by simply choosing right parameter combinations of helper function.

Current functionality -

DataTypes are matched on three factors - (fieldName, dataType, nullability)
Current APIs

  1. equalsIgnoreNullability - (fieldNameCheck: Yes, dataType: Yes, nullability: No)
  2. equalsIgnoreCompatibleNullability - (fieldNameCheck: Yes, dataType: Yes, nullability: Compatible)
  3. equalsIgnoreCaseAndNullability - (fieldNameCheck: Yes(case insensitive), dataType: Yes, nullability: No)
  4. equalsStructurally - (fieldNameCheck: No, dataType: Yes, nullability: Yes/No)

Same API contracts are maintained but just new helper functions equalsDataTypes, isSameFieldName, isSameNullability added to give more flexibility & code centralization.
Additional private vals are added to set behavior of helper functions.

private val NoNameCheck = 0
private val CaseSensitiveNameCheck = 1
private val CaseInsensitiveNameCheck = 2
private val NoNullabilityCheck = 0
private val NullabilityCheck = 1
private val CompatibleNullabilityCheck = 2

Any combination of above variables can be generated in future to add new equals* APIs. For instance,

equalsIgnoreCaseCompatibleNullability

can easily be generated by calling -

equalsDataTypes(left, right, CaseInsensitiveNameCheck, CompatibleNullabilityCheck)

New scalatests added to test missing API unit testing as well.

How was this patch tested?

Code was tested with existing scalatests. New scalatests added for more robust testing of this functionality.
All tests were ran locally to make sure it doesn't affect elsewhere.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does this deduplicate? Seems overkill and even more complicated and not reducing the codes much. Roughly guess 60ish line deletion and 70ish line addition.

@swapnilushinde
Copy link
Author

swapnilushinde commented Jul 17, 2018

Thanks @HyukjinKwon! I removed old commented code in new comment.
Old code was repeating same recursive datatype checks again & again in every equals* function.

  • This commit abstracts that recursive equality checks and gives flexibility to easily add any combinations of equals* method in future. (eg. equalsIgnoreCaseCompatibleNullability() can be created with one line of code)
  • I have modularized dataType equality in smaller helper functions for more readability.
  • Equals* functions are just one line with this commit.

Please let me know if I need to change anything to make code changes more readable & less complex.

@HyukjinKwon
Copy link
Member

But the current code looks less readable and adding more lines. I would rather leave this as was.

@swapnilushinde
Copy link
Author

swapnilushinde commented Jul 18, 2018

true. Currently we have just 3 variations of comparing two datatypes for equality. Adding even one more equality function would easily cause writing same repetitive code which would negate observed increase in code lines.
Comparing two datasets' with different criteria for datatype equality is very handy for unit testing purposes too.
Can you please let me know what do you find unreadable/complex in new code so I can try to simplify it if possible?
@HyukjinKwon

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 18, 2018

Constant variables and squashing the logic into one function look not worth enough and overkill. Less duplication is good of course but it doesn't look worth enough by the overkill. I would focus on more important stuff. -1 from me.

@asfgit asfgit closed this in 1a4fda8 Jul 19, 2018
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#17422
Closes apache#17619
Closes apache#18034
Closes apache#18229
Closes apache#18268
Closes apache#17973
Closes apache#18125
Closes apache#18918
Closes apache#19274
Closes apache#19456
Closes apache#19510
Closes apache#19420
Closes apache#20090
Closes apache#20177
Closes apache#20304
Closes apache#20319
Closes apache#20543
Closes apache#20437
Closes apache#21261
Closes apache#21726
Closes apache#14653
Closes apache#13143
Closes apache#17894
Closes apache#19758
Closes apache#12951
Closes apache#17092
Closes apache#21240
Closes apache#16910
Closes apache#12904
Closes apache#21731
Closes apache#21095

Added:
Closes apache#19233
Closes apache#20100
Closes apache#21453
Closes apache#21455
Closes apache#18477

Added:
Closes apache#21812
Closes apache#21787

Author: hyukjinkwon <gurwls223@apache.org>

Closes apache#21781 from HyukjinKwon/closing-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants