Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change isnull field of Nullable into hasvalue #18510

Merged
merged 2 commits into from
Sep 29, 2016
Merged

Change isnull field of Nullable into hasvalue #18510

merged 2 commits into from
Sep 29, 2016

Conversation

nalimilan
Copy link
Member

This is consistent with most other languages and formats.

Fixes #18507.

@nalimilan nalimilan added the missing data Base.missing and related functionality label Sep 14, 2016
@nalimilan nalimilan force-pushed the nl/hasvalue branch 3 times, most recently from 2618e99 to 8da3c10 Compare September 14, 2016 21:09
@@ -3,7 +3,7 @@
immutable NullException <: Exception
end

Nullable{T}(value::T, isnull::Bool=false) = Nullable{T}(value, isnull)
Nullable{T}(value::T, hasvalue::Bool=true) = Nullable{T}(value, hasvalue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the second argument meant to be used?

@tkelman Indeed the two-argument form is mostly useful for tests, where using === requires controlling the contents of value (cf. #16923). Maybe it can also help for performance, instead of writing ifelse(hasvalue, Nullable(value), Nullable()), I would have to check. I guess @johnmyleswhite can tell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make use of this for null-safe operations where we can generate the value and hasvalue fields without branching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, then it should be documented?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We should document it in such a way that it's clear that you have more freedom than you usually need and that its use is a performance optimization.

@@ -53,20 +53,20 @@ otherwise, returns `y` if provided, or throws a `NullException` if not.
"""
@inline function get{S,T}(x::Nullable{S}, y::T)
if isbits(S)
ifelse(x.isnull, y, x.value)
ifelse(isnull(x), y, x.value)
Copy link
Contributor

@TotalVerb TotalVerb Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe run benchmarks to make sure any ! operations are successfully eliminated? Or perhaps this is nothing compared to the cost of the branch, so even if they aren't removed, it doesn't matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea in general, but I don't think we have any benchmarks for Nullable yet. We really need to add some. We could also check the generated code manually in some cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened a PR to add benchmarks: JuliaCI/BaseBenchmarks.jl#24. Please suggest cases to add measurements for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no cause for concern in the generated code. It has not changed at all with this patch.

julia> @code_llvm get(Nullable{Int}(10), 100)

define i64 @julia_get_64687(%Nullable.7*, i64) #0 {
top:
  %2 = getelementptr inbounds %Nullable.7, %Nullable.7* %0, i64 0, i32 0
  %3 = load i8, i8* %2, align 1
  %4 = and i8 %3, 1
  %5 = icmp eq i8 %4, 0
  %6 = getelementptr inbounds %Nullable.7, %Nullable.7* %0, i64 0, i32 1
  %7 = load i64, i64* %6, align 8
  %8 = select i1 %5, i64 %7, i64 %1
  ret i64 %8
}

@nalimilan
Copy link
Member Author

The first commit only changes the field but keeps the constructor as it was, while the second also change the constructor. I think we want both for consistency, but separating them helps measuring the breakage it creates. Overall, very little changes are needed, which is promising as regards the broader ecosystem.

The two most disruptive cases are:

  1. the two-argument constructor (if we change it); see thread above
  2. the C API, where the isnull field is always passed explicitly (no constructor default value)

Luckily, these two cases appear to be relatively rare.

Copy link
Contributor

@TotalVerb TotalVerb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Since generated code for get has not changed, I suspect there will be no performance regressions (except possibly isnull, but that's unavoidable).

This is consistent with most other languages and formats.
This method is not currently documented, and the new form is more consistent
with the new structure of the type.
@nalimilan
Copy link
Member Author

I've run the benchmarks locally. As expected, they are slower with this PR if we keep the isnull field of NullableArray, but when reversing it to match hasvalue most regressions go away. What remains are regression in perf_countnulls, which is kind of logical since all it does is call isnull. I don't see this as a problem since it real cases, we generally have a branch/ifelse which the compiler automatically reverses for avoid the negation. We could change that benchmark into count_nonnulls, or add it in parallel; doesn't matter much IMHO.

So now it remains to find out whether many packages would be broken by the second commit. I've made a PR against NullableArrays, and it wasn't too painful. I think most packages should work fine.

@tkelman Could you please run PkgEval on this PR? Thanks!

@tkelman
Copy link
Contributor

tkelman commented Sep 27, 2016

Sure. I messed something up in a rebase of my first attempt, running again now.

@tkelman
Copy link
Contributor

tkelman commented Sep 28, 2016

Shouldn't be too surprised by this: CategoricalArrays, CSV, and SQLite are new failures. Git was a timeout, DynamicDiscreteModels is flaky. https://gist.github.com/7d1882b630429019667fece0cc82172f

@nalimilan
Copy link
Member Author

Great! These are all packages which were supposed to fail, and whose maintainers are in the loop. Will have a look. Cc: @quinnj

@quinnj
Copy link
Member

quinnj commented Sep 28, 2016

Noted. Will this be backported? Or just on 0.6?

@nalimilan
Copy link
Member Author

No, just on 0.6.

@quinnj
Copy link
Member

quinnj commented Sep 28, 2016

Great. Merge away. Should be a pretty easy compat.

@nalimilan
Copy link
Member Author

Should we merge this then? @johnmyleswhite? @davidagold?

Copy link
Member

@johnmyleswhite johnmyleswhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good by me.

@@ -3,7 +3,7 @@
immutable NullException <: Exception
end

Nullable{T}(value::T, isnull::Bool=false) = Nullable{T}(value, isnull)
Nullable{T}(value::T, hasvalue::Bool=true) = Nullable{T}(value, hasvalue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make use of this for null-safe operations where we can generate the value and hasvalue fields without branching.

@nalimilan nalimilan merged commit b36141f into master Sep 29, 2016
@nalimilan nalimilan deleted the nl/hasvalue branch September 29, 2016 16:09
StefanKarpinski pushed a commit that referenced this pull request Feb 8, 2018
StefanKarpinski pushed a commit that referenced this pull request Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants