Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate count_ones and count_zeros to countones and countzeros #23531

Closed
wants to merge 1 commit into from

Conversation

Sacha0
Copy link
Member

@Sacha0 Sacha0 commented Aug 31, 2017

This pull request deprecates count_ones and count_zeros to countones and countzeros as part of the API consistency review's underscore audit. Best!

@Sacha0 Sacha0 added the deprecation This change introduces or involves a deprecation label Aug 31, 2017
@Sacha0 Sacha0 added this to the 1.0 milestone Aug 31, 2017
@fredrikekre
Copy link
Member

fredrikekre commented Sep 4, 2017

FWIW countones is not a very intuitive name; the first time I encountered this I thought countones(A) was convenience for something like count(x -> x == 1, A).

@KristofferC
Copy link
Member

Agreed. It is a quite generic name for an uncommon low level operation. It is also (AFAIK) unlikely to be overloaded a lot so it could in my opinion use a longer descriptive name.

@Sacha0
Copy link
Member Author

Sacha0 commented Sep 4, 2017

Agreed. Suggestions? :)

@nalimilan
Copy link
Member

nalimilan commented Sep 4, 2017

Hamming distance? :-) More seriously, maybe bitcount? Wikipedia says its used by some languages. Not sure about countzeros, though.

@Sacha0
Copy link
Member Author

Sacha0 commented Sep 4, 2017

Regarding changes beyond underscore removal, perhaps consistency with leading_{ones|zeros} and trailing_{ones|zeros} should be kept in mind as well?

The alternatives that occur to me immediately seem a bit unwieldy (countonebits/countzerobits, countonbits/countoffbits, bitcountones/bitcountzeros, and similarly for the leading/trailing functions), but perhaps that unwieldiness is fine given these operations' nicheness. Best!

@ararslan
Copy link
Member

ararslan commented Sep 4, 2017

We could do something like

bitcount(::typeof(isone), n::Integer) = # count_ones(n), likewise for iszero -> count_zeros

We could also define bitcount(n) = bitcount(isone, n) for convenience/compatibility with other languages.

I think popcnt is what the actual assembly instruction is called for counting set bits, but that's not exactly obvious, and I'm not sure there's an equivalent name for counting unset bits.

@Sacha0
Copy link
Member Author

Sacha0 commented Sep 4, 2017

How do you envision extending that bitcount proposal to {leading|trailing}{ones|zeros}?

The more I mull it over, the more I appreciate the simplicity and consistency of the status quo ({count|leading|trailing}{ones|zeros}).

@TotalVerb
Copy link
Contributor

If we make bits return a lazy view over the bits of a number instead of a string, this can just be written as a specialization of count(isone, bits(n)). The point about leading/trailing remains, however.

@nalimilan
Copy link
Member

Since count_zeros is defined as count_ones(~x), couldn't we deprecate it, e.g. in favor of bitcount(~x)? I'm not sure other languages provide both "one" and "zero" functions. Same for leading_ones/trailing_ones, which are defined as leading_zeros(~x)/trailing_zeros(~x). Common name for these seem to be clz and ctz ("count leading/trailing zeros"), which I admit isn't very explicit, but that's a low-level operation anyway.

@rfourquet
Copy link
Member

If we make bits return a lazy view over the bits of a number instead of a string

I thought the same (I have a module to do that kind of things), but is there other uses in base for this? otherwise, it may be too complicated for Base (also, close to that is: count(isone, digits(x, 2, sizeof(x)*8)), but which is inneficient).

The flipside is, why not have countones(x) defined as count(isone, x) for a general iterator x, and specialized for integers? (I think I don't like it to have a special case though, as integers are already iterable with a different interpretation than iterating over their bits).

@ararslan
Copy link
Member

ararslan commented Sep 5, 2017

As long as numbers are iterable, I don't think we change the meaning of count(isone, x::Integer) as that will be very subtly breaking.

Some data points:

Language count_ones leading_zeros trailing_zeros Equivalent for 0/1
Fortran 2008 popcnt leadz trailz No
GCC/Clang __builtin_popcount __builtin_clz __builtin_ctz No
Haskell popCount countLeadingZeros countTrailingZeros No
Java bitCount numberOfLeadingZeros numberOfTrailingZeros No
LLVM Assembly ctpop ctlz cttz No
Rust count_ones leading_zeros trailing_zeros Yes

The following languages don't have any such functions built in: Python, Ruby, Perl, Matlab, R

I could have done more but I got bored of Googling. My takeaway from this is that most languages don't seem to provide functionality for counting zeros, leading ones, or trailing ones. However, Rust does, and all of our functions are named identically to theirs. I typically take Rust's choices more seriously than other languages, as it's about as new as Julia so they're also trailblazing in some sense.

I'm still not sure what the best choice is, but at least we have a bit of perspective from other languages.

@Sacha0
Copy link
Member Author

Sacha0 commented Sep 5, 2017

Nice data! :D Of all those names, Rust/Julia's strike me as far and away the best. I lean progressively more heavily to the status quo less underscores. Thanks for compiling that list! :)

@ararslan
Copy link
Member

ararslan commented Sep 5, 2017

We could go with countones, leadingzeros, and trailingzeros, and deprecate count_zeros to countones(~x), leading_ones to leadingzeros(~x), and trailing_ones to trailingzeros(~x). Just an idea, anyway.

My only concern with count_*/count* is that the name can be confused with count(is*, x), which is not the same.

@nalimilan
Copy link
Member

Yes, I usually like following Rust's choices, but in that case count* sounds too easy to confuse with count(*. Maybe that's just because Rust and Julia have different user targets, but I don't like stealing so general names for such specific and low-level operations. I'd rather use Fortran's terminology.

@waldyrious
Copy link
Contributor

Could {count|leading|trailing}{on|off}bits be an option?

@ararslan
Copy link
Member

ararslan commented Sep 5, 2017

on/off is a little confusing

@waldyrious
Copy link
Contributor

waldyrious commented Sep 5, 2017

I find on/off more explicit (and even venture say more appropriate) than one/zero, when in a binary context. That's not to say I can't see how the former may throw people off, especially since most languages seem to use one/zero — however, the alternative would be {count|leading|trailing}{one|zero}bits, which creates another (more serious IMO) ambiguity: countonebits and countzerobits are confusing in a different way; i.e., the "one" and "zero" may be interpreted as modifiers of "count", rather as modifiers of "bits".

So while I do see the potential for some confusion, I'm not sure the alternatives are less confusing/ambiguous.

@nalimilan
Copy link
Member

nalimilan commented Sep 6, 2017

According to Wikipedia, the POSIX libc terminology speaks of "set bits". But even with that name, leadingsetbits isn't very readable...

It could make sense to handle the diversity of cases via a second argument. We could have countbits(x), countbits(x, :leading) and countbits(x, :trailing). Performance-wise, something like this seems to generate optimal code thanks to inlining:

@inline function countbits(x, kind=:all)
    if kind === :all
        count_ones(x)
    elseif kind === :leading
        leading_ones(x)
    elseif kind === :trailing
        trailing_ones(x)
    end
end
f(x) = countbits(x, :trailing)
@code_native f(1)

Unfortunately, throwing an error when kind does not match one of the three choices makes the generated code much more complex, but maybe there's a solution. EDIT: seems to work fine actually.

@Sacha0 Sacha0 added triage This should be discussed on a triage call needs decision A decision on this change is needed labels Sep 12, 2017
@Sacha0
Copy link
Member Author

Sacha0 commented Sep 12, 2017

At this point I favor the minimal evolution of the status quo necessary to address the comments that sparked this discussion (

FWIW countones is not a very intuitive name; the first time I encountered this I thought countones(A) was convenience for something like count(x -> x == 1, A).

Agreed. It is a quite generic name for an uncommon low level operation. It is also (AFAIK) unlikely to be overloaded a lot so it could in my opinion use a longer descriptive name.

), particularly {count|leading|trailing}{one|zero}bits. That set of names seems likely be to as simple, intuitive, and unambiguous to a broad population as any of the other suggestions. And chances are this thread's participants' bandwidth would be better spent on less marginal concerns. So perhaps we can call this set of names good enough and lay this bikeshed to rest? :) Best!

@ararslan
Copy link
Member

As much as I dislike extended bikeshedding sessions, I must say that IMHO trailingzerobits seems like quite a mouthful. But these functions are specialized enough that it probably doesn't matter too much.

@nalimilan
Copy link
Member

Any ideas about how to get rid of the performance cost of throwing an error when kind doesn't match any valid case in my countbits example above? So far it sounds like the most readable solution to me (if it is possible to implement efficiently).

Though I'm not opposed to trailingzerobits, which is verbose but at least explicit.

@JeffBezanson
Copy link
Member

Here's a similar possibility:

countbits(n, 0 or 1)
leadingbits(n, 0 or 1)
trailingbits(n, 0 or 1)

@Sacha0
Copy link
Member Author

Sacha0 commented Sep 14, 2017

Unfortunately, throwing an error when kind does not match one of the three choices makes the generated code much more complex, but maybe there's a solution.

A mock implementation of Jeff's suggestion seems fine. Am I missing something? Thanks!

julia> begin
           function countbits(x, kind)
               if kind === 0
                   count_zeros(x)
               elseif kind === 1
                   count_ones(x)
               else
                   error("cats!")
               end
           end

           f0(x) = countbits(x, 0)
           f1(x) = countbits(x, 1)
           g0(x) = count_zeros(x)
           g1(x) = count_ones(x)
       end
g1 (generic function with 1 method)

julia> @code_llvm f0(1)

; Function f0
; Location: REPL[1]
define i64 @julia_f0_61723(i64) #0 {
top:
; Location: REPL[1]:12
; Function countbits; {
; Location: REPL[1]:4
  %1 = xor i64 %0, -1
  %2 = call i64 @llvm.ctpop.i64(i64 %1)
;}
  ret i64 %2
}

julia> @code_llvm g0(1)

; Function g0
; Location: REPL[1]
define i64 @julia_g0_61738(i64) #0 {
top:
; Location: REPL[1]:14
  %1 = xor i64 %0, -1
  %2 = call i64 @llvm.ctpop.i64(i64 %1)
  ret i64 %2
}

@nalimilan
Copy link
Member

@Sacha0 You're right, I tried reproducing the problem with the generated code and I couldn't, even with my example from above. Not sure what I did wrong the first time. So we can choose whatever API looks the best.

@JeffBezanson
Copy link
Member

From triage: the current names aren't so bad and we should probably just keep them. Plus this is a pure renaming issue; it doesn't really affect broader API or design issues. The fact that Rust has adopted these names is a further reason to leave them as-is.

@vtjnash vtjnash closed this Sep 14, 2017
@nalimilan
Copy link
Member

So even underscores in function names are considered OK?

@TotalVerb
Copy link
Contributor

I think the consensus was that using underscores to separate words is not a big problem; the desire to avoid them is mostly to push toward simplifying function names to single words whenever possible.

@JeffBezanson
Copy link
Member

Surely we have bigger problems than the occasional presence of an underscore.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Sep 14, 2017
@ararslan ararslan mentioned this pull request Jan 17, 2018
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation This change introduces or involves a deprecation needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants