Replace min/max, make quantile_mut return None for empty arrays, and fix docs #13

jturner314 · 2018-11-18T21:21:42Z

This PR does a few things:

Replace the old min/max methods with the old min_partialord/max_partialord. (Rename min_partialord/max_partialord to min/max.) See this comment for more explanation. The primary issue with the old min/max methods is that they panicked on empty arrays.
Make quantile_mut return None for empty arrays. It's easy to forget about the possibility of empty arrays, so it's important to remind the user of the possibility and force them to handle it. See this comment for more justification.
Fix the conditions for panicking in the docs of histogram strategies and quantile_mut.

Edit: I'm uncomfortable with the histogram strategies panicking in so many cases, but we can fix that in a later version.

The problem with the old methods was that they panicked when the array was empty, which was very problematic. (See rust-ndarray/ndarray#512 for discussion.) The old `min_partialord` and `max_partialord` have been renamed to `min`/`max`.

LukeMathWalker · 2018-11-18T23:05:35Z

I'll review it tomorrow morning.

On histogram's panics: should we change the signature to return a Result?
We panic in those occasions where it is not possible to provide an answer that makes sense:

what is the optimal bin width for a constant/empty array?
how do I handle degenerate cases (e.g. IQR=0 in Freedman Diaconis)?

Returning a Result would probably give a chance to the user to specify a default that makes sense to them, especially if we provide a useful error.

jturner314 · 2018-11-19T00:16:29Z

For empty arrays, I think the right thing to do is to have Edges be an empty Vec (no bins).

For constant arrays, I think the right thing to do is have Edges be vec![the_constant_value] (no bins) or vec![the_constant_value, the_constant_value + 1] (one bin). Alternatively, if we change Edges to allow duplicate edges, we could use vec![the_constant_value; 2] (one bin).

For Freedman Diaconis with IQR=0, I'm not sure; that requires some thought. Panicking definitely isn't the right thing, because IMO panics shouldn't be a function of the data in the array. I think I'd return an Option/Result or have a fallback strategy. At first glance, the Option/Result appeals to me more.

By the way, in a future version, I'm thinking about replacing GridBuilder with a fit_array method on Grid and simplifying the BinsBuildingStrategy trait:

impl<A: Ord> Grid<A> {
    fn fit_array<S, B>(array: &ArrayBase<S, Ix2>, strategy: &B) -> Result<Self, Error>
    where
        S: Data<Elem = A>,
        B: BinsBuildingStrategy<A>,
    {
        // ...
    }
}

pub trait BinsBuildingStrategy<A: Ord> {
    fn fit_array<S>(&self, array: &ArrayBase<S, Ix1>) -> Result<Bins<A>, Error>
    where
        S: Data<Elem = A>;
}

The BinsBuildingStrategy types wouldn't contain the min/max/n_bins/etc. derived from a specific array; they'd contain whatever metadata would be necessary to calculate the Bins in fit_array. This would simplify handling of special cases and would simplify the public API a little. What do you think about this?

LukeMathWalker · 2018-11-19T08:09:16Z

For empty arrays, I think the right thing to do is to have Edges be an empty Vec (no bins).

It makes sense, I agree.

For constant arrays, I think the right thing to do is have Edges be vec![the_constant_value] (no bins) or vec![the_constant_value, the_constant_value + 1] (one bin). Alternatively, if we change Edges to allow duplicate edges, we could use vec![the_constant_value; 2] (one bin).

We could use a "default" width of 1, that could be the way. I'd avoid duplicate edges since they would still lead to empty bins and a degenerate grid (no observation would be captured), thus surprising the user.

By the way, in a future version, I'm thinking about replacing GridBuilder with a fit_array method on Grid and simplifying the BinsBuildingStrategy trait:
impl<A: Ord> Grid<A> {
    fn fit_array<S, B>(array: &ArrayBase<S, Ix2>, strategy: &B) -> Result<Self, Error>
    where
        S: Data<Elem = A>,
        B: BinsBuildingStrategy<A>,
    {
        // ...
    }
}

pub trait BinsBuildingStrategy<A: Ord> {
    fn fit_array<S>(&self, array: &ArrayBase<S, Ix1>) -> Result<Bins<A>, Error>
    where
        S: Data<Elem = A>;
}
The BinsBuildingStrategy types wouldn't contain the min/max/n_bins/etc. derived from a specific array; they'd contain whatever metadata would be necessary to calculate the Bins in fit_array. This would simplify handling of special cases and would simplify the public API a little. What do you think about this?

A fit_array method sounds good to me, as a shortcut on Grid to simplify the public API, but you still need extremes and a bin width to execute all the strategies we have implemented, hence I think the new strategy would end up quite similar to the ones we have right now. They would probably just squeeze everything in a single method.

jturner314 added 4 commits November 18, 2018 15:56

Remove min/max for A: Ord

71f416a

The problem with the old methods was that they panicked when the array was empty, which was very problematic. (See rust-ndarray/ndarray#512 for discussion.) The old `min_partialord` and `max_partialord` have been renamed to `min`/`max`.

Document axis_len == 0 panic for quantile_axis_mut

b523932

Make quantile_mut return None for empty arrays

2aa647b

Fix panic docs for histogram strategies

73a4819

jturner314 mentioned this pull request Nov 18, 2018

Roadmap #1

Open

17 tasks

LukeMathWalker merged commit 882a411 into master Nov 19, 2018

LukeMathWalker deleted the rework-min-max branch November 19, 2018 08:19

jturner314 mentioned this pull request Nov 19, 2018

Reduce cases where histogram strategies panic #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace min/max, make quantile_mut return None for empty arrays, and fix docs #13

Replace min/max, make quantile_mut return None for empty arrays, and fix docs #13

jturner314 commented Nov 18, 2018 •

edited

Loading

LukeMathWalker commented Nov 18, 2018

jturner314 commented Nov 19, 2018

LukeMathWalker commented Nov 19, 2018

Replace min/max, make quantile_mut return None for empty arrays, and fix docs #13

Replace min/max, make quantile_mut return None for empty arrays, and fix docs #13

Conversation

jturner314 commented Nov 18, 2018 • edited Loading

LukeMathWalker commented Nov 18, 2018

jturner314 commented Nov 19, 2018

LukeMathWalker commented Nov 19, 2018

jturner314 commented Nov 18, 2018 •

edited

Loading