Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invariant sizes #85

Merged
merged 14 commits into from
Feb 26, 2020
Merged

Invariant sizes #85

merged 14 commits into from
Feb 26, 2020

Conversation

ivirshup
Copy link
Contributor

@ivirshup ivirshup commented Feb 19, 2020

Trying to enforce length.(dims(da)) == size(da) == size(data(da)) following up from #83.

This is pretty messy at the moment, but it kept bugging me throughout the day. There are a few cases that need thinking about.

  • How "unevaluated" dimensions are used. That is, does it make sense to have an instance of X()?
  • I'm not sure how TransformedGrids are supposed to work. Why are the dims of an array on this grid the transformer, and not the original dimensions with the transform applied?
  • formatdims could be cleaner
  • Whyare cov and cor defined the way they are?

@codecov-io
Copy link

codecov-io commented Feb 19, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@4e4a69d). Click here to learn what that means.
The diff coverage is 82.35%.

Impacted file tree graph

@@          Coverage Diff           @@
##             master   #85   +/-   ##
======================================
  Coverage          ?   84%           
======================================
  Files             ?    12           
  Lines             ?   650           
  Branches          ?     0           
======================================
  Hits              ?   546           
  Misses            ?   104           
  Partials          ?     0
Impacted Files Coverage Δ
src/methods.jl 84.69% <100%> (ø)
src/primitives.jl 91.33% <75%> (ø)
src/array.jl 66.66% <80%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e4a69d...9483a43. Read the comment docs.

@ivirshup ivirshup changed the title [WIP] Invariant sizes Invariant sizes Feb 19, 2020
@ivirshup ivirshup requested a review from rafaqz February 19, 2020 12:53
@rafaqz
Copy link
Owner

rafaqz commented Feb 19, 2020

Sorry I wont be able to review this for a couple days.

@ivirshup
Copy link
Contributor Author

Not sure if you've started looking at this yet, but do you have a preference on dealing with conflicts? E.g. should I rebase or merge?

@rafaqz
Copy link
Owner

rafaqz commented Feb 22, 2020

rebase when its easy but I dont mind really.

maybe I should do this merge, but not much time currently so have a go.

This could be done more cleanly. I'm thinking it should probably only do this if the dimension type is passed, not Dim()
* Now can do `DimensionalArray(a, (X(), Y))` and similar
* Made the code paths for different types clearer
* Reduced redundant code
* Errors for too many and too few dimensions should be similar now
@ivirshup
Copy link
Contributor Author

No worries, it was a pretty easy rebase.

@ivirshup
Copy link
Contributor Author

Oh, and my point about the test on the TransformedGrid. My understanding of TransformedGrid is that it's supposed to allow regridding operations like xESMF.

It doesn't make sense to me that the value of the dimension would be the transformation. It seems more like a property of the dimension's grid. Right now the test fails since the value of the dimension doesn't have length.

Previously this would throw an error due to an intermediate subset DimensionalArray being created with invalid dimensions.
@rafaqz
Copy link
Owner

rafaqz commented Feb 23, 2020

Sure the transormation can be in a grid field. The choice was pretty arbitrary.

@rafaqz
Copy link
Owner

rafaqz commented Feb 23, 2020

My understanding of TransformedGrid is that it's supposed to allow regridding operations like xESMF.

Maybe eventually. Currently it's just to be able to index points using a transformed coordinate system, and to be able to actually represent gdal etc files than use a coordinate transform.

It's also not at all my area of expertise, I don't actually use these grids. But we need to be able to represent them somehow and I wanted to be sure that could happen with the existing architecture - and it can. But it's not a complete implementation.

@rafaqz
Copy link
Owner

rafaqz commented Feb 24, 2020

Also realised I never answered these (too busy...):

How "unevaluated" dimensions are used. That is, does it make sense to have an instance of X()?

The instance of X() is for indexing the x dim with Colon(). Or for passing to mean etc as a dimension.

I'm not sure how TransformedGrids are supposed to work. Why are the dims of an array on this grid the transformer, and not the original dimensions with the transform applied?

Because then you have to compute every transform for every dim combination, and its at minimum 2d, so that also means allocating a matrix.

formatdims could be cleaner

Absolutely. I have a rework of it in progress.

Why are cov and cor defined the way they are?

I have no idea, but also not sure exactly what you mean

@ivirshup
Copy link
Contributor Author

Does anyone currently use the grids? I'm not sure I'm the person to redefine them, since I definitely don't use them. Would it be alright to leave the test disabled and open an issue for fixing this?

About what the dim values for transformed grid are, I was thinking something like:

DimArray A with dims θ, r is on a transformed grid that maps (X, Y) -> (θ, r). So if you access A with θ(Between(τ//4, τ//2)), r(Near(3.)), no transformation occurs. But if indexed with: X(Near(5)), Y(Near(10)) the transformation would be applied to the coordinates being searched for before the lookup happens.


The instance of X() is for indexing the x dim with Colon(). Or for passing to mean etc as a dimension.

I think what I mean by this, is should X() and X mean different things? I've gone ahead and made them equivalent in the cases I've come across, since that fits the tests, but I think it's worth considering.

format dims

Oh, I ended up doing a bit of cleanup here while fixing a bug...

Why are cov and cor defined the way they are?

Basically, if you did cov(DimArray(ones(5, 4), (X, Y)), dims=X) you get an array of size (4, 4) but with two X dimensions of length five. I've changed this, but there's a comment in the tests (tests/methods.jl) that says:

    # Need to think about dims for these, currently (Y, Y) etc.
    # But you can't index (Y, Y) with dims as you get the
    # first Y both times. It will plot correctly at least.

And I wasn't quite sure what this meant.

@rafaqz
Copy link
Owner

rafaqz commented Feb 24, 2020

No-one uses the TransformedGrid as far as I know, but I use the others every day in GeoData.jl - they are what lets it abstract the array access over different base data types, with different grid orders, projections and range/array indices etc. It's the basis of a lot of my other modelling packages and the use case I wrote DD for.

I think the transform is from some coordinates to the actual index integers, so you would only use Between with the untransformed values A[X(Between(35.9, 47.4)), Y(Near(145))]. Otherwise you can just use a unit range A[θ(1:3), r(7)].

We need both X and X() because X() is more reliably type stable than X. But you should be able to use them interchangeably everywhere. I think the X syntax feels cleaner.

cov/cor just sounds like an error from rushing the implementation. That comment is from before I fixed #46, and can be deleted now.

@ivirshup
Copy link
Contributor Author

No-one uses the TransformedGrid as far as I know, but I use the others every day in GeoData.jl

Oh, I definitely just meant TransformedGrid. So, do you mind if I leave the tests for TransformedGrid commented out with a TODO?

We need both X and X() because X() is more reliably type stable than X.

Yeah, the typeof((X, Y)) === Tuple{UnionAll, UnionAll} had me frustrated for a bit.

I think the X syntax feels cleaner.

Me too, that's what I've been going for. I've got another commit that makes mean(da, dims=(X, Y)) work (for a few other reducers as well). Could I add that here, or should it be another PR?


Apart from deciding on the multidimensional reduction stuff, does this need any changes?

@rafaqz
Copy link
Owner

rafaqz commented Feb 24, 2020

I guess we can leave TransformedGrid tests out, it feels a bit dirty but at least we'll know if anyone else is using it...

Maybe I'll merge this and you can make a new PR for the reduce methods, we should try to keep these focussed.

Also are there actual tests for invariant size?

test/array.jl Outdated Show resolved Hide resolved
@ivirshup
Copy link
Contributor Author

Travis is just failing due to build errors for the plotting tests. The relevant tests are passing.

@rafaqz
Copy link
Owner

rafaqz commented Feb 26, 2020

Ok this is a Plots/FFMPEG_jll problem on windows builds. We might need to disable windows testing temporarily to get this passing.

Actually we should put a conditional block around plot tests to avoid windows.

if !Sys.iswindows() 
    include("plotrecipes.jl")
end

@rafaqz rafaqz merged commit bcef609 into rafaqz:master Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants