Make sure pearson_correlation_scalar input is validated #753

JanisGailis · 2018-09-20T11:00:32Z

Input validation in pearson_correlation_scalar had faulty logic

JanisGailis · 2018-09-20T11:03:06Z

Closes #746

The pearson_correlation_scalar is meant to work only on 1D variables (much like the underlying scipy function). I had a check for 3D variables, which was faulty. This PR resolves that, but the operation you were trying to carry out in #746 will then fail anyway, just with a nicer message.

codecov-io · 2018-09-20T11:18:34Z

Codecov Report

Merging #753 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master     #753   +/-   ##
=======================================
  Coverage   76.94%   76.94%           
=======================================
  Files          81       81           
  Lines       12498    12498           
=======================================
  Hits         9616     9616           
  Misses       2882     2882

Impacted Files	Coverage Δ
cate/ops/correlation.py	`100% <100%> (+1.06%)`	⬆️
cate/util/process.py	`90.15% <0%> (-0.76%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9af8fd2...a044555. Read the comment docs.

JanisGailis · 2018-09-20T11:19:26Z

@forman Merge when ready. If this is not merged in a day, I'll go ahead and do it myself.

forman

Why is that operation 1D only? And why specifically about time series?

Can't we just check if both datasets have same dims and all dims are equal. If so, flatten both nd-arrays and compute correlation.

forman · 2018-09-21T14:31:31Z

@JanisGailis merging this, but please answer my question once you have time!

JanisGailis · 2018-09-24T08:52:11Z

@forman Using .values.flatten() on an ND xarray.DataArray, is a bad idea, as it will very likely run into MemoryError in many instances.

However, after some quick investigations, xr.DataArray.stack() could be used to achieve what you suggest in a memory safe way. This probably means that the datasets will have to be coregistered beforehand, for the operation to make any sense on ND inputs. This shouldn't take too long to implement. Should I go ahead?

forman · 2018-09-24T10:27:32Z

I actually did not mean the numpy flatten(), but just flatten, turn ND into 1D. So, yes, fine!

Make sure pearson_correlation_scalar input is validated

a044555

Input validation in pearson_correlation_scalar had faulty logic

JanisGailis requested a review from forman September 20, 2018 11:00

JanisGailis added the ops label Sep 20, 2018

forman reviewed Sep 21, 2018

View reviewed changes

forman merged commit 6540e61 into master Sep 21, 2018

JanisGailis deleted the jg-746-pearsonr branch September 27, 2018 07:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure pearson_correlation_scalar input is validated #753

Make sure pearson_correlation_scalar input is validated #753

JanisGailis commented Sep 20, 2018

JanisGailis commented Sep 20, 2018

codecov-io commented Sep 20, 2018 •

edited

Loading

JanisGailis commented Sep 20, 2018

forman left a comment

forman commented Sep 21, 2018

JanisGailis commented Sep 24, 2018

forman commented Sep 24, 2018

Make sure pearson_correlation_scalar input is validated #753

Make sure pearson_correlation_scalar input is validated #753

Conversation

JanisGailis commented Sep 20, 2018

JanisGailis commented Sep 20, 2018

codecov-io commented Sep 20, 2018 • edited Loading

Codecov Report

JanisGailis commented Sep 20, 2018

forman left a comment

Choose a reason for hiding this comment

forman commented Sep 21, 2018

JanisGailis commented Sep 24, 2018

forman commented Sep 24, 2018

codecov-io commented Sep 20, 2018 •

edited

Loading