-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QC-aware transformations #703
Comments
I wonder if applying the QC before transformation will be the best method. Xarray will apply the same methods to QC data as the other data. This results in Int dtypes upconverted to float and the values no longer whole numbers. We could encourage applying QC to the desire way before performing any method modifications. |
@kenkehoe Yeah, I think we're more or less on the same page there. QC masks should be applied before the transformation(s) to mask out bad values and drop the original QC variables so you don't wind up with nonsense averages or interpolations of bitpacked QC values. So far this is all within the realm of user-written code and maybe an example or two of how to do this using ACT. What I am suggesting is an extension on this which also provides output QC values that make sense given the transformation type, kind of like how ARM does it (examples of bitpacked transform QC below). For ARM, we automatically assume that any QC check with a 'Bad' assessment should automatically get masked out, and if a certain threshold of points are bad for a given bin, then the output for that bin should also get QC'd as bad and masked out. There's similar logic for 'Indeterminate' assessments. Detailed ARM QC
E.g., in the DOD interface for a particular variable you can add a ancillary variable
Summary ARM QC
E.g., using the summary transform QC
Moving forward with this is probably opening a whole can of worms as there are a bunch of things to scope out, but in general I think this could be a very nice feature for ACT and its users. I'd be open to collaboration here |
@maxwelllevin I can see argument for both sides of this discussion on how much to do for the user. ARM's ability to provide correct and meaningful QC is not 100%. There are plenty of cases where limits are not set correctly for a long period of time. This results in good data being labeled as Indeterminate and Indeterminate data being labeled as Bad. I think we need to provide code/examples for both cases to ensure the users can get the results that best work with their analysis. I think you are suggesting we create a method to convert b level data to s level data? I'm down with that. Should be pretty simple to implement. Plus it will show off ACT's superior technology. |
@kenkehoe @maxwelllevin thanks for the great discussion so far on this! I do want to get thoughts from @mgrover1 and @zssherman on this as well. I do want to go back to the original request which included the below request. I think we could easily take care of A with what Ken's thinking and develop a A. data values QC'd as bad are excluded from consideration in the transformation
Overall, I see 3 needs here:
Does that sound right? |
@AdamTheisen yeah, that all sounds right to me. I think "3. A method that transforms the data and adds value in the QC flagging" is more challenging than it first appears. Definitely would recommend discussing and defining/narrowing the scope on that for the first pass |
That sounds reasonable @AdamTheisen ! I agree with @maxwelllevin about being clear about the scope, |
@AdamTheisen Sounds reasonable as well to me! |
Note, we added an example in #734 on how users should filter with ACT before applying xarray transformations like resample. |
xarray provides powerful resample and groupby methods for transforming data onto a target coordinate grid, but it has no concept of QC variables so any transformations applied (e.g., mean, nearest) are performed naively and could result in the use of data that has been flagged as bad.
I think ACT would be a great place to host extensions to xarray's methods that do account for QC values in transformations. The proposed interface would be a series of methods that mirror the transformation types offered by the ARM Data Integrator (ADI) made available to ARM users in the PCM Interface:
The ADI library makes a few key decisions for QC-aware transformations that I think should be mirrored here:
I think this could be implemented as a method applied to an xarray
DatasetResample
/DatasetGroupBy
object returned byds.resample
/ds.groupby
, e.g.,:The transform functions/classes (
NearestNeighbor
,Interpolate
,BinAverage
) should take and return xarrayDataset
objects. The input passed by theapply
method contains all the points in the given bin and the output is expected to be a 0-coord Dataset with scalar values for each data variable (metadata included).I'm totally open to any changes/feedback. This could probably use several iterations of revisions to make it easier for users. Let me know what you think!
The text was updated successfully, but these errors were encountered: