Values outside QC test domain are incorrectly flagged as in error #191

AdamRJensen · 2023-05-30T12:13:53Z

I've found that the QC checks in pvanalytics flag values outside the testing domain as erroneously, which I strongly believe is against the purpose of the testing domain.

As an example, the BSRN comparison and diffuse ratio tests are stated to only be performed for GHI>50 W/m^2. The reason for this threshold is that the uncertainty at lower irradiance is too high to perform the test reliably. This is NOT saying that values lower than 50 W/m^2 should be flagged.

If values lower than 50 W/m^2 should be flagged, it certainly does not make sense for the BSRN lower limits on the individual components to be -4 W/m^2. For example, the below data point which matches the closure equation perfectly (i.e., GHI = DHI+DNI*cos(Z)) is flagged as erroneously because it's below 50 W/m^2.

    qcrad_consistency_flag = check_irradiance_consistency_qcrad(
        solar_zenith=pd.Series(80),
        ghi=40,
        dhi=40,
        dni=0)

I have a series of QC checks that I would like to add, but this issue is a blocker for now.

This issue is related to #188 (comment) and #167 (comment)

The text was updated successfully, but these errors were encountered:

cwhanse · 2023-05-30T15:25:01Z

@AdamRJensen if the BSRN tests aren't applied for GHI < 50 W/m2, what do the tests return for such values? I assume they don't roll up to a True/False flag, or do they?

I don't think False (in pvanalytics) should be interpreted as erroneous, it means not True, where True means the value passes the test. We could say more in the docstrings than "True where value passes limits test."

AdamRJensen · 2023-05-31T12:44:13Z

@AdamRJensen if the BSRN tests aren't applied for GHI < 50 W/m2, what do the tests return for such values? I assume they don't roll up to a True/False flag, or do they?

I don't think False (in pvanalytics) should be interpreted as erroneous, it means not True, where True means the value passes the test. We could say more in the docstrings than "True where value passes limits test."

^I think this is a problematic way of implementing QC tests and where the real issue lies!

I think my concern is best explained using an example. Let's consider two hypothetical tests from two different authors aiming to identify two different issues. The first test is suitable for SZA < 75 and the second test is suitable for SZA >= 75. Given that their domains do not overlap, one of the two flags will always be marked False (following the pvanalytics convention). Consequently, there are three possible outcomes (True, False), (False, True), or (False, False). Once I've applied all my various QC tests I want to combine them and assess the final decision of the data point. As I only want to use data points that didn't fail any tests, I use the "AND" operator, but the end result for the example above will always be False no matter the data point.

I think the flawed part of the pvanalytics QC implementation is that it follows a strategy of (1) determining if a data point is valid (defaults to False), instead of (2) determining whether a data point is suspicious (defaults to True). QC tests are developed to find suspicious data points and not to validate data, thus strategy (2) should be applied (flagging of suspect data).

@cwhanse I'm having hard time conveying this information and would be happy to have a quick chat about it.

kandersolar · 2023-06-01T15:04:27Z

Put another way, the problem is that we are trying to fit three states (pass, fail, N/A) into a two-state variable.

I don't think False (in pvanalytics) should be interpreted as erroneous, it means not True, where True means the value passes the test. We could say more in the docstrings than "True where value passes limits test."

I suspect most users will not grasp this distinction. I think most will assume "False = Bad", and I'm skeptical that including careful wording in the docstrings would be enough to disabuse them of this notion.

My first thought was to return np.nan for inputs outside the test domain, but pandas doesn't accept mixed boolean/nan masks with .loc (which I assume is the primary use case for these outputs): ValueError: Cannot mask with non-boolean array containing NA / NaN values.

In the absence of first-class ternary functionality in pandas, I think I'd support switching to a convention of False means failed, True means passed or N/A. But maybe we should poll users to how they want to be able to use these functions.

cwhanse · 2023-06-01T16:21:24Z

This would be a major design change, and I see the arguments in its favor. @kperrynrel can you discuss this with your colleagues?

The proposal is to return True for values outside the filter boundary, i.e., values for which the test doesn't apply. Currently, filter functions return False for such values.

AdamRJensen · 2023-07-12T22:17:58Z

@kperrynrel @cwhanse Here's an idea for perhaps a middle ground and, if anything, a solution that can let the addition of new tests proceed without breaking causing any breaking changes.

I propose to introduce an argument that lets the user decide what fill value to use outside the test domain. The existing tests return False for values outside the test domain, so this can be the default fill value, thus retaining the current functionality. It would, at the same time, provide flexibility and let users choose another fill value, e.g., True (as per my proposal) or nan.

The concept is based on a discussion with @kandersolar, though he may not endorse it.

cwhanse · 2023-07-13T14:45:28Z

Good idea - fill_value, out_of_bounds, something like that would work.

AdamRJensen changed the title ~~Values outside QC test domain are incorrectly flagged as erroneously~~ Values outside QC test domain are incorrectly flagged as in error May 30, 2023

cwhanse mentioned this issue May 30, 2023

Add extreme QCRAD QC limits #190

Merged

4 tasks

AdamRJensen mentioned this issue Jun 8, 2023

Implement Lorenz et al. POA and GHI QC methods #167

Open

8 tasks

AdamRJensen mentioned this issue Oct 9, 2024

Add outside_domain parameter to irradiance ratio QC tests #214

Merged

7 tasks

kandersolar closed this as completed in #214 Oct 24, 2024

kandersolar added this to the v0.2.2 milestone Nov 27, 2024

kandersolar added the enhancement New feature or request label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Values outside QC test domain are incorrectly flagged as in error #191

Values outside QC test domain are incorrectly flagged as in error #191

AdamRJensen commented May 30, 2023

cwhanse commented May 30, 2023

AdamRJensen commented May 31, 2023

kandersolar commented Jun 1, 2023

cwhanse commented Jun 1, 2023

AdamRJensen commented Jul 12, 2023

cwhanse commented Jul 13, 2023

Values outside QC test domain are incorrectly flagged as in error #191

Values outside QC test domain are incorrectly flagged as in error #191

Comments

AdamRJensen commented May 30, 2023

cwhanse commented May 30, 2023

AdamRJensen commented May 31, 2023

kandersolar commented Jun 1, 2023

cwhanse commented Jun 1, 2023

AdamRJensen commented Jul 12, 2023

cwhanse commented Jul 13, 2023