Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modules: Add JSON output for db.univar and v.db.univar #2386

Merged
merged 4 commits into from
Jun 9, 2022

Conversation

wenzeslaus
Copy link
Member

@wenzeslaus wenzeslaus commented May 19, 2022

This adds JSON output for db.univar and v.db.univar.

v.db.univar map=roadsmajor column=SHAPE_LEN format=json -e percentile=80,90,95,99 | jq
{
  "statistics": {
    "n": 355,
    "min": 20.359027,
    "max": 64177.255429,
    "range": 64156.896402,
    "mean": 4934.153557109861,
    "mean_abs": 4934.153557109861,
    "variance": 38328715.30867731,
    "stddev": 6191.0189233015035,
    "coeff_var": 1.254727655238975,
    "sum": 1751624.5127740006,
    "first_quartile": 761.180256,
    "median": 1601.228177,
    "third_quartile": 9527.487778,
    "percentiles": [
      {
        "percentile": 80,
        "value": 11737.529039
      },
      {
        "percentile": 90,
        "value": 13883.001283
      },
      {
        "percentile": 95,
        "value": 14711.484257
      },
      {
        "percentile": 99,
        "value": 14943.396283
      }
    ]
  }
}

The implementation introduces some more duplication (in addition to the existing duplicated code), but given that the computations are directly implemented in Python without any libraries and that the implemented method, even if implemented correctly, may need to be changed to a more standard one, I'm going with bad, but easier to write, code which brings less changes to the current code.

This also adds test, but the correctness checks against NumPy are limited, esp. due to different definitions of quartiles and percentiles.

See also #2108.

@wenzeslaus wenzeslaus added this to the 8.4.0 milestone May 20, 2022
@wenzeslaus wenzeslaus added Python Related code is in Python enhancement New feature or request labels May 20, 2022
Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original.

The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.
@wenzeslaus wenzeslaus marked this pull request as ready for review June 1, 2022 14:09
@wenzeslaus
Copy link
Member Author

wenzeslaus commented Jun 1, 2022

This PR adds JSON output to v.db.univar. Please, let me know what you think about the output structure.

The question is how to represent the user-provided percentiles which are a list. Do we repeat them in output? As a list of mappings (as in the description) or as list of values (as below)? What about the names (percentile versus percentile value)?

{
  "statistics": {
    "first_quartile": 761.180256,
    "median": 1601.228177,
    "third_quartile": 9527.487778,
    "percentiles_points": [
      80,
      90,
      95,
      99
    ],
    "percentiles_values": [
      11737.529039,
      13883.001283,
      14711.484257,
      14943.396283
    ]
  }
}

Created with:

diff --git a/scripts/db.univar/db.univar.py b/scripts/db.univar/db.univar.py
index d96f168700..278e83deb7 100755
--- a/scripts/db.univar/db.univar.py
+++ b/scripts/db.univar/db.univar.py
@@ -362,10 +362,13 @@ def main():
         result["median"] = q50
         result["third_quartile"] = q75
         if options["percentile"]:
-            percentiles = []
+            percentiles_points = []
+            percentiles_values = []
             for i, one_percentile in enumerate(perc):
-                percentiles.append({"percentile": one_percentile, "value": pval[i]})
-        result["percentiles"] = percentiles
+                percentiles_points.append(one_percentile)
+                percentiles_values.append(pval[i])
+        result["percentiles_points"] = percentiles_points
+        result["percentiles_values"] = percentiles_values
         json.dump({"statistics": result}, sys.stdout)
     else:
         sys.stdout.write("first_quartile=%.15g\n" % q25)

@wenzeslaus
Copy link
Member Author

The current code produces percentiles which repeats the percentile parameter and percentile_values which are the actual values:

grass8 ~/grassdata/nc_spm_08_grass7/user1/ --exec v.db.univar precip_30ynormals column=annual format=json -e percentile=80,90,95,99 | jq
{
  "statistics": {
    "n": 136,
    "min": 947.42,
    "max": 2329.18,
    "range": 1381.7599999999998,
    "mean": 1289.311470588235,
    "mean_abs": 1289.311470588235,
    "variance": 39430.97231548565,
    "stddev": 198.57233522191768,
    "coeff_var": 0.15401424694633417,
    "sum": 175346.35999999996,
    "first_quartile": 1183.64,
    "median": 1234.44,
    "third_quartile": 1320.8,
    "percentiles": [
      80,
      90,
      95,
      99
    ],
    "percentile_values": [
      1381.76,
      1480.82,
      1661.16,
      2222.5
    ]
  }
}

@wenzeslaus
Copy link
Member Author

JSON output for v.db.univar has tests and it is ready for feedback, review, or merge.

@wenzeslaus wenzeslaus merged commit 120f198 into OSGeo:main Jun 9, 2022
@wenzeslaus wenzeslaus deleted the json-for-db_univar branch June 9, 2022 16:03
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Oct 26, 2022
* Add JSON output to db.univar.
* Add JSON from db.univar to v.db.univar.
* All formats now handled through the format option.
* Output percentiles as two lists, not a mapping.

Tests:

* Old tests fixed.
* New tests are using pytest.
* Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original.
* The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Feb 17, 2023
* Add JSON output to db.univar.
* Add JSON from db.univar to v.db.univar.
* All formats now handled through the format option.
* Output percentiles as two lists, not a mapping.

Tests:

* Old tests fixed.
* New tests are using pytest.
* Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original.
* The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.
neteler pushed a commit to nilason/grass that referenced this pull request Nov 7, 2023
* Add JSON output to db.univar.
* Add JSON from db.univar to v.db.univar.
* All formats now handled through the format option.
* Output percentiles as two lists, not a mapping.

Tests:

* Old tests fixed.
* New tests are using pytest.
* Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original.
* The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Python Related code is in Python
Projects
Development

Successfully merging this pull request may close these issues.

1 participant