-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modules: Add JSON output for db.univar and v.db.univar #2386
Conversation
Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original. The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.
This PR adds JSON output to v.db.univar. Please, let me know what you think about the output structure. The question is how to represent the user-provided percentiles which are a list. Do we repeat them in output? As a list of mappings (as in the description) or as list of values (as below)? What about the names (percentile versus percentile value)? {
"statistics": {
"first_quartile": 761.180256,
"median": 1601.228177,
"third_quartile": 9527.487778,
"percentiles_points": [
80,
90,
95,
99
],
"percentiles_values": [
11737.529039,
13883.001283,
14711.484257,
14943.396283
]
}
} Created with: diff --git a/scripts/db.univar/db.univar.py b/scripts/db.univar/db.univar.py
index d96f168700..278e83deb7 100755
--- a/scripts/db.univar/db.univar.py
+++ b/scripts/db.univar/db.univar.py
@@ -362,10 +362,13 @@ def main():
result["median"] = q50
result["third_quartile"] = q75
if options["percentile"]:
- percentiles = []
+ percentiles_points = []
+ percentiles_values = []
for i, one_percentile in enumerate(perc):
- percentiles.append({"percentile": one_percentile, "value": pval[i]})
- result["percentiles"] = percentiles
+ percentiles_points.append(one_percentile)
+ percentiles_values.append(pval[i])
+ result["percentiles_points"] = percentiles_points
+ result["percentiles_values"] = percentiles_values
json.dump({"statistics": result}, sys.stdout)
else:
sys.stdout.write("first_quartile=%.15g\n" % q25) |
The current code produces grass8 ~/grassdata/nc_spm_08_grass7/user1/ --exec v.db.univar precip_30ynormals column=annual format=json -e percentile=80,90,95,99 | jq {
"statistics": {
"n": 136,
"min": 947.42,
"max": 2329.18,
"range": 1381.7599999999998,
"mean": 1289.311470588235,
"mean_abs": 1289.311470588235,
"variance": 39430.97231548565,
"stddev": 198.57233522191768,
"coeff_var": 0.15401424694633417,
"sum": 175346.35999999996,
"first_quartile": 1183.64,
"median": 1234.44,
"third_quartile": 1320.8,
"percentiles": [
80,
90,
95,
99
],
"percentile_values": [
1381.76,
1480.82,
1661.16,
2222.5
]
}
} |
JSON output for v.db.univar has tests and it is ready for feedback, review, or merge. |
* Add JSON output to db.univar. * Add JSON from db.univar to v.db.univar. * All formats now handled through the format option. * Output percentiles as two lists, not a mapping. Tests: * Old tests fixed. * New tests are using pytest. * Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original. * The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.
* Add JSON output to db.univar. * Add JSON from db.univar to v.db.univar. * All formats now handled through the format option. * Output percentiles as two lists, not a mapping. Tests: * Old tests fixed. * New tests are using pytest. * Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original. * The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.
* Add JSON output to db.univar. * Add JSON from db.univar to v.db.univar. * All formats now handled through the format option. * Output percentiles as two lists, not a mapping. Tests: * Old tests fixed. * New tests are using pytest. * Fixed values for test obtained from the plain output, so the new JSON output is checked to fit with the original. * The tests with computation using NumPy would fail with different tests data, i.e., using n other than 10, fails the tests.
This adds JSON output for db.univar and v.db.univar.
v.db.univar map=roadsmajor column=SHAPE_LEN format=json -e percentile=80,90,95,99 | jq
The implementation introduces some more duplication (in addition to the existing duplicated code), but given that the computations are directly implemented in Python without any libraries and that the implemented method, even if implemented correctly, may need to be changed to a more standard one, I'm going with bad, but easier to write, code which brings less changes to the current code.
This also adds test, but the correctness checks against NumPy are limited, esp. due to different definitions of quartiles and percentiles.
See also #2108.