Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make labels available in reduce, apply_dimension etc. #245

Closed
m-mohr opened this issue Nov 21, 2019 · 5 comments
Closed

Make labels available in reduce, apply_dimension etc. #245

m-mohr opened this issue Nov 21, 2019 · 5 comments
Labels
process graphs processes Process definitions and descriptions
Milestone

Comments

@m-mohr
Copy link
Member

m-mohr commented Nov 21, 2019

We pass only the data to the callbacks in these functions: aggregate_polygon, aggregate_temporal, apply_dimension, merge_cubes, reduce, resample_cube_temporal. It is useful to also have the labels available, e.g. for the client band math "magic" or more advanced timeseries analysis. We should make the labels available for each value. Could be achieved either with an additional parameter or something like a labeled array data type.

@m-mohr
Copy link
Member Author

m-mohr commented Nov 26, 2019

Seems to be useful and needs to be explored:

  1. Whether back-ends can actually provide the data (rasdaman may not be able to do it)
  2. How to pass the data to the reducer

@m-mohr
Copy link
Member Author

m-mohr commented Nov 26, 2019

Telco: It seems useful, let's explore it.

@m-mohr
Copy link
Member Author

m-mohr commented Dec 17, 2019

Idea

  1. Define a data type "assoc-array" (ordered associative array based on JSON data type array, i.e. a OrderedDict in Python, an associative array in PHP, Map in JS, not sure about Java). Keys (strings or numbers) are dimension labels, values are pixel values. There's no JSON equivalent for this, but I don't think this is an issue. You could have { array: [{a:1}, {b: 2}] } or { array: [ ["a", "b"], [1, 2] ] } or {array: { labels: ["a", "b"], values: [1, 2] } } ...
    Example (PHP): $data = ["a" => 123, "b" => 567]
  2. Allow easy access to it by extending the from_argument object with an index. This avoids heavy use of array_element or a similar process.
    Example: {from_argument: "data", index: "a"} to access a in data.
  3. Additionally, either allow array_element to be used on this data type (and objects?) or define separate processes.

This would be backward compatible, I think. Could be supported by from_node, too.

By default index would be set to false so that an array without keys is returned (as it is now, for backward compatibility). Setting index to true returns the full dict. Settings the index to a string or number returns the requested element in the array.

cc @jdries

Example process graph

Changes: https://gist.github.com/m-mohr/ec69ca2fc27a003aa3bd78a8e4b512da/revisions

Before

{
  "dc": {
    "process_id": "load_collection",
    "description": "Loading the data; The order of the specified bands is important for the following reduce operation.",
    "arguments": {
      "id": "Sentinel-2",
      "spatial_extent": {
        "west": 16.1,
        "east": 16.6,
        "north": 48.6,
        "south": 47.2
      },
      "temporal_extent": ["2018-01-01", "2018-02-01"],
      "bands": ["B08", "B04", "B02"]
    }
  },
  "evi": {
    "process_id": "reduce",
    "description": "Compute the EVI. Formula: 2.5 * (NIR - RED) / (1 + NIR + 6*RED + -7.5*BLUE)",
    "arguments": {
      "data": {"from_node": "dc"},
      "dimension": "spectral",
      "reducer": {
        "callback": {
          "nir": {
            "process_id": "array_element",
            "arguments": {
              "data": {"from_argument": "data"},
              "index": 0
            }
          },
          "red": {
            "process_id": "array_element",
            "arguments": {
              "data": {"from_argument": "data"},
              "index": 1
            }
          },
          "blue": {
            "process_id": "array_element",
            "arguments": {
              "data": {"from_argument": "data"},
              "index": 2
            }
          },
          "sub": {
            "process_id": "subtract",
            "arguments": {
              "data": [{"from_node": "nir"}, {"from_node": "red"}]
            }
          },
          "p1": {
            "process_id": "product",
            "arguments": {
              "data": [6, {"from_node": "red"}]
            }
          },
          "p2": {
            "process_id": "product",
            "arguments": {
              "data": [-7.5, {"from_node": "blue"}]
            }
          },
          "sum": {
            "process_id": "sum",
            "arguments": {
              "data": [1, {"from_node": "nir"}, {"from_node": "p1"}, {"from_node": "p2"}]
            }
          },
          "div": {
            "process_id": "divide",
            "arguments": {
              "data": [{"from_node": "sub"}, {"from_node": "sum"}]
            }
          },
          "p3": {
            "process_id": "product",
            "arguments": {
              "data": [2.5, {"from_node": "div"}]
            },
            "result": true
          }
        }
      }
    }
  },
  "mintime": {
    "process_id": "reduce",
    "description": "Compute a minimum time composite by reducing the temporal dimension",
    "arguments": {
      "data": {"from_node": "evi"},
      "dimension": "temporal",
      "reducer": {
        "callback": {
          "min": {
            "process_id": "min",
            "arguments": {
              "data": {"from_argument": "data"}
            },
            "result": true
          }
        }
      }
    }
  },
  "save": {
    "process_id": "save_result",
    "arguments": {
      "data": {"from_node": "mintime"},
      "format": "GTiff"
    },
    "result": true
  }
}

After

{
  "dc": {
    "process_id": "load_collection",
    "description": "Loading the data; The order of the specified bands is important for the following reduce operation.",
    "arguments": {
      "id": "Sentinel-2",
      "spatial_extent": {
        "west": 16.1,
        "east": 16.6,
        "north": 48.6,
        "south": 47.2
      },
      "temporal_extent": ["2018-01-01", "2018-02-01"],
      "bands": ["B08", "B04", "B02"]
    }
  },
  "evi": {
    "process_id": "reduce",
    "description": "Compute the EVI. Formula: 2.5 * (NIR - RED) / (1 + NIR + 6*RED + -7.5*BLUE)",
    "arguments": {
      "data": {"from_node": "dc"},
      "dimension": "spectral",
      "reducer": {
        "callback": {
          "sub": {
            "process_id": "subtract",
            "arguments": {
              "data": [{"from_argument": "data", "index": "B8"}, {"from_argument": "data", "index": "B4"}]
            }
          },
          "p1": {
            "process_id": "product",
            "arguments": {
              "data": [6, {"from_argument": "data", "index": "B4"}]
            }
          },
          "p2": {
            "process_id": "product",
            "arguments": {
              "data": [-7.5, {"from_argument": "data", "index": "B2"}]
            }
          },
          "sum": {
            "process_id": "sum",
            "arguments": {
              "data": [1, {"from_argument": "data", "index": "B8"}, {"from_node": "p1"}, {"from_node": "p2"}]
            }
          },
          "div": {
            "process_id": "divide",
            "arguments": {
              "data": [{"from_node": "sub"}, {"from_node": "sum"}]
            }
          },
          "p3": {
            "process_id": "product",
            "arguments": {
              "data": [2.5, {"from_node": "div"}]
            },
            "result": true
          }
        }
      }
    }
  },
  "mintime": {
    "process_id": "reduce",
    "description": "Compute a minimum time composite by reducing the temporal dimension",
    "arguments": {
      "data": {"from_node": "evi"},
      "dimension": "temporal",
      "reducer": {
        "callback": {
          "min": {
            "process_id": "min",
            "arguments": {
              "data": {"from_argument": "data"}
            },
            "result": true
          }
        }
      }
    }
  },
  "save": {
    "process_id": "save_result",
    "arguments": {
      "data": {"from_node": "mintime"},
      "format": "GTiff"
    },
    "result": true
  }
}

@m-mohr m-mohr transferred this issue from Open-EO/openeo-processes Dec 19, 2019
@m-mohr m-mohr added this to the v1.0-rc1 milestone Dec 19, 2019
@m-mohr m-mohr added process graphs processes Process definitions and descriptions labels Dec 19, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Jan 14, 2020

This can also be useful for the object-based schema in rename_labels' parameter labels.

@m-mohr
Copy link
Member Author

m-mohr commented Jan 21, 2020

The subtype labeled-array is now available, which is an array but has labels stored instead indices. Labeled arrays can still be used as normal arrays, so you can pass an labeled array still to mean() for examples, without any change to the process graph. The labels can be accessed with array_* functions, e.g. array_element, array_find and array_labels. Labels take preference over indices.

We don't need a JSON encoding yet. With the changes in #254 to rename_labels, we have no place yet where we need a JSON encoding for labeled arrays in process graphs. So I didn't invent one yet.

The shortcut to access data without array_element, e.g. {from_argument: "data", index: "a"} is not included yet. I guess I'll combine these changes with #161?!

@m-mohr m-mohr closed this as completed Jan 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
process graphs processes Process definitions and descriptions
Projects
None yet
Development

No branches or pull requests

1 participant