CQL2 JSON encoding #601

cportele · 2021-07-05T14:55:27Z

CQL2 JSON encoding options

There are two proposals for the CQL2 JSON encoding. They are currently described here and here.

The first option is the current text in the draft standard and uses JSON objects for predicates and JSON arrays for operands.

The second option was developed during one of the OGC Code Sprints and uses JSON arrays only (except for the GeoJSON geometries). It is inspired by the expressions in the Mapbox Style specification. One effect of this is differences in how operators are named (e.g., "==" instead of "=", "all" instead of "and", "get" instead of "property").

The Standards Working Group (SWG) wants to adopt only one JSON encoding for CQL2 and is looking for feedback which one should be adopted. This comment provides a side-by-side comparison of four examples to get a quick overview. The complete set of examples is available here.

The SWG currently does not have good reasons for picking one option over the other. Both options seem similar with the differences listed above (use of objects vs arrays, differences in the names). We are interested in feedback, if there are preferences in terms of the style or good reasons for preferring one option, because it is easier to process or create.

If you prefer the first option, please add a "thumbs up" in this comment.

If you prefer the second option, please add a "thumbs up" in this comment.

Comments on specific advantages or disadvantages of the options are very welcome, too.

Example 1

scene_id = 'LC82030282019133LGN00'

Option 1	Option 2
{ "=": [ { "property": "scene_id" }, "LC82030282019133LGN00" ] }	[ "==", [ "get", "scene_id" ], "LC82030282019133LGN00" ]

Example 2

eo:cloud_cover BETWEEN 0.1 AND 0.2
   AND landsat:wrs_row=28
   AND landsat:wrs_path=203

Option 1

Option 2

{
   "and": [
      {
         "between": [
            {
               "property": "eo:cloud_cover"
            },
            0.1,
            0.2
         ]
      },
      {
         "=": [
            {
               "property": "landsat:wrs_row"
            },
            28
         ]
      },
      {
         "=": [
            {
               "property": "landsat:wrs_path"
            },
            203
         ]
      }
   ]
}

[
   "all",
   [
      "between",
      [
         "get",
         "eo:cloud_cover"
      ],
      0.1,
      0.2
   ],
   [
      "==",
      [
         "get",
         "landsat:wrs_row"
      ],
      28
   ],
   [
      "==",
      [
         "get",
         "landsat:wrs_path"
      ],
      203
   ]
]

Example 3

eo:instrument LIKE 'OLI%' AND S_INTERSECTS(footprint, POLYGON((43.5845 -79.5442, 43.6079 -79.4893, 43.5677 -79.4632, 43.6129 -79.3925, 43.6223 -79.3238, 43.6576 -79.3163, 43.7945 -79.1178, 43.8144 -79.1542, 43.8555 -79.1714, 43.7509 -79.6390, 43.5845 -79.5442)))

Option 1

Option 2

{
   "and": [
      {
         "like": [
            {
               "property": "eo:instrument"
            },
            "OLI%"
         ]
      },
      {
         "s_intersects": [
            {
               "property": "footprint"
            },
            {
               "type": "Polygon",
               "coordinates": [
                  [
                     [43.5845,-79.5442],
                     [43.6079,-79.4893],
                     [43.5677,-79.4632],
                     [43.6129,-79.3925],
                     [43.6223,-79.3238],
                     [43.6576,-79.3163],
                     [43.7945,-79.1178],
                     [43.8144,-79.1542],
                     [43.8555,-79.1714],
                     [43.7509,-79.6390],
                     [43.5845,-79.5442]
                 ]
              ]
            }
         ]
      }
   ]
}

[
   "all",
   [
      "like",
      [
         "get",
         "eo:instrument"
      ],
      "OLI%"
   ],
   [
      "s_intersects",
      [
         "get",
         "footprint"
      ],
      {
         "type": "Polygon",
         "coordinates": [
            [
              [43.5845,-79.5442],
              [43.6079,-79.4893],
              [43.5677,-79.4632],
              [43.6129,-79.3925],
              [43.6223,-79.3238],
              [43.6576,-79.3163],
              [43.7945,-79.1178],
              [43.8144,-79.1542],
              [43.8555,-79.1714],
              [43.7509,-79.6390],
              [43.5845,-79.5442]
          ]
        ]
      }
   ]
]

Example 4

T_AFTER(built, DATE("2012-06-05"))

Option 1	Option 2
{ "t_after": [ { "property": "built" }, { "date": "2012-06-05" } ] }	[ "t_after", [ "get", "built" ], [ "date", "2012-06-05" ] ]

The text was updated successfully, but these errors were encountered:

pvretano · 2021-08-30T01:45:04Z

Current CQL examples re-encoded using @tschaub proposed JSON encoding created here. Hopefully I encoded them correctly! ;)

cportele · 2021-08-30T14:23:12Z

Meeting 2021-08-30: How do we get enough feedback / opinions which JSON encoding is easier to parse / generate / validate? @cportele will create a side-by-side comparison on key examples in this issue and then we can ask for comments / votes with thumbs-ups for each of the two options. Promote this at the member meeting and review the results in the meeting in 4 weeks.

cportele · 2021-09-13T08:16:35Z

There are two proposals for the CQL2 JSON encoding. See the first comment in this issue.

If you prefer the first option please add a "thumbs up" to this comment.

cportele · 2021-09-13T08:16:49Z

There are two proposals for the CQL2 JSON encoding. See the first comment in this issue.

If you prefer the second option please add a "thumbs up" to this comment.

m-mohr · 2021-09-13T08:38:04Z

IMHO this issue doesn't give enough context.

Are there any known (dis)advantages that the options have?
I'm especially wondering what the excessive use of "get" is useful for in Option 2?
Why is it "=" option 1 and "==" in option 2? Are there any other of those tiny differences that are not reflected here?
Is there any advantage aligning with the Mapbox Style specification? To me option 1 seems easier overall.

So right now voting for option 1 because it's the only one that I understand just by reading it.

PS: Example 1 - Option 2 seems broken.

cportele · 2021-09-13T09:13:37Z

@m-mohr - Thanks for the feedback.

I have added an explanation to address your items 2 and 3 (this is to align with the Mapbox Style expressions which uses different tokens/names). The advantage for aligning with Mapbox Style could be the reuse of existing code for parsing or generating such expressions or the fact that these expressions use a familiar style. But of course, this applies only to those that have been or are working with those styles.

As for other (dis)advantages of the options, we do not really have a clear view, which is one of the reasons for this issue and the request to look for broader feedback. I have implemented option 1, but parsing/processing option 2 expressions should work more or less the same way. I have also added a bit of text about this.

I have also fixed the property name in example 1 - option 2.

jerstlouis · 2021-09-13T10:05:36Z

Big preference for Option 1 on my side.
My understanding is that the whole point of having a JSON encoding for CQL2 (I would otherwise argue one is not needed) is that it does not require any special parsing and is an as-direct-as-possible object encoding of the CQL2 Abstract Syntax Tree.

Converters between MapboxGL style expressions can be written for those who would like this convenience.

In general I find the MapboxGL styles syntax extremely counter-intuitive, and the left-side/right-side expression rules (e.g. what is allowed on which side, and when is GET required or not) for different versions make things even more complicated.

philvarner · 2021-09-15T12:36:18Z

One thing I notice is that the difference between 1 & 2 conflate two different design decisions. Option 1 reuses the keywords (e.g., "and" is conjunction) from CQL Text and puts them in a JSON structure that feels pretty standard to me, whereas Option 2 changes the keywords (e.g., conjunction is "all" instead) and provides a JSON structure that is more LISP-like than I think is typical for "mainstream" use.

I'm strongly in favor of Option 1 between these two options, but I would be more amenable to Option 2 if it used the same keywords as CQL Text instead of the MapboxGL terms.

jisantuc · 2021-09-15T16:44:27Z

Parsing mapbox style expressions into data is really rough -- especially with the nesting rules (e.g. you can't ["get", ["get", "lookupKey"]]). Even with TypeScript support, correct non-trivial mapbox style expressions are (for me at least) pretty hard to author. It's way easier to imagine a schema for Option 1. Moreover, I think heterogeneous lists where the structure is encoded in array indices are a design smell -- there's obviously structure / semantics, but e.g. "an array where the first element is a string and the second is a double" rarely expresses the structure / semantics at all.

tschaub · 2021-09-16T20:34:53Z

I'm not really trying to lobby for Option 2 over Option 1 – I can tell that Option 2 is not popular with this crowd. But, I will drop a few comments anyway.

I assume that anyone who regularly works with JSON would see an object like {"or": ...} and wonder what it is supposed to mean of the object has more than one member like {"or": ..., "and": ...}.

Since JSON objects are unordered collections of one or more name/value members, there is no "first" member name. While the first object above might make for a nice (human-)readable "OR expression," the second object is ambiguous (OR?, AND?, maybe an implicit OR with a child OR and AND?).

I know that a validator can detect that the second form is invalid, but to me, using an object to represent a single name/value pair is JSON-smell. And if I see code that switches on the "first" member name in an object (to determine the expression type), that would strike me as code smell.

Disparaging remarks aside, I do see how {"or": ...} in JSON is tantalizingly similar to <Or ...> in XML. So I can understand how it seems like a nice incremental upgrade. And even if people don't have any exposure to OGC Filter Encoding, I also get how the LISP-like syntax of Option 2 is off-putting. All I can say is that after having used it for style expressions in other projects for a while, it has grown to be a syntax that feels nice to me – but I have no idea if others would find the same.

It is probably fair to say that the "type-safeness" of data in the two options is equally mediocre. But having written a parser for the second option in a type safe language, I can say that it wasn't hard.

jerstlouis · 2021-09-16T23:29:21Z

@tschaub I share some of your concerns for Option 1, and I was originally wondering where Option 3 was :)
I believe the intent is that there is always a single key inside those Option 1 objects (that should be clearly explained, if that is the case).
So in effect the key is the operator, and the value is the operands.

Personally, I would have preferred something that is even closer to the AST, e.g.:

{
   "operator": "and",
   "operands" : [
      {
         "operator" : "between",
         "operands" : [
            {
               "property": "eo:cloud_cover"
            },
            0.1,
            0.2
         ]
      },
      {
         "operator" : "=",
         "operands": [
            {
               "property": "landsat:wrs_row"
            },
            28
         ]
      },
      {
         "operator" : "=",
         "operands" : [
            {
               "property": "landsat:wrs_path"
            },
            203
         ]
      }
   ]

It could have used "op" instead of "operators", and "values" instead of "operands" as well for a more concise syntax.

tschaub · 2021-09-16T23:55:52Z

@jerstlouis - I agree that if you want to use JSON objects to represent an expression, it makes sense to have operator or type as a member to switch on. Though I would choose a different style for the accessor expression (your {"property": ...}) – an operator (or type) value of get makes sense to me there.

Then if you wanted something a bit less verbose, maybe you can imagine using positions in an array instead of named members in an object. So instead of expr.operator you would have expr[0] and instead of expr.operands you would have expr[1:] (or expr.slice(1)). And then you are at Option 2 😃

jerstlouis · 2021-09-17T00:34:57Z

Though I would choose a different style for the accessor expression (your {"property": ...}) – an operator (or type) value of get makes sense to me there.

I don't really see properties as an operation expression, they're more like identifier expressions.

But yes having something consistent to switch on like "type" might be good... "type" is already used for GeoJSON geometry, so if there is no overlap with "property" and the operators perhaps it's a good idea? I would prefer property to get though, e.g.

{ "type" : "property", "id" : "eo:cloud_cover" }

but that becomes a bit verbose...

maybe you can imagine using positions in an array instead of named members in an object. So instead of expr.operator you would have expr[0] and instead of expr.operands you would have expr[1:] (or expr.slice(1)).

That is the part I strongly dislike :) It was extremely difficult to parse and then work with that data model in our MapboxGL parser, where elements of the array are all sort of different things (operator type vs. value operands are very different kind of objects):

   Array<MBGLFilterValue> params = filter.type.type == blob ? (Array<MBGLFilterValue>)filter.b : null;
   String op = params && params.count && params[0].type.type == text ? params[0].s : null;
   if(op && params.count >= 2)
   {
      if(!strcmpi(op, "get"))
      {

cportele · 2021-09-27T14:35:25Z

Meeting 2021-09-27: From the two options there is a preference for the first option. However, there is value in considering the issues raised in the last comments and come back to this issue at the next meeting in two weeks and after we got more input from others that might use CQL2 in the future.

@philvarner - Any view from the STAC perspective on @jerstlouis proposal in #601 (comment), which is a variation of option 1 that is more verbose, but closer to the abstract syntax tree?

In any case we need to ensure that in option 1 { "and": [ ... ] , "or" : [ ... ] } is not valid (add additionalProperties: false and add text in the document).

philvarner · 2021-09-27T17:44:21Z

I don't have a strong opinion between them, and I think there's clear difference wrt STAC. Personally, I dislike when the "value" is used as an attribute name (as in option 1) and prefer when the value is an attribute value (as in jerstlouis's proposal). It's more verbose, but also has a more consistent structure.

m-mohr · 2021-09-28T08:30:48Z

Yes, I agree that @jerstlouis proposal seems best to me, too. That's most verbose, but can be created and transformed most easily, IMHO. Which would also be nice for openEO use cases.

cportele · 2021-10-11T14:09:23Z

Meeting 2021-10-11: Based on the feedback we tend to select the modified option 1 (we may pick difference keys than "operator" and "operands"). We will make a decision in the meeting on October 25.

cportele · 2021-10-25T14:42:55Z

Meeting 2021-11-25: No comments were made, so @pvretano will work on a pull request based on the decision. We also discussed whether we should follow the same approach for the property we will implement the decision. We will use "op" instead of "operator" and "args" instead of "operands". There was a discussion about the encoding of "property", "function", etc. Right now Peter uses the following approach in the draft PR:

{ "op" : "s_intersects", "args" : [ "myarg1", 123 ] }
{ "op" : "in", "args" : [ { "property" : "prop-name" }, [ 123, 456 ] ] }
{ "property" : "prop-name" }
{ "function" : "my-func", "args" : [ "myarg1", 123 ] }

We will discuss "property" etc. once the PR is ready for review.

philvarner · 2021-10-26T13:06:36Z

This is great. I'm working on updating the STAC examples now.

I just had one question -- is it correct that CQL2 Text only support symbolic comparison operators (e.g., >=), but that CQL2 JSON supports both symbolic and alphabetic (e.g., >= or gte)?

pvretano · 2021-10-26T13:29:11Z

@philvarner that is correct BUT I would prefer to align the text and JSON to use the symbolic operators. I'll create a PR for that for the SWG to consider at the next meeting.

philvarner · 2021-10-26T13:34:54Z

Thanks @pvretano -- I'll reflect that in the STAC spec, and can always add the alphabetic names later if they are retained.

philvarner · 2021-10-27T13:11:17Z

My updated examples for the STAC API Filter Extension are here: https://github.com/radiantearth/stac-api-spec/blob/f5da775080ff3ff46d454c2888b6e796ee956faf/fragments/filter/README.md
in this PR radiantearth/stac-api-spec#225
if anyone would care to look at them.

cportele added CQL2 CQL JSON labels Jul 5, 2021

cportele assigned pvretano Jul 5, 2021

cportele self-assigned this Aug 30, 2021

cportele changed the title ~~CQL JSON encoding~~ CQL2 JSON encoding Sep 13, 2021

philvarner mentioned this issue Oct 13, 2021

Update CQL2 JSON examples per changes to OAFeat Part 3 radiantearth/stac-api-spec#222

Closed

bitner mentioned this issue Nov 15, 2021

Support for CQL2 stac-utils/pgstac#67

Closed

pvretano linked a pull request Nov 16, 2021 that will close this issue

Issue 601: Revised JSON encoding using op/args keys. #655

Merged

pvretano closed this as completed in #655 Dec 6, 2021

cportele added this to Features Part 3: Filtering / Common Query Language (CQL2) Jun 3, 2024

cportele moved this to Done in Features Part 3: Filtering / Common Query Language (CQL2) Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CQL2 JSON encoding #601

CQL2 JSON encoding #601

cportele commented Jul 5, 2021 •

edited

Loading

pvretano commented Aug 30, 2021

cportele commented Aug 30, 2021

cportele commented Sep 13, 2021 •

edited

Loading

cportele commented Sep 13, 2021

m-mohr commented Sep 13, 2021 •

edited

Loading

cportele commented Sep 13, 2021

jerstlouis commented Sep 13, 2021

philvarner commented Sep 15, 2021

jisantuc commented Sep 15, 2021

tschaub commented Sep 16, 2021 •

edited

Loading

jerstlouis commented Sep 16, 2021 •

edited

Loading

tschaub commented Sep 16, 2021

jerstlouis commented Sep 17, 2021 •

edited

Loading

cportele commented Sep 27, 2021

philvarner commented Sep 27, 2021

m-mohr commented Sep 28, 2021

cportele commented Oct 11, 2021

cportele commented Oct 25, 2021

philvarner commented Oct 26, 2021

pvretano commented Oct 26, 2021

philvarner commented Oct 26, 2021

philvarner commented Oct 27, 2021

CQL2 JSON encoding #601

CQL2 JSON encoding #601

Comments

cportele commented Jul 5, 2021 • edited Loading

CQL2 JSON encoding options

Example 1

Example 2

Example 3

Example 4

pvretano commented Aug 30, 2021

cportele commented Aug 30, 2021

cportele commented Sep 13, 2021 • edited Loading

cportele commented Sep 13, 2021

m-mohr commented Sep 13, 2021 • edited Loading

cportele commented Sep 13, 2021

jerstlouis commented Sep 13, 2021

philvarner commented Sep 15, 2021

jisantuc commented Sep 15, 2021

tschaub commented Sep 16, 2021 • edited Loading

jerstlouis commented Sep 16, 2021 • edited Loading

tschaub commented Sep 16, 2021

jerstlouis commented Sep 17, 2021 • edited Loading

cportele commented Sep 27, 2021

philvarner commented Sep 27, 2021

m-mohr commented Sep 28, 2021

cportele commented Oct 11, 2021

cportele commented Oct 25, 2021

philvarner commented Oct 26, 2021

pvretano commented Oct 26, 2021

philvarner commented Oct 26, 2021

philvarner commented Oct 27, 2021

cportele commented Jul 5, 2021 •

edited

Loading

cportele commented Sep 13, 2021 •

edited

Loading

m-mohr commented Sep 13, 2021 •

edited

Loading

tschaub commented Sep 16, 2021 •

edited

Loading

jerstlouis commented Sep 16, 2021 •

edited

Loading

jerstlouis commented Sep 17, 2021 •

edited

Loading