Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Extension #128

Merged
merged 28 commits into from
May 26, 2021
Merged

Filter Extension #128

merged 28 commits into from
May 26, 2021

Conversation

philvarner
Copy link
Collaborator

@philvarner philvarner commented May 10, 2021

Related Issue(s): #32

Proposed Changes:

  1. Defines Filter Extension for allowing CQL queries.

Non-changes

  • @cholmes 's original PR removed intersects. I think that's a separate concern and should be in a different PR (and needs a lot of feedback and discussion)

Open Issues / Questions

  • The OpenAPI spec for it still has some validation problems, but it gets the correct ideas across.
  • How do the conformance classes relate to implementing filter on both /collections/{collectionId}/items and /search? Are both required?

PR Checklist:

  • This PR is made against the dev branch (all proposed changes except releases should be against dev, not master).
  • This PR has no breaking changes.
  • I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.

@philvarner philvarner mentioned this pull request May 11, 2021
3 tasks
@philvarner philvarner marked this pull request as ready for review May 14, 2021 17:00
@philvarner philvarner changed the title Initial draft of Filter extension Filter Extension May 14, 2021
@jisantuc
Copy link
Contributor

I've been working on an implementation in Franklin using only CQL JSON and here are my comments so far:

Strange mix of types?

One thing that I keep tripping on is a pretty strange mixture of types. An expression for filtering a field is a single key at the root of an object, followed by one of a map also holding a single key, or an array of maps holding a single key. This singularity strikes me as odd, because objects can hold many keys, so if there's only supposed to be one thing, why does the data structure in principle hold more than one? Similarly, for or and and, the value behind the key is supposed to be, e.g.:

[
  {"eq": ...},
  {"neq": ...}
]

These string keys in a by-default plural structure are much less convenient to work with than an alternative in which they live under a kind key in each json object, like:

[
  {"kind": "eq", ...},
  {"kind": "neq", ...}
]

I think we're getting this from the OGC spec itself, so there might not be much we can do about it, but it's pretty weird to program with. I've been fighting a JSON round trip test for about 90 minutes while tabbing back and forth to the spec to try to get this to cooperate, and a better matching of types to what they represent would have made this part much easier.

queryables seem pretty tough to know ahead of time

This is becoming a recurring theme with the open-endedness of the spec, but it's not obvious to me what I'm supposed to do for queryables in a server implementation when the server will serve other people's data. It's also not clear to me whether queryables is required. I can accumulate item properties as someone fills Franklin's database with items (the dynamic case mentioned in the spec), but then it's not obvious to me what "queryable" means -- does it mean it's technically possible to query based on x, or that querying based on x will be fast? If the latter, then I need to expose some way for users to control what's queryable and what isn't, since Postgres JSONB indexing is really effective for equality at a key filters but pretty much worthless otherwise. There's a related issue for this in Franklin from before the CQL work here azavea/franklin#616 -- basically there's a trade-off in what fields we can search efficiently and index size / cost of inserting items, and it's hard to make a good choice about that without knowing what promise "x is queryable" represents.

It's also pretty difficult in the dynamic case to figure out what to do with fields that don't have a JSON schema. Let's say someone posts an item that has a string value in mySpecialField -- if I'm dynamically aggregating queryables, I don't know that all myVerySpecialField values are strings, just that I happen to have seen one.

Overall though pretty solid 👍🏻

This isn't really that different technically from the previous query extension in its implementation for us -- build up the cases that something of the type CQLFilter can have, destructure with a fold or a match to whatever the CQLFilter has to become. A bunch of stuff was copy-pastable from the query extension implementation we had in Franklin.

Random notes

I appreciated the clarification that "POST with a JSON body to the Items resource is not supported, as POST is used by the Transactions Extension for creating Item objects." Since I'm working 100% on the JSON side of things that was going to come up for me eventually, and when I got to thinking about it I saw it was already there. 💯

It wasn't too difficult I think to implement the RHS property name lookup Postgres. One thing that wasn't clear to me was whether the properties in any part of the queries were only top-level properties or could drill into the JSON object. I acted as if they could access nested fields, but only if it's strictly through a hierarchy of objects. Something I think would be a cool capability would be having some kind of JSON / textual representation of object traversal (in the traversal systems / optics sense) that would allow quite precise querying, e.g. "the name of the seventh band in eo:bands is aerosol", but that's way out of scope here 🤷🏻‍♂️

@m-mohr
Copy link
Collaborator

m-mohr commented May 18, 2021

@jisantuc This is great feedback, but it seems most should be actually be pointed towards OGC API - Features so that they may improve things depending on your feedback. :-)

@jisantuc
Copy link
Contributor

I was asked for comments on the PR via email so I put them here first

@philvarner
Copy link
Collaborator Author

Queryables

The spec uses the words that the queryables "may" and "can" be used in filter expressions, which I interpret to mean that using a parameter name that's not in queryables should result in an error. Maybe one recommendation would be to allow wildcards, like properties.*, as queryables, so a user could query for any value in Properties? Or maybe there's a wildcard option * for queryables that indicates that the filter is allowed to use property names not defined explicitly as queryables, but there's no guarantee they'll work (e.g., if I query foobar = 1 and foobar doesn't exist, I get a successful 0 results).

@jisantuc jisantuc mentioned this pull request May 18, 2021
1 task
@philvarner
Copy link
Collaborator Author

One thing that wasn't clear to me was whether the properties in any part of the queries were only top-level properties or could drill into the JSON object.

You could do this with queryables -- a queryable foo that actually maps to properties.bar.baz.foo

I acted as if they could access nested fields, but only if it's strictly through a hierarchy of objects. Something I think would be a cool capability would be having some kind of JSON / textual representation of object traversal (in the traversal systems / optics sense) that would allow quite precise querying, e.g. "the name of the seventh band in eo:bands is aerosol", but that's way out of scope here 🤷🏻‍♂️

Yes, I think we have many use cases like this, particularly eo:bands, as users search multiple products for assets that have the spectral bands they need. Might be something GraphQL could express well.

@philvarner
Copy link
Collaborator Author

Strange mix of types?

cql-text is pretty nailed down, since they're just formalizing something that's been used in practice for a while. But CQL JSON is new, and I think they're pretty open to suggestions still, particularly if they're related to implementation difficulties. I didn't look at it too closely because it looked good enough, and no one is going to actually be manually constructing these queries anyway. Also, I feel Scala exposes these type issues a lot more than other common languages, since there's just more correct automatic behavior if you can get the types structured right.

Copy link
Collaborator

@cholmes cholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Just put up a bunch of review comments. I think the biggest ones are:

  • I think we should have two conformance classes - one for item-search, and one for ogcapi-features. The latter we just use OGC's class directly, and the other we define our own class (which I think you already did). But let people choose one or the other or both. Some people implement 'item-search' without ogcapi-features, so we should give them the option to just do the item-search CQL. I think this approach should simplify a few bits, as the fragment can talk through the common stuff, and then do more specifics in each section (like cross collection search in item search). I noted in comments that that was the intent behind the fragments, to define conformance classes in multiple places, but we never executed on it. fields and sort should do that next - like use them in ogcapi-features under stac conformance namespace (until we manage to fully align)

  • Clarity on what to do with the fact the spec may change with more conformance classes. This is mentioned a good bit, but I think we should try to make the recommendation of what to do right now clear. I think it's to use their conformance classes but it's ok to not fully implement everything, and in a future version there will be the right conformance classes for more of a subset.

  • Queryables - it reads now like there's potentially a lot of work to do here. Should focus on what the minimal requirements are, and make it clear for the STAC content. I think the examples helped make it a bit more clear to me, but we should just structure for an easy on ramp for new implementors, where they can just follow some simple recommendations and be 'good enough'.

fragments/filter/README.md Show resolved Hide resolved
fragments/filter/README.md Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
@cholmes
Copy link
Collaborator

cholmes commented May 18, 2021

I saw @jisantuc's review come in as I was most of the way through mine, so there's some overlapping stuff, but I didn't go back and tweak them all. It seems like he had some overlapping thoughts/questions on queryables, and Phil answered some of them:

The spec uses the words that the queryables "may" and "can" be used in filter expressions, which I interpret to mean that using a parameter name that's not in queryables should result in an error. Maybe one recommendation would be to allow wildcards, like properties.*, as queryables, so a user could query for any value in Properties? Or maybe there's a wildcard option * for queryables that indicates that the filter is allowed to use property names not defined explicitly as queryables, but there's no guarantee they'll work (e.g., if I query foobar = 1 and foobar doesn't exist, I get a successful 0 results).

Something in those ideas to add flexibility would make sense to me, but not quite sure how either of those would work - may need to see some examples. But thinking about the easy 'on-ramp' for a STAC developer, it'd be useful if we could just provide them with almost a 'static' queryables document that they could just use. At the very least it could define datetime, geometry and collection. It seems like it'd be nice if we could also define common metadata and stable extensions as 'queryables' in some static way. But I'm not sure how that plays with OGC if the data doesn't actually have that field.

@philvarner
Copy link
Collaborator Author

@cholmes @jisantuc I just filed this issue with OGC opengeospatial/ogcapi-features#582

On the one hand, I like the tightness of having queryables defined and allowing that type of dynamic discovery, but on the other hand, it severely complicates and/or restricts filtering when the Items have arbitrary values in their properties. I think as OGC has defined it, it's too tight to work in many situations, and there needs to be a way for the client and/or server to determine the semantics of what they allow/deny.

fragments/filter/README.md Outdated Show resolved Hide resolved
extensions.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@lossyrob lossyrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great stuff - represents a lot of detailed work, really appreciate that you're sorting through this.

One high level comment is that you may want to consider splitting up the README into either separate Markdown documents, pulling some of the justification/point-in-time-specific information into the PR description, and/or ongoing issues or TBDs into GitHub Issues. This might make reading easier for users who want to grok what this extension is quickly, as well as make this document more maintainable as things continue to evolve

item-search/README.md Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
known as Common Query Language) or the [Cassandra Query Language](https://cassandra.apache.org/doc/latest/cql/) used by the
Cassandra database.

## Limitations of Item Search
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is more a justification for the extension, and perhaps doesn't need to be so prominent. It was a little confusing to me reading through the spec as I was expecting this section to begin talking about what this extensions supports. I'd recommend making a section lower down dedicated to talking about other parts of the spec in comparison or justifying this extension, and have the beginning of this document focused on specifying what the filter extension is and does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about that. I like having the "motivation" section first, to adequately set the stage for where we're starting with this extension.


These projects have or are developing CQL support:

- [GeoPython PyCQL](https://github.com/geopython/pycql/tree/master/pycql), and the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should https://github.com/geopython/pygeofilter be used instead, as it seams to be more aligned with the current OAFeatures Filtering spec? This is a bit of a guess, as I haven't grokked both libraries completely, but this seems to be under more active development

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer to that. I added in pygeofilter, but I also don't know what either provides. Looks like they do have a cql-json impl, whereas I think pyCQL is only doing cql-text right now.

fragments/filter/README.md Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
fragments/filter/README.md Show resolved Hide resolved
fragments/filter/README.md Outdated Show resolved Hide resolved
@philvarner
Copy link
Collaborator Author

Really great stuff - represents a lot of detailed work, really appreciate that you're sorting through this.

One high level comment is that you may want to consider splitting up the README into either separate Markdown documents, pulling some of the justification/point-in-time-specific information into the PR description, and/or ongoing issues or TBDs into GitHub Issues. This might make reading easier for users who want to grok what this extension is quickly, as well as make this document more maintainable as things continue to evolve

I agree, it's so large with all the examples that it's a lot to process at once, and breaking it up would be good.

@philvarner
Copy link
Collaborator Author

philvarner commented May 26, 2021

Filed these for follow-up work, to reorganize the doc (based on @lossyrob feeback) and provide better guidance on queryables.

@philvarner
Copy link
Collaborator Author

I'd like to merge this unless anyone has any major objections -- I think any further work would be better done as individual PR changes rather than adding to this already-massive one.

@m-mohr
Copy link
Collaborator

m-mohr commented May 26, 2021

Unfortunately, I don't have the time right now to review it so leave it up to others. Will just do issues after the merge if something comes up...

@philvarner philvarner merged commit ed7f3f1 into radiantearth:dev May 26, 2021
@philvarner philvarner deleted the filter-extension branch January 31, 2023 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants