Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Feature identity for interactivity #6019

Closed
ansis opened this issue Jan 18, 2018 · 35 comments · Fixed by #8987
Closed

RFC: Feature identity for interactivity #6019

ansis opened this issue Jan 18, 2018 · 35 comments · Fixed by #8987

Comments

@ansis
Copy link
Contributor

ansis commented Jan 18, 2018

Motivation

Vector tile sources split features across multiple tiles.

How do we tell if two features in different tiles are actually the same feature?
How do we highlight a feature across tile boundaries?
How do we only dispatch a feature event listener once, not once-per-tile?

We need some concept of what makes things "the same feature".

For GeoJSON sources we have access to whole features so we can assign them ids guaranteed to be unique.

Design Alternatives

Option 1: do nothing

Treat vector tile features from different tiles as different features (like we do currently).

Option 2: use the id field

Vector tile sources support an id field for features that SHOULD be unique. We could treat all features with the same id as the same feature. Users would need to provide this in order for interactivity to work well.

It could be possible to have multiple features with the same id but different properties, which could create some weird edge cases. How would a theoretical map.getFeatureProperties(sourceID, featureID) handle this case?

Would features with missing id fields get merged into one feature? Or never get merged?

Option 3: use the combination of the id field and properties

We could adapt the previous option and use the combination of the id and properties as a unique identifier. Features with the same id but different properties would be treated as different features. The nice thing about this is that a combined feature always has a well-defined properties. Also, merging could mostly work in many cases even if id is not provided.

Option 4: Using a user defined function

We could extend the style spec to let users specify an expression to calculate a unique id. This expression could default to ["id"]. This would provide a way to merge features together but it would also add extra complexity to the style spec.

This would have the same problem as Option 2, where a single combined feature could have multiple different properties objects.

Also, would this apply to just vector tile sources, or also geojson?

Design

I think we should implement Option 3.
Option 2 could be fine too.

Advantages of Option 3 over Option 2:

  • the combined feature has a well-defined set of properties
  • features without ids don't get merged into one giant feature

Disadvantages of Option 3 compared to Option 2:

  • calculating the id could be more expensive since it has to process all the properties
  • 3 could be harder to explain than option 2

Option 4 provides extra power but also extra complexity. I think that users that need this could implement it with event listeners and data joining.

Mock-Up

vector tiles

TileA contains: [ { id: 123, type: 'multipoint', coords: [[10, 20]], properties: { key: value }]
TileB contains: [ { id: 123, type: 'multipoint', coords: [[40, 30]], properties: { key: value }]

mouseover the feature in TileA
both the feature in TileA and in TileB get highlighted

geojson

Source contains [{
    id: 123,
    geometry: { type: 'Point', coordinates: [10, 20] },
    properties: { key: 'value' }
}, {
    id: 123,
    geometry: { type: 'Point', coordinates: [40, 30] },
    properties: { key: 'value' }
}

mouseover the first feature
only the first feature gets highlighted

Concepts

Hopefully this should be mostly intuitive. Users encountering unexpected behavior for vector tile interaction could be told to add a unique id to each of their features.

@kkaefer @asheemmamoowala @mollymerp @anandthakker @mourner @lucaswoj

@asheemmamoowala
Copy link
Contributor

Where a feature is not present contiguously in tiles, how would the set of tiles for a feature be computed? Does a tree of feature bounds need to be generated as tiles are loaded?

@jingsam
Copy link
Contributor

jingsam commented Jan 19, 2018

I like option 2, which covers most use cases but with simplicity. In practice, I use tippecanoe to convert geojson to vector tiles. The geojson is generated with GDAL with the option -lco preserve-fid. In all generated vector tiles, same features have a unique $id. With the standard vector tiles generating process, I can use a unified way to highlight feature.

I think we should chose the simplicity way than a complex one. The API of mapbox-gl-js provide a highlight feature based on $id. If the integrated highlight feature is not users want because of lacking unique id in vector tiles, users could use option 1.

@ansis
Copy link
Contributor Author

ansis commented Jan 19, 2018

@asheemmamoowala I'm not sure I understand the question, where would the set of tiles for a feature be used?

@jingsam thanks for the feedback!

@anandthakker
Copy link
Contributor

calculating the id could be more expensive since it has to process all the properties

I think this expense could well be insurmountable for some cases: in some datasets, each feature has a lot of property data.

I lean towards option 4 -- i.e., add a new style-spec property for layers, feature-identity, which can be a data expression. It does add complexity, but I think it's worthwhile, especially if we default it to option 3 (["concat", ["id"], ["properties"]]) rather than just ["id"] so that in most simple use cases, the complexity wouldn't be in the user's face.

@ansis
Copy link
Contributor Author

ansis commented Jan 19, 2018

I lean towards option 4 -- i.e., add a new style-spec property for layers, feature-identity, which can be a data expression. It does add complexity, but I think it's worthwhile, especially if we default it to option 3 (["concat", ["id"], ["properties"]]) rather than just ["id"] so that in most simple use cases, the complexity wouldn't be in the user's face.

@anandthakker interesting! I hadn't considered that default. What are the main use cases you see for custom id expressions? Would these expressions be set at the source level or the layer level? Would these merge features just for built-in hovering or also event listeners and queries? Would a combined feature have multiple sets of properties instead of just one?

@lucaswoj
Copy link
Contributor

If we implement Option 1, 2, or 3 now, it'd be relatively easy to implement Option 4 down the road.

@e-n-f
Copy link
Contributor

e-n-f commented Jan 19, 2018

@flippmoke and I have been talking about this in the context of revising the vector tile spec. There is (as far as I know) no implementation yet, but our general idea was to add a unique ID (distinct from the GeoJSON feature ID) to features that are split between tiles, with additional IDs or serial numbers at the nodes where features transition in or out of the tile boundaries, so the original complete geometry can be reconstructed from its tiled components.

@asheemmamoowala
Copy link
Contributor

I'm not sure I understand the question, where would the set of tiles for a feature be used?

When re-constituting features across tiles:

  • ensure that all parts of the feature in currently visible tiles are included in the updated appearance.
  • ensure that combined geometry can be used in an event handler in user code. For example, if I want to draw a bounding box when a feature is selected, I would need the entire geometry across visible and no-yet visible/loaded tiles.

additional IDs or serial numbers at the nodes where features transition in or out of the tile boundaries

@ericfischer not sure if your suggestion addresses the above problem.

@e-n-f
Copy link
Contributor

e-n-f commented Jan 19, 2018

@asheemmamoowala I think what we were talking about would address the first need but not the second. We could add a complete bounding box as well as an ID to features when they are split though.

@ansis
Copy link
Contributor Author

ansis commented Jan 19, 2018

ensure that all parts of the feature in currently visible tiles are included in the updated appearance.

I think we could check all visible tiles for whether they contain the feature. This way we wouldn't need to store the bounds

I would need the entire geometry across visible and no-yet visible/loaded tiles.

This would involve loading all the missing tiles, right? Since that could be a massive number the best solution here might be to just include the bbox in the features' properties when creating the custom tiles for that specific use

@andrewharvey
Copy link
Collaborator

This is going off on a tangent but labelling is probably the most common use case of using GeoJSON GeometryCollections to have both a Point (label) and Polygon as part of the same feature sharing the common properties. If data came in like this it would make it easier to support hover style on the label even when hovering over the polygon only.

However MVT don't support GeometryCollections so these are split into different features and I presume get different IDs.

@anandthakker
Copy link
Contributor

What are the main use cases you see for custom id expressions?

One case for this would be performance-related, for dealing with datasets where each feature has a prohibitively large set of properties, most of which are not needed to identify the feature. E.g., a dataset that doesn't have feature.id values, but does have properties for, say, ISO codes for country / province / district, along with dozens (hundreds?) of demographic and other data. However, it may be reasonable to argue that this should be solved at the dataset level by setting feature.id

Another use is one where the identity of 'features' users are interacting with is more dynamic -- e.g., maybe you want to switch, at runtime, whether hovering over a park represents the intent to focus on "this particular park (as opposed to other parks)" or "parks (as opposed to residences or businesses or schools)".

Would these expressions be set at the source level or the layer level? Would these merge features just for built-in hovering or also event listeners and queries? Would a combined feature have multiple sets of properties instead of just one?

Good questions. Initial thoughts:

  • set at the layer level, because user interactions with source data are always mediated by one or more map layers, and different layers may represent fundamentally different interaction semantics
  • we would use the custom id to include in a query result / event listener call any features from outside the query geometry who have the same id as those that are in the query geometry. however, rather than merging features who share the same id, it might be simpler to return individual features (perhaps flat, or perhaps grouped by this custom id if that seems more convenient).

@asheemmamoowala
Copy link
Contributor

With Option 2 for updating the appearance, the properties may be modified dynamically. How would the id be computed in Option 3 (generate feature ID from properties) - with only original properties or with modified values as well?
Would the computed id be presented back through the API as well? Would the id value be valid across sessions? A potential use case for this would be syncing state across multiple concurrent sessions of the same map.

@mapsam
Copy link
Contributor

mapsam commented Jan 23, 2018

Neat! Just a little drive-by comment... we're implementing a dedupe path in vtquery (mapbox/vtquery#51) which adds a couple more checks. Only deduping if features are from the same layer and are the same geometries type (since GeomCollections aren't allowed in the spec). If so, it uses the ID to dedupe first, but falls back on comparing the tag integers from the pbf. Not sure if those are available at this point in gl-js, but might reduce the cost of decoding every property in Option 3.

@ansis
Copy link
Contributor Author

ansis commented Feb 5, 2018

@mapsam thanks for the link. It's definitely helpful to read about similar problems. I think we can't use the tag integers in this case because they'll be different in different tiles.


We talked about this last week and it seems like the consensus is to go with Option 2 (id for vector tiles) for now. It's the the simplest and we can always expand to one of the other options later.

@1ec5
Copy link
Contributor

1ec5 commented Feb 9, 2018

For reference, the iOS and macOS SDKs effectively implement option 3 in mapbox/mapbox-gl-native#6559. (This affects direct calls to -[MGLFeature isEqual:], but nothing related to runtime styling at the moment.)

@asheemmamoowala
Copy link
Contributor

#6263 introduces new feature state APIs for updating the appearance of features using the top level id field as described in Option 2 above.

While this works for some datasets, most of the datasets I used for demos/examples are missing the id field and had to be updated. In cases where the published data sets cannot be updated, it would render these new APIs unusable. Many of these datasets do have a unique identifier property in the properties field. Option 4 would enable the new APIs for these kinds of datasets, and vastly increase the number of cases where they can be used.

Results from Map#queryRenderedFeatures now also include additional fields (source and sourceLayer) so that they can be used directly as inputs to the feature state APIs. The feature state APIs accept a feature identifier object with id, source, and sourceLayer fields.

mapbox-gl-js/src/ui/map.js

Lines 1378 to 1383 in a09aeae

* @param {Object} [feature] Feature identifier. Feature objects returned from
* {@link Map#queryRenderedFeatures} or event handlers can be used as feature identifiers.
* @param {string} [feature.source] The Id of the vector source or GeoJSON source for the feature.
* @param {string} [feature.sourceLayer] (optional) *For vector tile sources, the sourceLayer is
* required.*
* @param {string} [feature.id] Unique id of the feature.

set at the layer level, because user interactions with source data are always mediated by one or more map layers, and different layers may represent fundamentally different interaction semantics

A layer-specific custom identifier would prevent results from being used among different layers (example(comment)). A feature should have a unique identifier per source that can be reused for any/all APIs regardless of layer.

With a custom identifier, a new field would be needed for the complete feature identifier. Some name options: feature-id, query-id, query-key, and feature-key. This way the entire feature object from Map#queryRenderedFeatures doesn't need to be used as a feature identifier, which is the case as present.

feature: {
  id: string|number,
  properties: {...},
  state: {...},
  // Use a new id field
  query-id: { source: string, sourceLayer?: string, id: any } ,
  
  // Generate hashed id from source, sourceLayer, custom id field
  feature-id: string,

  // Generate UUID for each feature and internally map to the feature using the cusotm identifier
  feature-key: UUID
}

@anandthakker @ansis @mollymerp

@asheemmamoowala
Copy link
Contributor

@ansis @mollymerp and I discussed an alternative where GeoJSON sources generate internal feature IDs, or allow promotion of a property attribute to feature id. In conjunction with #6021, this could enable interactivity on a large portion of existing datasets.

@andrewharvey
Copy link
Collaborator

I wanted to mention that features created in the Mapbox Datasets Editor will automatically get a Feature ID assigned, I just discovered that when you export that Dataset to a Tileset, the Tileset Features get new and different IDs.

This makes it harder to do feature interactivity when you're working with both (eg. have a list of features populated from the Datasets API and a corresponding set of features on the Map from the Tileset no longer share the same IDs)

@danielfornies
Copy link

danielfornies commented Jul 13, 2018

Hi!
I am trying this very promising feature, but it seems not to work if there are no "id" fields in the features. It doesn't throw an error nor warning either. I see the updated "state" on the feature, but I don't see the visual effect being triggered (I am doing something similar than this example: https://www.mapbox.com/mapbox-gl-js/example/hover-styles/).
I am trying now to include such id fields i the geojson input used to generate the vector tiles (which I am testing against) with tippecanoe, but after that I still can't see them appear when querying the rendered features on the map. Perhaps tippecanoe doesn't include such "id" fields, or nothing outside the standard geojson spec?
The pitty is that I already have ids I could use, but they are inside the standard geojson "properties" object. So for me, being able to specify the name of the "id" field would have solved this already.
Any tips meanwhile?
Thx!

@e-n-f
Copy link
Contributor

e-n-f commented Jul 13, 2018

Tippecanoe will preserve integer feature IDs. Maybe your IDs that are being lost are strings (which can't be represented in the current vector tile format)? You can use tippecanoe-decode on the tileset that tippecanoe generates to make sure that the features in the tiles contain everything you expect.

@danielfornies
Copy link

You are right, making them integer instead of String fixed it, thanks!

@BBegouin
Copy link

Hi !

I am currently generating MVT layer from Postgis, using ST_AsMVT. The generated features doesn't have id, but have id properties.
Do you know if there is a way to promote this id property to feature id ?
I already saw discussions about this topic, but nothing really usable. Did I miss something ?
Or maybe do you know if there's a way of of using ST_AsMVT to generate this Id ?
Actually I'm facing trouble to create an hover effect (https://www.mapbox.com/mapbox-gl-js/example/hover-styles/)

Thx for your help !

@devPuppySim
Copy link

I'm a new user of mapbox-gl.js and am having trouble see if this is resolved or not.

I need to join a json table to with my own ID attribute to vector tiles with polygons that have the same ID attribute. The feature IDs used to join in the https://www.mapbox.com/mapbox-gl-js/example/data-join/ example are not going to work. The feature.id of the polygons I uploaded were generated automatically by Mapbox Studio when I created the vector tiles. Zooming in and out also changes the IDs of the polygons as found by
map.on('click', function (e) { var features = map.queryRenderedFeatures(e.point); console.log(features); });.

Would someone mind linking the the resolution or let me know if it's open or closed. I'm very new to source control as well.

@asheemmamoowala
Copy link
Contributor

Would someone mind linking the the resolution or let me know if it's open or closed

@devPuppySim this this ticket will remain open until a solution has been built and merged. Until then, if you need a work around, try the latest release of tippecanoe to create vector tiles with the correct feature.id and upload those to Mapbox Studio.

@devPuppySim
Copy link

@asheemmamoowala thank you for the response. I will try tippecanoe then

@shayke
Copy link

shayke commented Dec 9, 2018

Does this ticket also address the issue where i do setFeatureState on zoom 14 and select similar features but when zooming out those same similar features are not selected?
How can we address this situation? Should I provide the same featureId across zoom levels?

EDIT:
Is it possible maybe to listen for tile load and then set state for all of the tile features (say from an external map where i keep the shared state)?

@asheemmamoowala
Copy link
Contributor

Should I provide the same featureId across zoom levels?

@shayke That is correct - a feature should have a consistent ID across all tiles and zoom levels.

Is it possible maybe to listen for tile load and then set state for all of the tile features (say from an external map where i keep the shared state)?

A MapDataEvent is fired for every tile load (and reload). Setting state for all the tile features requires knowing all their IDs. You could use queryRenderedFeatures or querySourceFeatures to query all the features from the tile/layer if the IDs are not known.

@jacknkandy
Copy link

I am currently generating MVT layer from Postgis, using ST_AsMVT. The generated features doesn't have id, but have id properties.
Do you know if there is a way to promote this id property to feature id ?

@BBegouin did you manage to solve this problem? As far as I can see the Postgis team are working on this feature, but it is scheduled for release in 3.0 (https://trac.osgeo.org/postgis/ticket/4128). Until this is updated, I can't figure out a way to take advantage of the feature state functions. Thanks!

@waissbluth
Copy link

waissbluth commented Feb 20, 2019

I share @jacknkandy's problem, it would be massively helpful to promote properties.id to id.

Even if this is not currently supported as an option, is there a way to intercept the tiles on load to make this change?

cc @lobenichou

@lobenichou
Copy link

lobenichou commented Feb 21, 2019

@waissbluth @jacknkandy Hey folks! There is a couple of things that come to mind:

So your best bet (I think) would be to export into a geoJSON instead of MVT using ST_AsGeoJSON and use tippecanoe to convert that into MBTiles using the option above. Hope this helps.

@lobenichou
Copy link

AFAIK, note that feature.id can only be an integer or a string cast as an integer. It doesn't accept alphanumerical ids. More on that here: https://blog.mapbox.com/going-live-with-electoral-maps-a-guide-to-feature-state-b520e91a22d

@jacknkandy
Copy link

@waissbluth @jacknkandy Hey folks! There is a couple of things that come to mind:

So your best bet (I think) would be to export into a geoJSON instead of MVT using ST_AsGeoJSON and use tippecanoe to convert that into MBTiles using the option above. Hope this helps.

Hi @lobenichou,

Thanks for your response and your suggestion. Unfortunately for us I don't think we will be able to move away from using the PostGIS MVT functions and generate our vector tiles using Tippecanoe.

I have currently built a NodeJS + Postgres/PostGIS based tile server which generates vector tiles on the fly based on dynamic queries (various filters, etc.). Also the underlying datasets are quite massive (some have 3 million+ features). So we would not be able to pre-generate all tiles due to the extremely large data size as well as the fact that we are unable to predict all possible combinations of queries. From my brief reading of the Tippecanoe documentation it appears to be a command-line based tool (so it would be difficult to integrate with a NodeJS based server) but also appears that it would not be so effective at generating dynamic tile sets on the fly.

Maybe mapbox/geojson-vt could be a possible alternative - but it seems to be more of a client-based library rather than server-side. I think I might just have to wait for the PostGIS 3.0 release.

@waissbluth
Copy link

Is there a way to at least monkey patch the vector tile parser to promote properties.id to feature.id? If this is possible, can someone point me in the direction of where this would be done? Thanks!

@hoogw
Copy link

hoogw commented May 14, 2019

I have geojson, no ID, no objectID, field. No other field could be unique value column.

The hove-over failed to work.

As of May 2019, 1 year and half later, have above genius made a magic?
This hurt me a long time,

I temporary fix it by, add a new geojson-layer( create from the hove-overed-geojson).

By the way,
For mapbox style - sourced layer, how to make a hove-over effect?

map.on('mousemove', function(e) { var features = map.queryRenderedFeatures(e.point, { layers: [_layer_name] // replace this with the name of the layer });

now I get feature[0] as hove-overed-feature,
Can I use setFeatureState() method just like it is geojson-sourced and tileset-sourced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.