-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are property values that are objects/arrays encoded in vector tiles? #75
Comments
My understanding is that array and object values are not part of the specification, and if you want to use them you need to invent your own serialization protocol and ensure that the encoders and decoders in your part of the ecosystem understand it. In other words, there is no "should" here at the level of the VT spec. |
Thanks @jfirebaugh - this is the case. |
Mapbox Studio Classic and recent versions of Tippecanoe write them as stringified JSON, with no way to tell whether the value originally came from a string or an object. |
Interesting @ericfischer. Does that mean parsers must try to parse as JSON first, then assume it's a string if the JSON cannot be parsed? |
@mapsam I think most parsers just treat them as strings and don't make any attempt to reconstitute objects. But @ingalls may have more details on what ultimately happens with the values that prompted mapbox/tippecanoe#155. |
Ah, so it's up the user then to know their string/object-string values. Makes sense! |
Here is one place this conversation is happening in mapbox-gl-js. |
@mapsam our use case is such that we expect either an array or a stringified array given the property name so for us parsing is relatively easy. |
@mapsam mentioned reopening this to have a convo about adding how to handle objects and arrays into the spec. @flippmoke what are your thoughts on adding this in? |
+1 to adding it in - multiple people are adding ways to represent lists (e.g. tilezen/mapbox-vector-tile#64) |
I have been talking to a large number of people about this at SOTM 2016. The biggest hurdle to store the unlimited key-values (object/dictionary/hstore) and the array support is that Mapnik is fairly rigid in terms of non-simple-type columns, and the number of those columns, so adding this support to Mapnik would require substantial efforts. I would like to propose both the Mapnik workaround as well as the spec change. To generate an object or an array in vtile using Mapnik, one could generate the data as a single JSON-encoded string value, and add a custom server-side vector post-processing step that decodes JSON and converts it into values. This way, There are two ways to actually store it:
The Value message should have two additional fields:
So if the value contains One of the drawback is that it is now possible to have circular dependencies, so to avoid this and simplify client's data validation, data may only be two levels deep (root+one sublevel). Another unenforced but possibly valuable requirement might be to restrict all values in both object and list to have the same type. |
Sorry for my lack of responses on this ticket: Let me summarize some thoughts I have on this currently. Mapnik Free Vector Tiles
Eventually we do want to pull out all Mapbox vector tile encoding and decoding such that it is not required to be a part of Mapnik. In general mapnik-vector-tile issues should probably be discussed there. Here on the spec trying to specifically just cover ideas specific to the specification. ListsList support could be added by changing section 4.4 - which currently reads as follows:
In theory we could allow keys not to be unique per a feature and if more then one occurs we assume the item to be a list. This is not ideal for compression necessarily but could solve the problem. I do not think this change could qualify as anything but a major version incrementation (though we could debate on this). ObjectsObjects scare me quite a bit more because we suddenly are adding depth "values". This could be something that was very bad for performance overall if not implemented with thought and with some testing. Example Implementation:
Final ThoughtsI am going to leave this ticket open for continued discussion. Thanks for the feedback everyone! |
@nyurik part of the challenge here with both of our ideas is that that means that all non object values MUST be encoded first. This would put a variety of different requirements on both the encoders and decoders. |
@flippmoke could you elaborate on your last comment? I'm not sure I understood. |
Lets just assume the following structure:
Because
Edit: by encoded here I mean placed into values on the Layer.
|
@flippmoke ah, yes, we (@pnorman and I) were discussing this. I don't think we actually want to allow unlimited nest-ness. It has very limited practical value, whereas having just one level of unlimited key-value storage has a huge practical benefit - primarily for storing all possible translations of a given object (key=name, value={language-code => string}. The moment we put this restriction in place, the whole system becomes much more manageable and easy to implement:
|
If we're going to allow nesting at all, and it requires a version bump, why not allow arbitrary JSON objects? Values are already tagged with an integer for their type, and only 1 through 7 are defined. So type 8 could denote that the value is an array (encoded as an integer array of sub-values) and type 9 denote that it is a hash (encoded as an integer array of sub-keys alternating with sub-values). So a feature with this attribute:
would have these keys in the layer:
and these values:
with an outer key-value list of
|
I just realized that this was what @nyurik already said. Sorry for the needless duplication. |
@ericfischer I'm not sure it's the same - I actually think that we should NOT allow arbitrary nested levels, as it would create too many encoding and decoding problems, while not giving us much practical benefits. After all, vector tiles is a very domain-specific data storage format. My proposal is either to not change the spec at all (except that now a vector tile would have any number of non-predefined keys), or, if we change the spec, we only allow two new types (list and object) at the top level. Neither list nor object will allow storing other lists or objects inside of them for the reasons outlined above. |
Encoding example { "Features": [
{ "str": "aaa",
"dic": { "en": "aaa", "fr": "bbb"},
"arr": [ 100, 200 ]
}, {
"dic": { "en": "ccc", "gr": "bbb" },
"arr": [ 100, 300 ]
}
]} VTile structure: # 0 1 2 3 4 5
keys: [ "str", "en", "fr", "dic", "arr", "gr" ]
values: [
"aaa" (string) # 0
"bbb" (string) # 1
# key value pairs, value MUST NOT be object or array,
[1,0,2,1] (object) # 2 all value indexes MUST be less than the index of this element.
100 (int) # 3
200 (int) # 4
[3,4] (array) # 5 list of value indexes. Same restrictions as for Object
"ccc" (string) # 6
[1,6,5,1] (object) # 7
300 (int) # 8
[3,8] (array) # 9
]
features: [
{ tags: [0,0, 3,2, 4,5] },
{ tags: [3,7, 4,9] }
] @flippmoke, I'm not so sure that repeating keys to make a list is better than adding a new value type. The decoder would need to do an additional key lookup for every new key, to make sure it hasn't seen it before (performance), and because the key needs to be repeated (compression). |
Maybe to make validation easier, there could be a rule that a value can only refer to values that are numbered lower than itself? This would prevent reference loops. |
@ericfischer, while it would make the validation easier, it would also make encoding slightly slower and harder. Are there any practical reasons to allow infinite nesting? @pnorman mentioned that allowing nesting inside arrays would make multidimensional (Jagged) arrays possible, but I haven't seen a usage example yet. |
I have seen one case where someone embedded an entirely different GeoJSON feature as an attribute of a GeoJSON feature. I don't know how common this need is in reality though. From my perspective, guaranteeing that sub-values will be numbered below the value that refers to them seems easy, because you can't build your list of references until you have already assigned IDs to the sub-values, so the ID that then gets assigned to the list of references, once it is complete, will have to be greater than the IDs that were assigned earlier in the process to the things that went into it. But there might be reasons to want to write it the other way around instead. (I should try writing an implementation of this to make sure I'm not missing something important.) |
@nyurik Whether or not it is a good idea, here is my experimental implementation of it in Tippecanoe: mapbox/tippecanoe#303 |
@ericfischer, thanks. I will attempt to do the same with the https://github.com/mapbox/vector-tile-js . While we are introducing a major change, question for proto-buf gurus: does it make sense to make |
@ericfischer @nyurik could you both drop sample |
I recommend not adding null @flippmoke Can you create a dev branch in the repo which is the same as the master, except with 2.1 copied to 3.0? This will let us make PRs against it |
@pnorman yes, I will setup a 3.0 dev branch. I have a feeling that a 3.0 branch will be much harder to find a consensus though compared to the 2.0. So don't expect any quick merges |
I just re-posted my proposed change as #89. In the future, we should switch to 4 space tabs - easier to view diffs |
@pnorman I have no objection to dropping key-value pairs where the value is null, which Tippecanoe (and, I think, Mapnik) does now when it encounters them. What do you think the right thing is to do when someone specifies null as an element of an array, though? |
@pnorman why is that? @ericfischer Our current JS implementation interprets a missing value (e.g. a pbf message with no field set) as |
Thanks @kkaefer. I've updated mapbox/tippecanoe#303 to encode nulls as empty messages rather than as their own type. |
The problem with this that I just discovered is that empty packed arrays in protocol buffers are indistinguishable from an empty message, so empty lists and hashes are mistaken for null values. I think we need magic empty-list and empty-hash messages (or a magic value index that means the same). |
In mapbox/tippecanoe@9319f2e1b2022 I propose:
as a way to represent empty lists and hashes. |
It would be great to add an example of encoding an
Array
orObject
from the original GeoJSON to the PBF value (from the4.5
example) to make it clear how the v2 spec should encode these values. Are the values encoded as string values and required to be parsed by the user?i.e. for an array value:
Related to https://github.com/mapbox/vector-tiles/issues/12
cc @flippmoke @danswick
The text was updated successfully, but these errors were encountered: