GeoJSON to binary arrays improvements #703

kylebarron · 2020-03-21T01:29:44Z

Index array length fixed, where the last value is coordinate length
Mixed coordinate dimensions: Should be fixed; if mixed coordinate lengths, 0 is left for the missing values
Numeric properties added to typed arrays, where each property is contained within the points/lines/polygons.numericProps object. It was helpful to make it its own object, since I loop over the keys of numericProps a couple times.
Calculations from the first pass are combined with user-provided options. So, for example, if a user didn't want any numeric properties in binary arrays, they could pass numericProps = [].
Feature properties (minus any numeric keys) are added to properties arrays by geometry type. If the user passes numericProps = [], all properties are kept as JSON.
Both featureIndex and globalFeatureIndex arrays are created.
Add new tests/fix changes to existing tests
Use smallest data type necessary for TypedArrays
An option for user-defined missing values for positions and numeric feature properties? I.e. if non-zero, fill this as the default value for positions when it's initialized, and then I think the third coordinate won't get overwritten when filling a 2D point.

Closes #702

coveralls · 2020-03-21T03:09:39Z

Coverage decreased (-0.0005%) to 62.04% when pulling a4a0361 on kylebarron:geojson-to-binary into 9b8c059 on uber-web:master.

coveralls · 2020-03-21T03:09:39Z

Coverage increased (+0.2%) to 62.24% when pulling 262a1d4 on kylebarron:geojson-to-binary into 9b8c059 on uber-web:master.

kylebarron · 2020-03-25T17:25:56Z

@ibgreen any feedback?

ibgreen

@kylebarron This looks great!

modules/gis/src/lib/geojson-to-binary.js

ibgreen · 2020-03-26T02:29:44Z

modules/gis/src/lib/geojson-to-binary.js

+    if (feature.properties) {
+      for (const key in feature.properties) {
+        const val = feature.properties[key];
+        numericProps[key] = numericProps[key] ? isNumeric(val) : numericProps[key];


The logic may well be correct, but I get confused here. Do we want this to be true only if all the values in all features are numeric? If so maybe the following is easier to follow?

// initalize for (const key in feature.properties) {propIsNumericKey[key] = true;} ... // each iteration, check if still considered numeric if (propIsNumericKey[key]) { propIsNumericKey[key] = Number.isFinite(val); }

Do we want this to be true only if all the values in all features are numeric?

I think so, yes.

If so maybe the following is easier to follow?

That snippet is initializing based on the properties in the first feature? The problem with that is that not all features are required to have all keys. With MVT in particular, if I'm not mistaken, a property could be defined in only a single feature within a layer.

So if we want numericKey to be true if the values for that prop in all features are numeric and exist, then initializing based on the properties in the first feature are fine. Otherwise, to initialize based on all properties I think it would be necessary to loop over all features just to do the initialization.

I'm currently interpreting numericKey should be true if the values for that prop in all features are either numeric or do not exist. So I need to keep track of 1) have I seen this key in a previous feature, 2) has it been non-numeric in any previous feature.

It might be good to add a comment like

// If property has been numeric in all previous features in which the // property existed, check if numeric in this feature

Sounds good. This can get pretty sophisticated, so you should draw the line here for now and then we can come back and make further improvements if we feel so inclined.

For later:

An approach used in binary columnar tables is to optionally generate a second null array to track which objects are not present (e.g. a Uint8Array).

We could detect the type of the array:

For each element check if integer. If row is all integer, generate Uint32Array

Track max/min values of all values

if (-32000, 32000), generate Uint16Array

If (-128, 127) generate Uint8Array

etc

Maybe create a follow up tasks with such ideas for a later date?

This can get pretty sophisticated, so you should draw the line here for now and then we can come back and make further improvements if we feel so inclined.

Good idea, it's clear this can be a black hole.

One more idea I thought of that can be a future task: should numericKeys depend on the geometry type?

For example, if I have a set of GeoJSON features where a numeric property x only exists on Point features, and I have 10 Point features but 1000 Polygon features, the current code instantiates a Float32Array for every Polygon vertex, when every vertex is missing data.

I don't think this would necessarily be uncommon either; I have a mountain_peak layer of points in my MVTs with a numeric elevation value, but no other layer has that property, so there would be # line positions + # polygon positions extra empty data just for that property.

Yes that is a good observation, that can be another optimization.

Right now I think we should be concerned with the "combinatorial explosion" of options + input dependent outputs, specifically relating to the testability and ease-of-use of it all.

Less optionality is probably good for now

Yes, you're right. It's easy to go on a feature-adding spree. If there's anything you think should be taken out let me know.

Now I'm pivoting to adding more tests... My initial approach would be to add tests to firstPass and secondPass separately, to more easily test edge cases. Do you know if it's possible to write tests for a non-exported function in JS? Or is it ok to export the functions for use in tests? I saw one approach on the web like

export const _testing = [firstPass, secondPass]

Yes I sometimes add a TEST_EXPORTS: const TEST_EXPORTS = {firstPass, secondPass}

Also by using an object you can then do:

import {TEST_EXPORTS} from ...; const {firstPass, secondPass} = TEST_EXPORTS;

modules/gis/src/lib/geojson-to-binary.js

ibgreen · 2020-03-26T02:43:08Z

@kylebarron Just mark the PR as "ready for review" and ping me and I will land.

kylebarron · 2020-03-28T01:23:23Z

@ibgreen
I think I covered a decent number of edge cases in the tests I added. I also updated the docs. Do you think they're clear enough?

ibgreen

@kylebarron This is a spectacular PR. I added a few comments, but will land this now.

I feel that collecting any remaining ideas that came up here in the tracker issue would be good.

Have you tried to build the website and made sure that your new docs show up? I could help with that.

ibgreen · 2020-03-28T13:37:07Z

modules/gis/docs/api-reference/geojson-to-binary.md

 `geojsonToBinary` returns an object containing typed arrays sorted by geometry
-type. `positions` corresponds to 2D or 3D coordinates; `objectIds` returns the
-index of the vertex in the initial `features` array.
+type. `positions` is a flat array of coordinates; `globalFeatureIds` references


For easier reading and a cleaner look, use a markdown table, or at least a bulleted list?

ibgreen · 2020-03-28T13:39:57Z

modules/gis/docs/api-reference/geojson-to-binary.md

    // Array of x, y or x, y, z positions
-    positions: {value: Float32Array, size: coordLength},
+    positions: {value: PositionDataType, size: coordLength},


We are in the process of introducing typescript defintions for loaders.gl 2.2. With minimal extra effort, we could actually turn what you wrote here into typescript types in a geojson-to-binary.d.ts. Look for other .d.ts files to see how it is done.

ibgreen · 2020-03-28T13:42:42Z

modules/gis/docs/api-reference/geojson-to-binary.md

-| PositionDataType | `Float32Array` or `Float64Array` | `Float32Array` | Data type used for positions arrays |
+| Option           | Type     | Default           | Description                                                                                                                                             |
+| ---------------- | -------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| PositionDataType | `class`  | `Float32Array`    | Data type used for positions arrays.                                                                                                                    |


This reminds me: It is best if the option values are representable in JSON, i.e. numbers, strings, etc. So you could support "float32" and "float64", or positionPrecision: 32 | 64

The reason is that loaders.gl is integrated with deck.gl, and deck.gl exposes loader options in its API, and for instance pydeck and non-JS language bindings go through the deck.gl JSON API intermediary.

ibgreen · 2020-03-28T13:43:55Z

modules/gis/src/lib/geojson-to-binary.js

 }

+export const TEST_EXPORTS = {


Nit: I normally put this last in file

ibgreen · 2020-03-28T13:46:17Z

modules/gis/test/geojson-to-binary.spec.js

+
+  // Test value equality, missing third dimension imputed as 0
+  t.deepEqual(points.positions.value, [100, 0, 1, 100, 0, 0, 101, 1, 0]);
+  t.deepEqual(lines.positions.value, [


Use // prettier-ignore just before line 125 to prevent auto formatting?

Kyle Barron added 15 commits March 20, 2020 16:04

Record numeric properties in firstPass

91b79d1

Set last element of path/polygon indices as positions length

4e1dcd6

Instantiate numeric property arrays

2b5c499

Put numeric properties inside numericProps object

c60e7ad

Fill numeric properties

061b435

Fix tests for path/polygon indices being n+1

dd11bf6

Use Set for keeping track of coord lengths

cb47994

Combine user options with firstPassData

266e397

Rename objectIds -> globalFeatureIndex

133e93e

Add properties objects to arrays by geometry type

9d8ea74

Fill mixed length coordinates correctly

7ba3c10

Define featureIndex arrays

2302a37

Fix linePositions -> polygonPositions

4e2fb0c

Fix creating accessor objects

1ecd7a3

Fix existing tests

a4a0361

ibgreen approved these changes Mar 26, 2020

View reviewed changes

Kyle Barron added 10 commits March 27, 2020 12:52

Add Count suffix to counter variables

038839b

Use Uint16Array where possible

b5f3c8d

Count features by geometry type

82bb874

Rename numericProps to numericPropKeys

a2cc813

Explicitly copy top-level options

dbebb59

Fix rename numericProps -> numericPropKeys

62882b4

Rename FeatureIndex -> FeatureIds

0852a20

Fix existing tests

1905218

Add firstPass tests with property-less features

988fd59

Simplify testing position values

98ebc32

Kyle Barron added 3 commits March 27, 2020 16:59

Fix finding numericPropKeys

095222f

Test resulting data types with large features array

99dbd5f

Test properties

e8c3b04

kylebarron marked this pull request as ready for review March 28, 2020 00:46

Kyle Barron added 2 commits March 27, 2020 19:07

Update docs

2671d65

Test mixed-coordinate missing dimension imputation

262a1d4

ibgreen approved these changes Mar 28, 2020

View reviewed changes

ibgreen merged commit dd8da0f into visgl:master Mar 28, 2020

kylebarron mentioned this pull request Mar 31, 2020

Tracker: GIS Category Loaders / Flat Binary Array support #685

Closed

2 tasks

xintongxia pushed a commit that referenced this pull request May 6, 2020

GeoJSON to binary arrays improvements (#703)

783f127

kylebarron mentioned this pull request Sep 9, 2024

Add comparison to loaders.gl GeoArrowLoader geoarrow/deck.gl-layers#134

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoJSON to binary arrays improvements #703

GeoJSON to binary arrays improvements #703

kylebarron commented Mar 21, 2020 •

edited

Loading

coveralls commented Mar 21, 2020

coveralls commented Mar 21, 2020 •

edited

Loading

kylebarron commented Mar 25, 2020

ibgreen left a comment

ibgreen Mar 26, 2020 •

edited

Loading

kylebarron Mar 27, 2020

ibgreen Mar 27, 2020

kylebarron Mar 27, 2020

ibgreen Mar 27, 2020

kylebarron Mar 27, 2020

ibgreen Mar 27, 2020

ibgreen commented Mar 26, 2020 •

edited

Loading

kylebarron commented Mar 28, 2020

ibgreen left a comment

ibgreen Mar 28, 2020

ibgreen Mar 28, 2020

ibgreen Mar 28, 2020

ibgreen Mar 28, 2020

ibgreen Mar 28, 2020

GeoJSON to binary arrays improvements #703

GeoJSON to binary arrays improvements #703

Conversation

kylebarron commented Mar 21, 2020 • edited Loading

coveralls commented Mar 21, 2020

coveralls commented Mar 21, 2020 • edited Loading

kylebarron commented Mar 25, 2020

ibgreen left a comment

Choose a reason for hiding this comment

ibgreen Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibgreen commented Mar 26, 2020 • edited Loading

kylebarron commented Mar 28, 2020

ibgreen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylebarron commented Mar 21, 2020 •

edited

Loading

coveralls commented Mar 21, 2020 •

edited

Loading

ibgreen Mar 26, 2020 •

edited

Loading

ibgreen commented Mar 26, 2020 •

edited

Loading