Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diff-style updates for geojson sources #1605

Merged
merged 21 commits into from
Oct 1, 2022

Conversation

mfedderly
Copy link
Contributor

This PR adds an additional public API to the geojson source which allows for diff-style updating (see #1236). I'm interested in feedback on the API before going much further. I'm happy to add tests/docs/etc once we agree on an approach.

This change adds a new method source.updateData() that takes in a GeoJSONSourceDiff that lets us represent changes to the geojson in a compact format. Updateable geojson requires that every feature contains a unique feature ID. A risk with this implementation is that updateData may throw an error if setData was called with geojson that is not updateable.

By keeping an up to date copy of the geojson in the worker, we can save a dramatic amount of time updating the source. In my testing on an M1 Macbook, the average time to update went from 11.5ms -> 1.8ms (see geojson-updates.html).

There are further optimizations that can be made on the worker side around handling partial updates with these diffs, but in this PR the worker processes the full geojson every time.

Launch Checklist

  • Confirm your changes do not include backports from Mapbox projects (unless with compliant license) - if you are not sure about this, please ask!
  • Briefly describe the changes in this PR.
  • Include before/after visuals or gifs if this PR includes visual changes.
  • Write tests for all new functionality.
  • Document any changes to public APIs.
  • Post benchmark scores.
  • Manually test the debug page.
  • Suggest a changelog category: bug/feature/docs/etc. or "skip changelog".
  • Add an entry inside this element for inclusion in the maplibre-gl-js changelog: <changelog></changelog>.

Add the ability to send partial updates to a geojson source. This is useful for large sources that get frequent small updates.
There are further optimizations that could be made on the worker side, but this saves a lot of time spent in JSON.stringify/parse.
src/source/geojson_source_diff.ts Outdated Show resolved Hide resolved
src/source/geojson_source.ts Outdated Show resolved Hide resolved
src/source/geojson_source_diff.ts Outdated Show resolved Hide resolved
src/source/geojson_source_diff.ts Outdated Show resolved Hide resolved
src/source/geojson_source_diff.ts Show resolved Hide resolved
src/source/geojson_source.ts Outdated Show resolved Hide resolved
@HarelM
Copy link
Collaborator

HarelM commented Sep 2, 2022

Isn't this somewhat covered by this PR:
#1238
I haven't looked too deeply, but they sound very similar from first impression...

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Bundle size report:

Size Change: +575 B
Total Size Before: 204 kB
Total Size After: 205 kB

Output file Before After Change
maplibre-gl.js 195 kB 196 kB +575 B
maplibre-gl.css 9.09 kB 9.09 kB 0 B
ℹ️ View Details No major changes

@HarelM
Copy link
Collaborator

HarelM commented Sep 2, 2022

I might have simply remembered the discussion here: #1236 not sure... In anycase I'll see what I can do it terms of valuable input in the next few days...

@mfedderly
Copy link
Contributor Author

#1238 looks vaguely similar but does not cover the same thing as this PR. Looking forward to the feedback when you get the time.

src/source/geojson_source.ts Outdated Show resolved Hide resolved
src/source/geojson_source.ts Outdated Show resolved Hide resolved
src/source/geojson_source.ts Outdated Show resolved Hide resolved
@mfedderly
Copy link
Contributor Author

@HarelM I updated the PR based on your feedback.

A few flags:

  1. serialize() is now not super useful for the updatable sources
  2. The way we get feature ids means that setData cannot be updated unless we add code to setData to actually turn it into a diff style update. It also implies that we can never support loading data from a URL and then updating it, because we don't have ids.

src/source/geojson_source.ts Outdated Show resolved Hide resolved
@HarelM
Copy link
Collaborator

HarelM commented Sep 6, 2022

You mean geojson_source.serialize(), right?
This means there's no way to get the latest state of the data, right?
We should consider throwing in that case maybe, or warnOnce to let anyone who uses updateData with serialize know that they are not supported somehow, maybe even in the jsdocs above the method...?
IDK...

@HarelM
Copy link
Collaborator

HarelM commented Sep 6, 2022

As a side not, in most cases it's easier and more common to simply write if (someVariable) instead of if (someVariable != null). this is just nit picking though...

@HarelM
Copy link
Collaborator

HarelM commented Sep 6, 2022

I think I now understand the full flow of things and the limitations...
This is very problematic.
This means that the API we are creating will most probably be used in an incorrect way.
This is the opposite of the pit of success. :-((
Doing setData and then updateData will probably be the most common way to use this source.
I think we need to get back to the design board and define a way to facilitate for the above two calls so that they will work as expected.
Either by creating a new type of source - updatableGeoJsonSrouce (my least preferred option)
or by adding a new parameter to setData, maybe some sort of xPath that will be used for finding the id of the given data
or something else, IDK...

@westinrm
Copy link
Contributor

westinrm commented Sep 6, 2022

so you can't make setData and updateData both work without changing the way setData behaves. Even if you made setData require IDs (be it by a property, the real feature ID, or another way) that still doesn't insulate you against someone providing two features with the same ID. Currently if you provide two features with the same ID they are both written to the vector source and they both display. These features could be entirely distinct in terms of properties, and thus styling, so you can't merge them in the vector tile.

So to make setData always work with updateData you have to make setData reject features with duplicate IDs, which is a rather major break that would make everyone that doesn't use updateData angry. You could make it discard duplicate IDs, but that will also make people confused.

I can think of two approaches that might be acceptable that allow mixing of setData and updateData:

  1. add an updateable flag that must be set when the source is created. If true, setData discards duplicates and updateData is allowed. If false, setData doesn't discard duplicates and updateData isn't allowed. This is similar to making a new source but without having to write an entire new source, spec, and worker.
  2. allow storing multiple features per ID and when a patch/remove request is received for an ID apply that patch/remove request to all the features with that ID. This feels like it has the potential to be very confusing.

@HarelM
Copy link
Collaborator

HarelM commented Sep 8, 2022

I think that duplicate IDs is a "second" priority.
From my point of view making setData and updateData to work one after the other (maybe with a relevant optional parameter) is the highest priority.

@mfedderly
Copy link
Contributor Author

@HarelM let me know if I missed anything that needs testing. I tried getting maplibre-gl-js-docs set up on my local but I'm running into issues on npm start so the docs aren't done yet.

@HarelM
Copy link
Collaborator

HarelM commented Sep 24, 2022

@wipfli @birkskyum any specific issue you are aware of with the docs repo?

@birkskyum
Copy link
Member

birkskyum commented Sep 24, 2022

I can run the docs just fine as they are now, on M1 machine. It's not possible to test git branches with the docs like this in package.json:

"maplibre-gl": "git://github.com/maplibre/maplibre-gl-js.git",

Releasing a new prerelease will make it easier to work on the docs.

@wipfli
Copy link
Contributor

wipfli commented Sep 24, 2022

I don't know. @mfedderly feel free to post the error report

@birkskyum
Copy link
Member

The errors returned when using this version outside of a release look like this on npm install:

npm ERR! code 1
npm ERR! path /Users/admin/repos/maplibre-gl-js-docs/node_modules/maplibre-gl
npm ERR! command failed
npm ERR! command sh -c -- node ./postinstall.js
npm ERR! (node:2485) ExperimentalWarning: Custom ESM Loaders is an experimental feature. This feature could change at any time
npm ERR! (Use `node --trace-warnings ...` to show where the warning was created)
npm ERR! (node:2485) ExperimentalWarning: The Node.js specifier resolution flag is experimental. It could change or be removed at any time.
npm ERR! (node:2485) ExperimentalWarning: The Node.js specifier resolution in ESM is experimental.
npm ERR! RangeError [ERR_UNKNOWN_MODULE_FORMAT]: Unknown module format: null for URL file:///Users/admin/repos/maplibre-gl-js-docs/node_modules/maplibre-gl/build/generate-style-code.ts
npm ERR!     at new NodeError (node:internal/errors:393:5)
npm ERR!     at ESMLoader.load (node:internal/modules/esm/loader:647:13)
npm ERR!     at async ESMLoader.moduleProvider (node:internal/modules/esm/loader:458:11)
npm ERR!     at async link (node:internal/modules/esm/module_job:67:21) {
npm ERR!   code: 'ERR_UNKNOWN_MODULE_FORMAT'
npm ERR! }
npm ERR! node:internal/errors:863
npm ERR!   const err = new Error(message);
npm ERR!               ^
npm ERR! 
npm ERR! Error: Command failed: npm run codegen && npm run generate-query-test-fixtures
npm ERR! (node:2485) ExperimentalWarning: Custom ESM Loaders is an experimental feature. This feature could change at any time
npm ERR! (Use `node --trace-warnings ...` to show where the warning was created)
npm ERR! (node:2485) ExperimentalWarning: The Node.js specifier resolution flag is experimental. It could change or be removed at any time.
npm ERR! (node:2485) ExperimentalWarning: The Node.js specifier resolution in ESM is experimental.
npm ERR! RangeError [ERR_UNKNOWN_MODULE_FORMAT]: Unknown module format: null for URL file:///Users/admin/repos/maplibre-gl-js-docs/node_modules/maplibre-gl/build/generate-style-code.ts
npm ERR!     at new NodeError (node:internal/errors:393:5)
npm ERR!     at ESMLoader.load (node:internal/modules/esm/loader:647:13)
npm ERR!     at async ESMLoader.moduleProvider (node:internal/modules/esm/loader:458:11)
npm ERR!     at async link (node:internal/modules/esm/module_job:67:21) {
npm ERR!   code: 'ERR_UNKNOWN_MODULE_FORMAT'
npm ERR! }
npm ERR! 
npm ERR!     at checkExecSyncError (node:child_process:871:11)
npm ERR!     at execSync (node:child_process:943:15)
npm ERR!     at file:///Users/admin/repos/maplibre-gl-js-docs/node_modules/maplibre-gl/postinstall.js:3:1
npm ERR!     at ModuleJob.run (node:internal/modules/esm/module_job:193:25)
npm ERR!     at async Promise.all (index 0)
npm ERR!     at async ESMLoader.import (node:internal/modules/esm/loader:527:24)
npm ERR!     at async loadESM (node:internal/process/esm_loader:91:5)
npm ERR!     at async handleMainPromise (node:internal/modules/run_main:65:12) {
npm ERR!   status: 1,
npm ERR!   signal: null,
npm ERR!   output: [
npm ERR!     null,
npm ERR!     Buffer(330) [Uint8Array] [
npm ERR!        10,  62,  32, 109,  97, 112, 108, 105,  98, 114, 101,  45,
npm ERR!       103, 108,  64,  50,  46,  52,  46,  48,  32,  99, 111, 100,
npm ERR!       101, 103, 101, 110,  10,  62,  32, 110, 112, 109,  32, 114,
npm ERR!       117, 110,  32, 103, 101, 110, 101, 114,  97, 116, 101,  45,
npm ERR!       115, 116, 121, 108, 101,  45,  99, 111, 100, 101,  32,  38,
npm ERR!        38,  32, 110, 112, 109,  32, 114, 117, 110,  32, 103, 101,
npm ERR!       110, 101, 114,  97, 116, 101,  45, 115, 116, 114, 117,  99,
npm ERR!       116,  45,  97, 114, 114,  97, 121, 115,  32,  38,  38,  32,
npm ERR!       110, 112, 109,  32,
npm ERR!       ... 230 more items
npm ERR!     ],
npm ERR!     Buffer(884) [Uint8Array] [
npm ERR!        40, 110, 111, 100, 101,  58,  50,  52,  56,  53,  41,  32,
npm ERR!        69, 120, 112, 101, 114, 105, 109, 101, 110, 116,  97, 108,
npm ERR!        87,  97, 114, 110, 105, 110, 103,  58,  32,  67, 117, 115,
npm ERR!       116, 111, 109,  32,  69,  83,  77,  32,  76, 111,  97, 100,
npm ERR!       101, 114, 115,  32, 105, 115,  32,  97, 110,  32, 101, 120,
npm ERR!       112, 101, 114, 105, 109, 101, 110, 116,  97, 108,  32, 102,
npm ERR!       101,  97, 116, 117, 114, 101,  46,  32,  84, 104, 105, 115,
npm ERR!        32, 102, 101,  97, 116, 117, 114, 101,  32,  99, 111, 117,
npm ERR!       108, 100,  32,  99,
npm ERR!       ... 784 more items
npm ERR!     ]
npm ERR!   ],
npm ERR!   pid: 2481,
npm ERR!   stdout: Buffer(330) [Uint8Array] [
npm ERR!      10,  62,  32, 109,  97, 112, 108, 105,  98, 114, 101,  45,
npm ERR!     103, 108,  64,  50,  46,  52,  46,  48,  32,  99, 111, 100,
npm ERR!     101, 103, 101, 110,  10,  62,  32, 110, 112, 109,  32, 114,
npm ERR!     117, 110,  32, 103, 101, 110, 101, 114,  97, 116, 101,  45,
npm ERR!     115, 116, 121, 108, 101,  45,  99, 111, 100, 101,  32,  38,
npm ERR!      38,  32, 110, 112, 109,  32, 114, 117, 110,  32, 103, 101,
npm ERR!     110, 101, 114,  97, 116, 101,  45, 115, 116, 114, 117,  99,
npm ERR!     116,  45,  97, 114, 114,  97, 121, 115,  32,  38,  38,  32,
npm ERR!     110, 112, 109,  32,
npm ERR!     ... 230 more items
npm ERR!   ],
npm ERR!   stderr: Buffer(884) [Uint8Array] [
npm ERR!      40, 110, 111, 100, 101,  58,  50,  52,  56,  53,  41,  32,
npm ERR!      69, 120, 112, 101, 114, 105, 109, 101, 110, 116,  97, 108,
npm ERR!      87,  97, 114, 110, 105, 110, 103,  58,  32,  67, 117, 115,
npm ERR!     116, 111, 109,  32,  69,  83,  77,  32,  76, 111,  97, 100,
npm ERR!     101, 114, 115,  32, 105, 115,  32,  97, 110,  32, 101, 120,
npm ERR!     112, 101, 114, 105, 109, 101, 110, 116,  97, 108,  32, 102,
npm ERR!     101,  97, 116, 117, 114, 101,  46,  32,  84, 104, 105, 115,
npm ERR!      32, 102, 101,  97, 116, 117, 114, 101,  32,  99, 111, 117,
npm ERR!     108, 100,  32,  99,
npm ERR!     ... 784 more items
npm ERR!   ]
npm ERR! }
npm ERR! 
npm ERR! Node.js v18.9.1

@mfedderly
Copy link
Contributor Author

Yeah I'm getting the ESM errors that @birkskyum posted above. Sounds like we need to merge this and do a prerelease and then I can write up the docs.

src/source/geojson_source_diff.ts Show resolved Hide resolved
src/source/geojson_source_diff.ts Outdated Show resolved Hide resolved
@HarelM
Copy link
Collaborator

HarelM commented Sep 28, 2022

@westinrm can you check if all the code review fixes were addressed?
I think I resolved most of mine, there's still a need to open an issue to track a future change request.

@westinrm
Copy link
Contributor

all my comments are addressed. I can't seem to actually mark them resolved for some reason.

@HarelM
Copy link
Collaborator

HarelM commented Sep 29, 2022

I've added a few last comments.
Sorry for the late review.
After those are fixed I'll merge this.
Thanks for all the hard work!

@HarelM
Copy link
Collaborator

HarelM commented Sep 30, 2022

Can you add this PR to the changelog.md file?
Forgot about that...
I need to change the checklist to be more clear about this I guess...

@mfedderly mfedderly force-pushed the geojson-diff-to-worker branch from b759104 to 39e5456 Compare September 30, 2022 17:41
@HarelM
Copy link
Collaborator

HarelM commented Sep 30, 2022

Can you change the links to be "full" like the other links in the changelog?

@HarelM HarelM merged commit 9e2bed3 into maplibre:main Oct 1, 2022
mfedderly added a commit to mfedderly/maplibre-gl-js that referenced this pull request Jan 23, 2023
* Diff-style updates for geojson sources

Add the ability to send partial updates to a geojson source. This is useful for large sources that get frequent small updates.
There are further optimizations that could be made on the worker side, but this saves a lot of time spent in JSON.stringify/parse.

* updates per CR

* DirectGeometry -> Geometry

* Only keep a copy of the data on the worker thread

* remove the feature id requirement

* enforce that only empty feature collections become updateable

* Use ids and promoteIds

* clean up some PR noise

* more cleanup

* use Set and Map

* Fix existing tests by turning loadGeoJSON into a lambda

* geojson_source_diff tests + fix to the worker now that its a map

* geojson_worker_source tests

* Fix issue with null data payloads

* Fix rendering tests (and maybe the rest of them too)

* cr nits

* updates per CR

* changelog

* changelog again

Co-authored-by: Matt Fedderly <mfedderly@palantir.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants