Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use JavaScript for everything, split sources into features and resources #490

Closed
wants to merge 8 commits into from

Conversation

bhousel
Copy link
Member

@bhousel bhousel commented May 15, 2018

Much of this approach is copied from https://github.com/osmlab/osm-community-index
closes #168 #211 #214 #216 #217 (maybe #179 with the polygon deduplication)

There's a lot going on here, but I'll try to list out what has changed:

What's done:

Everything that was Python/Make is now JavaScript
This makes it much easier to setup your development environment and manage dependencies. The only prerequisite now is Node.

  • npm install to install the dependencies
  • npm run build to build the index
  • npm run test to test it
  • package.json contains all the dependencies and commands you can run

The files originally under sources/ have been split up:

  • features/ (.geojson files for the polygons)
  • resources/ (.json files for the imagery information - original files were git mv here, to preserve history)
    This lets us reuse polygons..

License is changed to ISC

  • it's just simpler
    We decided against this, to allow continued sync with JOSM

More goodies:

  • .geojson files are automatically rewound correctly
  • coordinates are truncated to 5 decimal places
  • feature and resource files are all prettified in a format that is easier to read and diffs better
  • npm run stats will show you what files are taking up the most space in the index
  • RELEASE.md contains instructions on how to prepare a release, publish to release branch, tag (github release) and npm publish
  • Translation file i18n/en.yaml is done

Still to do:

  • restore some of the tests that the old code did (checking for valid urls, etc)
  • restore a backwards compatible output for imagery.json / imagery.geojson
  • restore the imagery.xml output
  • actually go through and dedupe the polygons (right now I just copied them from source)
  • convert some features to "true" MultiPolygons (this is Support GeoJSON MultiPolygon #217)
  • fix the preview map
  • update CONTRIBUTING and README instructions

I also recommend:

  • switch the main branch from gh-pages to master
  • get rid of the Robot Fairhurst integration

@bhousel bhousel requested a review from grischard May 15, 2018 06:56
@grischard
Copy link
Collaborator

Cool, and impressive for one evening of work!

I'm not against retiring Robot Fairhurst, but what's your plan to build releases automatically when a source is updated? Something similar with Travis/Jenkins?

@imagico
Copy link
Contributor

imagico commented May 15, 2018

Trying to leave out the political aspect of this (which practically mostly means future updates from me would likely be untested) i would like to - since you change the source file format again - add a general comment regarding that - which probably also applies to https://github.com/osmlab/osm-community-index.

In a project like this were people are meant to be encouraged to contribute by editing and adding to the source data and not just the surrounding code it is universally a good idea to - i will borrow the terminology of the GPL here - maintain the source data in the preferred form of the work for making modifications.

Manually editing JSON/GeoJSON files is not really the most sensible form of doing this when you have fairly unstructured data like here. It deters contributions and makes people invest energy into making their work convenient for the programmer. A format for manual edits should be intuitive and fault tolerant, no nested structures where avoidable, etc. I mostly use simple KEY=Value formats with one item per line in such cases - no quotes, no braces. Separating the footprint geometries as not meant to be manually edited can be sensible but you don't really need a nested directory structure for that, just flatly store them with id as filename. It might also make sense to make this optional and allow simple footprint geometries to be inlined with the other attributes (after all the icons are inlined as well - and they are not human readable at all).

But the most important thing is not to change the format people are asked to contribute in whenever it might seem convenient from a data user perspective. As said the requirements of the data format are defined by the contribution ergonomics and these do not typically change considerably unless the scope of content changes.

@tyrasd
Copy link
Member

tyrasd commented May 15, 2018

fix the preview map

I can help with that, if needed.

get rid of the Robot Fairhurst integration

I assume you would prefer replacing it with a mechanism using npm publish, as outlined in openstreetmap/iD#4994, right?

@simonpoole
Copy link
Contributor

Aehmm ... how is the re-licensing supposed to work?

@bhousel
Copy link
Member Author

bhousel commented May 15, 2018

from @grischard

I'm not against retiring Robot Fairhurst, but what's your plan to build releases automatically when a source is updated? Something similar with Travis/Jenkins?

We could leave it in if you find it helpful, but I don't like how it introduces conflicts.
It doesn't really do that much.. npm run build && git add -A && git commit -m 'run build' right?
I just see it as something that saves you about 6 seconds of time until it breaks and you need to spend minutes to figure out why.

@bhousel
Copy link
Member Author

bhousel commented May 15, 2018

from @tyrasd

fix the preview map

I can help with that, if needed.

Great!

get rid of the Robot Fairhurst integration

I assume you would prefer replacing it with a mechanism using npm publish, as outlined in openstreetmap/iD#4994, right?

Yeah - the idea is that people would consume the index by pulling from tagged releases on GitHub or the release branch, or from npm, rather than "whatever is in master", and we would accept that master might be broken/unbuilt sometimes.

@bhousel
Copy link
Member Author

bhousel commented May 15, 2018

from @simonpoole

Aehmm ... how is the re-licensing supposed to work?

None of the original code remains, so there isn't really anything complicated here. I tend to use ISC these days because it is simpler and freer. Are you really attached to CC-BY-SA 3.0?

@bhousel
Copy link
Member Author

bhousel commented May 15, 2018

from @imagico:

maintain the source data in the preferred form of the work for making modifications.

I actually think this proposed change improves a lot on this idea.. As new imagery is updated, often the only thing being modified is the imagery url and name/description.

You can look under existing north_america/us/ to see that we have 3 TIGER centerlines, 2 USFS, and 2 USGS sources.. It's silly and wasteful to include 7 copies of the United States border polygon in this index.

@simonpoole
Copy link
Contributor

@bhousel I'm not attached to CC BY-SA at all, but I expect you will find that the original content creators for this index (JOSM devs and others) might not be happy with your unilateral decision to simply ignore the original licence.

@imagico
Copy link
Contributor

imagico commented May 15, 2018

As said separating the footprints from the rest of the metadata is not necessarily a bad idea. The US however is not really a good example for reusing footprints because most of the layers have different actual coverages which are not all the full US - large scale imagery is only CONUS, Forest Service data seems to not cover Puerto Rico and the topographic maps do not cover the outer Aleutians. Most image layers have specific coverages which are not exactly identical to each other or a national boundary. Practically multiple layers with exactly the same footprint are a rare exception, mostly in case of several related layers by the same provider (like visible light and NIR images). Allowing contributors a simple way to contribute national image layers with a simplified way to specify a national border as an approximate coverage footprint (like with the Nominatim boundary polygons database) would not hurt of course. But on the other hand there are many layers where the GeoJSON footprint code is smaller than the icon definition of the layer.

My comment was mostly about choosing a source data format with consideration for the contributor rather than for the downstream data users or conversion script development.

Frankly i currently see no good reason to change the format for reasons other than improving contribution ergonomics (which this change currently does not). Allowing to specify footprints by reference to an external file or to a national boundary could change that - but this should be optional and not mandatory as in your current draft. If in the process of addressing #130 more sophisticated handling of coverage polygons with additional meanings is to be added this might be a factor as well but without a concrete plan for that it is not really possible to draft a change that makes sense at this time.

@bhousel
Copy link
Member Author

bhousel commented May 15, 2018

@bhousel I'm not attached to CC BY-SA at all,

Cool, good to know..

but I expect you will find that the original content creators for this index (JOSM devs and others) might not be happy with your unilateral decision to simply ignore the original licence.

Please tag anyone who you think might care.

@bhousel
Copy link
Member Author

bhousel commented May 15, 2018

from @imagico

As said separating the footprints from the rest of the metadata is not necessarily a bad idea.

Cool, glad you can see the benefit..

But on the other hand there are many layers where the GeoJSON footprint code is smaller than the icon definition of the layer.

Yeah the icons can be kind of wasteful, especially in duplication. I was actually thinking of splitting out the icons too.. So they would sit in a separate folder and get merged into a spritesheet or something rather than existing as data urls attached to the imagery resources. Would this be helpful?

Frankly i currently see no good reason to change the format for reasons other than improving contribution ergonomics (which this change currently does not)

I think it does improve things. I mean, it's already not easy to contribute to this index - you need to know about tile servers and licenses and stuff. The kinds of people who want to add an entry here are not going to be deterred by needing to add both a .geojson and a metadata .json (instead of how things currently are where they add a .geojson with metadata in the properties).

@simonpoole
Copy link
Contributor

@Klumbumbus see above licence issue (imho not solvable without an immense amount of work, and in particular not worth the effort).

@imagico
Copy link
Contributor

imagico commented May 15, 2018

The kinds of people who want to add an entry here are not going to be deterred by needing to add both a .geojson and a metadata .json (instead of how things currently are where they add a .geojson with metadata in the properties).

That is an understandable but errant impression from a programmer perspective. From the viewpoint of a mapper licenses and tile server URLs are things you become familiar with automatically over time. But the hurdle to contribute to this index is unreachable for most mappers even as is. And it would be even higher if it is mandatory to provide two or three different files carefully named and located with respect to another in the directory structure or referencing each other.

The current format is anything but ideal but the mapper can take an existing layer as a template and go from there. For the footprint you can sketch it in JOSM, save it as GeoJSON and copy&paste the coordinate list. As said i would choose a different format probably but this is not a big deal for most i suppose.

Duplicating the same footprint or icon code several times is not a contribution hurdle. The advantage of having footprints in separate files would in terms of contributor ergonomics mainly be that they can be generated with standard tools like ogr2ogr. But copying and pasting a WKT or GeoJSON geometry is easy enough as well.

@Klumbumbus
Copy link
Contributor

None of the original code remains, so there isn't really anything complicated here. I tend to use ISC these days because it is simpler and freer. Are you really attached to CC-BY-SA 3.0?

Do you want to change the license for the whole ELI project or only for the "program code" leaving the license of the contents of https://github.com/osmlab/editor-layer-index/tree/gh-pages/sources untouched?

@bhousel
Copy link
Member Author

bhousel commented May 15, 2018

from @Klumbumbus

Do you want to change the license for the whole ELI project or only for the "program code" leaving the license of the contents of https://github.com/osmlab/editor-layer-index/tree/gh-pages/sources untouched?

Whole project, ideally.. I don't see any reason to have separate licenses for the program code and the sources.

@stoecker
Copy link

@Klumbumbus asked to comment on a possible license change in https://josm.openstreetmap.de/ticket/16294:

from @bhousel:

Whole project, ideally.. I don't see any reason to have separate licenses for the program code and the sources.

Well. To change the license you must:

  • Find all contributors (not simply all git users, but all the original authors)
  • Contact all the contributors and ask them to agree to that change
  • Drop any content where authors do not agree or aren't reachable
  • [See OSM license change in the years 2010-2012 to find out how it is done]

Especially point 1 may be a lot of work. And 2 partly impossible, as you don't have contact information for many of them.

And after the license for data is changed a sync between JOSM and ELI is no longer possible, as JOSM license is CC-BY-SA (and LGPL for any wiki contents after 2014 where possible). Also it may be impossible to use OSM data for boundary polygons afterwards, but I'm not sure about this.

So my suggestion: Don't do it. If you want ISC in any case drop all data and start from scratch.

@bhousel
Copy link
Member Author

bhousel commented May 16, 2018

And after the license for data is changed a sync between JOSM and ELI is no longer possible, as JOSM license is CC-BY-SA

Hey @stoecker - thanks for the context, this is a good reason to leave the index license alone. I'll change the PR so that the index stays CC-BY-SA, and the source code is ISC.

I was very confused about the license situation, since package.json claims BSD, the LICENSE file claims CC-BY-SA, and the scripts/LICENSE file claims unlicense.

@tyrasd
Copy link
Member

tyrasd commented May 16, 2018

But the hurdle to contribute to this index […] would be even higher if it is mandatory to provide two or three different files carefully named and located with respect to another in the directory structure or referencing each other.

I agree. I have raised similar issues a few weeks back also at the then brand new osm-community-index: osmlab/osm-community-index#7 (comment)

@andrewharvey
Copy link
Collaborator

switch the main branch from gh-pages to master

Please do, it makes contributing easier.

@Marc-marc-marc
Copy link
Collaborator

Marc-marc-marc commented May 21, 2018

this PR seems to do too many things at once when these elements have nothing to do with each other.
perhaps it would be necessary to list some goals of the separate issue in order to show the various objects well.
and above all, it is necessary to make them the most independent of each other in order to be able to merge what suits everyone without being blocked with a hudge PR.

  • switch python -> javascript
    I don't care myself due the fact I never code for this project.
    But of course, having a more uniform stack with other project is a good idea.
    other devs (~12) of this project should give a comment if the change 'll help or 'll retain them to work in this project.
    same question for the licence of new scripts.

  • solving some open issue
    of course it's a good idea :)
    but you can fix them separately if possible
    for example if versionning is still waned, let's talk about it in Start versioning the repo and output #214 and it is easy to add 2 more commands in travis to do it without the need to wait

  • changing the source format.
    I agree with imagico that this change doesn't help new contributor (and 'll also make some current contributor unhappy like me cause current tools I use 'll go in trash...)
    If you want to facilitate the contribution, you need a simple input format or a web interface where the user easily fills in the requested information. it is enough to see the number of tickets where information is missing at the first submission to realize the problem.
    asking people to deduplicate the shape at the input is an additional complexity and even useless (either they are offered a list of existing polygons, or make or equivalent can do the deduplication automatically)

  • output format
    existing output format 'll be keep to avoid consumer of eli the need to change their code.

  • changing gh-pages to master
    if wanted, that can be another issue/PR easy to merge

  • CONTRIBUTING.md
    you drop various distro command, at least they must be kept for git and add node.js

  • What's done:
    the sentence about node dependency should be changed to node.js because some think it was an osm node :)

@bhousel
Copy link
Member Author

bhousel commented May 21, 2018

@grischard are you mostly ok with this PR or should we abandon it? Not really sure how to proceed.

Since people seem really hung up on the "2 files" thing, I could put everything back under sources/ and have a single .geojson file per polygon, with multiple sources stored in an array in properties.

@simonpoole
Copy link
Contributor

The "two files" bit is what I actually like about this PR, but it needs to be accompanied with a solution for creating at least new entries that makes things a bit easier (its probably not really needed for later edits). Lets say a form for the attributes plus the ability to select one of the existing geojson features (and the ability to include one in a PR if none exist that fit).

@grischard
Copy link
Collaborator

This PR indeed tries to to many things at once, which makes it hard to review. I'll try to split them individually.

Switching to JS: I'm more familiar with python, but this should ultimately be a doocracy. If a javascript rewrite helps us close some issues without creating new ones, why not? I think our familiarity with js for you, python for me, might make us underestimate the complexity of setting up a new environment for a newbie. Ideally, I don't want contributors to have to install or run anything.

The input and output format change: I'm not in favour of it, mostly because it isn't supported by our ecosystem. It's really useful, for example, to be able to tell people to simply load their file onto geojson.io to adjust their imagery boundaries.

Having npm releases: it seems to me that we're partially solving an iD problem here. The current travis system guarantees that we'll only ever build valid files. If we ever change the output format in a breaking way - which we shouldn't - we could create an imagery_v2.geojson. We could also create github releases, which build systems or iD itself could download, but I don't really see the advantages. We could also keep the current system, and build npm releases with travis.

The goodies are all good. Simplifying to 5 decimals is something we already do.

As I said on IRC a few days ago, I'm still not sure I see the business case for the rewrite. It has an organisational cost to the project, and we should be sure that it's worth that cost and that there are no easier alternatives before we spend the energy on it. What does this do that is impossible to do more easily?

@bhousel
Copy link
Member Author

bhousel commented May 26, 2018

As I said on IRC a few days ago, I'm still not sure I see the business case for the rewrite. It has an organisational cost to the project, and we should be sure that it's worth that cost and that there are no easier alternatives before we spend the energy on it.

Ok, let's leave things as-is then..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Just make this JavaScript already
9 participants