-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for diff uploads #140
Comments
I use HTTP Basic Auth over HTTPS most of the time when I need to use the redacted history API.
I expect there's a fair number of editors using that. Some even over HTTP. Right now, for feature parity, we'd need to either keep HTTP Basic requests going to the rails port or support them in cgimap. I wouldn't be opposed to dropping basic auth support, and would support dropping basic auth without HTTPS, but this issue tracker isn't the right place to make that policy change. |
A quick update from my side. I completed a first proof of concept implementation, which covers creating/modifying/deleting nodes/ways/relations, including tags, way nodes and relation members. Also, copying from current_* tables to history tables is part of the PoC, as well as an update of the number of changes in the changeset. One of my test cases with 4341 nodes and some 180'000 tags in total that took around 15-20 minutes on rails now finishes in just 5 seconds. The code already survives a JOSM editing session as a drop in replacement for rails code. I see a number of follow up activities (in no particular order): Review code
Actions
SELECT "user_blocks".* FROM "user_blocks" WHERE "user_blocks"."user_id" = $1 AND (needs_view or ends_at > (now() at time zone 'utc')) LIMIT $2
Nice to have:
Open issues: https://github.com/mmd-osm/diffupload/issues |
This comment has been minimized.
This comment has been minimized.
@zerebubuth : I started integrating my coding into cgimap now, please see https://github.com/mmd-osm/openstreetmap-cgimap/tree/feature/bulk_upload The biggest challenge at the moment is to include https://github.com/mmd-osm/diffupload/blob/master/src/diffuploader.cpp - as there's really no framework available in cgimap yet for HTTP POST request semantics. It doesn't really fit into the existing concept of a data_selection, and I would need the OsmChange message payload available (preferably as a file), along with the current UID and changeset number, and of course a database connection. The router needs some concept to distinguish HTTP GET / HEAD / OPTIONS / POST, and restrict the path to certain request methods. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
That's undoubtedly true of the diff upload API call, and possibly several others. I hope it's no longer true of anything that goes through cgimap, as we had some awesome contributions from @jronak in #130 and #136 which meant cgimap could make a single database call for extracting nodes, one for extracting ways and another for extracting relations. No doubt it would be possible to get that down to a single call, but the extra saving of a handful of round-trips didn't seem to be worth the extra effort. According to #122 (comment), each round-trip adds about 0.2ms, so a few 10s of round-trips isn't too bad. We had issues when cgimap was making 1000s of round-trips for individual objects.
It's not an explicit goal of cgimap to only issue selects without joins or sorts. In fact, cgimap now does many joins, some of which require sorting. I've tried to avoid sorting on the whole query because I'm worried about the extra load on the database. I'd love to be proved wrong about that, and we could make a few simple changes to enable sorted output.
I agree totally, and I hope to see some awesome stuff from fastmap-wrapper in the future. Having multiple projects means we can experiment with better ideas. If I were designing cgimap from scratch today then it would look totally different, and there's a number of mistakes in the design which have either required a lot of effort fix, or continue to require effort to work with. For example, the backend "independence" isn't something cgimap really uses, and leads to some ugly code to try and continue supporting. The data selection interface was intended to abstract over the data model, allowing handlers to be written in a style which concentrated on what was being queried, rather than how - ideally I wanted something which was close to a graph query like Gremlin, but unfortunately it didn't really work out that way. For sure, there's a lot of code in cgimap which ought to be provided by some HTTP standard library, as it would be in Go and any other sensible modern language for writing web services! Unfortunately, the origins of cgimap date back to 2009, so it carries some baggage because of assumptions which were true but are no longer, and the languages and libraries which were available at the time. @mmd-osm: Thanks so much for the work that you've been doing. I'm sorry I haven't had a chance to review it yet, but I'll try to get to that as soon as I can. |
@mmd-osm I had a quick look through and it looks good so far, thanks! Your code style is a little different from the repo style, but that's not worth worrying about now. The biggest issues seem to be plugging it into the request flow (i.e: not I've started trying to add code so we can support POST requests. Will PR that into your branch as soon as it's ready. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@zerebubuth : I think we've already achieved some extremely good progress with your refactoring work, so thanks for that. I'm continuing now in my branch: feature/bulk_upload...mmd-osm:feature/bulk_upload and created #143 to merge my changes back into your branch. List of changes
Current statusHere's a screenshot taken in wireshark for the very first successful changeset upload via cgimap, and JOSM was also happy with the result 💯 🥇 I did some local edits using Potlatch, Potlatch2 and iD as well. For testing, I can highly recommend to apply a few changes to the lighttpd config file: https://github.com/mmd-osm/openstreetmap-cgimap/wiki/lighttpd-proxy It acts as a reverse proxy for both CGImap and the Rails port. All requests to http://localhost:31337 get automatically dispatched to correct server, even OAuth works flawlessly. ~~~Open topics~~~ --> solved
|
@tomhughes : at some point in the future we may want to deploy the changeset upload redesign on the dev instance to get some wider community exposure and identify bugs we couldn’t come up with in our own unit tests. Has cgimap ever been deployed on the dev instance before, alongside a working rails port installation, and some Apache / lighttpd to route requests to either one? How would you roughly assess the time and effort needed? TIA. |
No, that has never been done. It shouldn't be too hard, if we do it for all the dev apis. Doing it for just one will be much harder. |
@tomhughes : I have another short question on how to deploy cgimap on the dev instance. Some older commits in the chef repo indicate that the dev instance is also managed by chef and some PPA for cgimap would be needed for deployment. I don't seem to find any information describing how this should actually work. Do I have to go that route or would compiling the sources on the dev instance also be feasible? The new changeset upload doesn't introduce any new dependencies except for a C++11 compiler. So whatever worked in the past to deploy cgimap should basically still be valid. I have merged all 3 open PRs in this repo into https://github.com/mmd-osm/openstreetmap-cgimap/tree/demo - this could be used for testing. |
Ah I forgot cgimap was coming from a PPA now - it used to be build from source. I don't really want to reintroduce building from source, so a PPA is probably best but it will need to be separate to our main one or the package will need to have a different name. |
Hmmm. ok, this is going to be a bit of a challenge then, as I can't find the PPA packaging instructions for CGImap anywhere. The As the openstreetmap-cgimap - 0.6.0-1~xenial1 package installs both shared libs and the cgimap binary, it doesn't make too much sense recreating those settings from scratch. Edit: It looks like the relevant details are available here: http://ppa.launchpad.net/zerebubuth/openstreetmap-cgimap/ubuntu/pool/main/o/openstreetmap-cgimap/openstreetmap-cgimap_0.6.0-1~xenial1.debian.tar.xz I believe I'd still need some support from @zerebubuth on how to create a ppa which could be used for testing on the dev instance. |
Yup, apologies - that's not documented anywhere. I think I was so exhausted by the time I finally got it working that I didn't write it down anywhere. The distro-specific packaging is kept on distro-specific branches, of which there's currently The process of building is something like (apologies if this doesn't work - mostly from memory):
Based on whether the package is formatted and signed correctly, Launchpad may or may not send you an email. If you don't get an email and the package doesn't show up in the build queue, it's worth checking the |
Ooops, looks like I had a whole bunch of local changes too. Pushed those to |
Many thanks for your very detailed description of the build process. I totally agree it's really mind-boggling. As I'm working with a fork of your repo, I also tried to follow https://unix.stackexchange.com/questions/324680/how-to-apply-a-patch-in-a-debian-package - The idea here is to use your package as a baseline and apply my changes as patches to it. This however got rejected with:
After a bit of further research there seems to be yet another way:
(instead of using tar directly, some guides recommend I'm not totally happy with this process yet, as it skips the local build. That's a huge pain on its own as any issues will only show up on the launchpad build process log half an hour later. NB Later I found out that there's some way to run the build process locally via pbuilder detailspbuilder :: initial setup
pbuilder :: build binary package
After a few iterations, launchpad finally produced some binary files for a testing version named Build results
I added the xenial PPA to my local box, installed openstreetmap-cgimap and ran a few test cases, which seemed ok. For better visibility into failing test cases, I'd also propose the following change to debian/rules:
|
@tomhughes : now that we have PPAs for both xenial and bionic, I wonder what the next step would be. Open an issue on the operations tracker, maybe? |
We're likely to be extremely busy in the near future, and it's very much non-trivial to do, but open a ticket and I'll get to it when I can. |
Thanks, will do soon. Meanwhile, I also set up a mini test drive on http://static.64.206.46.78.clients.your-server.de:31337/user/mmd2 (= http://78.46.206.64:31337/user/mmd2) so people can already start playing around with the new upload without impacting the dev instance. To try it out create a new user, wait up to 5 min. for auto confirm process to run in background (no email confirm needed), and go ahead editing in Potlatch 2, iD or JOSM (see instructions on the user page on how to set up OAuth for JOSM). There's also a blog post out there now: https://www.openstreetmap.org/user/mmd/diary/44318 |
@mmd-osm not getting very far with testing uploads right now (getting a 400 without any further messaging), maybe a short interactive session on irc would be the easiest to debug this and avoid spamming this issue (downloads work and OAuth handshake seemed to run smoothly too). |
@simonpoole : many thanks for your feedback. Could you somehow extract the osmChange message? Which editor did you use for testing? We could arrange an irc session maybe later tonight. |
@mmd-osm The current diff upload mechanism in Vespucci uses OkHttp which uses Transfer-encoding chunked, could this be causing the issue? Capture from wireshark enclosed |
That's interesting, I guess you don't encounter this issue when testing against dev.openstreetmap.org? |
Works ok vs dev instance etc.. It is more than slightly painful to get OkHttp to change its ways so I'm not sure if I can do A-B testing today, but I intend to asap. |
@simonpoole : I feel like chasing chunked transfer encoding issues on lighttpd might not be worth it, so I decided to set up an Apache instance on port 80 now, following the Chef cookbook. It serves as a reverse proxy for fcgi (like in the cookbook), but also forwards requests to the Rails port running locally on port 3000. Unfortunately I cannot follow all steps in the cookbook, as I'm not really familiar with Passenger and don't know how to set that up properly. At least for JOSM and iD this doesn't seem to impact the upload: http://static.64.206.46.78.clients.your-server.de/changeset/2000000054 Could you please give this alternative URL a try: http://static.64.206.46.78.clients.your-server.de I'm getting the following in the logs now (reponse= HTTP/500)
Looks like you've created your first successful changeset in Vespucci: http://static.64.206.46.78.clients.your-server.de/changeset/2000000055 🏆 |
Works now, thanks. A simple diff upload completes now and seems to do the "right" thing..... now on to test if you meticulously copied all the error messages :-) (The error message you saw was because I was trying to convince OkHttp not to use chunking, which involves turning off gzipping and a lot of other stuff). |
Sigh the celebrations were a bit premature, with my standard code it now fails in the actual diff upload with the error Http header 'Content-Length' missing (which is true :-)). So likely you need to handle the chunked transfer in whatever you are using for cgi-map as a server too. |
Pretty good that we're discovering this in early testing, I was expecting some 'i don't know what i don't know' issues, that's why I set up this demo. None of the other editors I tested uses chunked encoding. Right now CGImap is running as daemon ( There's a check for the Content-Length HTTP Header in my code, and if it's missing, it will return an error message. I think I need to take a closer look on how to deal with this. Somewhat relevant maybe: https://bz.apache.org/bugzilla/show_bug.cgi?id=53332, https://bz.apache.org/bugzilla/show_bug.cgi?id=57087 Does this stop you from testing other things now, or can you go back to the same mode as for cs 2000000055? |
There's no problem in testing with the "working" code, but the problem still has to be resolved: on the one hand OkHttp is "fairly" popular (as in the likely the most used HTTP library globally), on the other hand supporting HTTP 1.1 doesn't exactly fall in the newfangled stuff category. |
Strictly speaking, CGImap doesn't even talk HTTP but FastCGI protocol and Apache and its modules do all the translation between HTTP/* and FastCGI. Now the question boils down to how chunked encoding is getting mapped onto the FastCGI protocol. I wonder if you could make your Vespucci APK file available for testing somehow. After installing the released version from Github, I figured out that I'm missing proper endpoint information for OAuth, presumably this has to be added to https://github.com/MarcusWolschon/osmeditor4android/blob/master/src/main/res/values/apis.xml. |
@mmd-osm sure I can make an apk available for download, I assume this should be with chunked transfers (the OAuth key config is currently static and requires a recompile, which I've naturally done). APK https://drive.google.com/file/d/1aXcvPJ2DEDk43pqwoaPWE-mPLM7G96J3/view?usp=sharing Note this was built from code that is undergoing some refactoring wrt presets and still has a couple of remaining issues, so it shouldn't be used for anything else than testing. You will need to add an API entry (in the Advanced preferences) for the test server with the URL http://static.64.206.46.78.clients.your-server.de/api/0.6/ and then set that as the current API instance to use. |
@simonpoole : thanks a lot. I could install the apk now and already reproduce the "Content-length missing" error. |
@simonpoole : I have adjusted the logic to extract the POST payload a bit and could now successfully upload a change via Vespucci: http://static.64.206.46.78.clients.your-server.de/changeset/2000000073
|
Thanks to @tomhughes tireless work on the Chef repo, the changeset upload is now ready for general testing on the dev instance: https://upload.apis.dev.openstreetmap.org 🥇 Settings for JOSM It supports both Basic auth and OAuth, as well as uncompressed and compressed messages for osmChange upload. Examples: https://upload.apis.dev.openstreetmap.org/user/mmd2/history Patch for JOSM to test compressed uploadIndex: src/org/openstreetmap/josm/io/OsmApi.java
===================================================================
--- src/org/openstreetmap/josm/io/OsmApi.java (Revision 14258)
+++ src/org/openstreetmap/josm/io/OsmApi.java (Arbeitskopie)
@@ -4,6 +4,9 @@
import static org.openstreetmap.josm.tools.I18n.tr;
import static org.openstreetmap.josm.tools.I18n.trn;
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringReader;
@@ -21,6 +24,8 @@
import java.util.Map;
import java.util.function.Consumer;
import java.util.function.Function;
+import java.util.zip.GZIPInputStream;
+import java.util.zip.GZIPOutputStream;
import javax.xml.parsers.ParserConfigurationException;
@@ -642,6 +647,17 @@
return sendRequest(requestMethod, urlSuffix, requestBody, monitor, true, false);
}
+
+ public static byte[] compress(String data) throws IOException {
+ ByteArrayOutputStream bos = new ByteArrayOutputStream(data.length());
+ GZIPOutputStream gzip = new GZIPOutputStream(bos);
+ gzip.write(data.getBytes(StandardCharsets.UTF_8));
+ gzip.close();
+ byte[] compressed = bos.toByteArray();
+ bos.close();
+ return compressed;
+ }
+
/**
* Generic method for sending requests to the OSM API.
*
@@ -682,7 +698,10 @@
}
if ("PUT".equals(requestMethod) || "POST".equals(requestMethod) || "DELETE".equals(requestMethod)) {
+ System.err.println(urlSuffix);
client.setHeader("Content-Type", "text/xml");
+ if (urlSuffix.endsWith("/upload"))
+ client.setHeader("Content-Encoding", "gzip"); // TEST ONLY
// It seems that certain bits of the Ruby API are very unhappy upon
// receipt of a PUT/POST message without a Content-length header,
// even if the request has no payload.
@@ -689,7 +708,14 @@
// Since Java will not generate a Content-length header unless
// we use the output stream, we create an output stream for PUT/POST
// even if there is no payload.
- client.setRequestBody((requestBody != null ? requestBody : "").getBytes(StandardCharsets.UTF_8));
+
+ if (urlSuffix.endsWith("/upload")) {
+
+ byte[] compressedRequestBody = compress(requestBody != null ? requestBody : ""); // TEST ONLY
+ client.setRequestBody(compressedRequestBody); // TEST ONLY
+ }
+ else
+ client.setRequestBody((requestBody != null ? requestBody : "").getBytes(StandardCharsets.UTF_8));
}
final HttpClient.Response response = client.connect(); |
Now that we have support for OAuth, we can add support for diff (i.e:
osmChange
) uploads. It seems like there's a considerable improvement to be made by batching queries to the database.Considerations:
The text was updated successfully, but these errors were encountered: