-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Depth data endpoint removed, need to get metadata elsewhere #8
Comments
Getting the relevant metadata is kind of annoying, because there is no longer just an easy URL that we can use. We have to get the data from a StreetViewService object through JavaScript. Fortunately, all the data we're talking about here is already saved in Project Sidewalk databases. I'm not going to bother getting the data into this repository for now, since we have it saved elsewhere. If we realize that we need that metadata in the same directory as the saved images at some point, I'll set up a way to get that data from an endpoint in Project Sidewalk. I'll also wait a little bit before removing the functionality for grabbing depth data on the off chance that the endpoint is added back in. |
Hi Mikey, I was looking to run these tools to download the GSV data used in the paper "Deep Learning for Automatically Detecting Sidewalk I encountered the issue you described here when trying to run DownloadRunner.py script (can no longer grab metadata as URL is a bust). Is there a way the XML metadata can be shared (copyright issues etc.)? I appreciate this is an open-source project, and maintaining it can be tricky so I am really grateful for any help or advice you can provide me with! Rob |
Hi Rob, Are you just looking for the metadata? Or the depth data as well? For the depth data, we wouldn't be able to help due to legal constraints, as you said. If it's just the metadata, that's something we can definitely get to you. I haven't set up the API endpoint yet to get that data, unfortunately... What's the timescale you're looking at for when you'd need the data? If you've got a few weeks, maybe I'll just go ahead and set up the API endpoint so we are set up for the future. If you're looking for something faster, I should be able to just run a SQL query on our database and send you a CSV. |
Hi Mikey, Great to hear back from you so soon. I'm relatively familiar with DownloadRunner.py and CropRunner.py after spending some time looking over them, but probably best I explain what my intended use case is and then you'll be able be best placed to say if this is possible or not. Ideally I'm looking to replicate the original dataset (with some slight tweaks). So that would involve downloading and processing the panoramas (on my end using the project repo) to generate the cropped areas of interest (curb ramps, missing curb ramps etc.). I do not require the depth maps for my work, but would guess looking at CropRunner I would need a distance value from Lidar/Car to the X,Y coordinate of the object in the scene (to allow for scaled crops using the [size (pixels) = 4/15 * distance +200]. I'm not sure if this distance value is something you could generate/provide via the metadata, or does this become more work then it's worth for you... The goal is to use this dataset for computer vision and accessibility work I'm planning (people with low vision). I can wait a couple of weeks for sure, as at the moment, this is the best dataset available out there by a long way. If I'm filling up your issue ticket too much, feel free to contact me directly via: robert.young2@ucdconnect.ie. Thanks again for all the help. |
Hi @robertyoung2, I'd prefer to keep this conversation in this ticket, if possible with the goal of potentially better understanding your problem and a potential fix (which will help others, I'm sure). Re: CropRunner.py: The key issue is that Google took down the unpublished endpoint for scraping depth data and we were using depth information to generate a reasonable crop around the point label in the panorama. However, Mikey has worked on an alternative solution... perhaps he could share it here and you could try to implement and report back. If it works, we could create a PR? Alternatively, Mikey could try to add in his solution (which we've integrated into core Project Sidewalk) into CropRunner.py but our team is generally pretty slammed so your help would be beneficial. You can find out more background on this here:
Re: DownloadRunner.py This is a different issue than above and may or may not be easier to fix. @misaugstad would have to comment. Perhaps we could get a ugrad on it @misaugstad? |
Hi @jonfroehlich, Good to hear from you, help and discussion is appreciated, I will keep our conversations to this ticket. Re: CropRunner.py: |
Right, I wasn't involved in writing that code, so I didn't realize that the depth info was used there. But looking at it now, I see that it definitely is 😁 That method that @jonfroehlich linked to is the best approach we've got for that right now. Help in incorporating that into Note that this method also requires the metadata that used to be in the XML files that I would need to get to you. Would you like me to send you a CSV dump with that metadata? I would be able to get that to you this week. It would also make it easier for you to begin playing around with updating |
Hey @misaugstad I'd be more than happy to give implementing the approach you discussed with @jonfroehlich in the linked notebook into Just for my own understanding, and so I'm clear, could you describe what each of these variables relate to and if these come in the mentioned metadata dump or if they're values I need to calculate:
The sidewalk-panorama-tools repo has some of these variable names in sample_meta.xml and labeldata.csv. I can see in the tools there's a link to an SQL query which can pull the full label list (still correct?) - getFullLabelList.sql. Just want to make sure my understanding is solid/variable names have not been changed or updated :) Sending me a CSV dump of all the metadata also sounds great, I can tweak Thanks again for the quick responses and help :) |
Sure thing. I'll start by saying that all of them will come in the CSV I'm going to send you; none of these need to be calculated.
So in Project Sidewalk, we've got a 480 px high by 720 pixel wide canvas through which you see/pan/zoom the GSV image.
Correct! The x/y coordinates on the GSV pano (of the label).
Oh, yes! Great, so you already have some sample data to work with! I'm going to write a slightly modified query, but the parameter names will all be the same and mean the same thing 👌
Sounds good! |
This is great and simplifies things a lot.
Ok that makes sense, I can see how canvas size fits into the project sidewalk online interface. I also am not 100% familiar with the repo so maybe more work and thinking needed from my side... 😬
I guess we can assume then that the pano is mostly made up of zoom level 5 (and some level 3 tiles). I can see in I could then look at making an initial crop of height 480px and width 720px (project sidewalk canvas size) around the x,y label point to simulate your regression equation conditions as best as possible, and then see if the depth makes sense when making the final 'object' crop. Correct me if what I've said doesn't make sense/isn't compatible with the metadata + regression equation, working through the possible implementation out loud! |
Hmm. I think that it might be best to add a little piece to that analysis document where I add a regression that just tries to predict the distance from the pano using |
Just did a quick check and got very very similar results using that method. I'll more carefully add an analysis to that repo later, but for now you can use the formula:
|
Thanks for that equation, I've played around with it in a very rough notebook on the one sample panorama I had access to from There's going to be a lot of code I'll need to review and try to understand to integrate well, as there seems to be some edge cases that will need to be accounted for. For example, in
For now I've set crop size to 50 in this case (seems to be the approach taken in the code if crop size less than 50 pixels). I think your equation did a great job. See below for comparison: Original Sample Cropped Curb RampCrop of Curb Ramp Using Regression EquationI used the labels in sidewalk-cv-assets19/dataset_csvs/ for this panorama to get a few further examples: Cropped Curb Ramp Example 2Surface Problem Example |
Great! Really glad that it seems to be working in most cases at least! Please keep me updated as you continue! And I promise that I'll get you the full data set soon 😅 it's been a hectic couple weeks! |
It looks like it'll work well enough. It'll be great when you get the dataset complete and sent over, as I can start to download some images and then use the metadata headings to play around with the regression equation a little more! Then I can start to update/rework the existing code to hopefully make it work. It's been a very hectic year already, I know the feeling... |
Okay I think I've probably got all the metadata you need here. I didn't include info about links to other panos, but I doubt that that's relevant :) Let me know if I missed anything or if you have questions! And definitely let me know if you get things working 😁 |
This is great, thanks for sorting out and providing. I'll spend the next few days/weeks working on this, and will keep you posted with my progress/any questions here! Rob |
Sounds good. Thanks Rob!
Jon
…On Mon, Feb 22, 2021 at 4:04 AM Robert Young ***@***.***> wrote:
This is great, thanks for sorting out and providing.
I'll spend the next few days/weeks working on this, and will keep you
posted with my progress/any questions here!
Rob
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAML55MIIEFY5VXXH5BVUZLTAJB25ANCNFSM4TT7GVXQ>
.
--
Jon E. Froehlich <https://jonfroehlich.github.io/> (he/him
<https://www.mypronouns.org/he-him>) | @jonfroehlich
<https://twitter.com/jonfroehlich>
Associate Professor, Allen School of Computer Science & Engineering
Core Faculty, Urban Design and Planning, College of Built Environments
Director, Makeability Lab <https://makeabilitylab.cs.washington.edu/> |
Associate Director, CREATE <http://create.uw.edu/>
University of Washington
Help make sidewalks more accessible: http://projectsidewalk.io
|
Hi @misaugstad, Overdue update on where I've gotten so far. I made a fork of this repo, which is where I am currently carrying out all my work - robertyoung2/sidewalk-panorama-tools . Currently working from the branch meta_csv_implementation. Changes made and implemented so far over the last week:
DownloadRunner.py (majority time spent here)Completed
To-Do:
Issues/QuestionsGSV Pano IDs - Broken/DefunctA lot of the gsv_pano_ids seem to be defunct. However it might be possible to replace some using the lat and long coordinates of the pano_id, and the Google Street View API/developer console with the command:
Raising to see if this is something you were aware of or had considered. I've checked for a couple of dead gsv_ids and this does work to get the latest id and access the new image tiles (not tested beyond exploring). sv_image_x and sv_image_y labelsMy biggest problem once I complete DownloadRunner.py is that I don't seem to be able to get the sv_image_x and sv_image_y labels working correctly on the GSV pano, so maybe you can point me in the right direction as to where I'm going wrong. The tool currently implements locating the x and y points for given labels by doing the following:
The complete code for the above snippet can be seen here. This worked fine using your distance model and the sample labels I found on the repo, but I cannot replicate this with the csv-metadata-seattle.csv you provided me with. Looking at the values for one GSV pano I was testing with (vab8Pdz_crutOLFqS9r9Ew) it looks like the sv_image_x coordinates do not need altering, and line up well with objects as is. I can't seem to work out the sv_image_y value though (I end up too high above the object currently). Previously in the tool it took the midpoint and then added (subtraction of a negative value) to this midpoint to get the y value for the label. Is a correction for pitch/working out the horizon line needed for this? Thanks in advance, hopefully this all makes sense and isn't too long a read.. Rob |
Wow, this is awesome. Just chiming in here quickly. Re: GSV Pano IDs - Broken/Defunct Re: sv_image_x and sv_image_y labels
|
@jonfroehlich after investigating a little bit more, |
As @jonfroehlich said, the rest of the parameters that we have recorded for a label are unlikely to work on new imagery at that location. |
Hi @misaugstad and @jonfroehlich, thanks for the quick responses!
This makes sense, and that is the case when I went back through some historical GSV panos. Thanks both for informing me, shame as it seems to be failing on a lot of the labels (no longer found using the Google links).
I was using the image size provided via the metadata csv, but I can see the actual saved pano image size depends on a conditional check in DownloadRunner.py:
Previously my downloaded pano images were of size (13,312, 6,656), I've switched to 16,384 by 8,192 pixels for downloads now. As Mikey pointed out, CropRunner.py seems to access the values in GSVimage which are set to (13312, 6656) for width and height. I think I've found what the problem is (to a degree) but I'm not sure quite why it works/if it is applicable to all GSV panos (it depends if all images on the server are of size [16,384, 8,192] or there's a mixture of [16,384, 8,192] and [13,312, 6,656]?) Sticking with GSV pano id vab8Pdz_crutOLFqS9r9Ew as the example I did the following:
I've tried this on about 3-4 different GSV panos now, and the larger downloaded image size with the sv_x and sv_y scaling factor seems to be working well overall (about 10% of labels look wrong, could be user tagging error). I made a simple gist of the code I have been testing with for image cropping for reference which you can see here if you want further background/check my work has no mistakes 😅 |
Okay, so it looks like we are just assuming that the imagery has that higher resolution in Seattle, and that it doesn't elsewhere. I think this has pretty consistently been the case, which is why they created the conditional like that. So you are (probably) safe to use the higher resolution values throughout. However, I discovered that we are currently just putting the 13312x6656 no matter what without even checking it (seems like that was just the assumed resolution in 2016 when the code was written). But I figured out a way to check going forward (documented in this Github issue). That won't be super helpful for you right now, but thank you for catching this so that we don't continue making this mistake!
They are probably all that higher resolution. Hoping to design a method to check the resolution of old imagery while resolving the issue I linked to above. But I'm not sure what the timeline will be on that. For now, it is probably best to just assume a higher resolution, since the dataset I gave you is from Seattle.
If you send me screenshots of a few that look right and a few that look wrong (along with their pano IDs), I can check that they actually line up with where users place their labels!
That... is very weird! I'm very interested to know if that ends up working consistently and why. I really hope that we aren't recording the |
HI Mikey, I've been working on getting the DownloadRunner.py working async, so this is faster for image gathering etc (behind a proxy). I completed this today.
Having completed this and attempted to run it, I think there is still a slight issue with the gsv pano image sizes that means it can't always be assumed to be (16,384, 8,192). At the moment the loop for url generation works like this:
See line 179 - DownloadRunner.py for the rest of the code relating to url generation. For images of assumed width and height - 16,384, 8,192:
For images of assumed width and height - 13,312, 6,656:
Assuming the large image size becomes an issue when the GSV pano does not extend that far (you end up getting errors or black squares back). My idea is to take some of the existing code to check if an image is all black (check first square at position 0,0), and use this to test for the extent of the greater range. If the square returned for say x = 31, y = 15 is black, then switch image size to the smaller one for this pano. An example of the smaller pano image size is WFb1J3rDtVya6h030JgyGQ. This has black squares (does not extend past width = 25) for the larger default size, but download works with the smaller size. Also sv_image_x and sv_image_y crop are correct for this smaller image (no multiplying factor needed). I still have a feeling the larger images need that factor, but haven't been able test extensively yet. Looks like a mix of different GSV pano sizes and an sv_image_x and sv_image_y which did not compensate/adapt for the larger images might be the cause of the issues! Should be easy enough to correct with some logic in CropRunner.py (get size of loaded GSV image etc.) One final thing that might be worth you looking over is regarding csv you shared with me. The pano_id=stxXyCKAbd73DmkM2vsIHA has 4530 labels associated with it, there might be some sort of error going on here with how these labels were stored/possibly duplicated 😬 As always, thanks again for the help and support, hope you're all keeping well! |
Very cool!! Love this.
Yep that sounds like a good idea to me! I'm assuming you're talking about the code/link here!
Ugh, yes so that is a panorama from the tutorial. For some reason, a sizable number of labels from the tutorial are not being marked as such. In fact, there is one street in DC that we added to our Seattle database (to use for the tutorial), so there should be a number of labels from that street in DC that you should probably filter out. I wouldn't add this to the scripts, but it would be worth making a new CSV by filtering out labels with lat/lng in DC (or just sufficiently East of Seattle would work).
I'm so glad that you're willing and able to put so much time into this! I am doing well, and I hope you're having as much fun with this as I am 😊 |
I have DownloadRunner.py running as intended now with asyncio and aiohttp (and proxies and random headers optionally also). From what I've seen so far about ~50% of the pano ids look to be dead using the links and methods, which is a shame but not much can be done about that. That's it, yes. I have implemented the change/check and it runs well!
Ah that makes sense, thanks for letting me know.
I've had a look over the pano image size differences we spoke about before. There's three different possible sizes for the GSV pano:
I tested 10 images from the largest size, 10 from the older/smaller size and 4 from the fallback size (all I've encountered so far) using the metadata you provided me with. I'm relatively confident after checking these images that the sv_image_x and sv_image_y coordinates you provided me with relate to the older/smaller pano image sizes. This means for this size GSV pano the points line up correctly for cropping. For the largest GSV pano sizes and the fallback sizes, a scaling factor needs to be applied to get the correct positions:
I'm now in the process of updating/getting CropRunner.py working with the downloaded images. I hope everything I've said above makes sense, if not, or if you want more information/results to check just let me know! And yep, having fun, beats some of the more mundane work I have to do some weeks 😁 |
This all sounds great @robertyoung2 ! I am honestly expecting more issues than just scaling with "Largest" and "Fallback" image sizes, given what I've seen in the core Sidewalk code. Once you've got everything set up, we should be able to see if there are patterns and determine if that's an issue. Thank you for all of this!! |
Hi @misaugstad, Hope you're keeping well! I've just about got CropRunner.py up and running using the csv data you provided me with. I've implemented a couple of enhancements + workarounds for the image scaling/size issues we've discussed. I should be in a position to give you a review/list of updates for both the main scripts in the next week or two if you are interested in a pull request for it. I've run CropRunner.py on a couple of hundred GSV images for quality control. I would say 80-90% of the crops are really good and make sense. I am encountering two re-occurring issues in the crops relating to label positions and I was hoping you could have a look over to see if it's a labelling issue, or a problem on my end I need to work on. The first is a chunk of the labels seem to be slightly above the object of interest. I have tested these images with and without my compensation/scaling for GSV image size and I don't think this is the issue. I've selected a couple of label-ids for you to over-check and see if this is just a labelling error:
The second re-occurring issue is label positions that are not on any actual object of interest. Again I've tested this with and without scaling for image size, and I don't believe this is the cause of the issue:
I've created local issues on my fork of your repo for tracking my own work. These have some sample crops for each of the issues described above if you want a quick visual check before digging deeper: Hope this makes sense, if you have any questions or need more info, just let me know! |
We would love a pull request!!
Great, thank you. The full list of
Yeah this is a consistent problem we've been dealing with... I can't totally tell yet if it is something where users don't want to cover up the thing they are trying to mark, so they label just above it, or if it is a bug on our end. This issue is also much more common when people are marking things that are far in the distance. For example, if I just look at the first label you list here, and zoom all the way out on the panorama, what we get is the screenshot below. Then the 2nd screenshot is the zoomed in version. All that being said, we are currently in the process of looking into this. You can see our Github issue for it here. Any more
Also in the process of looking into that here. I would love to know if you are seeing this on the 13312x6656 images at all (though I realize that there are fewer images of this size). If we don't see it at that image size at all, it would be strong evidence to confirm my suspicions about the cause of the problem (which is definitely on our end). |
Great thanks for the info! I've gone through some of the initial crops made by CropRunner.py from the GSV panos and made csvs for both issues. I went direct to both issue tickets you linked to (hope this ok) and added a comment explaining what I did, and attached the relevant csv files for you so you have some samples to look into further. Let me know if you need more info/samples than I've provided! |
@robertyoung2 just a quick update for you: I spent most of Wednesday/Thursday working on figuring out what's going on with those label placement issues. I think that they are actually both the same problem. I think that the crop is actually only ever off in the y direction! After going through the algebra, it seems that multiplying by 1.230769231 is actually just completely valid, and is the same as using the correct resolution from the start. I'll fix it soon so that we use the correct resolution from the start, but nice to know that the fix will work for now :) In your list of images that are completely missing the mark, I see two things going on. The first is that maybe half of them are actually just place where users labeled something completely incorrectly. In that CSV I gave you, we don't filter out labels where people have given them a bunch of downvotes, so you've just got some BS data in there. So those aren't cropped incorrectly, it was just never supposed to be on a legitimate curb ramp 😉 And for the other ones in that list, they just seem to be off in the y direction to varying degrees! I've been doing some more tests, and based off of where in the image I place a label, the error in the y direction could be anywhere from 50 to 500 pixels on the same image! I've got more testing to do next week to figure out if there is a pattern, to see if there is some correction we can do or something we are doing completely incorrectly right now. It's been pretty exhausting, but hopefully I can get it figured out next week! |
Morning @misaugstad, That sounds like brilliant work! It's great to see the intuition about the image size and the scaling factor makes sense, and has pointed us in the correct direction to solving the issue of differing image sizes. I'll keep using it in the short term for my work confident knowing it's a legitimate fix. With respect to images where labels are way off the mark, that makes sense, some of those labels are up in trees or in the middle of the road. Good to know it's just a poor user placement issue rather than some more complicated geometry error.... I wonder if there's a way to get a confidence measure based on downvotes to filter these labels out? It's a small issue, but it'd be good to separate the wheat from the chaff 😬 The final point about being off in the y-direction is an interesting one, and was actually another point I was going to raise this week (sorry for the constant stream of issues...). I have noticed that some objects have multiple labels, which aren't duplicate tags as the coordinates are all slightly different, but could be considered duplicates as they are all labelling the same object. An example of this is below:
I was thinking above averaging such points, but this could cause an issue where you have a great label placement and then drag it to a poorer placement by using the average. My thinking for now is to just leave it as is, but more raise it so you were aware 😅 Thanks again for all the great work, constant updates and help, I do really appreciate it! Feels like we're getting very close to getting a lot of the issues resolved! |
I am hoping to add validation information (and filtering) into our API soon (issue 1668) which should take care of this for the most part.
Right, so with that, it might be useful for you to also use our API. The /attributesWithLabels API include "accessibility attributes" which are just the result of a clustering algorithm used on the individual labels. With that API endpoint, the individual This comes with a different set of problems for your use case though. Namely, that we are clustering labels that are labeling the same object but from different panoramas. And you may want to keep one per image. You could also do something where you take one label per image per cluster instead of one label per cluster?
And I want to thank YOU so much for all of your contributions and careful testing! It has been immensely helpful for us. |
@robertyoung2 just wanted to give you a quick update: the timing of things didn't end up working out this week so I couldn't finish figuring out the bug we've been working on. Hopefully I'll have something for you late next week! |
Hi @misaugstad, That's not a worry at all. I'm the same, end of semester for teaching and corrections and it's all a bit crazy! I still need to tidy up my commits and make pull requests so you have the updated code. I've also been implementing a new option in crop-runner which I'll also share and explain once complete (data prep geared towards object detection algos). I have completed running DownloadRunner. For reference it looks like about 51% of the panos are still active/valid and available for download. Look forward to hearing from your again soon! |
Hi @misaugstad, I've made a pull request with all the changes discussed over the months here. I've tried to make the pull request as detailed and as clear as possible. If anything isn't clear, or if you spot something that's incorrect or not working, let me know! Hopefully this brings the repo up to speed and fixes the issue. I'm also working on some other functionality for dataset processing which I can share with you once complete and tested (more of an aside). Thanks, Rob |
@robertyoung2 thank you so much for all of your contributions here! Looking at my schedule realistically, I've been busy working on a push for an ASSETS poster in addition to my normal responsibilities maintaining the core Project Sidewalk code. Which is why I haven't yet finished the investigation into the incorrect label locations we've been talking about in this thread. This is most likely going to keep me busy through mid/late June. As of right now, this is my top priority when I have more free time again. So I would like to say that you could expect real updates from me in late June / early July on this. First in dealing with the label placement issues, then with going through your PR. Apologies for my delays here, some things just end up taking more time than you'd expect 😅 I want to thank you again so much for all your work here! This has been incredibly helpful. Both in the upgrades you're making, and in the bugs you've pointed out. |
Hi @misaugstad , Totally understand, not a worry at all, hope the ASSETS poster went/is going ok! It would be great as soon as you do have any updates on the label placement to get them, as I'm just about to use the data points for a model I'm going to train and the less sky crops, the better :) |
Google has taken down the endpoint that allowed us to get depth data. Not much we can do about that. I think that the important thing for right now is that we were using that endpoint to grab the metadata for the pano as well. The depth data was just included in that metadata XML. We are still going to want that metadata, and that should be available from another endpoint.
The text was updated successfully, but these errors were encountered: