-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluating Cityscapes #148
Comments
Are you able to re-produce the ground-truth numbers by running the provided script? |
Not for the FCN scores. The pre-trained caffe model doesn't seem to give correct outputs. |
What's the number you are getting? |
The provided script does not resize the images down to 256x256 (due to an outcommented line). When I run the script on the ground-truth images in "gtFine/val/frankfurt" and look at the images outputted by the pretrained model I get: input:(1024x2048) Rescaling the images to 256x256 before feeding them to the pretrained model does not seem to help: input: (256x256) Did you get better looking segmentation masks? |
Evaluating on the first 20 images in "gtFine/val/frankfurt" using 256x256 scaling results in these scores:
So, pretty bad, but as expected (when taking into account that the segmentation masks is classifying almost everything as "road"). |
Just to make sure, to get the ground-truth number, did you first construct a folder of original Cityscapes images resized to 256x256 and then run the provided script without modification? python ./scripts/eval_cityscapes/evaluate.py --cityscapes_dir /path/to/original/cityscapes/dataset/ --result_dir /path/to/resized/images/ --output_dir /path/to/output/directory/ |
These are also numbers from the first 20 images? Is it possible for you to run on the entire test set or does it take too long? |
What does seem to work is rescaling the images to 256x256 and then resizing them back to the original resolution (1024x2048) before feeding them to the network (as suggested by @FishYuLi). I get the following segmentations: And these scores on the frankfurt images:
|
Glad that it worked out. But if you have a folder of 256x256 images, this line should do the scaling for you to the original resolution. Did you need to an extra scaling before running the code? |
Yes, thanks. @tinghuiz that’s right (if you resize the images to 256x256 and keep the labels/ground-truth segmentations in their original higher resolution). |
Hi @tychovdo, I have read the discussion here and the discussion here regarding generating the FCN score. Having followed what you did, I am still unable to get meaningful predictions from the FCN model. I am just trying the original validation images from the original Cityscapes dataset (1024x2048), resized them to 256x256, and then resized them back to 1024x2048 before giving it to the model. I am using the
I appreciate your time. |
We don't resize the ground truth prediction. Please see this note for more details.
|
Thanks for your response.
To make sure the problem is not from saving, I used So my problem is mainly with the output of the semantic classifier. I will further investigate it, as I see in other threads that some people have managed to solve the issue (@FishYuLi I would be happy if you have any thoughts on this). |
Hi @MoeinSorkhei , just a guess. is it possible that your |
Hi @tinghuiz , I investigated it, and indeed the range of the output of |
Hi, I am giving an update in case this might be helpful to someone: What I did was to install Caffe (with GPU support) from this repository, and to use exactly the Although the corresponding |
Hi, Did you run into any problems with memory whilst running the caffe model? Any help would be highly appreciated! Kind regards, Erik |
Hi, You should not get this error if you evaluate 1 image at a time. Are you using the provided code for evaluation? In that, the images are evaluated one by one in a for loop, and the GPU that I used (with 11GB memory) was able to perform the forward pass for evaluating the images. |
Hi, Thanks for the quick reply! I am running it on Colab, which should give 12GB (or even more I think). I do run it with the provided evaluate.py file (under scripts/eval_cityscapes), which has the loop in it. Also followed your tips for the resizing, thanks for that! Weird that it worked for you with 11GB, should be something else still then.. I am currently running it on CPU, which takes quite long but it does seem so stay within the limit of 25GB memory (it uses 23GB now). Just to be sure, you also only resized (to 256x256) the images in leftImg8bit and the ones ending on _color.png in gtFine? Kind regards, |
Hi, Actually the amount of CPU memory that I allocate for running this job is at most 15GB, so I think you are doing something unnecessary here. Best, |
Hi, Thanks again! I have got that working now. Last question: did you also resize the results from testing or do you keep those original size as well? Kind regards, |
Hi, What do you exactly mean by the results from testing? |
The output of our trained model. |
If you mean the images that are to be evaluated by the FCN model, the answer is yes. Let me know if I still understand your question wrongly. |
We updated the evaluation description. It might help. |
It looks reasonable. Our paper's numbers are based on models trained with Torch repo. We expect a slight difference between PyTorch models and Torch models. Sometimes better sometimes worse. |
Ok, great! Thanks again for the help. |
Hi, I am using another generative model to generate images of different sizes. When I generate 128x256 (height 128, width 256), the FCN score would be reasonable. However, when I evaluate generated images of size 256x512, I get scores that are higher than the ground truth. I thought evaluating 256x512 images with your FCN model would be OK because I resize all the generated images to 256x256 before feeding to the FCN model. It seems I can only evaluate images that are actually of size 256x256 at generation time and resizing after generating images (before feeding into the FCN model) to 256x256 would reproduce wrong results. Do you have any thoughts on this? Thes are the numbers I get:
And this is ground truth 256x256 (similar to the paper):
I appreciate your thought. |
Hi,
I'm having difficulties reproducing the results from the CycleGAN paper for the cityscapes evaluation. For the city->label classification scores I get very similar results. But, for the label->photo FCN score experiment I get really bad results. I used the code from the ./scripts/eval_cityscapes folder and trimmed it down a bit to find the error (see code below): I load a single image from the cityscapes dataset, resize and preprocess it using the code from the repo and then perform a forward pass through the pretrained caffe model.
Unfortunately, the caffe model outputs mostly 0s. Do you have any suggestions?
^Left to right: "orig", "resized" and "segmented"
Thanks in advance.
The text was updated successfully, but these errors were encountered: