Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement random sunlight variation during dataset generation #91

Closed
2 of 3 tasks
andrefdre opened this issue Feb 10, 2023 · 49 comments
Closed
2 of 3 tasks

Implement random sunlight variation during dataset generation #91

andrefdre opened this issue Feb 10, 2023 · 49 comments
Assignees
Labels
enhancement New feature or request

Comments

@andrefdre
Copy link
Collaborator

andrefdre commented Feb 10, 2023

As suggested during the meeting, the following steps will consist in:

  • Open the mesh where there are windows or doors.
  • Use ROS topics to change directional lights
  • In case the model doesn't behave correctly to windows with blank image, add a background

During quick test on the meeting, we founded that Gazebo only simulates shadows with directional lights which we will use to simulate the sun. The lights presented in the room won't have shadow simulation, but we argued that it won't be a problem due to low interaction with the scene.

@andrefdre andrefdre added the enhancement New feature or request label Feb 10, 2023
@andrefdre andrefdre self-assigned this Feb 10, 2023
andrefdre added a commit that referenced this issue Feb 13, 2023
@andrefdre
Copy link
Collaborator Author

I manage to change sun rotation using the service SetLightProperties from gazebo_msgs.

@miguelriemoliveira
Copy link
Collaborator

Hi @andrefdre ,

that's good news. Can you post an image or a video to exemplify?

@andrefdre
Copy link
Collaborator Author

When adding -uvl option when running dataset acquisition implemented by @DanielCoelho112, now also generates random sunlight rotation. Should I think about sun behavior? Right now, I'm only changing the pitch between -1.57 and 1.57.
Demonstration:
Screencast from 13-02-2023 15:30:44.webm

@miguelriemoliveira
Copy link
Collaborator

Hi @andrefdre ,

Looks good, but I think you can add more variebility to the angles, the intensity, and the presence of this and that light.

Also, during the video, try to rotate the point of view so that it's easier to understand the light is coming through the windows.

@andrefdre
Copy link
Collaborator Author

One thing I noticed with some angles of the sun, the rendering gets weird.
Rendering with weird artifacts:
Screenshot from 2023-02-13 17-46-28
Screenshot from 2023-02-13 17-46-45

This also happens with the point lights, but with attenuation it isn't visible. I tried changing attenuation values of the sun but it isn't affected by them.

@miguelriemoliveira
Copy link
Collaborator

Hi @andrefdre ,

I would say should be a separate issue. I have no idea what this is. Perhaps some google searches ...

@andrefdre
Copy link
Collaborator Author

andrefdre commented Feb 14, 2023

I added more variation to the lights. First, there is a video that will change the light with continuity in mind. The second video, every time it generates a new pose also generates random lights.
In the first video, it's possible to see the light starting to appear in the window to the right.

light.demonstration.mp4
test.mp4

Would it be interesting to play around with having a background maybe a sky and changing the time of the day?
Something like this:
Screenshot from 2023-02-14 17-19-20

Screenshot from 2023-02-14 17-18-20

@miguelriemoliveira
Copy link
Collaborator

miguelriemoliveira commented Feb 15, 2023

Hi @andrefdre ,

the second video is very nice. I liked it a lot and saved it in my videos.

The first one, where the light changes progressively, is not very good. Two reasons:

  1. The direction of the light is changing the pitch angle, where normally it would be more realistic to change the yaw angle
  2. the changes are very slow and the effects are not very visible.

Can you try to make a new one taking this into consideration? Remember, these videos are to submit alongside a paper where you want to make the point that these virtual changes introduce realistic changes of light in the scene.

@andrefdre
Copy link
Collaborator Author

The direction of the light is changing the pitch angle, where normally it would be more realistic to change the yaw angle

The teacher suggested to change more angles, so in the video I was changing all the angles.

the changes are very slow and the effects are not very visible.

I also thought about that and will experiment with reducing the ambient color, so the light has more effect. I experimented with this and was afraid of having a dark scenario during most of the time.

Can you try to make a new one taking this into consideration? Remember, these videos are to submit alongside a paper where you want to make the point that these virtual changes introduce realistic changes of light in the scene.

I will record a new video with the teacher's recommendations. Is there a way to move the GUI camera through a path?

@miguelriemoliveira
Copy link
Collaborator

Hi @andrefdre ,

What I sai is to try to change the angles in a more realistic way.

Some inspiration here

https://youtu.be/mMgTCocmeSQ

@andrefdre
Copy link
Collaborator Author

I found the library pvlib which calculates the azimuth and the zenith angle of the sun at a given time. So now instead of incrementing angles without any relation between each other, they now accurately describe the sun rotation.
I will now work on disabling the sun when the time is after sunset and before sunrise.

Results:

sun_rotation.mp4

@DanielCoelho112
Copy link
Owner

Hi @andrefdre,

great job! That's a very nice feature you've added to synfeal.
I would say that when you finish the sunset/sunrise, you could create the train and test dataset with this feature. And then a baseline train dataset without illumination.

@miguelriemoliveira
Copy link
Collaborator

I agree. Very nice indeed. Congratulations.

Do you have the internal lights configured yet?

I also think you can move ahead with collecting a dataset.

@DanielCoelho112 , you talk about collecting a baseline train dataset, but why not use the ones you already captured?

@pmdjdias
Copy link
Collaborator

Nice job on these days! I think the internal lights will be also a very importante add on as In many case they may have a strong influence (day light will influence only some portions of the room most of the time): anyway some dataset training to test the pipeline and se some preliminaru results seems the way to go.

@DanielCoelho112
Copy link
Owner

Hi @miguelriemoliveira,

@DanielCoelho112 , you talk about collecting a baseline train dataset, but why not use the ones you already captured?

I would say to create a new one because the 3D model was updated (holes in the windows). It wouldn't be fair to train with windows and then test without them. Do you agree?

@miguelriemoliveira
Copy link
Collaborator

miguelriemoliveira commented Feb 17, 2023 via email

@andrefdre
Copy link
Collaborator Author

The internal lights were already done by @DanielCoelho112 I just added them to the scene. In the previous videos they were already present and change was happening.

Right now, I'm having a problem when collecting the dataset, the code seems to get stuck every time at around the 300 pose. So, it may delay me a bit to get the new dataset.

@miguelriemoliveira
Copy link
Collaborator

Hi @andrefdre ,

In the previous video the changing of the internal lights is not very noticeable, I think. But I was not looking for it, so it may be me.

What do you mean "get stuck"? In these cases, it is better to offer a detailed description of what happens.

Does any of the nodes crash?
Does any of the nodes stops printing to the terminal?

Since you say "around the 300" I would guess this maybe a memory leak.

Can you run htop while you are collecting creating the dataset and see if the memory footprint of the process is continuously increasing?

I am sure @DanielCoelho112 will have better suggestions ...

@andrefdre
Copy link
Collaborator Author

andrefdre commented Feb 18, 2023

I agree that the changes aren't very noticeable, I will increase the step.

I was still trying to figure out where it gets stuck, I managed to track down to this line:

rgb_msg = rospy.wait_for_message('/kinect/rgb/image_raw', Image)

@DanielCoelho112 is there any reason why you preferred to use wait_for_message instead of using a Subscriber?
It just stays waiting forever for the message, even if I restart the program it stays waiting for camera information. Only when I restart everything, it manages to work again. Which will stop again near the 300 pose.

But when I run the code without light manipulation, it doesn't get stuck there.

With this finding, I don't think it is a memory leak.

@miguelriemoliveira
Copy link
Collaborator

When it gets stuck, use a terminal to run

rostopic echo '/kinect/rgb/image_raw'

Do you receive any message?

@andrefdre
Copy link
Collaborator Author

The message I receive is:

WARNING: no messages received and simulated time is active.
Is /clock being published?

Already tried researching about this but also didn't get to any conclusion.
I tried pausing and unpausing simulation in gazebo and doesn't change anything.

@miguelriemoliveira
Copy link
Collaborator

How about

rostopic echo /clock

Before getting stuck, and after? Do you get clock messages?

@andrefdre
Copy link
Collaborator Author

Before I get clock messages, but after I get the same message as before.

WARNING: no messages received and simulated time is active.
Is /clock being published?

@miguelriemoliveira
Copy link
Collaborator

I am out of ideas ... seems like its a problem inside gazebo ... you wait for some time before collecting a new image, can you change that time to see if something gets better?

@DanielCoelho112
Copy link
Owner

Hi @andrefdre,

@DanielCoelho112 is there any reason why you preferred to use wait_for_message instead of using a Subscriber?

I don't think so, but I think the Subscriber won't solve this problem...

This bug never occurred to me. Since it only happens with lights and around the pose 300, I would say it is possible to be a CPU or memory issue, as @miguelriemoliveira suggested. Maybe the shadows are computationally expensive... Can you log the CPU, RAM, and maybe GPU usage over time until failure?

Another option would be to run the same code with a smaller mesh to see if the problem persists, maybe using the 024 room.

You should also write here the sequence of commands needed to replicate this problem so I or @miguelriemoliveira can test.

@andrefdre
Copy link
Collaborator Author

Steps to replicate:

roslaunch synfeal_bringup bringup_mesh.launch world:=santuario_light.world
roslaunch synfeal_bringup bringup_camera.launch  
rosrun synfeal_collection data_collector -nf 1000 -m path -mc 'santuario.yaml' -s santuario_light_train -f -uvl

Santuario with windows cut out mesh: https://uapt33090-my.sharepoint.com/:u:/g/personal/andref_ua_pt/EY-COfZod9BJnjiEXdXRXFkB6wl9ePkyLWt5jzr0jPisRA?e=fis5y7

I already checked memory usage, and it doesn't get near the limits for ram and GPU. For the CPU it is mostly around 60% but has occasional spikes to near 100% in some cores, but it is weird that it always stops around 300.
My computer specs:

  • Nvidia Geforce GTX 1050 4GB
  • 16 GB of Ram
  • Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

@miguelriemoliveira
Copy link
Collaborator

How many steps do we need? 300 is one per image? So we are aiming at 20000? Is that it?

@andrefdre
Copy link
Collaborator Author

How many steps do we need? 300 is one per image? So we are aiming at 20000? Is that it?

I don't know if I totally understood the question. Each step an image is saved, the more images the better.

@miguelriemoliveira
Copy link
Collaborator

Each step an image is saved,

That's what I wanted to know ... and we need a lot of images, so stopping at 300 is really not feasible.

@miguelriemoliveira
Copy link
Collaborator

Can you try to run the experiment with gazebo in debug mode? That outputs a lot of information, perhaps we get some intel from there.

You can paste the terminal messages of when it gets blocked.

@miguelriemoliveira
Copy link
Collaborator

Could it be something like this occurring?

gazebosim/gazebo-classic#2725

there is a workaround, using gazebo messages:

https://answers.gazebosim.org/question/24982/delete_model-hangs-for-several-mins-after-repeated-additionsdeletions-of-a-sdf-model-which-sometimes-entirely-vanishes-from-the-scene-too-in-gazebo/?answer=25003#post-id-25003

Also, if we really need to we can just tell gazebo to delete the model and restart a new model at every 250 iterations?

@andrefdre
Copy link
Collaborator Author

I tried changing how @DanielCoelho112 sent images to change the lights to the same way as I do for the sun an I managed to pass the 300 image. But it is still on 400 so it's kind of early to tell.

But this confuses me, how changing how lights are changed makes the simulation stop after a while.
The way it was previously done was with this:

# my_str = f'name: "{name}" \nattenuation_quadratic: {light}'

# with open('/tmp/set_light.txt', 'w') as f:
#     f.write(my_str)

# os.system(
#     f'gz topic -p /gazebo/santuario/light/modify -f /tmp/set_light.txt')

And now I use the service '/gazebo/set_light_properties'

@miguelriemoliveira
Copy link
Collaborator

Hope it works

@DanielCoelho112
Copy link
Owner

And now I use the service '/gazebo/set_light_properties'

I prefer that option also. When I implemented this feature, I also tried to use the service, but the gazebo wasn't responding to the service. If it works now, it is much better than saving the .txt and then publishing it.

@andrefdre
Copy link
Collaborator Author

I managed to generate a dataset with 1k, so I think it's solved. I will now generate a dataset with 10k and train a model. During nighttime, I didn't find any reasonable why to turn off the sun and then turn it back on. It is not affected by attenuation. Maybe removing the model and then adding it again is a possible way. For now, I just make it point directly to the ground, so no light enters the windows.

If it works now, it is much better than saving the .txt and then publishing it.

Looks to be, I also noticed that the code is much faster than with previous implementation.

@miguelriemoliveira
Copy link
Collaborator

Great. Let us now when you have news.

@andrefdre
Copy link
Collaborator Author

I have trained two models, one with dataset generated in a scene without light and another model generated with light manipulation. I trained both models for 50 epochs with a batch size of 10. Both datasets had 10k images for training and 2.5k for testing. For validation, I used a dataset with light manipulation for both models with 500 images.
These are the results:
Blank 6 Grids Collage (1)

@DanielCoelho112
Copy link
Owner

Hi @andrefdre,

the plot of the loss is weird. Have you ever obtained something like this?
image

Also, these results are with which model? I would recommend PoseNet with dynamic loss.

I think you should take a step back, and try to run one model that performs similarly to what we used to have. To give you an idea, these were the results we were having:
image

Why did you use a batch size of 10? Is it because you're training locally? Or something else? 10 seems really small.

@andrefdre
Copy link
Collaborator Author

the plot of the loss is weird. Have you ever obtained something like this?

I don't think I ever had a negative loss.

Also, these results are with which model? I would recommend PoseNet with dynamic loss.

Sorry totally forgot to paste the command.

./rgb_training -fn santuario_light -mn posenetgooglenet -train_set santuario_light_train -test_set santuario_light_test -n_epochs 50 -batch_size 10  -loss 'BetaLoss(100)' -c -im 'PoseNetGoogleNet(True,0.8)' -lr_step_size 60 -lr 1e-4 -lr_gamma 0.5 -wd 1e-2 -gpu 0

I think you should take a step back, and try to run one model that performs similarly to what we used to have.

I will try the next few days to see what it can be, I already tried to look at previous versions of the code but didn't find anything. Are you free to have a meeting at the end of the week in case I don't find anything? Maybe you could help me.

Why did you use a batch size of 10? Is it because you're training locally? Or something else? 10 seems really small.

Yes I trained locally when I looked at deeplar GPU usage were all occupied and wanted to have results as quick as possible?
Out of curiosity, does batch size influence results, wouldn't it only influence the results if it was an RNN? Because it still does the mean of the loss, right?

@miguelriemoliveira
Copy link
Collaborator

miguelriemoliveira commented Feb 22, 2023 via email

@miguelriemoliveira
Copy link
Collaborator

Opps, sorry, reading the emails in FIFO mode. I agree with what @DanielCoelho112 says.

I don't think you will get any meaningful results training locally.

@DanielCoelho112
Copy link
Owner

I don't think I ever had a negative loss.

The negative loss is due to the dynamic loss used. My point is that in your experiments I've never seen the test loss coming down, which is not a good sign.

I will try the next few days to see what it can be, I already tried to look at previous versions of the code but didn't find anything. Are you free to have a meeting at the end of the week in case I don't find anything? Maybe you could help me.

I can't this week, but I'm free next monday or tuesday.

Out of curiosity, does batch size influence results, wouldn't it only influence the results if it was an RNN? Because it still does the mean of the loss, right?

The batch size influences the results for all models. If you have a batch size too small, you cannot guarantee that the batch is statistically representative of the full dataset, which in turn can lead to inaccurate gradients. Usually, the bigger the better. However, there are two main problems when using really large batch size: the computational resources used, and poor generalization (currently it's not know why this happens).

@andrefdre
Copy link
Collaborator Author

I will train in DeepLar to see if anything changes.

I can't this week, but I'm free next monday or tuesday.

If you want, I will probably be in LAR.

The batch size influences the results for all models. If you have a batch size too small, you cannot guarantee that the batch is statistically representative of the full dataset, which in turn can lead to inaccurate gradients. Usually, the bigger the better. However, there are two main problems when using really large batch size: the computational resources used, and poor generalization (currently it's not know why this happens).

Okay I think I got it, I was thinking only about the loss and not the back propagation, thank you for the clarification.

@andrefdre
Copy link
Collaborator Author

The negative loss is due to the dynamic loss used. My point is that in your experiments I've never seen the test loss coming down, which is not a good sign.

I did a train in deeplar with dataset without my implementations and still the test loss is not coming down, I tried looking at the older versions of the code to see what is different but can't really find anything that would lead to this.

./rgb_training -fn santuario -mn posenetgooglenet -train_set santuario_10k -test_set santuario_2k -n_epochs 50 -batch_size 40  -loss 'DynamicLoss(sx=0,sq=-1)' -c -im 'PoseNetGoogleNet(True,0.6)' -lr_step_size 150 -lr 1e-4 -lr_gamma 0.5 -wd 1e-3 -gpu 1

losses

@miguelriemoliveira
Copy link
Collaborator

miguelriemoliveira commented Feb 24, 2023 via email

@andrefdre
Copy link
Collaborator Author

I have created 3 small datasets with 1k images to ask for your opinion.
In all the methods the sun isn't random, rather its rotation is based on simulated time. Between 8 am to 7 pm the sun is on, while at night the other lights are on.

The first dataset has random lights and the sun creates shadows:
https://youtu.be/gpvPMZATkhk

The second has sequential lights and the sun creates shadows:
https://youtu.be/NkxhXsxXJGk

The third has random lights and the sun doesn't create shadows:
https://youtu.be/YI001r7ZS_E

I like the third option the best, with sun creating shadows most of the time it doesn't create an impact on the viewed images and just creates a darker image.

@andrefdre
Copy link
Collaborator Author

Here is a video demonstrating using data augmentation, I only changed the brightness value to be a fair comparison:
https://youtu.be/Y8KI9bcpeD4

@miguelriemoliveira
Copy link
Collaborator

miguelriemoliveira commented Apr 14, 2023

Hi @andrefdre ,

great work.

I like the third option the best, with sun creating shadows most of the time it doesn't create an impact on the viewed images and just creates a darker image.

Not sure about that. I mean, the random lights in the third case is a global change in the scenario illumination, which suggests that if we were to do it instead in an image that captured the scenario the result would be more or less the same.

I think the game changer here should be directional light and shadows, because those are the ones that are not possible to mimic using transformations on the image level. Because these are non-uniform changes of light in the scene.

So I would try to keep the sun and the shadows, possibly even increasing its presence of the test dataset.

@andrefdre
Copy link
Collaborator Author

So I would try to keep the sun and the shadows, possibly even increasing its presence of the test dataset.

The only way I think we could increase shadows is by opening more windows, do you think I should try it?

I also noticed something, since I can't disable lights, what I do to disable the sun is disable the shadows (which creates the texture problem) and point it downwards. The sun also ins't affected by the attenuation as other lights. This works all good but when it is changing between shadows on and off, Gazebo sometimes doesn't change this option, keeping shadows off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants