Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor simulate.cc in preparation for Python bindings #201

Merged

Conversation

aftersomemath
Copy link
Contributor

@aftersomemath aftersomemath commented Mar 23, 2022

Hi,

I have refactored simulate.cc so that the ui is a standalone set of functions, that do not use global variables, and can be called from a C++ application and soon a Python application as discussed in #190.

I will rebase soon, but I was hoping to get some feedback sooner rather than later. I'm willing to see this through so it gets released sooner rather than later. I am simultaneously working on the pybind11 bindings.

It seems that ideally "simulateui.so" would be a member of the Mujoco standard library, but since that source is not available yet, I kept the relevant files in the sample directory for now.

Ciao

@aftersomemath
Copy link
Contributor Author

The branch is now rebased on main.

@saran-t
Copy link
Member

saran-t commented Mar 24, 2022

Thanks for the PR!

We haven't gotten around to writing a style guide yet (we'll have one in place by the time we open source), but in short we intend to follow the Google C++ Style Guide for all "new" C++ code.

By "new" we refer to C++ source files that were not part of the original MuJoCo codebase that we inherited from Roboti. The entirety of the python directory comes under this category. We diverted from certain parts of that style guide (primarily around macro usage due to existing MuJoCo X macros, and around template metaprogramming which was used to generate large parts of the bindings), but generally speaking we expect to review code against the guidelines and requirements spelled out there.

Beyond the style guide, I think we wouldn't want to place the bulk of simulate logic under python since we'll want to continue maintaining it as a standalone application. What's probably a better idea is to create a new top-level directory, break the existing simulate.cc down into binary and library parts under that directory. The library part would then be shared between the native simulate application and the Python bindings. If we structure things this way, I'd be happy to treat the newly split simulate library code as legacy, in which case we'd only apply the style requirements to the newly written code specifically for Python.

Are you interested in going through a reasonably thorough review to see if we can get this merged? Or would you prefer to iterate with us but treat the PR as a prototype, with a view of having us reimplementing it later?

@aftersomemath
Copy link
Contributor Author

Thanks!

This PR is supposed to break simulate.cc into the binary and library parts without touching the python directory. It sounds like you may have noticed that an extra copy of simulateui.cc was originally in the python folder. That was a mistake and it was fixed with a force push last night. There is only supposed to be a single version in the sample directory (as there is now).

I'll make the library parts into a real shared library soon and in the top-level directory as requested.

I would definitely appreciate treating the split simulate library as legacy code. It is almost the same code as is, most changes are just accessing what used to be global variables through a pointer to a structure.

I would like to go through a detailed review so that this PR can be merged. Once it is merged I will make a second PR that adds Python bindings for the new shared library in the same way that bindings were done for the rest of MuJoCo (I have already started working on those bindings).

$(CC) -c -O2 -fPIC -I../include uitools.c
$(CXX) -c -O2 -fPIC -I../include mjsimulateui.cc
$(CXX) -shared -o libmjsimulateui.so mjsimulateui.o uitools.o
mv libmjsimulateui.so ../lib/libmjsimulateui.so
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just keep the shared library in this folder for now? Or should I put it in this directory as if the library were part of a full MuJoCo build?

Copy link
Contributor Author

@aftersomemath aftersomemath Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should make separate lib, include directories in mjsimualteui? On the other hand, then mjsimulateui would be a separate library that depends on mujoco instead of something that ships as part of mujoco.

@@ -0,0 +1,96 @@
// Copyright 2021 DeepMind Technologies Limited
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not come up with a good way to avoid copying this file from the sample directory. If it were a part of the MuJoCo headers, it would not need to be copied.

@@ -16,6 +16,3 @@ all:
clang++ $(ALLFLAGS) derivative.cc -lmujoco.2.1.3 -o derivative
clang++ $(ALLFLAGS) basic.cc -lmujoco.2.1.3 -lglfw -o basic
clang++ $(ALLFLAGS) record.cc -lmujoco.2.1.3 -lglfw -o record
clang -c $(CFLAGS) uitools.c
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a way to test the macos makefiles. I could do it for windows if I setup a new development environment, but that would take a noticeable amount of time.

Assistance with the windows and macos makefiles would be very welcome!

@saran-t
Copy link
Member

saran-t commented Mar 29, 2022

Sorry for the delay, we've been discussing this change within the team and we think it would be best to structure it as follows:

  • Name the directory simulate.
  • Put the library components in simulate.c, with a corresponding simulate.h header.
  • Put the logic for the actual simulate binary into main.c (but the built binary will still be called simulate).

Regarding array_safety.h, this is actually forked from an internal header that is due to be open source with the rest of the codebase in the next release. For now, let's just fork it into simulate directory and I'll try to clean up all these forks when we open source.

Does this all sound reasonable to you?

@aftersomemath
Copy link
Contributor Author

aftersomemath commented Mar 29, 2022

Yes, I would be happy to make those changes.

Changing the file name seems to imply that the struct's type name should change. How about:

  • mjSimulateUI will be renamed to mjSimulate
  • Instances of mjSimulateUI which are currently named sui will be renamed to just s.
  • mjsui_* functions will be renamed to mjs_*

I'm happy to name them anything else. Perhaps the team wants the mj(s) prefix dropped entirely?

The python bindings are about halfway done and depend on these names, so I was hoping we could settle on the names in this PR.

@saran-t
Copy link
Member

saran-t commented Mar 29, 2022

Yes, I would be happy to make those changes.

Changing the file name seems to imply that the struct's type name should change. How about:

  • mjSimulateUI will be renamed to mjSimulate

Agreed.

  • Instances of mjSimulateUI which are currently named sui will be renamed to just s.

How about simulate?

  • mjsui_* functions will be renamed to mjs_*

mjs_ sounds good.

@aftersomemath
Copy link
Contributor Author

Great, all requested changes have been implemented.

// See the License for the specific language governing permissions and
// limitations under the License.

#ifndef MUJOCO_SIMULATEUI_H_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify the header guard to reflect the new filename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I renamed it to MUJOCO_SIMULATE_H which is the convention used in the rest of the MuJoCo include files.

@aftersomemath
Copy link
Contributor Author

I put everything in the mujoco namespace. The review comment about it seems to have disappeared.


#include "uitools.h"

//-------------------------------- global -----------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please place everything in a namespace mujoco

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a nested namespace 'simulate' be created inside of the 'mujoco' namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we generally prefer to keep to shallow namespaces. Rationale: https://abseil.io/tips/130

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that I do admit to having used mujoco::python in my Python bindings code, I'm in the process of reconsidering that too.

//-------------------------------- global -----------------------------------------------

// constants
static constexpr int mjsMAXFILENAME = 1000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this a static constexpr member of mjSimulate and rename it kMaxFilenameLength to conform with the Google style guide (it's a new variable after all).

};
typedef struct mjSimulate mjSimulate;

// load mjb or xml model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually since we're in C++ I think it would make more sense to turn mjSimulate into a class and make these member functions instead.


//-------------------------------- global -----------------------------------------------

const int maxgeom = 5000; // preallocated geom array in mjvScene
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and other constants, and all local functions) should be declared static or placed in an unnamed-namespace.

namespace mju = ::mujoco::sample_util;

// constants
const double syncmisalign = 0.1; // maximum time mis-alignment before re-sync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put everything apart from the main function in an unnamed-namespace

simulate/main.cc Outdated
}

// start exclusive access
mtx.lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well switch to using std::lock_guard while we're at it.

simulate/main.cc Outdated
// run event loop
while (!glfwWindowShouldClose(simulate.window) && !simulate.exitrequest) {
// start exclusive access (block simulation thread)
mtx.lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

@aftersomemath
Copy link
Contributor Author

aftersomemath commented Mar 29, 2022

All requested changes have been made.

I chose to leave all class members as public because all elements of all MuJoCo's structs are public, main assumes several things are public, and I expect future users from Python will want easy access to numerous elements of mjSimulate.

I did not create a constructor, move constructor, or destructor. It seems the defaults should be ok for such a simple class whose variables are all C structs or primitives. However, I'm not a C++ expert, so I thought I should mention this.

@saran-t
Copy link
Member

saran-t commented Mar 30, 2022

I did not create a constructor, move constructor, or destructor. It seems the defaults should be ok for such a simple class whose variables are all C structs or primitives. However, I'm not a C++ expert, so I thought I should mention this.

The trouble is that the simulate object as it stands is partially initialized (it contains a bunch of members that aren't initialized). I'd turn simulate.init() into the class' default constructor. This entails doing nontrivial work in the c'tor that can fail, but since we're right now we're mainly focussed on just pulling legacy code apart I think that's better than creating a struct with uninitialized members.

simulate/main.cc Outdated
//---------------------------------- simulation --------------------------------------

// sim thread synchronization
std::mutex mtx;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::mutex& GetMutex() {
  static std::mutex* mtx = new std::mutex();
  return *mtx;
}

Then replace all reference to mtx everywhere in the file with GetMutex() instead. Avoids potential issues arising from destruction ordering (see e.g. https://stackoverflow.com/a/469613/845532)


// Simulate states not contained in MuJoCo structures
struct mjSimulate {
class mjSimulate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to just Simulate since we have the namespace now.


// Simulate states not contained in MuJoCo structures
struct mjSimulate {
class mjSimulate {
public:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one space before public:

struct mjSimulate {
class mjSimulate {
public:
// initalize simulate ui
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialize (preexisting typo)

// initalize simulate ui
void mjSimulate::init(void) {
// print version, check compatibility
std::printf("MuJoCo version %s\n", mj_versionString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go in main rather than here.

simulate/main.cc Outdated
const double refreshfactor = 0.7; // fraction of refresh available for simulation

// model and data
mjModel* m = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++ source, let's change NULL to nullptr everywhere.

simulate/main.cc Outdated
// model and data
mjModel* m = NULL;
mjData* d = NULL;
mj::mjSimulate simulate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment about mtx below, this should be

mj::Simulate& GetInstance() {
  static mj::Simulate* simulate = new mj::Simulate();
  return *simulate;
}

Then use GetInstance() everywhere where simulate is referred to now in this file.

@aftersomemath
Copy link
Contributor Author

aftersomemath commented Mar 30, 2022

Thanks for the great feedback. I was unaware of the destruction ordering problem.

All changes were implemented. The init method is now the constructor.

Due to the new changes to simulate.cc on main I think it would be easiest to squash and rebase in order to fix the merge conflicts. However, I do not want to squash the commits on this branch yet though as it will make the conversation on this PR difficult to read. Once everything is settled I can squash before a final lookover. If you would rather keep all these commits I can skip squashing and just rebase.

@saran-t
Copy link
Member

saran-t commented Mar 30, 2022

I've approved the PR so that our tooling will pick it up and start the migration process to our internal repo. If there are minor changes needed I'll do them myself on our side. If something bigger needs to be modified I'll get back to you.

In the meantime, please go ahead and rebase+squash.

@aftersomemath
Copy link
Contributor Author

Great thanks. The branch has been rebased and squashed.

@aftersomemath
Copy link
Contributor Author

Should I rebase this again? I don't want to conflict with what might be going on behind the scenes after #217 was fixed.

@aftersomemath
Copy link
Contributor Author

Rebased again to fix conflicts in the gcc makefile's and workaround the OpenGL shared library dependency being stripped by GNU's ld as a side effect of the switch to GLAD.

glfwGetWindowSize(this->window, this->windowsize, this->windowsize+1);

// make context current, set v-sync
glfwMakeContextCurrent(this->window);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Simulate is supposed to encapsulate the entire window and rendering context, it really should create and maintain its own thread. As things stand, this class cannot be safely use alongside any other GLFW (or indeed OpenGL) code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, my inexperience with GLFW and OpenGL is showing.

I'll refactor the rendering thread currently run from main() so that it is a part of Simulate. The incorrectly named simulate_thread in main.cc will be run from main() instead.

Simulate's constructor will be modified to accept a mutex that will be used by the thread it launches to control access to the model and data structures.

The interaction between the rendering thread, the physics thread, and the ctrlnoise variable that occurs when a model xml file is drag and dropped will be reworked as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it'll be appropriate to launch the thread in the object's constructor. A better way to do it would be to create the window in the constructor, and have methods for launching and stopping the encapsulated thread.

About the mutex: why not have the Simulate class own it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree using methods to start/stop Simulate's thread is better than launching the thread from the constructor. While there are already things that can fail in the constructor, the thread would be too much.

I think the user of Simulate should own the mutex to because multiple Simulate instances should be able to launched simultaneously on the same model and data instances, and then it doesn't seem to make sense for Simulate to contain the mutex. Additionally, I can envision other objects with purpose like Simulate (maybe logger or simulated camera rendering pool) that would also require exclusive access to the model and data using the same mutex.

Because of this I also think the model and data should be arguments to Simulate's constructor.

The creation of the model and data struct's will need to be reworked to be outside of Simulate's direct control in order for this to make sense, but I think it is worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I thought that reworking Simulate to support multiple Simulate's running on the same model and data should be a second PR since it requires a decent amount of change in the model loading code and seems to be more than a refactoring to me. But maybe it is more appropriate to just do that in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuvaltassa pointed out that there might be some limitation on which thread is allowed to make rendering calls on macOS. It might be necessary to create the window on the same thread on which you intend to make rendering calls.

I don't suppose you're able to check things on macOS? If that's the case, I can try it out myself once you have it working on Linux to confirm.

Copy link
Contributor Author

@aftersomemath aftersomemath Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about something like that popping up. Would a fix be to make the window related variables private (everything will need to be made private anyway), and let Simulate's thread create the window instead of the constructor?

This seems like it would be fine given that, if Simulate creates its own thread, then the interface is going to change significantly to just be:

  • the constructor
  • the start thread function
  • the stop thread function
  • threadsafe getters of key Simulate variables that users may be interested in (perturbations, exitrequest, etc)

Copy link
Contributor Author

@aftersomemath aftersomemath Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer the question, no I don't have a macOS machine to check things (which is why there is no macOS Makefile in this PR either...)

@aftersomemath
Copy link
Contributor Author

aftersomemath commented Jul 8, 2022

No worries. I rebased the branch (and brought in all changes to simulate on main), do you want it squashed? There's not much in the commit history that is of interest to the main branch.

@nimrod-gileadi
Copy link
Collaborator

Hey, we're still working on getting this to work on mac and Windows, and verifying that the change is good.

I noticed that after this change, clicking the exit button on simulate hangs the program, both on linux and on mac. Any ideas?
I'll continue debugging on our side.

@nimrod-gileadi
Copy link
Collaborator

Never mind, resolved now (was missing a simulate.exitrequest = 1 before physicsthreadhandle.join()).

@aftersomemath
Copy link
Contributor Author

aftersomemath commented Jul 26, 2022

Great, sorry I missed that. Getting the exit button to work after the refactor was tricky for various reasons and so I missed that race condition (exit button was working for me on all platforms).

I did test on Mac and Windows before pushing back in June, all that changed since was rebasing. The CI build only failed because the main branch was broken at the time. So I am interested in whatever other issues have popped up.

@nimrod-gileadi
Copy link
Collaborator

On mac, there were issues with creating the app bundle:

  1. the new libmjsimulate was built for dynamic linking but not included in the bundle.
  2. the sample binaries were not included in the bundle.

On Windows, I haven't tested yet.

@yuvaltassa
Copy link
Collaborator

@aftersomemath

Unfortunately CI passing means less for simulate than for everything else. It's very difficult to automatically test that a GUI app actually does what you want. You'd be shocked to learn how many reputable GUI projects' "CI" is actually underpaid humans somewhere far away following a script clicking on buttons...

That's the main reason this is taking so long, but we totally agree with you on the importance and usefulness of this change. Personally, I can't wait for it to go through. Enabling researchers to easily build GUI apps with interactive physics is one of our main goals, and this is definitely an important first step towards that.

@aftersomemath
Copy link
Contributor Author

For sure GUI's are a pain, I can't claim to have tested every possible thing after this change (though I did try "many" and my group is using this in a few different projects now without issue). Glad to hear your side is doing more thorough testing.

To be clear, I did actually build and run this branch on Windows and Mac OS (x86) (which incidentally got CI to pass earlier). I did not mean to suggest it should be good to go just because CI passed. Next time I'll be more careful to make sure the app bundle is tested (though currently I don't know what a bundle is, I must have built it on Mac OS as a non-bundle before).

Looking forward to seeing this through. :)

@liuliu
Copy link
Contributor

liuliu commented Jul 26, 2022

Because MuJoCo has good separation of UI events v.s. the underlying framework (GLFW or other GL* rendering engine). It is possible to expose the event ingestion (I exposed that in my fork: https://github.com/liuliu/swift-mujoco/blob/main/Sources/Simulate.swift#L190) and write a few snapshot tests to validate the UI behaves as expected. The UI events separation can even enable some other use cases for me as well, here is an example of running simulate inside web browser: https://github.com/liuliu/sim2real/blob/main/examples/simulate2/main.swift#L10 (very much WIP though).

@saran-t
Copy link
Member

saran-t commented Aug 1, 2022

Sorry that this is still dragging on, we're very close to being able to merge this. There's some tidying up we had to do in CMake and some thread synchronisation logic but it's all working now. We also had to track down a mysterious crash on Windows, which is now resolved (we removed the stack size increase in CMakeLists, which led to a stack overflow due to the Simulate object being too big, but we didn't notice that we made that modification in the CMakeLists so were left scratching our heads for a while).

@aftersomemath
Copy link
Contributor Author

Great! All of those things were tricky... The stack overflow took me a while as well until I loaded up the debugger.

I am not sure if increasing the stack size on windows is a good decision. I did it because Windows was the odd one out compared to Mac and Linux.

@saran-t
Copy link
Member

saran-t commented Aug 1, 2022

I just moved the object to the heap instead (std::unique_ptr<mj::Simulate>)

@copybara-service copybara-service bot merged commit e40b9da into google-deepmind:main Aug 3, 2022
@saran-t
Copy link
Member

saran-t commented Aug 3, 2022

@aftersomemath this is now merged, you can check the diff to see what modifications we've made to your commit.

Note that I've decided to go with static linking of the simulate lib into the app since that means we won't have to handle a separate DLL. If this doesn't work for your use case, please make another PR to add an option to allow linkage of the simulate lib to be specified as a CMake option.

Thanks for your help and your patience again!

@aftersomemath
Copy link
Contributor Author

aftersomemath commented Aug 3, 2022

For sure, I was happy to work on this. Thanks for the teams patience with the minor issues found at the end. :)

I will make a PR soon to propose python bindings for simulate. If the team decides they want to go a different direction it can end with just adding the option to install simulate as a shared or static library. That option will make it much easier for those bindings to be maintained in our fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants