Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamics randomization for MuJoCo #109

Closed
wants to merge 30 commits into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented May 30, 2018

A simple data structure consisting of a list of variation objects was
implemented. Each variation object is an instance of the Variation class
that works as a container for each of the fields used to randomized a
dynamic parameter within the simulation environment.
This list of variations is further tested in script
test_dynamics_rand.py to verify that fields within each variation can be
set and get. The former script was originally started at issue #61.

@ghost ghost assigned CatherineSue and ghost May 30, 2018
CatherineSue
CatherineSue previously approved these changes May 30, 2018
tosser = ET.parse(TOSSER_XML)

var_list = VariationsList().\
add_variation(".//motor[@name='a1']", "gear", "coefficient", "uniform", (0.5, 1.5)).\
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fluent interface would be more like

variations = Variations()
variations.randomize()
          .attribute("gear")
          .at_xpath(".//motor[@name='a1']")
          .with_method(Variations.COEFFICIENT)
          .sampled_from(Variations.UNIFORM)
          .with_range(0.5, 1.5)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's what @CatherineSue suggested but I thought it's a little verbose, although more understandable.
So, Variations contains a list of variation objects right?
And if that is the case, is the randomize() method used to create a new variation object?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well they contain some way of storing variations. a list of objects seems fine. randomize() create a new internal record of a variation.

fluent interfaces are verbose but easy to read. yes, you could use a very long list of arguments, but what if the relevant arguments change based on other arguments?

for instance, if i want a uniform distribution i need a range=(min,max) argument, but if I use a guassian i need a (mean, std) argument. it gets messy very quickly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. @ryanjulian, please review the new implementation. Also, me and @CatherineSue discussed the implementation of the wrapper for the randomized environment and work is in progress on that side as well.


@method.setter
def method(self, method):
self._method = method
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variation objects should be immutable (i.e. there are no setters)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just noticed this. There's a trade-off between having the fluent interface pattern and having immutable properties.
In order to set the members of the Variation object without setters, we have to pass their attributes in the constructor, and that breaks the interface pattern. Otherwise, without the setter decorators and with the members that correspond to each property not previously defined, we get this kind of errors:

Traceback (most recent call last):
  File "rllab/dynamics_randomization/trpo_swimmer.py", line 13, in <module>
    at_xpath(".//geom[@name='torso']").\
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/dynamics_randomization/variation.py", line 114, in at_xpath
    self._list[-1].xpath = xpath
AttributeError: can't set attribute

Another way to do it is by setting the members of the class Variation directly in Variations, but that breaks the immutability of the property.
Any advice of how to achieve this is well appreciated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to use a dictionary to store the variation then add it to Variation class at last. But I found that we need the v.elem=e in MujocoModelGenerator, this needs to call the elem setter in Variation. So I didn’t think of a way to implement immutable Variation after all.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For how to implement it, see builder pattern.

To solve the problem with v.elem = e, the solution is to keep a local cache of those elems in MujocoModelGenerator, rather than storing the data inside variations. That would be the proper way to do it--I just used the dict in my example as a shortcut.

Note that you can use the variation object directly as a key in your dictionary, i.e.

elem_cache = {}

# caching elems
for v in variations:
    # find elem, = e
    elem_cache[v] = e

# making models
for v in variations:
    e = elem_cache[v]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Done

@CatherineSue
Copy link
Collaborator

The rllab/dynamics_randomization/trpo_swimmer.py tests the run of rllab.dynamics_randomization.RandomizeEnv. It runs successfully in
macOS, but it raises this exception in Linux:

mujoco_py.builder.MujocoException: Got MuJoCo Warning: Unknown warning type Time = 0.0000.

I have reinstalled the conda env, mujoco150 and mujoco-py to ensure
the test runs on the same env. So there may be missing configure in
Linux, or it's just a bug.

@CatherineSue
Copy link
Collaborator

@gonzaiva found someone with a similar error message here:
pfnet/gym-env-mujoco150#3

Copy link
Author

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, all the python files in rllab have file names in lowercase with each word separated by an underscore, so we should keep this standard with RandomizeEnv.py to randomized_env.py

import os.path as osp
import numpy as np
from rllab.dynamics_randomization import VariationMethods
from rllab.dynamics_randomization.variation import VariationDistributions
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to from rllab.dynamics_randomization import VariationDistributions

from mujoco_py import MjSim
from rllab.core import Serializable

MODEL_DIR = osp.abspath(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse this variable from mujoco_env.py

elif v.method == VariationMethods.ABSOLUTE:
e.attrib[v.attrib] = str(c)
else:
raise NotImplementedError("Unknown method")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one NotImplementedError for the if statement of v.distribution as well.

self._wrapped_env.model = load_model_from_xml(model_xml)
self._wrapped_env.sim = MjSim(self._wrapped_env.model)
self._wrapped_env.data = self._wrapped_env.sim.data
self._wrapped_env.viewer = None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only set those variables that depend on self._wrapped_env.sim

return self._wrapped_env.horizon


randomize = RandomizedEnv
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be here. will delete in the new commit.

@ghost ghost requested a review from jonashen May 31, 2018 23:27
@ghost
Copy link
Author

ghost commented Jun 1, 2018

@ryanjulian we're able to run our current implementation of Dynamics Randomization in OS X (tested in both @CatherineSue and @jonashen laptops), however Linux is a hard cookie to crack!
According to mujoco-py, Linux installation requires some dependencies as specified here, while OS X does not require to install any dependency, at the first instance this seems to be problem. However, we tried installing the dependencies and the error keeps happening.
Another thing we tried is to downgrade the mujoco-py package but the error persists.
I still think there's a dependency we're missing but there's no clear clue of what it could be.
Another thing worth mentioning is that the error usually happens in the second episode of the first iteration, but sometimes it occurs in other iterations like iteration 19, so the error cannot be exactly reproduced all the time.
Full error is:

0% [######################        ] 100% | ETA: 00:00:00Traceback (most recent call last):
  File "rllab/dynamics_randomization/trpo_swimmer.py", line 39, in <module>
    algo.train()
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/algos/batch_polopt.py", line 119, in train
    paths = self.sampler.obtain_samples(itr)
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/algos/batch_polopt.py", line 30, in obtain_samples
    scope=self.algo.scope,
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/sampler/parallel_sampler.py", line 125, in sample_paths
    show_prog_bar=True
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/sampler/stateful_pool.py", line 181, in run_collect
    result, inc = collect_once(self.G, *args)
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/sampler/parallel_sampler.py", line 94, in _worker_collect_one_path
    path = rollout(G.env, G.policy, max_path_length)
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/sampler/utils.py", line 20, in rollout
    next_o, r, d, env_info = env.step(a)
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/envs/normalized_env.py", line 86, in step
    wrapped_step = self._wrapped_env.step(scaled_action)
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/dynamics_randomization/RandomizeEnv.py", line 74, in step
    return self._wrapped_env.step(action)
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/envs/mujoco/swimmer_env.py", line 33, in step
    self.forward_dynamics(action)
  File "/home/aigonzal/ivanWorkspace/rllab/rllab/envs/mujoco/mujoco_env.py", line 180, in forward_dynamics
    self.sim.step()
  File "mjsim.pyx", line 119, in mujoco_py.cymj.MjSim.step
  File "cymj.pyx", line 115, in mujoco_py.cymj.wrap_mujoco_warning.__exit__
  File "cymj.pyx", line 75, in mujoco_py.cymj.c_warning_callback
  File "/home/aigonzal/anaconda2/envs/rllab3/lib/python3.5/site-packages/mujoco_py/builder.py", line 331, in user_warning_raise_exception
    raise MujocoException('Got MuJoCo Warning: {}'.format(warn))
mujoco_py.builder.MujocoException: Got MuJoCo Warning: Unknown warning type Time = 0.0000.

This is quite a show stopper in Linux, and it points to other bugs that may happen with mujoco in rllab. Any feedback you could give is highly appreciated.

from rllab.envs.mujoco.mujoco_env import MODEL_DIR
from rllab.core import Serializable
from rllab.dynamics_randomization import VariationMethods
from rllab.dynamics_randomization.variation import VariationDistributions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetical order c->d->e

import os.path as osp
import xml.etree.ElementTree as ET

import numpy as np
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m->n

from rllab.baselines import LinearFeatureBaseline
from rllab.envs.mujoco import SwimmerEnv
from rllab.envs import normalize
from rllab.policies import GaussianMLPPolicy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetical order

@ghost
Copy link
Author

ghost commented Jun 4, 2018

The steps to produce the issue in Linux are:

  1. Make a fresh install of rllab to include the update for mujoco 1.5 by executing the installers on tip of the feature branch named "dynamics_randomization".
  2. Activate the rllab3 environment
  3. Run the script rllab/dynamics_randomization/trpo_swimmer.py

@ghost ghost force-pushed the dynamics_randomization branch from 930f894 to 83c2d48 Compare June 4, 2018 23:27
@CatherineSue CatherineSue force-pushed the dynamics_randomization branch 2 times, most recently from 3b6f4a8 to 9fb3106 Compare June 5, 2018 00:31
Copy link
Owner

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize that I don't have time today for a full review of the fluent interface, but please see my comment about your implementation of the multithreading.

I'm very curious to see benchmarks of the threaded vs non-threaded versions.

"""
if self._list:
self._list[-1].distribution = distribution
return self
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. if you return self then the API with_range() will be available after the dot operator.

e.g.

v.randomize()
 .attribute("gear")
 .at_xpath("something")
 .with_method(COEFFICIENT)
 .sampled_from(GAUSSIAN)
 .with_range(0.5, 1.5)

but .with_range() is not valid for .sampled_from(GAUSSIAN). you would want something like .with_parameters((1, 0.5)) or .with_mean_std(1, 0.5)

One way to solve this would be to return an instance of some new class without the method .with_range(), and then Python would throw an error at import time. That would be ideal but perhaps difficult to implement.

Another way is to change your API slightly to look like this

v.randomize()
 .attribute(gear)
 .at_xpath("something")
 .with_method(COEFFICIENT)
 .sampled_from(GAUSSIAN)
 .with_range(0.5, 1.5)
 .add()

Where add() is void (it doesn't return anything), and the new variation record is not constructed and added to the list of variations until add() is called, e.g.

class Variations():

    def randomize():
        return VariationSpec(self)


class Variation():

    def __init__(self, spec):
        # check a bunch of conditions on spec, raise an exception if it's wrong or incomplete or overspecified


class VariationSpec():
    
    def __init__(self, variations):
        self._variations = variations

    def with_xpath(xpath):
        self._xpath = xpath
        return self

    def add():
        self._variations.add(Variation(self))
        

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this turns out too much a pain to implement, on the other hand, it might be a sign that a fluent interface is not the right fit here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Implemented the first approach by segmenting the Variations class into different classes, one base class, and two classes for Gaussian and uniform distributions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove the setter in Variation, I changed the API using the second approach.

@@ -0,0 +1,96 @@
import os.path as osp
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should be a subpackage of rllab.envs.mujoco, as it's only relevant to MuJoCo environments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Should I replace the new rllab.env.mujoco import for all packages in rllab?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're asking. Can you clarify?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address this issue, I added osp in the modules of the package rllab.env.mujoco, so now we can import osp as from rllab.envs.mujoco import osp.
I noticed other modules in rllab import osp as import os.path as osp. My question is if I should replace those imports withfrom rllab.envs.mujoco import osp.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the dynamics_randomization package should be moved to rllab.envs.mujoco.dynamics_randomization?

for v in variations.get_list():
e = self._model.find(v.xpath)
if not e:
raise AttributeError("Can't find node in xml")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a ValueError, not an AttributeError.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your error messages would be more helpful if they told me which node could not be found in which file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

v.elem = e

if v.attrib not in e.attrib:
raise KeyError("Attribute doesn't exist")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite leading to a KeyError, this is probably still a ValueError, not a KeyError.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more helpful if it told me which attribute could not be found on which node in which file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self._model = etree.parse(self._file_path)

for v in variations.get_list():
e = self._model.find(v.xpath)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful to check for multiple results here.

This is an API question: what if an XPath matches multiples nodes
Should we:

  1. Raise an error
  2. Attempt to modify all matching nodes
  3. Something else?

I vote for (1) but I'm interested in your opinion. The problem I have with (2) is that it is hard to tell the difference between matching multiple nodes on accident (due to a malformed XPath) versus someone legitimately wanting to address multiple nodes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the lxml document. The ElementTree library in lxml supports:findall() returns a list of matching Elements, find() efficiently returns only the first match. So if we use find(), there will be no error.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!


variations = Variations()
variations.randomize().\
attribute("gear").\
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually when people format method chains onto multiple lines, they break before the .
e.g.

variations.randomize() \
          .attribute("gear")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on it.

from enum import Enum


class VariationMethod(Enum):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call this Method since its use is clear inside the package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ABSOLUTE = 2


class VariationDistribution(Enum):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call this Distribution since its use is clear inside the package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import atexit
from queue import Queue
from threading import Event
from threading import Thread
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be important to benchmark this to ensure threads (as opposed to processes) actually speed things up.

Python has a design detail called the GIL (global interpreter lock), which means that usually only one thread inside a Python application may run at once. This does not necessarily apply to parts of Python programs which are implemented using native code (e.g. mujoco), but whether that code is blocked by the GIL is very implementation-specific.

self._mujoco_model = None
# Communicates the calling thread with the worker thread by awaking
# the worker thread so as to generate a new model.
self._model_requested = Event()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great implementation, but I think you can make it even simpler by using the maxsize parameter of Queue

When maxsize != 0, put() blocks when the queue is full, until get() is called. By default, get() blocks when the queue is empty, until something is put() onto the queue.

@CatherineSue
Copy link
Collaborator

Benchmarks of running trpo_swimmer

multi-thread

$ cat bench_multi_itr40.txt | head -40
         27602992 function calls (27506126 primitive calls) in 64.712 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   2469/1    0.108    0.000   64.868   64.868 {built-in method builtins.exec}
        1    0.000    0.000   64.868   64.868 trpo_swimmer.py:1(<module>)
        1    0.004    0.004   56.190   56.190 batch_polopt.py:114(train)
       40    0.000    0.000   45.798    1.145 batch_polopt.py:24(obtain_samples)
       40    0.000    0.000   45.796    1.145 parallel_sampler.py:98(sample_paths)
       40    0.002    0.000   45.786    1.145 stateful_pool.py:121(run_collect)
      320    0.293    0.001   45.761    0.143 parallel_sampler.py:92(_worker_collect_one_path)
      320    1.198    0.004   45.467    0.142 utils.py:10(rollout)
   160000    1.889    0.000   18.428    0.000 normalized_env.py:77(step)
   160763    9.610    0.000   18.323    0.000 function_module.py:743(__call__)
   160000    1.703    0.000   17.649    0.000 gaussian_mlp_policy.py:124(get_action)
      205    0.003    0.000   16.644    0.081 __init__.py:1(<module>)
   160000    0.132    0.000   15.236    0.000 randomized_env.py:46(step)
   160000    1.980    0.000   15.104    0.000 swimmer_env.py:32(step)
       40    0.001    0.000    9.903    0.248 npo.py:101(optimize_policy)
       40    0.006    0.000    8.754    0.219 conjugate_gradient_optimizer.py:229(optimize)
        6    0.000    0.000    8.112    1.352 ext.py:123(compile_function)
        6    0.000    0.000    8.112    1.352 function.py:74(function)
        6    0.000    0.000    8.109    1.351 pfunc.py:283(pfunc)
        6    0.000    0.000    8.079    1.346 function_module.py:1765(orig_function)
      440    0.010    0.000    6.604    0.015 conjugate_gradient_optimizer.py:49(eval)
   160000    1.690    0.000    6.388    0.000 mujoco_env.py:177(forward_dynamics)
       40    0.013    0.000    6.264    0.157 krylov.py:7(cg)
        6    0.001    0.000    6.227    1.038 function_module.py:1428(__init__)
        6    0.000    0.000    6.138    1.023 opt.py:102(__call__)
    485/6    0.002    0.000    6.138    1.023 opt.py:85(optimize)
     24/6    0.001    0.000    6.138    1.023 opt.py:223(apply)
   487207    0.758    0.000    5.508    0.000 op.py:891(rval)
  2600/70    0.013    0.000    5.354    0.076 <frozen importlib._bootstrap>:966(_find_and_load)
  2600/70    0.010    0.000    5.352    0.076 <frozen importlib._bootstrap>:939(_find_and_load_unlocked)
 3453/202    0.003    0.000    5.335    0.026 <frozen importlib._bootstrap>:214(_call_with_frames_removed)
 2259/137    0.010    0.000    5.297    0.039 <frozen importlib._bootstrap>:659(_load_unlocked)
      538    0.003    0.000    5.248    0.010 op.py:912(make_thunk)
      420    0.005    0.000    5.233    0.012 op.py:826(make_c_thunk)
  2005/72    0.007    0.000    5.171    0.072 <frozen importlib._bootstrap_external>:659(exec_module)

single thread

$ cat bench_single_itr40.txt | head -40
         27753481 function calls (27656615 primitive calls) in 55.331 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   2469/1    0.107    0.000   55.475   55.475 {built-in method builtins.exec}
        1    0.000    0.000   55.475   55.475 trpo_swimmer_single_thread.py:1(<module>)
        1    0.006    0.006   46.967   46.967 batch_polopt.py:114(train)
       40    0.000    0.000   36.747    0.919 batch_polopt.py:24(obtain_samples)
       40    0.000    0.000   36.745    0.919 parallel_sampler.py:98(sample_paths)
       40    0.002    0.000   36.734    0.918 stateful_pool.py:121(run_collect)
      320    0.237    0.001   36.711    0.115 parallel_sampler.py:92(_worker_collect_one_path)
      320    0.946    0.003   36.474    0.114 utils.py:10(rollout)
      205    0.003    0.000   16.372    0.080 __init__.py:1(<module>)
   160000    1.750    0.000   14.331    0.000 normalized_env.py:77(step)
   160761    7.742    0.000   14.002    0.000 function_module.py:743(__call__)
   160000    1.334    0.000   12.020    0.000 gaussian_mlp_policy.py:124(get_action)
   160000    0.115    0.000   11.403    0.000 randomized_env_single_thread.py:69(step)
   160000    1.733    0.000   11.288    0.000 swimmer_env.py:32(step)
       40    0.001    0.000    9.754    0.244 npo.py:101(optimize_policy)
       40    0.006    0.000    8.689    0.217 conjugate_gradient_optimizer.py:229(optimize)
        6    0.000    0.000    8.038    1.340 ext.py:123(compile_function)
        6    0.000    0.000    8.038    1.340 function.py:74(function)
        6    0.000    0.000    8.035    1.339 pfunc.py:283(pfunc)
        6    0.000    0.000    8.006    1.334 function_module.py:1765(orig_function)
      440    0.010    0.000    6.592    0.015 conjugate_gradient_optimizer.py:49(eval)
      320    0.001    0.000    6.279    0.020 normalized_env.py:51(reset)
      320    1.087    0.003    6.278    0.020 randomized_env_single_thread.py:36(reset)
        6    0.000    0.000    6.272    1.045 function_module.py:1428(__init__)
       40    0.013    0.000    6.256    0.156 krylov.py:7(cg)
        6    0.000    0.000    6.046    1.008 opt.py:102(__call__)
    485/6    0.002    0.000    6.046    1.008 opt.py:85(optimize)
     24/6    0.001    0.000    6.046    1.008 opt.py:223(apply)
  2600/70    0.013    0.000    5.351    0.076 <frozen importlib._bootstrap>:966(_find_and_load)
  2600/70    0.010    0.000    5.349    0.076 <frozen importlib._bootstrap>:939(_find_and_load_unlocked)
 3453/202    0.004    0.000    5.331    0.026 <frozen importlib._bootstrap>:214(_call_with_frames_removed)
 2259/137    0.010    0.000    5.290    0.039 <frozen importlib._bootstrap>:659(_load_unlocked)
  2005/72    0.007    0.000    5.158    0.072 <frozen importlib._bootstrap_external>:659(exec_module)
      538    0.003    0.000    5.116    0.010 op.py:912(make_thunk)
       48    0.073    0.002    5.101    0.106 opt.py:2394(apply)

from rllab.policies import GaussianMLPPolicy
from rllab.dynamics_randomization import RandomizedEnv
from rllab.dynamics_randomization import Variations
from rllab.dynamics_randomization import VariationMethod
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we make this a wrapper which looks a lot like normalized().

So we put these classes in rllab.envs.mujoco.dynamics_randomization. At the bottom of dynamics_randomization.py you can put

# dynamics_randomization.py
randomize_dynamics = RandomizedDynamicsEnv

So that users can use it in a launcher like this

# in your launcher
from rllab.envs.mujoco.dynamics_randomization import randomize_dynamics    # currently RandomizedEnv
from rllab.envs.mujoco.dynamics_randomization import Variations
from rllab.envs.mujoco.dynamics_randomization import Method    # currently VariationMethod
from rllab.envs.mujoco.dynamics_randomization import Distribution    # currently VariationDistribution

variations = Variations()
variations.randomize()\
          .at_xpath(".//geom[@name='torso']")\
          .attribute("density")\
          .with_method(Method.COEFFICIENT)\
          .sampled_from(Distribution.UNIFORM)\
          .with_range(0.5, 1.5)

env = randomize_dynamics(SwimmerEnv(), variations)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, after trying the module I actually agree with some of your original naming. This module is general enough to randomize any part of an environment (good job!)

So the new vocabulary would be randomize = RandomizedEnv

@@ -0,0 +1,86 @@
<!--Author: vikash@openai.com-->
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, it was only in my sample for testing purposes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use it in a test, you should be able to find it directly in the mujoco_py package. Or you can use some other mujoco XML file already in our own tree (preferable).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also just define the XML inline in a string at the top of the file. That's probably the easiest option.

with_method(VariationMethod.COEFFICIENT).\
sampled_from(VariationDistribution.UNIFORM).\
with_range(0.5, 1.5).\
randomize().\
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally you split new variations into different lines (e.g. no calling randomize twice in one statement). This is part of why I suggested that there be an add() method which is void to finish the statement.

See the builder pattern


@method.setter
def method(self, method):
self._method = method
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For how to implement it, see builder pattern.

To solve the problem with v.elem = e, the solution is to keep a local cache of those elems in MujocoModelGenerator, rather than storing the data inside variations. That would be the proper way to do it--I just used the dict in my example as a shortcut.

Note that you can use the variation object directly as a key in your dictionary, i.e.

elem_cache = {}

# caching elems
for v in variations:
    # find elem, = e
    elem_cache[v] = e

# making models
for v in variations:
    e = elem_cache[v]

self._list[-1].distribution = distribution
return self

def with_range(self, low, high):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what stops me from using this method along with with_distribution()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two ideas to fix this. First one is to raise an error during the runtime. e.g.

variations = Variations()
variations.randomize() \
        .attribute("gear") \
        .at_xpath(".//motor[@name='a1']") \
        .with_method(Method.COEFFICIENT) \
        .sampled_from(Distribution.UNIFORM) \
        .with_mean_std(0.5, 1.5) \
        .add()

will raise

ValueError: Need to call with_range when sampled from Uniform

The current commit implements this idea.

Another idea is implement something like this
screen shot 2018-06-06 at 12 17 03 pm

There is no with_mean_std when user calls sampled_from(UNIFORM)
I tried using mixin to achieve this but failed. Because using inheritance will return either VariationsGaussian or VariationsUniform in sampled_from. There's no way to tell which one should be called before running time.

Chang and others added 24 commits June 7, 2018 21:00
All the python files have file names in lowercase. To keep this
standard, refactor RandomizeEnv.py to randomized_env.py.
Add error handling in constructor and reset().
Remove variables that doesn't depend on self._wrapped_env.sim in
reset().
Reuse MODEL_DIR in mujoco_env.py
Alphabetize imports.
Fix wrong AttributeError raising in constructor when there
is element in xml.

Add error handling towards the Variation.range attribute. When
the range shape isn't the same as the attribute value shape,
raise an AttributeError.
Finish the thread when the simulaton is interrupted.
Other miscellaneous changes include:
- Rename classes VariationsMethods and VariationDistributions to
VariationsMethod and VariationDistribution respectively.
- The parsing of the XML string and fetch of the dynamic parameters to
randomize is now done within the worker thread.
- The file randomize_env.py was renamed to randomized_env.py
Before this commit, when there is an error raised when loading the xml
object, only the worker_thread terminates. This commit fixes this bug by
terminating all the processes.

Fix some typo in the last commit.
Create an 10-length queue in MujocoModelGenerator to store the
mujoco_models.
- Renamed classes VariationMethod and VariationDistribution to Method
and
Distribution.
- Enforced the use of methods exclusive for uniform or normal
distributions in the fluent interface pattern provided in class
Variations by splitting the class into VariationsBase,
VariationsGaussian and VariationsUniform.
- Included the module os.path.osp in rllab.envs.mujoco modules.
- Changed error types and improved messages for two errors in class
MujocoModelGenerator.
 - Delete unused threading.Event in MujocoModelGenerator.
 - Correct error types in MujocoModelGenerator.
 - Renamed classes RandomizedEnv to RandomizedDynamicsEnv.
 - Delete wrong try-except in RandomizedDynamicsEnv.
 - Use randomize_dynamics() in the launcher.
 - Format method chains onto multiple lines.
 - Correct wrong param name in Variation.
Solve the problem with v.elem=e, which calls the setter method in
Variation. Replace this with a local cache of elements.

Same with v.default in MujocoModelGenerator.
 - Remove the setter in Variation
 - Add check of parameter shape in MujocoModelGenerator
 - Fix some typo
 - Add more detailed information in handling the shape of the sampled
 value with the default value

 - Add timeout in the Queue.get() in MujocoModelGenerator so the main
 thread will catch error raised in worker thread
The cached property action_space found in mujoco_env.py produces an
error in Linux for dynamics randomization. The idea behind the cached
property is to avoid doing an expensive computation several times, so
for regular execution, action_space is obtained from the model that is
used for the entire training once, improving the performance.
However, for dynamics randomization there's a new model for each
episode, and that requires that the action_space is updated accordingly,
but that does not happen because it's cached.
To update action_space and not make an invasive change, the attribute is
invalidated for each reset in the RandomizedEnv class.
 - Move the dynamics_randomization package to rllab.envs.mujoco.

 - Delete tosser.xml, use xml in rllab/vendor/mujoco_model for test

 - The old test_dynamics_rand.py only tests for the Variations API,
   so rewrite it to test for both Variations API and RandomizedEnv.

 - Reorder imports.

 - Delete import os.path as osp in rllab/envs/mujoco/__init__.py.
   Previously added by mistake.
Package names should follow class names.
test_dynamics_rand.py is enough for testing. Remove trpo_swimmer.py
The code to initialize the variations and to generate the randomized
parameters was moved into the Variations class. This will keep all the
current code related to variations in the same file to improve the API
of dynamics randomization, and will enable a more modular code for
further features in the module.
There is a bottleneck at function load_model_from_xml from mujoco-py
when using multi threading. In a single thread, a call to this function
takes units of milliseconds, while in multi threading it takes tens of
milliseconds. Maybe this is due to internal data structures that are
required for both loading the model in the worker thread and performing
the simulations in the main thread, causing the delay in
load_model_from_xml and other functions that can be perceived in the
cumulative time obtained by the profiler by running
test_dynamics_rand.py.
Due to this poor performance, the file mujoco_model_gen.py was removed
since it serves no purpose now that the variations.py file contains
methods to process the XML file, and the calls to obtain the randomized
model are done directly in the class RandomizedEnv.
The changes in this commit include:
- The modules in the __ini__.py file were sorted alphabetically.
- The new line at the end of file was added in randomized_env.py.
- Default values were assigned for VariationSpec, specifically for
fields method, distribution, mean_std and var_range. Fields xpath,
attrib, elem and default are specific to the model in the XML file
provided by the user, so they cannot be default parameters. Further
more, elem and default are obtained by parsing the XML file, so the user
won't set them.
@ghost ghost force-pushed the dynamics_randomization branch from bd93226 to ac0d449 Compare June 8, 2018 04:01
@ryanjulian
Copy link
Owner

Please reopen this PR against https://github.com/rlworkgroup/garage

@ghost
Copy link
Author

ghost commented Jun 14, 2018

Imported to garage. See rlworkgroup/garage#51

@ghost ghost closed this Jun 14, 2018
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants