This simulator builds upon ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. This simulator as of now allows multiple human users to control the agents present in a single scene in a concurrent manner. It also incorporates an HTTP server to which users can connect to remotely control the agents.
All data collected from the human-AI experiments can be downloaded by following the instructions located in the data directory.
- Linux (we've tested TDW on Ubuntu 18, 20, and 22)
- Python 3.8+
- A GPU, the faster the better, with up-to-date drivers. It is possible to run TDW without a GPU but you will lose some speed and photorealism. NVIDIA drivers tend to work better.
For ease of installation, the environment has been containerized using Docker. If manual installation is required, scroll down.
This section is needed if you want to run multiple instances at the same time, otherwise skip it. You can use the create_x_file.sh script to create the corresponding X server files and then copy them to /etc/X11, otherwise follow the steps in Manual File Creation.
Run nvidia-smi
and check if each GPU has only one distinct X server running in it (no X server should be running in multiple GPUs). If that is not the case, follow the next steps:
-
Run
nvidia-xconfig --query-gpu-info
. -
Run
cd /etc/X11
. For each GPU, annotate their Bus ID and use it as an argument in the next command ({} indicating substitution):sudo nvidia-xconfig --no-xinerama --probe-all-gpus --use-display-device=none --busid={BUS ID} -o xorg-{# of GPU}.conf
. -
For each
xorg-{# of GPU}.conf
file, add the following lines:
Section "ServerFlags"
Option "AutoAddGPU" "False"
EndSection
-
Run
ls /tmp/.X11-unix/
and annotate the highest number that appears in the file names following the X prefix. This number plus one will be the starting DISPLAY number. Annotate as well the number that appears after runningcat /sys/class/tty/tty0/active
following the tty prefix. This is your current virtual terminal. -
You can either run the X server in your current virtual terminal (useful for when you have a screen connected) or allow X server to run in different virtual terminals (useful when having a headless server). If you want to run it in the current virtual terminal, for each GPU in your machine, run the next command:
sudo nohup Xorg :{DISPLAY + # of GPU} vt{# VIRTUAL TERMINAL} -config /etc/X11/xorg-{# of GPU}.conf &
. Note that for each new GPU,{DISPLAY + # of GPU}
should increase by one, but{# VIRTUAL TERMINAL}
will always be the same. If you are in a headless server, just remove the virtual terminal argument. -
If you run
nvidia-smi
again, you should see now that each GPU has at least one X server only running in it.
-
Install docker and nvidia-container-toolkit. Make sure to allow for non-root use of Docker.
-
Clone this repository.
At this point, you can either build the Docker image or pull it. If you want to build it run the next command:
- Run
cd webrtc
and thendocker build -t XXXX/web .
. Go back to the root folder and runcd simulator
and thendocker build -t XXXX/simulator .
.
Else, pull it from its repo:
- Run
docker pull XXXX/simulator
anddocker pull XXXX/web
.
Finally make sure to read the following: Virtual Video Devices.
- Go to the root folder and just run the script with the following optional arguments:
./parallel_sims.sh -a {server address} -p {server port} -v {video index of first /dev/video* to use} -d {display number} -s {simulator port} -t {# of parallel instances}
. This will run parallel instances of the simulator with the configuration present in simulator/config.yaml. To see if it works, it is recommended to use it with the current display number set in the DISPLAY environment variable.
Use git clone --recurse-submodules https://github.com/XXXX/AI-Collab.git
to clone the repository with all the submodules.
-
Create an environment with python >= 3.7.0
-
Run
pip install -r requirements.txt
-
Change to the magnebot directory and run
pip install .
-
Patch the controller TDW source file by issuing the next command
pip show tdw
, copying the directory shown in Location, appending to it the line "/tdw/controller.py", and then copying the controller.py file that is in this repository to that location.
After this, you will be able to run the simulation by going into the simulator directory and using python simulation.py --local --no_virtual_cameras
. This will display a simulator window with the third person view camera, as well as an opencv window with a first person view of one of the robots. You can control this robot by focusing on the simulator window and using the arrows in the keyboard. Check the file keysets.csv for all the keys one can use for each robot.
In order to allow us to stream the generated videos to the respective users through WebRTC, we need to create virtual video devices to which we send the generated frames and from which the HTTP server gets the frames as streams.
Follow the steps in https://github.com/umlaeute/v4l2loopback to build the v4l2loopback module needed to simulate these virtual video interfaces and then just use the next command: modprobe v4l2loopback devices=4
, where the devices parameter can be changed to create as many virtual devices as you want (here it is 4). Be sure to use one of the tagged versions for v4l2loopback (0.12.7 in our case).
After this you will now be able to run the simulator using the next command: python simulation.py --local
, which shouldn't present be any different as when using the --no_virtual_cameras option.
By default the v4l2loopback imposes a hard limit on the number of virtual video devices you can create. To override such limit, clone the repo and modify the MAX_DEVICES constant inside v4l2loopback.c to a high enough one. And then follow the instructions there to compile the module. Make sure it installs the module in the correct path.
Our web interface uses Node.js, as well as WebRTC and Socket.io
Install:
- nodejs 16.17.0
- npm 8.15.0
Change to the webrtc directory and issue the next command: npm install
.
Before running the server, you will need to create a key and a self-signed certificate to enable HTTPS. To do this, just run the next command: openssl req -nodes -new -x509 -keyout server.key -out server.cert
. It will ask a series of questions, ignore them, only when asking Common Name put localhost and use your email address when asked for it.
Be sure to change the address in server.js before running the server.
The implementation of the WebRTC server was based on https://github.com/TannerGabriel/WebRTC-Video-Broadcast
Change to the ai_controller directory and install the gym environment by using the next command pip install -e gym_collab
- Run the server using
node server --address "address" --port "port"
. The simulator assumes the virtual devices to be used are the ones starting at /dev/video0, but if you already have some real webcams, you need to specify the parameter--video-index <number>
and include the index number of your first simulated webcam corresponding to the ones created for the simulator. - Run the simulator using
python simulation.py --address "https://address:port"
. A window will appear. Wait until a view of the scene appears in it. - Using your web browser, go to https://address:port/broadcast.html. This will present a view with all the camera views being streamed.
- When you run the first command, there will be an output indicating a code that you need to use as password when connecting through the browser.
- Using your web browser in the same or a different computer, go to https://address:port/?client=1, where the client parameter controls which robot you get assigned. This parameter goes from 1 to the number of user controllable robots you have in the simulation.
- Change to the ai_controller directory.
- The team configuration is determined by the team_structure.yaml file. Modify that file to make the agent act according to a particular team strategy.
- Run the server_command script. You have to also create a new certificate + key as this script executes an HTTPS server to setup the WebRTC parameters. Inside the server_command, specify the certificate, key and host address associated with this server, as well as the address to connect to.
- Alternatively, if you want to run many agents at the same time, you can use the ai_controller/multiple_robot_instantiation.sh using as the command-line arguments, the number of agents you want to instantiate and the port of the server where to connect. This will open a terminal with each tab representing each of the robots. This script just runs whatever you put in server_command and changes the --robot-number argument accordingly.
To make the HTTPS self-signed certificate work:
- Run server_command
- Access through the web browser to the address provided by the HTTPS server and accept the certificate
- Try again running server_command and it should work!
The ai_controller.py program uses an HTTPS server to negotiate the WebRTC parameters. Socket.IO is used for normal commmunication with the simulator server. The controller uses the same API functions defined in the Magnebot repository. To receive occupancy maps of a certain view radius instead of camera images, you can run the ai_controller.py program as python ai_controller.py --use-occupancy --view-radius <number>
, this way you don't need to make use of the HTTPS server.
The action space consists of the next fields:
-
"action" - argument: number of action to be executed. There are two types of actions that can be executed concurrently (issue one action while the other is completing), but actions of the same type cannot be executed this way, only sequentially. If you try executing an action that is of the same type as another before this last one is completed, your new action will be ignored. Actions may take different amount of steps. You can always execute the wait action (wait = 26). The two types of actions are the next ones:
-
Locomotion/Actuation
- move_up = 0
- move_down = 1
- move_left = 2
- move_right = 3
- move_up_right = 4
- move_up_left = 5
- move_down_right = 6
- move_down_left = 7
- grab_up = 8
- grab_right = 9
- grab_down = 10
- grab_left = 11
- grab_up_right = 12
- grab_up_left = 13
- grab_down_right = 14
- grab_down_left = 15
- drop_object = 16
-
Sensing/Communication
- danger_sensing = 17
- get_occupancy_map = 18
- get_objects_held = 19
- check_item = 20
- check_robot = 21
- get_messages = 22
- send_message = 23
-
-
"item" - argument: index of the object to be checked (useful for action = 20, check_item). The robot environment saves the object information collected so far, but to actually get the entries of any of these objects, you should specify the item number and execute the corresponding action. You can get the number of objects known so far by checking the corresponding observation output.
-
"robot" - argument: index of robot to be checked (useful for action = 21, check_robot). The robot environment saves information about other robots and you can get their information by specifying the index of the robot you want.
-
"message" - argument: text message (usefule for action = 23, send_message). If the action is to send a message, this is where to put the message. Use the "robot" field to specify the index of the robot you want to receive the message, use 0 if you want everyone to get the message.
Notes: action = 17 (danger_sensing), gets an estimation of the danger level of neighboring objects and updates the necessary information. To actually display this information you need to issue the action = 20 (check_item).
The observation space consists of the next fields:
-
"frame" - an nxm map, its actual dimensions determined by simulator parameters. This output shows at every step the location of the robot with respect to the grid world coordinates (index 5 in our occupancy map as shown in the Occupancy Maps section), nothing else. When taking action = 18 (get_occupancy_map), you will receive the entire occupacy map as part of this field's output. Remember that the occupancy map will be limited by the field of view, which is a configurable parameter.
-
"objects_held" - boolean value. Whether the robot is carrying an object in any of its arms.
-
"action_status" - list of binary values of size 4. A positive value will mean the following accoring to its position in the list:
- 1: a locomotion/actuation action has completed
- 2: a locomotion/actuation action failed to complete correctly
- 3: a sensing/communication action has completed
- 4: a sensing/communication action failed to complete correctly
-
"item_output" - dictionary that contains the information requested when using action = 20 (check_item). In it the next fields are present: "item_weight", "item_danger_level" (0 if unknown) and "item_location" (grid location as an x,y point represented with a list).
-
"num_items" - number of items discovered so far.
-
"neighbors_output" - dictionary that contains the information requested when using action = 21 (check_robot). In it the next fields are present: "neighbor_type" (0 for human, 1 for AI), "neighbor_location" (same format as "item_location").
-
"strength" - current strength.
-
"num_messages" - number of messages in the receiving queue. To get all messages, use action = 22 (get_messages), which will return the messages as part of the info output of the step function.
For occupancy maps, the map is divided into cells of the size defined in simulator/config.yaml. The parameter view_radius specifies how many of these cells will conform the current view around the magnebot being controlled. The next values conform the occupancy map:
- -2: Unknown
- -1: Map boundaries
- 0: No obstacle present
- 1: Ambient obstacle (wall)
- 2: Manipulable object
- 3: Magnebot
- 4: Object being held by a magnebot
- 5: Magnebot being controlled
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 2 0 3 0 0 0 0 0 0 0 0]
[0 5 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
The web interface consists of the camera views assigned to you robot, and a sidebar chat. This chat allows you to communicate with nearby robots, and to get information about your neighbors and scanned objects.
To control the robot through the web interface, you need to first click in the video area and then you can use one of the next keyboard commands:
- Arrows: To move the robot
- A: To grab a focused object or drop it with the left arm
- D: To grab a focused object or drop it with the right arm
- S: To move the camera downwards
- W: To move the camera upwards
- Q: To danger sense around you
- E: To focus on an object (this also gives you information about it)