Partially-observable taxi environment, with internal vectorization
Gymnax: Classic control, bsuite, MinAtar, FourRooms, MetaMaze, PointRobot, Bandits in JAX. Supports Podracer architecture
- Most interesting environments are probably MemoryChain, FourRooms, MetaMaze, PointRobot
ROOMS and C-ROOMS: ROOMS and C-ROOMs for reference
- Velocity-based vs just position
- Fixed layouts ahead of time. Random agent spawn. Fixed or set or random goal
- Discrete action (8 or 4 cardinal directions) vs Continuous (2D)
- 2 forms of action failure. 0.2 chance of taking random action (cardinal) or flipping signs (continuous). 0.2 standard deviation for Gaussian movement
- What to do for walls?
- Discrete case is easy. Don't move.
- Continuous case could be the same. Alternatively, draw the vector, stop right at wall.
- Observation?
- Non-continuous:
- Fully observable: grid discrete state. Goal state if random?
- Partially observable: 4D Hansen (adjacent), 8D Hansen, nxn grid
- Continuous:
- Fully observable: (x,y) coordinate, Need (dx, dy) if velocity-based. Goal state if random?
- Partially observable:
- (x,y) w/o velocity, (x,y) downsampled to grid
- 4/8D Hansen (0/1 walls in range 1M), 4/8D walls (distance of closest wall)
- Non-continuous:
Pocman/Pacman: Fully/partially-observable pocman from POMCP
Battleship: Partially observable battleship
Rocksample: Also has battleship
Isaacverse: GPU physics control
Mo-Gym: Multi-objective. Fancy fourrooms, reacher with more objectives,
gym-sokoban: pixel-based though...
CARL: Context-adaptive RL, reconfigure envs (Mario, Brax, control)
highway-env: Must infer behaviors of others
Other
- SpaceRobot: Non-actuated base space robot
- Learn2Race: Needs GPU. Eh...
- tmrl: TrackMania racing, 19-D LIDAR option
- ShinRL: Future reference, interesting