28 Nov 15:14

Trinkle23897

3592f45

0.4.5

Bug Fix

Fix tqdm issue (#481)
Fix atari wrapper to be deterministic (#467)
Add writer.flush() in TensorboardLogger to ensure real-time logging result (#485)

Enhancement

Implements set_env_attr and get_env_attr for vector environments (#478)
Implement BCQPolicy and offline_bcq example (#480)
Enable test_collector=None in 3 trainers to turn off testing during training (#485)
Fix an inconsistency in the implementation of Discrete CRR. Now it uses Critic class for its critic, following conventions in other actor-critic policies (#485)
Update several offline policies to use ActorCritic class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic (#485)
Move Atari offline RL examples to examples/offline and tests to test/offline (#485)

Assets 4

13 Oct 16:30

Trinkle23897

v0.4.4

b9eedc5

0.4.4

API Change

add a new class DataParallelNet for multi-GPU training (#461)
add ActorCritic for deterministic parameter grouping for share-head actor-critic network (#458)
collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) (#459)
rename WandBLogger -> WandbLogger (#441)

Bug Fix

fix logging in atari examples (#444)

Enhancement

save_fn() will be called at the beginning of trainer (#459)
create a new page for logger (#463)
add save_data and restore_data in wandb, allow more input arguments for wandb init, and integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py (#441)

Assets 4

02 Sep 21:20

Trinkle23897

v0.4.3

fc251ab

0.4.3

Bug Fix

fix a2c/ppo optimizer bug when sharing head (#428)
fix ppo dual clip implementation (#435)

Enhancement

add Rainbow (#386)
add WandbLogger (#427)
add env_id in preprocess_fn (#391)
update README, add new chart and bibtex (#406)
add Makefile, now you can use make commit-checks to automatically perform almost all checks (#432)
add isort and yapf, apply to existing codebase (#432)
add spelling check by using make spelling (#432)
update contributing.rst (#432)

Assets 4

26 Jun 10:24

Trinkle23897

v0.4.2

ebaca6f

0.4.2

Enhancement

Add model-free dqn family: IQN (#371), FQF (#376)
Add model-free on-policy algorithm: NPG (#344, #347), TRPO (#337, #340)
Add offline-rl algorithm: CQL (#359), CRR (#367)
Support deterministic evaluation for onpolicy algorithms (#354)
Make trainer resumable (#350)
Support different state size and fix exception in venv.__del__ (#352, #384)
Add vizdoom example (#384)
Add numerical analysis tool and interactive plot (#335, #341)

Assets 4

04 Apr 09:36

Trinkle23897

v0.4.1

dd4a011

0.4.1

API Change

Add observation normalization in BaseVectorEnv (norm_obs, obs_rms, update_obs_rms and RunningMeanStd) (#308)
Add policy.map_action to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (#313)
Add lr_scheduler in on-policy algorithms, typically for LambdaLR (#318)

Note

To adapt with this version, you should change the action_range=... to action_space=env.action_space in policy initialization.

Bug Fix

Fix incorrect behaviors (error when n/ep==0 and reward shown in tqdm) with on-policy algorithm (#306, #328)
Fix q-value mask_action error for obs_next (#310)

Enhancement

Release SOTA Mujoco benchmark (DDPG/TD3/SAC: #305, REINFORCE: #320, A2C: #325, PPO: #330) and add corresponding notes in /examples/mujoco/README.md
Fix numpy>=1.20 typing issue (#323)
Add cross-platform unittest (#331)
Add a test on how to deal with finite env (#324)
Add value normalization in on-policy algorithms (#319, #321)
Separate advantage normalization and value normalization in PPO (#329)

Assets 4

02 Mar 12:40

Trinkle23897

v0.4.0

389bdb7

0.4.0

This release contains several API and behavior changes.

API Change

Buffer

Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (#278, #280);
Change buffer.add API from buffer.add(obs, act, rew, done, obs_next, info, policy, ...) to buffer.add(batch, buffer_ids) in order to add data more efficient (#280);
Add set_batch method in buffer (#278);
Add sample_index method, same as sample but only return index instead of both index and batch data (#278);
Add prev (one-step previous transition index), next (one-step next transition index) and unfinished_index (the last modified index whose done==False) (#278);
Add internal method _alloc_by_keys_diff in batch to support any form of keys pop up (#280);

Collector

Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (#280);
Drop collector.collect(n_episode=List[int]) because the new collector can collect episodes without bias (#280);
Move reward_metric from Collector to trainer (#280);
Change Collector.collect logic: AsyncCollector.collect's semantic is the same as previous version, where collect(n_step or n_episode) will not collect exact n_step or n_episode transitions; Collector.collect(n_step or n_episode)'s semantic now changes to exact n_step or n_episode collect (#280);

Policy

Add policy.exploration_noise(action, batch) -> action method instead of implemented in policy.forward() (#280);
Add Timelimit.truncate handler in compute_*_returns (#296);
remove ignore_done flag (#296);
remove reward_normalization option in offpolicy-algorithm (will raise Error if set to True) (#298);

Trainer

Change collect_per_step to step_per_collect (#293);
Add update_per_step and episode_per_collect (#293);
onpolicy_trainer now supports either step_collect or episode_collect (#293)
Add BasicLogger and LazyLogger to log data more conveniently (#295)

Bug Fix

Fix VectorEnv action_space seed randomness -- when call env.seed(seed), it will call env.action_space.seed(seed); otherwise using Collector.collect(..., random=True) will produce different result each time (#300, #303).

Assets 4

16 Feb 01:41

Trinkle23897

v0.3.2

cb65b56

0.3.2

Bug Fix

fix networks under utils/discrete and utils/continuous cannot work well under CUDA+torch<=1.6.0 (#289)
fix 2 bugs of Batch: creating keys in Batch.__setitem__ now throws ValueError instead of KeyError; _create_value now allows placeholder with stack=False option (#284)

Enhancement

Add QR-DQN algorithm (#276)
small optimization for Batch.cat and Batch.stack (#284), now it is almost as fast as v0.2.3

Assets 4

20 Jan 10:24

Trinkle23897

v0.3.1

a511cb4

0.3.1

API Change

change utils.network args to support any form of MLP by default (#275), remove layer_num and hidden_layer_size, add hidden_sizes (a list of int indicate the network architecture)
add HDF5 save/load method for ReplayBuffer (#261)
add offline_trainer (#263)
move Atari-related network to examples/atari/atari_network.py (#275)

Bug Fix

fix a potential bug in discrete behavior cloning policy (#263)

Enhancement

update SAC mujoco result (#246)
add C51 algorithm with benchmark result (#266)
enable type checking in utils.network (#275)

Assets 4

08 Oct 15:24

Trinkle23897

v0.3.0.post1

b364f1a

0.3.0.post1

Several bug fix (trainer, test and docs)

Assets 4

26 Sep 08:39

Trinkle23897

v0.3.0

710966e

0.3.0

Since at this point, the code has largely changed from v0.2.0, we release version 0.3 from now on.

API Change

add policy.updating and clarify collecting state and updating state in training (#224)
change train_fn(epoch) to train_fn(epoch, env_step) and test_fn(epoch) to test_fn(epoch, env_step) (#229)
remove out-of-the-date API: collector.sample, collector.render, collector.seed, VectorEnv (#210)

Bug Fix

fix a bug in DDQN: target_q could not be sampled from np.random.rand (#224)
fix a bug in DQN atari net: it should add a ReLU before the last layer (#224)
fix a bug in collector timing (#224)
fix a bug in the converter of Batch: deepcopy a Batch in to_numpy and to_torch (#213)
ensure buffer.rew has a type of float (#229)

Enhancement

Anaconda support: conda install -c conda-forge tianshou (#228)
add PSRL (#202)
add SAC discrete (#216)
add type check in unit test (#200)
format code and update function signatures (#213)
add pydocstyle and doc8 check (#210)
several documentation fix (#210)

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fix

Enhancement

API Change

Bug Fix

Enhancement

Bug Fix

Enhancement

Enhancement

API Change

Note

Bug Fix

Enhancement

API Change

Buffer

Collector

Policy

Trainer

Bug Fix

Bug Fix

Enhancement

API Change

Bug Fix

Enhancement

API Change

Bug Fix

Enhancement

Releases: thu-ml/tianshou

0.4.5

Bug Fix

Enhancement

0.4.4

API Change

Bug Fix

Enhancement

0.4.3

Bug Fix

Enhancement

0.4.2

Enhancement

0.4.1

API Change

Note

Bug Fix

Enhancement

0.4.0

API Change

Buffer

Collector

Policy

Trainer

Bug Fix

0.3.2

Bug Fix

Enhancement

0.3.1

API Change

Bug Fix

Enhancement

0.3.0.post1

0.3.0

API Change

Bug Fix

Enhancement