Skip to content

Releases: pytorch/tensordict

v0.6.1: Minor fixes and perf improvements

04 Nov 15:23
Compare
Choose a tag to compare

This release offers some minor bug fixes in bytes (#1059), a better handling of edge cases in keys, values and items (#1058), better caching of grads (#1069) and a broken reference cycle in reductions during calls to export (#1056).

v0.6.0: Export, streaming and `CudaGraphModule`

21 Oct 16:38
8c65dcb
Compare
Choose a tag to compare

What's Changed

TensorDict 0.6.0 makes the @dispatch decorator compatible with torch.export and related APIs,
allowing you to get rid of tensordict altogether when exporting your models:

from torch.export import export

model = Seq(
    # 1. A small network for embedding
    Mod(nn.Linear(3, 4), in_keys=["x"], out_keys=["hidden"]),
    Mod(nn.ReLU(), in_keys=["hidden"], out_keys=["hidden"]),
    Mod(nn.Linear(4, 4), in_keys=["hidden"], out_keys=["latent"]),
    # 2. Extracting params
    Mod(NormalParamExtractor(), in_keys=["latent"], out_keys=["loc", "scale"]),
    # 3. Probabilistic module
    Prob(
        in_keys=["loc", "scale"],
        out_keys=["sample"],
        distribution_class=dists.Normal,
    ),
)

model_export = export(model, args=(), kwargs={"x": x})

See our new tutorial to learn more about this feature.

The library integration with the PT2 stack is also further improved by the introduction of CudaGraphModule,
which can be used to speed-up model execution under a certain set of assumptions; mainly that the inputs and outputs
are non-differentiable, that they are all tensors or constant and that the whole graph can be executed on cuda with
buffers of constant shape (ie, dynamic shape is not allowed).

We also introduce a new tutorial on streaming tensordicts.

Note: The aarch64 binaries are attached to these release notes and not available in PyPI at the moment.

Deprecations

  • [Deprecate] Make calls to make_functional error #1034 by @vmoens
  • [Deprecation] Act warned deprecations for v0.6 #1001 by @vmoens
  • [Refactor] make TD.get default to None, like dict (#948) by @vmoens

Features

Code improvements

Fixes

  • [BugFix] Add nullbyte in memmap files to make fbcode happy (#943) by @vmoens
  • [BugFix] Add sync to cudagraph module (#1026) by @vmoens
  • [BugFix] Another compiler fix for older pytorch #980 by @vmoens
  • [BugFix] Compatibility with non-tensor inputs in CudaGraphModule #1039 by @vmoens
  • [BugFix] Deserializing a consolidated TD reproduces a consolidated TD #1019 by @vmoens
  • [BugFix] Fix foreach_copy for older versions of PT #1035 by @vmoens
  • [BugFix] Fix buffer identity in Params._apply (#1027) by @vmoens
  • [BugFix] Fix key errors catch in del_ and related (#949) by @vmoens
  • [BugFix] Fix number check in array parsing (np>=2 compatibility) #999 by @vmoens
  • [BugFix] Fix pre 2.1 _apply compatibility #1050 by @vmoens
  • [BugFix] Fix select in tensorclass (#936) by @vmoens
  • [BugFix] Fix td device sync when error is raised #988 by @vmoens
  • [BugFix] Fix tree_leaves import for older versions of PT #995 by @vmoens
  • [BugFix] Fix vmap monkey patching #1009 by @vmoens
  • [BugFix] Make probabilistic sequential modules compatible with compile #1030 by @vmoens
  • [BugFix] Other dynamo fixes #977 by @vmoens
  • [BugFix] Propagate maybe_dense_stack in _stack #1036 by @vmoens
  • [BugFix] Regular swap_tensor for to_module in dynamo (#963) by @vmoens
  • [BugFix] Remove ForkingPickler to account for change of API in torch.mp #998 by @vmoens
  • [BugFix] Remove forkingpickler (#1049) by @bhack
  • [BugFix] Resilient deterministic_sample for CompositeDist #1000 by @vmoens
  • [BugFix] Simple syncs (#942) by @vmoens
  • [BugFix] Softly revert get changes (#950) by @vmoens
  • [BugFix] TDParams.to(device) works as nn.Module, not TDParams contained TD #1025 by @vmoens
  • [BugFix] Use separate streams for cudagraph warmup #1010 by @vmoens
  • [BugFix] dynamo compat refactors #975 by @vmoens
  • [BugFix] resilient _exclude_td_from_pytree #1038 by @vmoens
  • [BugFix] restrict usage of Buffers to non-batched, non-tracked tensors #979 by @vmoens

Doc

Performance

Not user facing

New Contributors

Full Changelog: v0.5.0...v0.6.0

Co-authored-by: Vincent Moens vmoens@meta.com by @albertbou92

v0.5.0: `consolidate`, compile compatibility and better non-tensor support

30 Jul 21:39
Compare
Choose a tag to compare

This release is packed with new features and performance improvements.

What's new

TensorDict.consolidate

There is now a TensorDict.consolidate method that will put all the tensors in a single storage. This will greatly speed-up serialization in multiprocessed and distributed settings.

PT2 support

TensorDict common ops (get, set, index, arithmetic ops etc) now work within torch.compile.
The list of supported operations can be found in test/test_compile.py. We encourage users to report any graph break caused by tensordict to us, as we are willing to improve the coverage as much as can be.

Python 3.12 support

#807 enables python 3.12 support, a long awaited feature!

Global reduction for mean, std and other reduction methods

It is now possible to get the grand average of a tensordict content using tensordict.mean(reduce=True).
This applies to mean, nanmean, prod, std, sum, nansum and var.

from_pytree and to_pytree

We made it easy to convert a tensordict to a given pytree structure and build it from any pytree using to_pytree and from_pytree. #832
Similarly, conversion to namedtuple is now made easy thanks to #788.

map_iter

One can now iterate through a TensorDIct batch-dimension and apply a function on a separate process thanks to map_iter.
This should enable the construction of datasets using TensorDict, where the preproc step is executed on a separate process. #847

Using flatten and unflatten, flatten_keys and unflatten_keys as context managers

It is not possible to use flatten_keys and flatten as context managers (#908, #779):

with tensordict.flatten_keys() as flat_td:
    flat_td["flat.key"] = 0
assert td["flat", "key"] == 0

Building a tensordict using keyword arguments

We made it easy to build tensordicts with simple keyword arguments, like a dict is built in python:

td = TensorDict(a=0, b=1)
assert td["a"] == torch.tensor(0)
assert td["b"] == torch.tensor(1)

The batch_size is now optional for both tensordict and tensorclasses. #905

Load tensordicts directly on device

Thanks to #769, it is now possible to load a tensordict directly on a destination device (including "meta" device):

td = TensorDict.load(path, device=device)

New features

  • [Feature,Performance] to(device, pin_memory, num_threads) by @vmoens in #846
  • [Feature] Allow calls to get_mode, get_mean and get_median in case mode, mean or median is not present by @vmoens in #804
  • [Feature] Arithmetic ops for tensorclass by @vmoens in #786
  • [Feature] Best attempt to densly stack sub-tds when LazyStacked TDS are passed to maybe_dense_stack by @vmoens in #799
  • [Feature] Better dtype coverage by @vmoens in #834
  • [Feature] Change default interaction types to DETERMINISTIC by @vmoens in #825
  • [Feature] DETERMINISTIC interaction mode by @vmoens in #824
  • [Feature] Expose call_on_nested to apply and named_apply by @vmoens in #768
  • [Feature] Expose stack / cat as class methods by @vmoens in #793
  • [Feature] Load tensordicts on device, incl. meta by @vmoens in #769
  • [Feature] Make Probabilistic modules aware of CompositeDistributions out_keys by @vmoens in #810
  • [Feature] Memory-mapped nested tensors by @vmoens in #618
  • [Feature] Multithreaded apply by @vmoens in #844
  • [Feature] Multithreaded pin_memory by @vmoens in #845
  • [Feature] Support for non tensor data in h5 by @vmoens in #772
  • [Feature] TensorDict.consolidate by @vmoens in #814
  • [Feature] TensorDict.numpy() by @vmoens in #787
  • [Feature] TensorDict.replace by @vmoens in #774
  • [Feature] out argument in apply by @vmoens in #794
  • [Feature] to for consolidated TDs by @vmoens in #851
  • [Feature] zero_grad and requires_grad_ by @vmoens in #901
  • [Feature] add_custom_mapping and NPE refactors by @vmoens in #910
  • [Feature] construct tds with kwargs by @vmoens in #905
  • [Feature] determinstic_sample for composite dist by @vmoens in #827
  • [Feature] expand_as by @vmoens in #792
  • [Feature] flatten and unflatten as decorators by @vmoens in #779
  • [Feature] from and to_pytree by @vmoens in #832
  • [Feature] from_modules expand_identical kwarg by @vmoens in #911
  • [Feature] grad and data for tensorclasses by @vmoens in #904
  • [Feature] isfinite, isnan, isreal by @vmoens in #829
  • [Feature] map_iter by @vmoens in #847
  • [Feature] map_names for composite dists by @vmoens in #809
  • [Feature] online edition of memory mapped tensordicts by @vmoens in #775
  • [Feature] remove distutils dependency and enable 3.12 support by @GaetanLepage in #807
  • [Feature] to_namedtuple and from_namedtuple by @vmoens in #788
  • [Feature] view(dtype) by @vmoens in #835

Performance

  • [Performance] Faster getattr in TC by @vmoens in #912
  • [Performance] Faster lock_/unclock_ when sub-tds are already locked by @vmoens in #816
  • [Performance] Faster multithreaded pin_memory by @vmoens in #919
  • [Performance] Faster tensorclass by @vmoens in #791
  • [Performance] Faster tensorclass set by @vmoens in #880
  • [Performance] Faster to-module by @vmoens in #914

Bug Fixes

  • [BugFix,CI] Fix storage filename tests by @vmoens in #850
  • [BugFix] @Property setter in tensorclass by @vmoens in #813
  • [BugFix] Allow any tensorclass to have a data field by @vmoens in #906
  • [BugFix] Allow fake-tensor detection pass through in torch 2.0 by @vmoens in #802
  • [BugFix] Avoid collapsing NonTensorStack when calling where by @vmoens in #837
  • [BugFix] Check if the current user has write access by @MateuszGuzek in #781
  • [BugFix] Ensure dtype is preserved with autocast by @vmoens in #773
  • [BugFix] FIx non-tensor writing in modules by @vmoens in #822
  • [BugFix] Fix (keys, values) in sub by @vmoens in #907
  • [BugFix] Fix _make_dtype_promotion backward compat by @vmoens in #842
  • [BugFix] Fix pad_sequence behavior for non-tensor attributes of tensorclass by @kurtamohler in #884
  • [BugFix] Fix builds by @vmoens in #849
  • [BugFix] Fix compile + vmap by @vmoens in #924
  • [BugFix] Fix deterministic fallback when the dist has no support by @vmoens in #830
  • [BugFix] Fix device parsing in augmented funcs by @vmoens in #770
  • [BugFix] Fix empty tuple index by @vmoens in #811
  • [BugFix] Fix fallback of deterministic samples when mean is not available by @vmoens in #828
  • [BugFix] Fix functorch dim mock by @vmoens in #777
  • [BugFix] Fix gather device by @vmoens in #815
  • [BugFix] Fix h5 auto batch size by @vmoens in #798
  • [BugFix] Fix key ordering in pointwise ops by @vmoens in #855
  • [BugFix] Fix lazy stack features (where and norm) by @vmoens in #795
  • [BugFix] Fix map by @vmoens in #862
  • [BugFix] Fix map test with fork on cuda by @vmoens in #765
  • [BugFix] Fix pad_sequence for non tensors by @vmoens in #784
  • [BugFix] Fix setting non-tensors as data in NonTensorData by @vmoens in https://github.com/pytorch/...
Read more

v0.4.0

25 Apr 16:14
Compare
Choose a tag to compare

What's Changed

This new version of tensordict comes with a great deal of new features:

  • You can now operate pointwise arithmetic operations on tensordict. For locked tensordicts and inplace operations such as += or data.mul_, fused cuda kernels will be used which will drastically improve the runtime.
    See

    • [Feature] Pointwise arithmetic operations using foreach by @vmoens in #722
    • [Feature] Mean, std, var, prod, sum by @vmoens in #751
  • Casting tensordicts to device is now much faster out-of-the box as data will be cast asynchronously (and it's safe too!)
    [BugFix,Feature] Optional non_blocking in set, to_module and update by @vmoens in #718
    [BugFix] consistent use of non_blocking in tensordict and torch.Tensor by @vmoens in #734
    [Feature] non_blocking=None by default by @vmoens in #748

  • The non-tensor data API has also been improved, see
    [BugFix] Allow inplace modification of non-tensor data in locked tds by @vmoens in #694
    [BugFix] Fix inheritance from non-tensor by @vmoens in #709
    [Feature] Allow non-tensordata to be shared across processes + memmap by @vmoens in #699
    [Feature] Better detection of non-tensor data by @vmoens in #685

  • @tensorclass now supports automatic type casting: annotating a value as a tensor or an int can ensure that the value will be cast to that type if the tensorclass decorator takes the autocast=True argument
    [Feature] Type casting for tensorclass by @vmoens in #735

  • TensorDict.map now supports the "fork" start method. Preallocated outputs are also a possibility.
    [Feature] mp_start_method in tensordict map by @vmoens in #695
    [Feature] map with preallocated output by @vmoens in #667

  • Miscellaneous performance improvements
    [Performance] Faster flatten_keys by @vmoens in #727
    [Performance] Faster update_ by @vmoens in #705
    [Performance] Minor efficiency improvements by @vmoens in #703
    [Performance] Random speedups by @albanD in #728
    [Feature] Faster to(device) by @vmoens in #740

  • Finally, we have opened a discord channel for tensordict!
    [Badge] Discord shield by @vmoens in #736

  • We cleaned up the API a bit, creating a save and a load methods, or adding some utils such as fromkeys. One can also check if a key belongs to a tensordict as it is done with a regular dictionary with key in tensordict!
    [Feature] contains, clear and fromkeys by @vmoens in #721

Thanks for all our contributors and community for the support!

Other PRs

Read more

v0.3.2: Minor release

07 Apr 13:39
Compare
Choose a tag to compare

[BugFix,Feature] Optional non_blocking in set, to_module and update (#718)
[Refactor] Refactor contiguous (#716)
[Test] Add proper tests for torch.stack with lazy stacks (#715)
[BugFix] Fix dense stack usage in torch.stack (#714)
[BugFix] Dense stack lazy tds defaults to dense_stack_tds (#713)
[Feature] Store non tensor stacks in a single json (#711)
[Feature] TensorDict logger (#710)
[BugFix, Feature] tensorclass.to_dict and from_dict (#707)
[BugFix] Fix inheritance from non-tensor (#709)
[Performance] Faster update_ (#705)
[Benchmark] Benchmark update_ (#704)
[Performance] Minor efficiency improvements (#703)
[Feature] Allow non-tensordata to be shared across processes + memmap (#699)
[CI] Unpin mpmath (#702)
[CI] Remove snapshot from CI (#701)
[BugFix] Support empty tuple in lazy stack indexing (#696)
[CI] Pinning mpmath (#697)
[BugFix] Allow inplace modification of non-tensor data in locked tds (#694)
[Feature] Better detection of non-tensor data (#685)
[Feature] Warn when reset_parameters_recursive is a no-op (#693)
[BugFix,Feature] filter_empty in apply (#661)

See the release on PyPI: https://pypi.org/project/tensordict/0.3.2/

v0.3.1

27 Feb 01:33
Compare
Choose a tag to compare

Solves several bugs and performance issues.

List of changes:

v0.3.0: `MemoryMappedTensor`, pickle-free multithreaded serialization and more!

31 Jan 14:07
Compare
Choose a tag to compare

In this release we introduce a bunch of exciting features to TensorDict:

  • We deprecate MemmapTensor in favour of MemoryMappedTensor, which is fully backed by torch.Tensor and not numpy anymore. The new API is faster and way more bug-free than it used too. See #541

  • Saving tensordicts on disk can now be done via memmap, memmap_ and memmap_like which all support multithreading. If possible, serialization is pickle free (memmap + json) and torch.save is only used for classes that fail to be serialized with json. Serializing models using tensordict is now 3-10x faster than using torch.save, even for SOTA LLMs such as LLAMA.

  • TensorDict can now carry non tensor data through the NonTensorData class. Assigning non-tensor data can also be done via __setitem__ and they can be retrieved via __getitem__. #601

  • A bunch of new operations have appeared too such as named_apply (apply with key names) or tensordict.auto_batch_size_(), and operations like update can now be achieved for only a subset of keys.

  • Almost all operations in the library are now faster!

  • We are slowing deprecating lazy classes except for LazyStackedTensorDict. Whereas torch.stack used to systematically return a lazy stack, it now returns a dense stack if the set_lazy_legacy(mode) decorator is set to False (which will be the default in the next release). The old behaviour can be set with set_lazy_legacy(True). Lazy stacks can still be obtained using LazyStackedTensorDict.lazy_stack. Appropriate warnings are raised unless you have patched your code accordingly.

What's Changed

  • [Refactor] MemoryMappedTensor by @vmoens in #541
  • [Feature] Multithread memmap by @vmoens in #592
  • [Refactor] Graceful as_tensor by @vmoens in #549
  • [Test] Fix as_tensor test by @vmoens in #551
  • Fix assignment of str-typed value to _device attribute in MemmapTensor by @kurt-stolle in #552
  • [Refactor] Refactor split by @vmoens in #555
  • [Refactor] Refactor implement_for by @vmoens in #556
  • [Feature] Better constructors for MemoryMappedTensors by @vmoens in #557
  • [CI] Fix benchmark on gpu by @vmoens in #560
  • [CI] Add regular benchmarks to CI in PRs without upload by @vmoens in #561
  • [Refactor] Major refactoring of codebase by @vmoens in #559
  • [Benchmark] Benchmark split and chunk by @vmoens in #564
  • [Performance] Faster split, chunk and unbind by @vmoens in #563
  • [Feature] Consolidate functional calls by @vmoens in #565
  • [Refactor] Improve functional call efficiency by @vmoens in #567
  • [Refactor] Do not lock nested tensordict in tensordictparams by @vmoens in #568
  • [Performance] Faster params and buffer registration in TensorDictParams by @vmoens in #569
  • [BugFix] Graceful attribute error exit in TensorDictParams by @vmoens in #571
  • [Refactor] Upgrade pytree import by @vmoens in #573
  • [BugFix] Compatibility with missing _global_parameter_registration_hooks by @vmoens in #574
  • [Feature] Seed workers in TensorDict.map by @vmoens in #562
  • [Performance] Faster update by @vmoens in #572
  • [Performance] Faster to_module by @vmoens in #575
  • [BugFix] _FileHandler for windows by @vmoens in #577
  • [Performance] Faster __init__ by @vmoens in #576
  • [Feature, Test] Add tests for partial update by @vmoens in #578
  • [BugFix] No fallback on TensorDictModule.__getattr__ for private attributes by @vmoens in #579
  • [BugFix] Fix deepcopy of TensorDictParams by @vmoens in #580
  • Add MANIFEST.in by @vmoens in #581
  • [BugFix] Delete parameter/buffer before setting it with regular setattr in to_module by @vmoens in #583
  • [Feature] named_apply and default value in apply by @vmoens in #584
  • [BugFix] Faster empty_like for MemoryMappedTensor by @vmoens in #585
  • [BugFix] Faster empty_like for MemoryMappedTensor (dup) by @vmoens in #586
  • [BugFix] Adapt MemoryMappedTensor for torch < 2.0 by @vmoens in #587
  • [Performance] Make copy_ a no-op if tensors are identical by @vmoens in #588
  • [BugFix] Fix non-blocking arg in copy_ by @vmoens in #590
  • [Feature] Unbind and stack tds in map with chunksize=0 by @vmoens in #589
  • [Performance] Faster dispatch by @vmoens in #487
  • [Feature] Saving metadata of tensorclass by @vmoens in #582
  • [BugFix] Fix osx tests by @vmoens in #591
  • [Feature] Weakref for unlocking tds by @vmoens in #595
  • [BugFix] Fix pickling of weakrefs by @vmoens in #597
  • [Feature] Return early a tensordict created through memmap with multiple threads by @vmoens in #598
  • [CI] Depend on torch nightly for nightly releases by @vmoens in #599
  • [Feature] Storing non-tensor data in tensordicts by @vmoens in #601
  • [Feature, Test] FSDP and DTensors by @vmoens in #600
  • [Minor] Fix type deletion in tensorclass load_memmap by @vmoens in #602
  • [BugFix] Fix ellipsis check by @vmoens in #604
  • [Feature] Best intention stack by @vmoens in #605
  • [Feature] Remove and check for prints in codebase using flake8-print by @vmoens in #603
  • [Doc] Doc revamp by @vmoens in #593
  • [BugFix, Doc] Fix tutorial by @vmoens in #606
  • [BugFix] Fix gh-pages upload by @vmoens in #607
  • [BugFix] Upload content of html directly by @vmoens in #608
  • [Feature] Improve in-place ops for TensorDictParams by @vmoens in #609
  • [BugFix, CI] Fix GPU benchmarks by @vmoens in #611
  • [Feature] inplace to_module by @vmoens in #610
  • [Versioning] Bump v0.3.0 by @vmoens in #613
  • [Feature] Support group specification by @lucifer1004 in #616
  • [Refactor] Remove remaining MemmapTensor references by @vmoens in #617
  • [Tests] Reorder and regroup tests by @vmoens in #614
  • [Performance] Faster set by @vmoens in #619
  • [Performance] Better shared/memmap inheritance and faster exclude by @vmoens in #621
  • [Benchmark] Benchmark select, exclude and empty by @vmoens in #623
  • [Feature] Improve the empty method by @vmoens in #622
  • [BugFix] Fix is_memmap attribute for memmap_like and memmap by @vmoens in #625
  • Bump jinja2 from 3.1.2 to 3.1.3 in /docs by @dependabot in #626
  • [BugFix] Remove shared/memmap inheritance from clone / select / exclude by @vmoens in #624
  • [BugFix] Fix index in list error by @vmoens in #627
  • [Refactor] Make unbind call tensor.unbind by @vmoens in #628
  • [Feature] auto_batch_size_ by @vmoens in #630
  • [BugFix] Fix NonTensorData interaction by @vmoens in #631
  • [Doc] More doc on how to set and get non-tensor data by @vmoens in #632
  • [Feature] _auto_make_functional and _dispatch_td_nn_modules by @vmoens in #633
  • [BugFIx] Fix exclude indent by @vmoens in #637
  • [BugFix] Limit number of threads in workers for .map() by @vmoens in #638
  • [Feature] Robust to lazy_legacy set to false and context managers for reshape ops by @vmoens in #634
  • [Minor] Typo in lazy legacy warnings by @Vmoe...
Read more

v0.2.1

26 Oct 20:57
c3caa76
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.2.1

0.2.0

05 Oct 06:54
Compare
Choose a tag to compare

New features

What's Changed

Read more

v0.1.2

09 May 15:38
8913d81
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.1...v0.1.2