Skip to content

Commit 2fb1af1

Browse files
authored
Fix tokenizer preview4 release notes (#9327)
* Fix Tokenizer Preview 4 Release Notes * remove extra empty line * Remove un-needed line
1 parent 379a948 commit 2fb1af1

File tree

1 file changed

+19
-17
lines changed

1 file changed

+19
-17
lines changed

release-notes/9.0/preview/preview4/libraries.md

+19-17
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,20 @@ Libraries updates in .NET 9 Preview 4:
1919

2020
## New `Tensor<T>` type
2121

22-
Tensors are the cornerstone data structure of artificial intelligence (AI). They can often be thought of as multidimensional arrays.
22+
Tensors are the cornerstone data structure of artificial intelligence (AI). They can often be thought of as multidimensional arrays.
2323

2424
Tensors are used to:
2525

26-
- Represent and encode data such as text sequences (tokens), images, video, and audio.
27-
- Efficiently manipulate higher-dimensional data.
28-
- Efficiently apply computations on higher-dimensional data.
29-
- Inside neural networks, they’re used to store weight information and intermediate computations.
26+
- Represent and encode data such as text sequences (tokens), images, video, and audio.
27+
- Efficiently manipulate higher-dimensional data.
28+
- Efficiently apply computations on higher-dimensional data.
29+
- Inside neural networks, they’re used to store weight information and intermediate computations.
3030

31-
In .NET 9, we plan to introduce a new `Tensor<T>` exchange type that:
31+
In .NET 9, we plan to introduce a new `Tensor<T>` exchange type that:
3232

33-
- Provides efficient interop with AI libraries like ML.NET, TorchSharp, and ONNX Runtime using zero copies where possible.
34-
- Builds on top of `TensorPrimitives` for efficient math operations.
35-
- Enables easy and efficient data manipulation by providing indexing and slicing operations.
33+
- Provides efficient interop with AI libraries like ML.NET, TorchSharp, and ONNX Runtime using zero copies where possible.
34+
- Builds on top of `TensorPrimitives` for efficient math operations.
35+
- Enables easy and efficient data manipulation by providing indexing and slicing operations.
3636

3737
Below is a brief overview of some of the APIs included with the new `Tensor<T>` type:
3838

@@ -69,12 +69,12 @@ var t11 = Tensor.Divide(t0, t0); // [[1, 1, 1]]
6969

7070
Some things to note:
7171

72-
- `Tensor<T>` is not a replacement for existing AI and Machine Learning libraries. Instead, it’s intended to provide enough of a common set of APIs that reduce code duplication, reduce dependencies, and where possible achieve better performance by using the latest runtime features.
72+
- `Tensor<T>` is not a replacement for existing AI and Machine Learning libraries. Instead, it’s intended to provide enough of a common set of APIs that reduce code duplication, reduce dependencies, and where possible achieve better performance by using the latest runtime features.
7373
- At the moment, the easiest way to try `Tensor<T>` is using .NET 8. If your application targets .NET 9, we recommend waiting until .NET 9 Preview 5. If you're eager to try it out in your .NET 9 applications, you can install the latest .NET nightly builds.
7474

7575
To get started:
7676

77-
1. Configure the following NuGet nightly feed:
77+
1. Configure the following NuGet nightly feed:
7878

7979
```text
8080
https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet9/nuget/v3/index.json
@@ -87,7 +87,7 @@ To get started:
8787
<LangVersion>preview</LangVersion>
8888
```
8989
90-
We can't wait to see what you build!
90+
We can't wait to see what you build!
9191
9292
Try it out and [give us feedback](https://github.com/dotnet/runtime/issues)!
9393
@@ -102,14 +102,14 @@ The following example demonstrates how to utilize the tokenizer with `Span<char>
102102
using Stream remoteStream = File.OpenRead(tokenizerModelPath));
103103
Tokenizer llamaTokenizer = Tokenizer.CreateLlama(remoteStream);
104104
105-
Span<char> textSpan = "Hello World".AsSpan();
105+
ReadOnlySpan<char> textSpan = "Hello World".AsSpan();
106106
IReadOnlyList<int> ids = llamaTokenizer.EncodeToIds(textSpan, considerNormalization: false); // bypass the normalization
107107
108108
Tokenizer tiktokenTokenizer = Tokenizer.CreateTiktokenForModel("gpt-4");
109-
IReadOnlyList<int> ids = tiktokenTokenizer.EncodeToIds(textSpan, considerPreTokenization: false); // bypass the PreTokenization
109+
ids = tiktokenTokenizer.EncodeToIds(textSpan, considerPreTokenization: false); // bypass the PreTokenization
110110
```
111111

112-
We've also introduced the CodeGen tokenizer, compatible with models such as [codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono/tree/main) and [phi-2](https://huggingface.co/microsoft/phi-2/tree/main).
112+
We've also introduced the CodeGen tokenizer, compatible with models such as [codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono/tree/main) and [phi-2](https://huggingface.co/microsoft/phi-2/tree/main).
113113

114114
The following example demonstrates how to create and utilize this tokenizer.
115115

@@ -123,11 +123,13 @@ Tokenizer ph2Tokenizer = Tokenizer.CreateCodeGen(vocabStream, mergesStream);
123123
IReadOnlyList<int> ids = ph2Tokenizer.EncodeToIds("Hello, World");
124124
```
125125

126+
The [tokenizer library](https://github.com/dotnet/machinelearning/tree/main/src/Microsoft.ML.Tokenizers) is available on GitHub and can be accessed by referencing the [NuGet package](https://www.nuget.org/packages/Microsoft.ML.Tokenizers/0.22.0-preview.24271.1#readme-body-tab).
127+
126128
## OpenTelemetry: Make activity linking more flexible
127129

128130
[Activity.AddLink](https://github.com/dotnet/runtime/blob/e1f98a13be27efbe0ee3b69aa4673e7e98c5c003/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.cs#L529) was added to enable linking an `Activity` object to other tracing contexts after `Activity` object creation. This change better aligns .NET with the [OpenTelemetry specifications](https://github.com/open-telemetry/opentelemetry-specification/blob/6360b49d20ae451b28f7ba0be168ed9a799ac9e1/specification/trace/api.md?plain=1#L804).
129131

130-
`Activity` linking was previously only possible as part of [`Activity` creation](https://learn.microsoft.com/dotnet/api/system.diagnostics.activitysource.createactivity?view=net-8.0#system-diagnostics-activitysource-createactivity(system-string-system-diagnostics-activitykind-system-diagnostics-activitycontext-system-collections-generic-ienumerable((system-collections-generic-keyvaluepair((system-string-system-object))))-system-collections-generic-ienumerable((system-diagnostics-activitylink))-system-diagnostics-activityidformat)).
132+
`Activity` linking was previously only possible as part of [`Activity` creation](https://learn.microsoft.com/dotnet/api/system.diagnostics.activitysource.createactivity?view=net-8.0#system-diagnostics-activitysource-createactivity(system-string-system-diagnostics-activitykind-system-diagnostics-activitycontext-system-collections-generic-ienumerable((system-collections-generic-keyvaluepair((system-string-system-object))))-system-collections-generic-ienumerable((system-diagnostics-activitylink))-system-diagnostics-activityidformat)).
131133

132134
```C#
133135
var activityContext = new ActivityContext(ActivityTraceId.CreateRandom(), ActivitySpanId.CreateRandom(), ActivityTraceFlags.None);
@@ -156,7 +158,7 @@ public abstract partial class ModuleBuilder : System.Reflection.Module
156158
{
157159
public void MarkSequencePoint(ISymbolDocumentWriter document, int startLine, int startColumn, int endLine, int endColumn) { }
158160
}
159-
161+
160162
public abstract partial class LocalBuilder : LocalVariableInfo
161163
{
162164
public void SetLocalSymInfo(string name);

0 commit comments

Comments
 (0)