-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
safetensors-parser: parse safetensors files
- Loading branch information
0 parents
commit 2295414
Showing
12 changed files
with
1,425 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
node_modules | ||
dist/tsconfig.tsbuildinfo | ||
dist/test |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
Copyright 2024 Reve AI Authors | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy of | ||
this software and associated documentation files (the “Software”), to deal in | ||
the Software without restriction, including without limitation the rights to | ||
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of | ||
the Software, and to permit persons to whom the Software is furnished to do so, | ||
subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS | ||
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR | ||
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER | ||
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN | ||
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
# Safetensors Parser for Javascript | ||
|
||
"safetensors" is the highest-performance file format in wide use within the | ||
pytorch machine learning community. It's a very simple format, and the Python | ||
libraries for dealing with the format are generally of good quality. However, | ||
sometimes you need to send tensors around some web infrastructure that might | ||
include a browser, or a node or deno server, and then what do you do? | ||
|
||
This library includes a utility class to build and save safetensor files, as | ||
well as parse, load, and inspect them. It validates the files and throws an | ||
error when the file is somehow not up to snuff. Some of those validations can | ||
be turned off if you like not knowing things. | ||
|
||
Safetensors-parser attempts to be memory efficient, inasmuchas anything in this | ||
area can be. Each separate tensor from a parsed file references the underlying | ||
byte array. Each tensor is written as a separate chunk if you provide a write | ||
callback to saving tensors. This could allow you to stream tensor data over a | ||
network without having to keep all of them in RAM at once. | ||
|
||
## Usage: Loading | ||
|
||
```typescript | ||
import { parseSafeTensors } from "@reve-ai/safetensors-parser"; | ||
|
||
const stuff: UInt8array = ...; | ||
|
||
const tensorMap = parseSafeTensors(stuff); | ||
const myTensor = tensorMap.getTensor("my-tensor"); | ||
``` | ||
|
||
## Usage: Saving | ||
|
||
```typescript | ||
import { saveSafeTensors, TensorMap } from "@reve-ai/safetensors-parser"; | ||
|
||
const tensorMap = new TensorMap(); | ||
tensorMap.setMetadata("creator", "me"); | ||
tensorMap.addTensor( | ||
"identity", | ||
new UInt8array([1, 0, 0, 0, 1, 0, 0, 0, 1], "UINT8", [3, 3]) | ||
); | ||
|
||
// Use the default writer, which returns the full byte array. | ||
// If you use a custom write callback, nothing will be returned. | ||
const stuff: UInt8array = saveSafeTensors(tensorMap); | ||
``` | ||
|
||
# FAQ | ||
|
||
These questions might have been asked by some person at some point, making them | ||
more frequently asked than questions that nobody has asked. | ||
|
||
## Why does the package.json have no bundler? | ||
|
||
This package is a proper module, provided in a single file (dist/src/index.js) | ||
which you can just import as-is, no bundler required. | ||
|
||
## Why does the package.json have no test runner? | ||
|
||
The tests live in a single file that you can just run with node.js. | ||
|
||
## Can you add convnient dependency wrappers for all my favorite frameworks and image loading libraries? | ||
|
||
This package has no runtime dependencies, and the only development time | ||
dependency is typescript, and it will stay that way. | ||
|
||
## I like the Buffer class better than the UInt8array class. | ||
|
||
That's not a question. Also, `Buffer` is not available in the browser; this | ||
library is intended to be usable both in a browser and on a server. | ||
|
||
# Detailed Usage | ||
|
||
Other than `parseSafeTensors` and `saveSafeTensors`, the rest of the functions | ||
are largely internal but are exported in case you want to use them or test them. | ||
The `TensorMap` class is intended for direct usage, and you could also use | ||
`TensorRef` directly. | ||
|
||
## TensorMap | ||
|
||
```typescript | ||
class TensorMap { | ||
constructor(); | ||
getTensor(name: TensorName): TensorRef | undefined; | ||
addTensor( | ||
name: TensorName, | ||
bytes: Bytes, | ||
format: Format, | ||
shape: Shape | ||
): TensorRef; | ||
addTensor(name: TensorName, tensor: TensorRef): TensorRef; | ||
setTensor(name: TensorName, tensor: TensorRef): TensorRef; | ||
getOrMakeTensor(name: TensorName, factory: () => TensorRef): TensorRef; | ||
getMetaValue(name: TensorName): string | undefined; | ||
setMetaValue(name: TensorName, value: string): void; | ||
get allMetadata(): Map<string, string>; | ||
get allTensors(): Map<string, TensorRef>; | ||
setAllMetadata(metadata: Map<string, string>): void; | ||
removeTensor(name: TensorName | TensorRef): void; | ||
} | ||
``` | ||
|
||
`TensorMap` is a collection of multiple tensors. It can be manually constructed | ||
or loaded from a safetensors file. It provides methods to manage tensors and | ||
metadata. | ||
|
||
- `constructor()`: Creates a new empty TensorMap. | ||
- `getTensor(name)`: Retrieves a tensor by name, or returns undefined if not found. | ||
- `addTensor(name, ...)`: Adds a new tensor to the map. Throws an error if the name already exists. | ||
- `setTensor(name, tensor)`: Sets a tensor, replacing any existing tensor with the same name. | ||
- `setMetaValue(name, value)`: Sets a metadata value. | ||
- `allMetadata`: Getter that returns all metadata as a Map. | ||
- `allTensors`: Getter that returns all tensors as a Map. | ||
- `setAllMetadata(metadata)`: Sets all metadata at once. | ||
- `removeTensor(name)`: Removes a tensor from the map. | ||
|
||
## TensorRef | ||
|
||
```typescript | ||
class TensorRef { | ||
constructor(name: TensorName, bytes: Bytes, format: Format, shape: Shape); | ||
get parent(): TensorMap | undefined; | ||
get name(): string; | ||
set name(val: string); | ||
removeIfParented(): void; | ||
sanityCheck(): void; | ||
} | ||
``` | ||
|
||
`TensorRef` represents a specific tensor within a safetensors archive. It can be | ||
created standalone or obtained from a TensorMap. | ||
|
||
- `constructor(name, bytes, format, shape)`: Creates a new TensorRef with the given properties. | ||
- `parent`: Getter that returns the parent TensorMap, if any. | ||
- `name`: Getter and setter for the tensor name. Setting the name will update it in the parent TensorMap if present. | ||
- `removeIfParented()`: Removes the tensor from its parent TensorMap, if it has one. | ||
- `sanityCheck()`: Performs sanity checks on the tensor, ensuring its size matches the declared shape and format. | ||
|
||
Note: The `bytes`, `format`, and `shape` properties are read-only and can be accessed directly on the TensorRef instance. | ||
|
||
## parseSafeTensors | ||
|
||
```typescript | ||
function parseSafeTensors( | ||
bytes: Uint8Array, | ||
ignoreInvalid?: boolean | ||
): TensorMap; | ||
``` | ||
|
||
Parses a safetensors file (as a Uint8Array) and returns a TensorMap containing | ||
the tensors and metadata stored in that file. If the file is not fully compliant | ||
with the spec, an error will be thrown. You can pass `ignoreInvalid=true` to | ||
attempt parsing anyway. | ||
|
||
## saveSafeTensors | ||
|
||
```typescript | ||
function saveSafeTensors(tensorMap: TensorMap): Uint8Array; | ||
function saveSafeTensors( | ||
tensorMap: TensorMap, | ||
write: (data: Uint8Array) => void | ||
): undefined; | ||
``` | ||
|
||
Generates the contents of a safetensors file from a given TensorMap. It can | ||
either return a Uint8Array containing the file contents or call a provided write | ||
function with chunks of data. | ||
|
||
## sanityCheckTensorsHeader | ||
|
||
```typescript | ||
function sanityCheckTensorsHeader( | ||
ignoreInvalid: boolean, | ||
bytes: Uint8Array, | ||
filesize: number | ||
): number; | ||
``` | ||
|
||
Verifies that the header of a safetensors file seems legitimate. Returns the | ||
size of the JSON chunk that starts at offset 8. | ||
|
||
## unsafeGetHeaderSize | ||
|
||
```typescript | ||
function unsafeGetHeaderSize(bytes: Uint8Array): number; | ||
``` | ||
|
||
Retrieves the header size from the first 4 bytes of a safetensors file. This | ||
function is unsafe as it doesn't perform any checks. | ||
|
||
## sanityCheckTensorsParsed | ||
|
||
```typescript | ||
function sanityCheckTensorsParsed( | ||
ignoreInvalid: boolean, | ||
j: Object, | ||
chunksize: number | ||
): void; | ||
``` | ||
|
||
Performs sanity checks on the parsed JSON content of a safetensors file. | ||
|
||
## sanityCheck | ||
|
||
```typescript | ||
function sanityCheck(tensor: TensorRef): void; | ||
function sanityCheck(bytes: Uint8Array, format: string, shape: number[]): void; | ||
``` | ||
|
||
Performs sanity checks on a tensor, ensuring that its size matches its declared | ||
shape and format. | ||
|
||
## formatLength | ||
|
||
```typescript | ||
function formatLength(format: string): number; | ||
``` | ||
|
||
Returns the byte length of a given tensor format. This may be a fractional value | ||
for formats like 4-bit data. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
export type Bytes = Uint8Array; | ||
export type Format = string; | ||
export type Integer = number; | ||
export type OffsetPair = [Integer, Integer]; | ||
export type Shape = Integer[]; | ||
export type TensorName = string; | ||
export declare class IgnorableError extends Error { | ||
constructor(message: string); | ||
} | ||
export declare function parseSafeTensors(bytes: Bytes, ignoreInvalid?: boolean): TensorMap; | ||
export declare function saveSafeTensors(tensorMap: TensorMap): Uint8Array; | ||
export declare function saveSafeTensors(tensorMap: TensorMap, write: (data: Bytes) => void): undefined; | ||
export declare class TensorMap { | ||
refs: Map<string, TensorRef>; | ||
metadata: Map<string, string>; | ||
constructor(); | ||
getTensor(name: TensorName): TensorRef | undefined; | ||
addTensor(name: TensorName, bytes: Bytes, format: Format, shape: Shape): TensorRef; | ||
addTensor(name: TensorName, tensor: TensorRef): TensorRef; | ||
setTensor(name: TensorName, tensor: TensorRef): TensorRef; | ||
getOrMakeTensor(name: TensorName, factory: () => TensorRef): TensorRef; | ||
getMetaValue(name: TensorName): string | undefined; | ||
setMetaValue(name: TensorName, value: string): void; | ||
get allMetadata(): Map<string, string>; | ||
get allTensors(): Map<string, TensorRef>; | ||
setAllMetadata(metadata: Map<string, string>): void; | ||
removeTensor(name: TensorName | TensorRef): void; | ||
private _setTensor; | ||
} | ||
export declare class TensorRef { | ||
readonly bytes: Bytes; | ||
readonly format: Format; | ||
readonly shape: Shape; | ||
constructor(name: TensorName, bytes: Bytes, format: Format, shape: Shape); | ||
get parent(): TensorMap | undefined; | ||
private _name; | ||
_parent?: TensorMap; | ||
get name(): string; | ||
set name(val: string); | ||
removeIfParented(): void; | ||
sanityCheck(): void; | ||
} | ||
export declare function sanityCheckTensorsHeader(ignoreInvalid: boolean, bytes: Bytes, filesize: Integer): Integer; | ||
export declare function unsafeGetHeaderSize(bytes: Bytes): Integer; | ||
export declare function sanityCheckTensorsParsed(ignoreInvalid: boolean, j: Object, chunksize: Integer): void; | ||
export declare function sanityCheck(tensor: TensorRef): void; | ||
export declare function sanityCheck(bytes: Bytes, format: Format, shape: Shape): void; | ||
export declare function formatLength(format: Format): Integer; | ||
//# sourceMappingURL=index.d.ts.map |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.