Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give API access to FileTypeParser#detectors #704

Merged
merged 8 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 27 additions & 18 deletions core.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -114,31 +114,35 @@ console.log(await fileTypeFromBlob(blob));
export declare function fileTypeFromBlob(blob: Blob): Promise<FileTypeResult | undefined>;

/**
Function that allows specifying custom detection mechanisms.
A custom file type detector.

An iterable of detectors can be provided via the `fileTypeOptions` argument for the {@link FileTypeParser.constructor}.
Detectors can be added via the constructor options or by directly modifying `FileTypeParser#detectors`.

The detectors are called before the default detections in the provided order.
Detectors provided through the constructor options are executed before the default detectors.

Custom detectors can be used to add new `FileTypeResults` or to modify return behavior of existing `FileTypeResult` detections.
Custom detectors allow for:
- Introducing new `FileTypeResult` entries.
- Modifying the detection behavior of existing `FileTypeResult` types.

If the detector returns `undefined`, there are 2 possible scenarios:
### Detector execution flow

1. The detector has not read from the tokenizer, it will be proceeded with the next available detector.
2. The detector has read from the tokenizer (`tokenizer.position` has been increased).
In that case no further detectors will be executed and the final conclusion is that file-type returns undefined.
Note that this an exceptional scenario, as the detector takes the opportunity from any other detector to determine the file type.
If a detector returns `undefined`, the following rules apply:

Example detector array which can be extended and provided via the fileTypeOptions argument:
1. **No Tokenizer Interaction**: If the detector does not modify the tokenizer's position, the next detector in the sequence is executed.
2. **Tokenizer Interaction**: If the detector modifies the tokenizer's position (`tokenizer.position` is advanced), no further detectors are executed. In this case, the file type remains `undefined`, as subsequent detectors cannot evaluate the content. This is an exceptional scenario, as it prevents any other detectors from determining the file type.

### Example usage

Below is an example of a custom detector array. This can be passed to the `FileTypeParser` via the `fileTypeOptions` argument.

```
import {FileTypeParser} from 'file-type';

const customDetectors = [
async tokenizer => {
const unicornHeader = [85, 78, 73, 67, 79, 82, 78]; // 'UNICORN' as decimal string
const unicornHeader = [85, 78, 73, 67, 79, 82, 78]; // "UNICORN" in ASCII decimal

const buffer = Buffer.alloc(7);
const buffer = new Uint8Array(unicornHeader.length);
await tokenizer.peekBuffer(buffer, {length: unicornHeader.length, mayBeLess: true});
if (unicornHeader.every((value, index) => value === buffer[index])) {
return {ext: 'unicorn', mime: 'application/unicorn'};
Expand All @@ -148,15 +152,15 @@ const customDetectors = [
},
];

const buffer = Buffer.from('UNICORN');
const buffer = new Uint8Array([85, 78, 73, 67, 79, 82, 78]);
const parser = new FileTypeParser({customDetectors});
const fileType = await parser.fromBuffer(buffer);
console.log(fileType);
console.log(fileType); // {ext: 'unicorn', mime: 'application/unicorn'}
```

@param tokenizer - The [tokenizer](https://github.com/Borewit/strtok3#tokenizer) used to read the file content from.
@param fileType - The file type detected by the standard detections or a previous custom detection, or `undefined`` if no matching file type could be found.
@returns The detected file type, or `undefined` when there is no match.
@param tokenizer - The [tokenizer](https://github.com/Borewit/strtok3#tokenizer) used to read file content.
@param fileType - The file type detected by standard or previous custom detectors, or `undefined` if no match is found.
@returns The detected file type, or `undefined` if no match is found.
*/
export type Detector = (tokenizer: ITokenizer, fileType?: FileTypeResult) => Promise<FileTypeResult | undefined>;

Expand All @@ -180,7 +184,12 @@ This method can be handy to put in a stream pipeline, but it comes with a price.
export function fileTypeStream(webStream: AnyWebReadableStream<Uint8Array>, options?: StreamOptions): Promise<AnyWebReadableByteStreamWithFileType>;

export declare class FileTypeParser {
detectors: Iterable<Detector>;
/**
File type detectors.

Initialized with a single entry holding the built-in detector function.
*/
detectors: Detector[];

constructor(options?: {customDetectors?: Iterable<Detector>; signal?: AbortSignal});

Expand Down
14 changes: 5 additions & 9 deletions core.js
Original file line number Diff line number Diff line change
Expand Up @@ -109,19 +109,17 @@ export async function fileTypeStream(webStream, options) {

export class FileTypeParser {
constructor(options) {
this.detectors = options?.customDetectors;
this.detectors = [...(options?.customDetectors ?? []), this.parse];
this.tokenizerOptions = {
abortSignal: options?.signal,
};
this.fromTokenizer = this.fromTokenizer.bind(this);
this.fromBuffer = this.fromBuffer.bind(this);
this.parse = this.parse.bind(this);
}

async fromTokenizer(tokenizer) {
const initialPosition = tokenizer.position;

for (const detector of this.detectors || []) {
// Iterate through all file-type detectors
for (const detector of this.detectors) {
const fileType = await detector(tokenizer);
if (fileType) {
return fileType;
Expand All @@ -131,8 +129,6 @@ export class FileTypeParser {
return undefined; // Cannot proceed scanning of the tokenizer is at an arbitrary position
}
}

return this.parse(tokenizer);
}

async fromBuffer(input) {
Expand Down Expand Up @@ -215,7 +211,7 @@ export class FileTypeParser {
return this.check(stringToBytes(header), options);
}

async parse(tokenizer) {
parse = async tokenizer => {
this.buffer = new Uint8Array(reasonableDetectionSizeInBytes);

// Keep reading until EOF if the file size is unknown.
Expand Down Expand Up @@ -1647,7 +1643,7 @@ export class FileTypeParser {
};
}
}
}
};

async readTiffTag(bigEndian) {
const tagId = await this.tokenizer.readToken(bigEndian ? Token.UINT16_BE : Token.UINT16_LE);
Expand Down
42 changes: 27 additions & 15 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,33 +340,36 @@ Returns a `Set<string>` of supported MIME types.

## Custom detectors

A custom detector is a function that allows specifying custom detection mechanisms.
A custom file type detector.

sindresorhus marked this conversation as resolved.
Show resolved Hide resolved
An iterable of detectors can be provided via the `fileTypeOptions` argument for the `FileTypeParser` constructor.
Detectors can be added via the constructor options or by directly modifying `FileTypeParser#detectors`.

The detectors are called before the default detections in the provided order.
Detectors provided through the constructor options are executed before the default detectors.

Custom detectors can be used to add new `FileTypeResults` or to modify return behaviour of existing `FileTypeResult` detections.
Custom detectors allow for:
- Introducing new `FileTypeResult` entries.
- Modifying the detection behavior of existing `FileTypeResult` types.

If the detector returns `undefined`, there are 2 possible scenarios:
### Detector execution flow

1. The detector has not read from the tokenizer, it will be proceeded with the next available detector.
2. The detector has read from the tokenizer (`tokenizer.position` has been increased).
In that case no further detectors will be executed and the final conclusion is that file-type returns undefined.
Note that this an exceptional scenario, as the detector takes the opportunity from any other detector to determine the file type.
If a detector returns `undefined`, the following rules apply:

Example detector array which can be extended and provided to each public method via the `fileTypeOptions` argument:
1. **No Tokenizer Interaction**: If the detector does not modify the tokenizer's position, the next detector in the sequence is executed.
2. **Tokenizer Interaction**: If the detector modifies the tokenizer's position (`tokenizer.position` is advanced), no further detectors are executed. In this case, the file type remains `undefined`, as subsequent detectors cannot evaluate the content. This is an exceptional scenario, as it prevents any other detectors from determining the file type.

### Example usage

Below is an example of a custom detector array. This can be passed to the `FileTypeParser` via the `fileTypeOptions` argument.

```js
import {FileTypeParser} from 'file-type';

const customDetectors = [
async tokenizer => {
const unicornHeader = [85, 78, 73, 67, 79, 82, 78]; // 'UNICORN' as decimal string
const unicornHeader = [85, 78, 73, 67, 79, 82, 78]; // "UNICORN" in ASCII decimal

const buffer = new Uint8Array(7);
const buffer = new Uint8Array(unicornHeader.length);
await tokenizer.peekBuffer(buffer, {length: unicornHeader.length, mayBeLess: true});

if (unicornHeader.every((value, index) => value === buffer[index])) {
return {ext: 'unicorn', mime: 'application/unicorn'};
}
Expand All @@ -375,10 +378,19 @@ const customDetectors = [
},
];

const buffer = new Uint8Array(new TextEncoder().encode('UNICORN'));
const buffer = new Uint8Array([85, 78, 73, 67, 79, 82, 78]);
const parser = new FileTypeParser({customDetectors});
const fileType = await parser.fromBuffer(buffer);
console.log(fileType);
console.log(fileType); // {ext: 'unicorn', mime: 'application/unicorn'}
```

```ts
/**
@param tokenizer - The [tokenizer](https://github.com/Borewit/strtok3#tokenizer) used to read file content.
@param fileType - The file type detected by standard or previous custom detectors, or `undefined` if no match is found.
@returns The detected file type, or `undefined` if no match is found.
*/
export type Detector = (tokenizer: ITokenizer, fileType?: FileTypeResult) => Promise<FileTypeResult | undefined>;
```

## Abort signal
Expand Down
Loading