Skip to content

Commit

Permalink
Merge pull request #42 from mmalecot/dev
Browse files Browse the repository at this point in the history
v0.23.0
  • Loading branch information
mmalecot authored Dec 11, 2023
2 parents e6aaae5 + 4e37015 commit 4804982
Show file tree
Hide file tree
Showing 35 changed files with 999 additions and 652 deletions.
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@
# Version 0.23.0 (2023-12-11)

## Fixes

- Fix Neo Geo Pocket ROM (NGP) extension

## Improvements

- Add precision to the JSON Feed signature

## Internal changes

- Improve performance and precision of all readers

## New formats support

- Empty
- Microsoft Write (WRI)
- Neo Geo Pocket Color ROM (NGC)
- Picture Exchange (PCX)

# Version 0.22.0 (2023-11-04)

## Internal changes
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "file-format"
version = "0.22.0"
version = "0.23.0"
authors = ["Mickaël Malécot <mickael.malecot@gmail.com>"]
edition = "2021"
description = "Crate for determining the file format of a given file or stream."
Expand Down
40 changes: 36 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@
Crate for determining the file format of a given file or stream.

It provides a variety of functions for identifying a wide range of file formats, including ZIP,
Compound File Binary (CFB), Extensible Markup Language (XML) and more.
Compound File Binary (CFB), Extensible Markup Language (XML) and much more.

It checks the signature of the file to determine its format. If it is not recognized by its
signature, it returns the default file format which is Arbitrary Binary Data (BIN).
It checks the signature of the file to determine its format and intelligently employs specific
readers when available for accurate identification. If the signature is not recognized, the crate
falls back to the default file format, which is Arbitrary Binary Data (BIN).

## Examples

Expand Down Expand Up @@ -50,9 +51,36 @@ Add this to your `Cargo.toml`:

```toml
[dependencies]
file-format = "0.22"
file-format = "0.23"
```

## Crate features

All features below are disabled by default.

### Ecosystem features

- `serde` - Adds the ability to serialize and deserialize a `FileFormat` and `Kind` using serde.

### Reader features

These features enable the detection of file formats that require a specific reader for
identification.

- `reader` - Enables all reader features.
- `reader-asf` - Enables Advanced Systems Format (ASF) based file formats detection.
- `reader-cfb` - Enables Compound File Binary (CFB) based file formats detection.
- `reader-ebml` - Enables Extensible Binary Meta Language (EBML) based file formats detection.
- `reader-exe` - Enables MS-DOS Executable (EXE) based file formats detection.
- `reader-mp4` - Enables MPEG-4 Part 14 (MP4) based file formats detection.
- `reader-pdf` - Enables Portable Document Format (PDF) based file formats detection.
- `reader-rm` - Enables RealMedia (RM) based file formats detection.
- `reader-txt` - Enables Plain Text (TXT) detection when the file format is not recognized by its
signature. Please note that this feature only detects files containing ASCII/UTF-8-encoded text.
- `reader-xml` - Enables Extensible Markup Language (XML) based file formats detection. Please note
that these file formats may be detected without the feature in certain cases.
- `reader-zip` - Enables ZIP-based file formats detection.

## Supported file formats

### Application
Expand All @@ -69,6 +97,7 @@ file-format = "0.22"
- CD Audio (CDA)
- Compound File Binary (CFB)
- Digital Imaging and Communications in Medicine (DICOM)
- Empty
- Encapsulated PostScript (EPS)
- Extensible Binary Meta Language (EBML)
- Extensible Stylesheet Language Transformations (XSLT)
Expand Down Expand Up @@ -237,6 +266,7 @@ file-format = "0.22"
- Microsoft Works 6 Spreadsheet (XLR)
- Microsoft Works Spreadsheet (WKS)
- Microsoft Works Word Processor (WPS)
- Microsoft Write (WRI)
- Office Open XML Document (DOCX)
- Office Open XML Drawing (VSDX)
- Office Open XML Presentation (PPTX)
Expand Down Expand Up @@ -359,6 +389,7 @@ file-format = "0.22"
- OpenEXR (EXR)
- OpenRaster (ORA)
- Panasonic Raw (RW2)
- Picture Exchange (PCX)
- Portable Arbitrary Map (PAM)
- Portable BitMap (PBM)
- Portable FloatMap (PFM)
Expand Down Expand Up @@ -463,6 +494,7 @@ file-format = "0.22"
- Game Boy ROM (GB)
- Game Gear ROM (GG)
- Mega Drive ROM (MD)
- Neo Geo Pocket Color ROM (NGC)
- Neo Geo Pocket ROM (NGP)
- Nintendo 64 ROM (Z64)
- Nintendo DS ROM (NDS)
Expand Down
Binary file modified fixtures/application/sample.asf
Binary file not shown.
Empty file.
Binary file modified fixtures/application/sample.rm
Binary file not shown.
Binary file modified fixtures/audio/sample.wma
Binary file not shown.
Binary file modified fixtures/audio/sample2.ra
Binary file not shown.
Binary file added fixtures/database/sample1.wdb
Binary file not shown.
File renamed without changes.
Binary file added fixtures/document/sample.wri
Binary file not shown.
Binary file added fixtures/document/sample1.wps
Binary file not shown.
File renamed without changes.
Binary file modified fixtures/image/sample.ai
Binary file not shown.
Binary file added fixtures/image/sample.pcx
Binary file not shown.
Binary file added fixtures/rom/sample.ngc
Binary file not shown.
16 changes: 0 additions & 16 deletions fixtures/subtitle/sample3.ttml

This file was deleted.

Binary file modified fixtures/video/sample.dvr-ms
Binary file not shown.
File renamed without changes.
Binary file modified fixtures/video/sample.rv
Binary file not shown.
Binary file modified fixtures/video/sample.wmv
Binary file not shown.
31 changes: 29 additions & 2 deletions src/formats.rs
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,12 @@ formats! {
extension = "eot"
kind = Font

format = Empty
name = "Empty"
media_type = "application/x-empty"
extension = "empty"
kind = Application

format = EncapsulatedPostscript
name = "Encapsulated PostScript"
short_name = "EPS"
Expand Down Expand Up @@ -1436,6 +1442,13 @@ formats! {
extension = "wps"
kind = Document

format = MicrosoftWrite
name = "Microsoft Write"
short_name = "WRI"
media_type = "application/x-mswrite"
extension = "wri"
kind = Document

format = Mobipocket
name = "Mobipocket"
short_name = "MOBI"
Expand Down Expand Up @@ -1529,7 +1542,7 @@ formats! {

format = MpegDashManifest
name = "MPEG-DASH Manifest"
short_name= "MPD"
short_name = "MPD"
media_type = "application/dash+xml"
extension = "mpd"
kind = Playlist
Expand Down Expand Up @@ -1588,11 +1601,18 @@ formats! {
extension = "mxl"
kind = Application

format = NeoGeoPocketColorRom
name = "Neo Geo Pocket Color ROM"
short_name = "NGC"
media_type = "application/x-neo-geo-pocket-rom"
extension = "ngc"
kind = Rom

format = NeoGeoPocketRom
name = "Neo Geo Pocket ROM"
short_name = "NGP"
media_type = "application/x-neo-geo-pocket-rom"
extension = "npg"
extension = "ngp"
kind = Rom

format = NewExecutable
Expand Down Expand Up @@ -1958,6 +1978,13 @@ formats! {
extension = "asc"
kind = Application

format = PictureExchange
name = "Picture Exchange"
short_name = "PCX"
media_type = "image/x-pcx"
extension = "pcx"
kind = Image

format = PlainText
name = "Plain Text"
short_name = "TXT"
Expand Down
56 changes: 28 additions & 28 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@ Crate for determining the file format of a [given file](`FileFormat::from_file`)
It provides a variety of functions for identifying a wide range of file formats, including
[ZIP](`FileFormat::Zip`), [Compound File Binary (CFB)](`FileFormat::CompoundFileBinary`),
[Extensible Markup Language (XML)](`FileFormat::ExtensibleMarkupLanguage`) and [more](`FileFormat`).
[Extensible Markup Language (XML)](`FileFormat::ExtensibleMarkupLanguage`) and
[much more](`FileFormat`).
It checks the signature of the file to determine its format. If it is not recognized by its
signature, it returns the default file format which is
It checks the signature of the file to determine its format and intelligently employs specific
readers when available for accurate identification. If the signature is not recognized, the crate
falls back to the default file format, which is
[Arbitrary Binary Data (BIN)](`FileFormat::ArbitraryBinaryData`).
# Examples
Expand Down Expand Up @@ -50,8 +52,8 @@ All features below are disabled by default.
## Reader features
These features enable the detection of file formats that need a specific reader in order to be
detected.
These features enable the detection of file formats that require a specific reader for
identification.
- `reader` - Enables all reader features.
- `reader-asf` - Enables [Advanced Systems Format (ASF)](`FileFormat::AdvancedSystemsFormat`) based
Expand All @@ -62,7 +64,6 @@ detected.
- `reader-cfb` - Enables [Compound File Binary (CFB)](`FileFormat::CompoundFileBinary`) based file
formats detection.
* [3D Studio Max (MAX)](`FileFormat::ThreeDimensionalStudioMax`)
* [Autodesk 123D (123DX)](`FileFormat::Autodesk123d`)
* [Autodesk Inventor Assembly (IAM)](`FileFormat::AutodeskInventorAssembly`)
* [Autodesk Inventor Drawing (IDW)](`FileFormat::AutodeskInventorDrawing`)
* [Autodesk Inventor Part (IPT)](`FileFormat::AutodeskInventorPart`)
Expand All @@ -76,7 +77,6 @@ detected.
* [Microsoft Word Document (DOC)](`FileFormat::MicrosoftWordDocument`)
* [Microsoft Works 6 Spreadsheet (XLR)](`FileFormat::MicrosoftWorks6Spreadsheet`)
* [Microsoft Works Database (WDB)](`FileFormat::MicrosoftWorksDatabase`)
* [Microsoft Works Spreadsheet (WKS)](`FileFormat::MicrosoftWorksSpreadsheet`)
* [Microsoft Works Word Processor (WPS)](`FileFormat::MicrosoftWorksWordProcessor`)
* [SolidWorks Assembly (SLDASM)](`FileFormat::SolidworksAssembly`)
* [SolidWorks Drawing (SLDDRW)](`FileFormat::SolidworksDrawing`)
Expand Down Expand Up @@ -114,11 +114,11 @@ detected.
* [RealAudio (RA)](`FileFormat::Realaudio`)
* [RealVideo (RV)](`FileFormat::Realvideo`)
- `reader-txt` - Enables [Plain Text (TXT)](`FileFormat::PlainText`) detection when the file format
is not recognized by its signature. Please note that this option only detects files that contain
is not recognized by its signature. Please note that this feature only detects files containing
ASCII/UTF-8-encoded text.
- `reader-xml` - Enables [Extensible Markup Language (XML)](`FileFormat::ExtensibleMarkupLanguage`)
based file formats detection. Please note that these file formats may be detected without the
feature in some cases.
feature in certain cases.
* [AbiWord (ABW)](`FileFormat::Abiword`)
* [AbiWord Template (AWT)](`FileFormat::AbiwordTemplate`)
* [Additive Manufacturing Format (AMF)](`FileFormat::AdditiveManufacturingFormat`)
Expand Down Expand Up @@ -149,6 +149,7 @@ detected.
* [3D Manufacturing Format (3MF)](`FileFormat::ThreeDimensionalManufacturingFormat`)
* [Adobe Integrated Runtime (AIR)](`FileFormat::AdobeIntegratedRuntime`)
* [Android Package (APK)](`FileFormat::AndroidPackage`)
* [Autodesk 123D (123DX)](`FileFormat::Autodesk123d`)
* [Circuit Diagram Document (CDDX)](`FileFormat::CircuitDiagramDocument`)
* [Design Web Format XPS (DWFX)](`FileFormat::DesignWebFormatXps`)
* [Electronic Publication (EPUB)](`FileFormat::ElectronicPublication`)
Expand Down Expand Up @@ -198,6 +199,7 @@ detected.
*/

#![deny(missing_docs)]
#![forbid(unsafe_code)]

#[macro_use]
mod macros;
Expand All @@ -209,7 +211,7 @@ mod signatures;
use std::{
fmt::{self, Display, Formatter},
fs::File,
io::{BufRead, BufReader, Cursor, Read, Result, Seek},
io::{Cursor, Read, Result, Seek},
path::Path,
};

Expand Down Expand Up @@ -269,20 +271,18 @@ impl FileFormat {
/// use file_format::FileFormat;
///
/// let format = FileFormat::from_reader(std::io::empty())?;
/// assert_eq!(format, FileFormat::default());
/// assert_eq!(format, FileFormat::Empty);
/// # Ok::<(), std::io::Error>(())
///```
pub fn from_reader<R: Read + Seek>(reader: R) -> Result<Self> {
// Maximum required size to read and detect the file format from its signature.
const BUFFER_SIZE: usize = 36870;

// Creates a buffered reader with the specified size.
let mut reader = BufReader::with_capacity(BUFFER_SIZE, reader);
pub fn from_reader<R: Read + Seek>(mut reader: R) -> Result<Self> {
// Creates and fills a buffer.
let mut buffer = [0; 36870];
let bytes_read = reader.read(&mut buffer)?;

// Attempts to detect the file format.
Ok(if reader.fill_buf()?.is_empty() {
Self::default()
} else if let Some(format) = Self::from_signature(reader.buffer()) {
// Determines file format.
Ok(if bytes_read == 0 {
Self::Empty
} else if let Some(format) = Self::from_signature(&buffer[..bytes_read]) {
Self::from_format_reader(format, &mut reader)
.unwrap_or_else(|_| Self::from_generic_reader(&mut reader))
} else {
Expand Down Expand Up @@ -321,7 +321,7 @@ pub enum Kind {
/// Data which do not fit in any of the other kinds, and particularly for data to be processed
/// by some type of application program.
Application,
/// Stored files and directories into a single file, possibly compressed.
/// Files and directories stored in a single, possibly compressed, archive.
Archive,
/// Musics, sound effects, and spoken audio recordings.
Audio,
Expand All @@ -335,10 +335,10 @@ pub enum Kind {
Database,
/// Floppy disk images, optical disc images and virtual machine disks.
Disk,
/// Word processing documents, spreadsheets, presentations, documents templates, diagrams,
/// Word processing documents, spreadsheets, presentations, document templates, diagrams,
/// charts, and other formatted documents.
Document,
/// Machine executable codes, virtual machine codes and shared libraries.
/// Machine-executable codes, virtual machine codes and shared libraries.
Executable,
/// Typefaces used for displaying text on screen or in print.
Font,
Expand All @@ -348,21 +348,21 @@ pub enum Kind {
Image,
/// 3D models, CAD drawings, and other types of files used for creating or displaying 3D images.
Model,
/// Archives or other containers that bundles programs and resources that can be run on target
/// Archives or other containers that bundle programs and resources that can be run on target
/// environments.
Package,
/// Lists of audio or video files that are played in a specific order.
Playlist,
/// Copies of a read-only memory chip of computers, cartridges or other electronic devices.
/// Copies of a read-only memory chip of computers, cartridges, or other electronic devices.
Rom,
/// Subtitles and captions.
Subtitle,
/// Web feeds and syndication.
Syndication,
/// Plain text, source codes, markup languages, and other types of files that contain written
/// Plain text, source codes, markup languages, and other types of files containing written
/// text.
Text,
/// Movies, animations, and other types of files that contain moving images, possibly with color
/// Movies, animations, and other types of files containing moving images, possibly with color
/// and coordinated sound.
Video,
}
Loading

0 comments on commit 4804982

Please sign in to comment.