A lightweight, intuitive wrapper around Intl.Segmenter for seamless segment-aware string operations in TypeScript and JavaScript.
- Intuitive
Intl.Segmenter
Wrapper: Simplifies text segmentation with a clean API. - Standards-Based: Built on native
Intl.Segmenter
for robust compatibility. - Lightweight & Tree-Shakeable: Minimal footprint with optimal bundling.
- Highly Performant: Uses iterators for efficient, on-demand processing.
- Full TypeScript Support: Strict types for safe, predictable usage.
npm install segment-string
segment-string
is a lightweight wrapper for Intl.Segmenter
, designed to simplify locale-sensitive text segmentation in JavaScript and TypeScript. It lets you easily segment and manipulate text by graphemes, words, or sentences, ideal for handling complex cases like multi-character emojis or language-specific boundaries.
import { SegmentString } from "segment-string";
const str = new SegmentString("Hello, world! π©βπ©βπ§βπ¦ππ");
// Segment by grapheme
console.log([...str.graphemes()]); // ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!', ' ', 'π©βπ©βπ§βπ¦', 'π', 'π']
The SegmentString
class encapsulates a string and provides methods for segmentation, counting, and retrieving segments at specified indices with locale and granularity options.
new SegmentString(str: string, locales?: Intl.LocalesArgument);
- str: The string to segment.
- locales: Optional locales argument for segmentation.
segments(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Iterable<string>
Segments the string by the specified granularity and returns the segments as strings.
rawSegments(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Intl.Segments | Iterable<Intl.SegmentData>
Returns raw Intl.SegmentData
objects based on granularity and options.
segmentCount(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): number
Counts segments in the string based on the specified granularity.
segmentAt(index: number, granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): string | undefined
Retrieves the segment at a specific index, supporting negative indices.
rawSegmentAt(index: number, granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Intl.SegmentData | undefined
Returns the raw segment data at a specific index, supporting negative indices.
Returns an iterable of grapheme segments as strings.
Returns an iterable of raw grapheme segments.
Counts grapheme segments in the string.
Returns the grapheme at a specific index, supporting negative indices.
Returns the raw grapheme data at a specific index, supporting negative indices.
Returns an iterable of word segments, with optional filtering for word-like segments.
Returns an iterable of raw word segments, with optional filtering for word-like segments.
Counts word segments in the string.
Returns the word at a specific index, supporting negative indices.
Returns the raw word data at a specific index, supporting negative indices.
Returns an iterable of sentence segments.
Returns an iterable of raw sentence segments.
Counts sentence segments in the string.
Returns the sentence at a specific index, supporting negative indices.
Returns the raw sentence data at a specific index, supporting negative indices.
Returns an iterator over the graphemes of the string.
import { SegmentString } from "segment-string";
const text = new SegmentString("Hello, world! π©βπ©βπ§βπ¦ππ");
// Segmenting by words
for (const word of text.words()) {
console.log(word); // 'Hello', ',', ' ', 'world', '!', ' π©βπ©βπ§βπ¦ππ'
}
// Segmenting graphemes and counting
console.log([...text.graphemes()]); // ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!', ' ', 'π©βπ©βπ§βπ¦', 'π', 'π']
console.log("Grapheme count:", text.graphemeCount()); // 17
console.log("String length:", text.toString().length); // 29
// Accessing a specific word
const secondWord = text.wordAt(1, { isWordLike: true }); // 'world'
console.log(secondWord);
Alternatively, the SegmentSplitter
class allows you to create an instance that can be directly used with JavaScript's String.prototype.split
method for basic segmentation.
new SegmentSplitter<T extends Granularity>(granularity: T, options?: SegmentationOptions<T>);
- granularity: Specifies the segmentation granularity level (
'grapheme'
,'word'
,'sentence'
, etc.). - options: Optional settings to customize the segmentation for the given granularity.
const str = "Hello, world!";
const wordSplitter = new SegmentSplitter("word", { isWordLike: true });
const words = str.split(wordSplitter);
console.log(words); // ["Hello", "world"]
function getRawSegments(
str: string,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): Intl.Segments | Iterable<Intl.SegmentData>;
- Description: Returns raw
Intl.SegmentData
objects based on granularity and options. - Parameters:
str
: The string to segment.granularity
: Specifies the segmentation level ('grapheme'
,'word'
, or'sentence'
).options
: Includeslocales
for specifying locale andisWordLike
for filtering word-like segments.
- Returns: An iterable of raw
Intl.SegmentData
.
function getSegments(
str: string,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): Iterable<string>;
- Description: Returns segments of the string as plain strings.
- Parameters: Similar to
getRawSegments
. - Returns: An iterable of segments as strings.
function segmentCount(
str: string,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): number;
- Description: Returns the count of segments based on granularity and options.
- Parameters: Similar to
getRawSegments
. - Returns: Number of segments.
function rawSegmentAt(
str: string,
index: number,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): Intl.SegmentData | undefined;
- Description: Returns the raw segment data at a specified index, supporting negative indices.
- Parameters: Similar to
getRawSegments
, plus anindex
parameter. - Returns: The
Intl.SegmentData
at the specified index, orundefined
if out of bounds.
function segmentAt(
str: string,
index: number,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): string | undefined;
- Description: Returns the segment at a specified index, supporting negative indices.
- Parameters: Similar to
getRawSegments
, plus anindex
parameter. - Returns: The segment at the specified index or
undefined
if out of bounds.
function filterRawWordLikeSegments(
segments: Intl.Segments,
): Iterable<Intl.SegmentData>;
- Description: Filters and returns an iterable of raw word-like segment data where
isWordLike
is true. - Parameters:
segments
: The segments to filter.
- Returns: An iterable of
Intl.SegmentData
for each word-like segment.
function filterWordLikeSegments(segments: Intl.Segments): Iterable<string>;
- Description: Filters and returns an iterable of word-like segments as strings where
isWordLike
is true. - Parameters:
segments
: The segments to filter.
- Returns: An iterable of strings for each word-like segment.
π This package was templated with
create-typescript-app
.