-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex-validated string types (feedback reset) #41160
Comments
Use case 1, URL path building libraries, /*snip*/
createTestCard : f.route()
.append("/platform")
.appendParam(s.platform.platformId, /\d+/)
.append("/stripe")
.append("/test-card")
/*snip*/ These are the constraints for
Use case 2,
Use case 3, safer new(pattern: string, flags?: PatternOf</^[gimsuy]*$/>): RegExp
|
Template string type can only be used in conditional type, so it's really a "type validator", not a "type" itself. It also focuses more on manipulating strings, I think it's a different design goal from Regex-validated types. It's doable to use conditional types to constrain parameters, for example taken from #6579 (comment) declare function takesOnlyHex<StrT extends string> (
hexString : Accepts<HexStringLen6, StrT> extends true ? StrT : {__err : `${StrT} is not a hex-string of length 6`}
) : void; However I think this parttern has several issues:
|
Would this allow me to define type constraints for String to match the XML specification's Name constructs (short summary) and QNames by expressing them as regular expressions? If so, I am all for it :-) |
@AnyhowStep It isn't the cleanest, but with conditional types now allowing recursion, it seems we can accomplish these cases with template literal types: playground link |
We can have compile-time regular expressions now. (Well, non-feature when I'm trying to use TypeScript for work. All personal projects have |
We have a strongly-typed filesystem library, where the user is expected to manipulate "clean types" like export interface PathUtils {
cwd(): PortablePath;
normalize(p: PortablePath): PortablePath;
join(...paths: Array<PortablePath | Filename>): PortablePath;
resolve(...pathSegments: Array<PortablePath | Filename>): PortablePath;
isAbsolute(path: PortablePath): boolean;
relative(from: PortablePath, to: PortablePath): P;
dirname(p: PortablePath): PortablePath;
basename(p: PortablePath, ext?: string): Filename;
extname(p: PortablePath): string;
readonly sep: PortablePath;
readonly delimiter: string;
parse(pathString: PortablePath): ParsedPath<PortablePath>;
format(pathObject: FormatInputPathObject<PortablePath>): PortablePath;
contains(from: PortablePath, to: PortablePath): PortablePath | null;
} I'm investigating template literals to remove the
The overhead sounds overwhelming, and makes it likely that there are side effects that would cause problems down the road - causing further pain if we need to revert. Ideally, the solution we're looking for would leave the code above intact, we'd just declare |
I have a strong use case for Regex-validated string types. AWS Lambda function names have a maximum length of 64 characters. This can be manually checked in a character counter but it's unnecessarily cumbersome given that the function name is usually composed with identifying substrings. As an example, this function name can be partially composed with the new work done in 4.1/4.2. However there is no way to easily create a compiler error in TypeScript since the below function name will be longer than 64 characters. type LambdaServicePrefix = 'my-application-service';
type LambdaFunctionIdentifier = 'dark-matter-upgrader-super-duper-test-function';
type LambdaFunctionName = `${LambdaServicePrefix}-${LambdaFunctionIdentifier}`;
const lambdaFunctionName: LambdaFunctionName = 'my-application-service-dark-matter-upgrader-super-duper-test-function'; This StackOverflow Post I created was asking this very same question. With the continued rise of TypeScript in back-end related code, statically defined data would be a likely strong use case for validating the string length or the format of the string. |
TypeScript supports literal types, template literal types, and enums. I think a string pattern type is a natural extension that allows for non-finite value restrictions to be expressed. I'm writing type definitions for an existing codebase. Many arguments and properties accept strings of a specific format:
|
I'd like to argue against @RyanCavanaugh's claim in the first post saying that:
As it stands presently TypeScript can't even work with the following type literal: type Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9;
type Just5Digits = `${Digit}${Digit}${Digit}${Digit}${Digit}`; Throwing an "Expression produces a union type that is too complex to represent.(2590)" error. That's the equivalent of the following regex: /^\d{5}$/ Just 5 digits in a row. Almost all useful regexes are more complicated than that, and TypeScript already gives up with that, hence I'd argue the opposite of that claim is true: a small number of use cases have been addressed and the progress with template literals has been mostly orthogonal really. |
What about validation of JSON schema's Possible syntax using a import { IJSONSchema, IJSONSchemaMap } from 'vs/base/common/jsonSchema';
export const UnscopedKeyPtn: string = '^[^\\[\\]]*$';
export type UnscopedKey = string & matchof RegExp(UnscopedKeyPtn);
export tokenColorSchema: IJSONSchema = {
properties: {},
patternProperties: { [UnscopedKeyPtn]: { type: 'object' } }
};
export interface ITokenColors {
[colorId: UnscopedKey]: string;
} |
I just want to add to the need for this because template literals do not behave the way we think explicitly - type UnionType = {
kind: `kind_${string}`,
one: boolean;
} | {
kind: `kind_${string}_again`,
two: string;
}
const union: UnionType = {
// ~~~~~ > Error here -
/**
Type '{ kind: "type1_123"; }' is not assignable to type 'UnionType'.
Property 'two' is missing in type '{ kind: "type1_123"; }' but required in type '{ kind: `type1_${string}_again`; two: string; }'.ts(2322)
*/
kind: 'type1_123',
} this shows template literals are not unique and one can be a subset of another while that is not the intention of use. Regex would let us have a |
(CC @Igmat) It occurs to me that there's a leaning towards using regex tests as type literals in #6579, i.e. type CssColor = /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: CssColor = '#000000'; // OK It seems that regexes are usually interpreted as values by the TS compiler. When used as a type, this usually throws an error that keeps types and values as distinct as possible. What do you think of:
type CssColor = matchof /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: CssColor = '#000000'; // OK
TL:DR; regex literal types aren't intuitively and visibly types without explicit regex->type casting, can we propose that? |
I'm not sure what the benefit of a separate keyword is here. There doesn't seem to be a case where it could be ambiguous whether the regex is used as a type or as a value, unless I'm missing something? I think #6579 (comment) and the replies below it already sketch out a syntax that hits the sweet spot of being both succinct and addressing all the use cases. Regarding the intersection, the input to |
Good to know about The ambiguity seems straightforward to me. As we know, TypeScript is a JS superset & regex values can be used as variables. To me, a regex literal is just not an intuitive type - it doesn't imply "string that matches this regexp restriction". It's common convention to camelcase regex literals and add a "Regex" suffix, but that variable name convention as a type looks really ugly: export cssColorRegex: RegExp = /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: cssColorRegex = '#000000'; // OK
// ^ lc 👎 ^ two options:
// - A. use Regex for value clarity but type confusion or
// - B. ditch Regex for unclear value name but clear type name The original proposal does suggests JSON schemas which would use the regex as a type and a value (if implemented). |
Perhaps I wasn't very clear, there doesn't seem to be a case where it would be ambiguous for the compiler whether a regex is a type or a value. Just as you can use string literals both as values and as types: const foo = "literal"; // Used as a value
const bar: "literal" = foo; // Used as a type The exact same approach can be applied for regex types without ambiguity. |
My concern is that the regex means two different things in the two contexts - literal vs "returns true from RegExp.test method". The latter seems like a type system feature exclusively - it wouldn't be intuitive unless there's syntax to cast the regex into a type |
There is also the issue of regex literals and regex types possibly being used as superclasses:
If all regex literals and type variables are cast into validators implicitly without a keyword, how do we use To me, context loss in #41160 (comment) is enough reason to add a keyword, but this is another reason. I'm unsure of the name I suggested but I do prefer the use of an explicit type cast. |
Another thing to add, this isn't just helpful for validation, but also for extracting information. E.g. type Id<
TVersion extends Id.Version = Id.Version,
TPartialId extends Id.PartialId = Id.PartialId,
TContext extends Id.Context | undefined = Id.Context | undefined
> = TContext extends undefined ? `${TVersion}:${TPartialId}` : `${TVersion}:${TContext}:${TPartialId}`
namespace Id {
export type Version = /v\d+/
export namespace Version {
export type Of<TId extends Id> = TId extends Id<infer TVersion> ? TVersion : never
}
export type PartialId = /\w+/
export namespace PartialId {
export type Of<TId extends Id> = TId extends Id<any, infer TPartialId> ? TPartialId : never
}
export type Context = /\w+/
export namespace Context {
export type Of<TId extends Id> = TId extends Id<any, any, infer TContext> ? TContext : never
}
}
type MyId = Id<'v1', 'myPartialId', 'myContext'> // 'v1:myContext:myPartialId'
type MyPartialId = Id.PartialId.Of<MyId> // 'myPartialId' This can be done with just |
This constructs a literal string type containing only the allowed characters. If you attempt to pass invalid characters you get back type HexDigit =
| 0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
| 'a'
| 'b'
| 'c'
| 'd'
| 'e'
| 'f'
// Construct a string type with all characters not in union `HexDigit` removed.
export type OnlyHexDigits<Str, Acc extends string = ''> =
Str extends `${infer D extends HexDigit}${infer Rest}`
? OnlyHexDigits<Rest, `${Acc}${D}`>
: Acc
// Return given type `Hex` IFF it was unchanged (and thus valid) by `OnlyHexDigits`.
export type HexIntLiteral<
Hex,
FilteredHex = OnlyHexDigits<Hex>
> =
Hex extends FilteredHex
? Hex
: never
// Effectively an alias of `HexIntLiteral<'123'>`.
function hexInt<Hex extends string> (n: Hex & HexIntLiteral<Hex>) {
return n as HexIntLiteral<Hex>
}
// Without the 'alias' form.
declare const t1: HexIntLiteral<'123'> // '123'
declare const t2: HexIntLiteral<'cafebabe'> // 'cafebabe'
// Using the 'alias' form.
const t3 = hexInt('zzzz') // never
const t4 = hexInt('a_b_c_d') // never
const t5 = hexInt('9287319283712ababababdefffababa12312') // <-- that
// Remember, the type is a string literal so `let` is still (as far as TypeScript
// is concerned) immutable (not _really_).
let t6 = hexInt('cafe123')
t6 = '123' // We (humans) know '123' is valid, but `t6` is a string literal `cafe123`
// so this is an error (2232): type '123' not assignable to type 'cafe123'
// because we construct a _string literal_ type. This can likely be simplified but I waste a lot of time code golfing TypeScript types so I abstain this time. |
My case: const obj = {
_test1: '1',
test2: '2',
_test3: '3',
test4: '4',
};
function removeKeysStartingWith_(obj: Record<string, unknown>): Record<string, unknown> {
const x: Record<string, unknown> = {};
Object.keys(obj)
.filter(key => !/^_/i.test(key))
.forEach(key => x[key] = obj[key]);
return x;
}
// {"test2":"2", "test4":"4"} I cannot express the fact that the return object of a function cannot have keys starting with "_". I cannot define the precise keyof set without a RegExp (to be used in combination with conditional types). |
@mauriziocescon template literal strings work fine for this; you don't need regexes const obj1 = {
_test1: '1',
test2: '2',
_test3: '3'
};
type RemoveUnderscore<K> = K extends `_${string}` ? never : K;
type NoUnderscores<T> = {
[K in keyof T as RemoveUnderscore<K>]: T[K];
}
declare function removeKeysStartingWith_<T extends object>(obj: T): NoUnderscores<T>;
const p1 = removeKeysStartingWith_(obj1);
p1.test2; // ok
p1._test1; // not ok |
Thanks a lot for the instantaneous feedback! I missed that part... 😅 |
@mauriziocescon Be careful, though: that type means that you definitely do not know whether any keys beginning with |
Use caseI would like to use this type: type Word = /^w+$/ I use this as a building block for many template strings. E.g.: // I mainly don't want `TPartialId` to contain ':',
// as that would interfere with my ability to parse this string
type Id<
TType extends Type,
TPartialId extends Word
> = `${Type}:${TPartialId}` Answers to some of your questionsI use this in a mix of static and dynamic use cases. E.g. const validId: Id = 'sometype:valid'
// this should not be allowed
const invalidId: Id = 'sometype:invalid:'
declare function createId<TType extends Type, TPartialId extends Word>(
type: TType,
partialId: TPartialId
): Id<TType, TPartialId>
declare function getPartialId<TId extends Id>(
id: TId
): TId extends Id<any, infer TPartialId> ? TPartialId : Word
declare function generateWord(): Word I absolutely want to use regular expression types in template literals (as seen in above examples). However, while it would be nice to have, I don't need to be able to use anything within my regular expression types. (e.g. I don't really need I would appreciate the ability to do something like this: const WORD_REGEXP = /^\w+$/
export type Word = Regex<typeof WORD_REGEXP>
export function isWord(val: unknown): val is Word {
return typeof val === 'string' && WORD_REGEXP.test(val)
} However, if I had to write the same regular expression twice, it would still be better than the current state. I don't think the above part approaches nominal typing. At a high level, regular expression is basically a structural type for a string. You can determine if a string matches the regular expression solely based on the string's contents, ignoring any metadata about the string. With that being said, I do acknowledge that it is harder to determine if a type for a string matches a regular expression, which is where things get kind of nominal. Specifically, to your point:
If you are within one project, you should create one type with whatever the "right" regex for that project is and reference that everywhere. If you are working with a library, you should use the type from that library. Either way, you shouldn't have to recreate a regular expression type in the way that you think is "right." And if you want to add additional restrictions, just use intersection. Although, I do recognize that without subtyping, things do get pretty nominal when determining if types match a regular expression. However, we currently deal with that type of problem with deferred evaluation of type parameters in functions/classes. So semi-nominal types in certain contexts doesn't seem to be a deal-breaker. Although, I do acknowledge deferred type parameters are never fun to deal with
To be fair, the canonical regex doesn't generally matter externally at the moment. If it did matter externally, e.g. it was used in a type, they would be more likely to publish it Alternative: template string enhancementsI do agree that enhancements to template strings could work. In my use case, these would be sufficient:
With these, I could do something like: type WordCharacter = 'a' | 'b' | ... (preferably this is built into TypeScript)
type Word = `${WordCharacter}${Word | ''}` // === /^\w+$/
type WordOrEmpty = Word | '' // === /^\w*$/ However, these would not work if I wanted to do this through negation, which I had thought about. E.g.: type PartialId = /^[^:]+$/ If you like these enhancements, I can put them in proposals in one or more separate issues |
To add a very straightforward use case to this: custom element names. Custom element names must begin with a lowercase letter, must have a dash, and are frequently defined as string literals, not dynamically. This seems like something that TypeScript should absolutely be able to handle, it's easy for people to carelessly forget that the elements have to have a dash or must be lowercased, and it's annoying to only get it at runtime. Sometimes people define custom element names dynamically, but they define them as literals often too. It would be nice if we could at least check the literals, even if we can't check the dynamic ones. On the whole, the discussion of this proposal is extremely frustrating to read. The evaluation begins with "Checking string literals is easy and straightforward". Great. So why is adding an easy and straightforward thing being held up for literal years by discussion about maybe adding much less easy and much less straightforward things? I understand the general sentiment that you want to be careful about making a simple syntax for the easy case that accidentally blocks future extension of functionality when you get to the hard cases, but that doesn't look like an issue here. Maybe capture groups would be useful, maybe dynamic strings would be useful. But adding support for string literals and regex without capture groups is easy and doesn't block adding support for dynamic strings and capture groups later. |
Another use-case: dynamically-typed reducers for event-based programming:
(an aside: the fact that typescript can infer the reduction of the declared union for the tooltip here is pretty darn impressive, though it falls back to the flat payload union if you change some of the intermediate types to use captures rather than explicit generic parameters) |
Nearly every use-case mentioned can already be implemented via built-in string template matching & extraction (see, for example, the "wouter" routing library for how they validate and extract route parameters from paths using this method). The only problem is that the solution in all of these cases requires something like: type ParseSomething<T extends string> = T extends `...${infer Something}...` ? T : never
function validateSomething<T extends string>(input: ParseSomething<T>) {
// input has now been validated, but we required this useless function to do it.
// also very painful to create "arrays of valid somethings"
} What we really want to be able to do is throw away the We want to be able to say (for example): type HexCode = <exists S extends string> ParseHex<S>
const hexCodes: HexCode[] = ['000000', 'FFFFFF']
// etc. So, if I'm not mistaken, it seems like this issue can mostly be reduced to the introduction of existential types issue. Please upvote that one! (The length-specific use-cases would probably also do well to upvote this length-specific issue.) |
@HansBrende how would you use existential types for the below use case? type Word = /^w+$/ |
type WordChar = 'A'|'B'|'C'| ... |'Z'|'a'|'b'|'c'| ... |'z'|'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|'_'
type IsWord<S extends string> = S extends `${WordChar}${infer R}` ? R extends '' ? unknown : IsWord<R> : never
type Word = <exists S extends string> IsWord<S> & S (Note: |
A) This doesn’t account for Unicode characters in a concise way |
True, but the original regexp doesn't either since it did not include the Unicode-aware expressions in general would obviously be a bit more complicated, but still potentially doable in many cases, including this one (but in many cases not). That's why I stated existential types would cover most of the use-cases, but not all.
Nope. Try it and see! The compiler can handle something like 10K types in a union (not sure what the exact number is), but this is only 26 + 26 + 10 + 1 = 63 types. Your Hex color type was probably defined differently, something like If both of the issues I linked to (existential types and string literal length--again, please upvote) were implemented, then you could implement type HexDigit = `${0|1|2|3|4|5|6|7|8|9}`|'a'|'b'|'c'|'d'|'e'|'f'
type IsHexString<S extends string> = S extends '' ? unknown : S extends `${HexDigit}${infer R}` ? IsHexString<R> : never
type IsHexColor<S extends string> = S extends `#${infer R}` & {length: 7 | 9} ? IsHexString<R> : never
// TA-DA!
type HexColor = <exists S extends string> IsHexColor<S> & S |
This would extremely useful for completely ditching ORMs and just using pure typed-SQL queries. Most devs use ORMs for the type-safety but it introduces a ton of method-chaining overhead then you end writing SQL anyways, just with methods. |
I'm not clear on how regex is useful for SQL queries; can you clarify? |
(One of the primary motivating use cases for tagged template literals was for being able to construct a context-aware DSL, including SQL queries and regular expressions) |
I can see both sides here:
Large table
Together with the constant string length of #34692 mentioned by @HansBrende and built-in types like utility intrinsic string manipulation types ( type UuidSegmentQuartett = `{(Alpha|number[1]){4}}`
type UUID = `{UuidSegmentQuartett[2]}-{UuidSegmentQuartett}-{UuidSegmentQuartett}-{UuidSegmentQuartett}-{UuidSegmentQuartett[3]}` Or, strings become generics (strings without type default to type UuidSegmentQuartett = `{(string<Alpha>|number[1]){4}}` // or `string<AlphaNum>[4]` // or even just `string<Hex>[4]`
type UUID = `{UuidSegmentQuartett[2]}-{UuidSegmentQuartett}-{UuidSegmentQuartett}-{UuidSegmentQuartett}-{UuidSegmentQuartett[3]}` |
@shaedrich Note: with existential types and string literal length (please upvote both of these issues), all the compiler problems in your first example go away: type IsRepeated<Char extends string, S extends string> =
S extends '' ? unknown :
S extends `${Char}${infer R}` ? IsRepeated<Char, R> : never;
// Note that here, an *existential type parameter* replaces the need for a large union:
type Repeat<Char> = <exists S extends string> IsRepeated<Char, S> & S
type HexDigit = `${0|1|2|3|4|5|6|7|8|9}`|'a'|'b'|'c'|'d'|'e'|'f'
type XXXX = Repeat<HexDigit> & {length: 4}
type XXXXXXXX = Repeat<HexDigit> & {length: 8}
type XXXXXXXXXXXX = Repeat<HexDigit> & {length: 12}
// Should work fine now since large union has been replaced with existential types:
type UUID = `${XXXXXXXX}-${XXXX}-${XXXX}-${XXXX}-${XXXXXXXXXXXX}` |
I forgot to mention that my example is a little oversimplified, since
|
By far the best feature for UUID is #43335 But again, how do you even get a malformed UUID in the first place? You can only ever copy-paste them. If you miss a digit from selecting wrong, you should get a more-or-less immediate exception. Why is this happening to people so often? |
@RyanCavanaugh to answer your question for my own use-cases, the ability to correctly type a UUID would be helpful mainly to ensure that UUIDs round-trip to the server without accidentally putting some other string identifier (whether that be some human-readable identifier, "code", or stringified serial ID) in the "id" field (which aligns with the whole point of using a typed language in the first place: fail at compile time instead of runtime). Of course, that problem is easily solved by using opaque symbol tags as well, though it feels kind of hacky, especially when certain "special" uuids are hardcoded and must be cast to fit the opaque type. A more compelling use-case in my mind is hex codes of a certain length, such as RGB or RGBA codes, especially when you are doing math on them and want to avoid writing extra code for error handling of non-valid inputs (or even worse: trying to support all possible formats to avoid error-handling). |
This is a pickup of #6579. With the addition of #40336, a large number of those use cases have been addressed, but possibly some still remain.
Update 2023-04-11: Reviewed use cases and posted a write-up of our current evaluation
Search Terms
regex string types
Suggestion
Open question: For people who had upvoted #6579, what use cases still need addressing?
Note: Please keep discussion on-topic; moderation will be a bit heavier to avoid off-topic tangents
Examples
(please help)
Checklist
My suggestion meets these guidelines:
The text was updated successfully, but these errors were encountered: