Regression: PR #17786 breaks model loading on Windows/MSVC for models with complex tokenizer regex

### Name and Version

version: 7310 (db9783738)
built with MSVC 19.50.35719.0 for x64

## Description

PR #17786 introduced `std::regex_constants::nosubs | std::regex_constants::optimize` flags to prevent segfaults on highly repetitive input. This fix works on Linux/GCC but breaks Windows/MSVC builds when loading models with complex tokenizer patterns (e.g., `gpt-4o`).

## Error
```
Failed to process regex: [^\r\n\p{L}\p{N}]?((?=[\p{L}])([^a-z]))*((?=[\p{L}])([^A-Z]))+(?:'[sS]|...
Regex error: regex_error(error_stack): There was insufficient memory to determine whether the regular expression could match the specified character sequence.
llama_model_load: error loading model: error loading model vocabulary: Failed to process regex
```

## Cause

MSVC's `std::regex` implementation has severe stack limitations with complex patterns containing nested lookaheads. The `nosubs | optimize` flags appear to trigger a code path that exhausts the stack during pattern compilation.

## Suggested Fix

Use platform-specific flags:
```cpp
#ifdef _MSC_VER
    // MSVC's std::regex has stack limitations with complex patterns
    constexpr auto regex_flags = std::regex_constants::ECMAScript;
#else
    // Prevents catastrophic backtracking on repetitive input
    constexpr auto regex_flags = std::regex_constants::nosubs | std::regex_constants::optimize;
#endif
```

This preserves the fix for Linux while avoiding the regression on Windows. Note that `_MSC_VER` is also defined for clang-cl, which uses the same broken MSVC STL implementation.

### First Bad Commit

1be97831e44a6335aca9c3f4f3edbb0e35bea98f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression: PR #17786 breaks model loading on Windows/MSVC for models with complex tokenizer regex #17830

Name and Version

Description

Error

Cause

Suggested Fix

First Bad Commit

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regression: PR #17786 breaks model loading on Windows/MSVC for models with complex tokenizer regex #17830

Description

Name and Version

Description

Error

Cause

Suggested Fix

First Bad Commit

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions